Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf ·...

24
Optimization Methods with Signal Denoising Applications Paul Tseng Mathematics, University of Washington Seattle Taiwan Normal University March 2, 2006 (Joint works with Sylvain Sardy (EPFL), Andrew Bruce (MathSoft), and Sangwoon Yun (UW)) Abstract This is a talk given at optimization seminar, Jan 2006, and at Taiwan Normal University in March 2006.

Transcript of Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf ·...

Page 1: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

Optimization Methods with Signal DenoisingApplications

Paul TsengMathematics, University of Washington

Seattle

Taiwan Normal UniversityMarch 2, 2006

(Joint works with Sylvain Sardy (EPFL), Andrew Bruce (MathSoft), andSangwoon Yun (UW))

AbstractThis is a talk given at optimization seminar, Jan 2006, and at Taiwan

Normal University in March 2006.

Page 2: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 1

Talk Outline

• Basic Problem Model

? Primal-Dual Interior Point Method? Block Coordinate Minimization Method? Applications

• General Problem Model

? Block Coordinate Gradient Descent Method? Convergence? Numerical Testing (ongoing)

• Conclusions & Future Work

Page 3: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 2

Basic Problem Model

tm

������������

b(t )

b(t )

1

1

B = 11

m

���������b = b(t )

b(t )

1

m

� ���������

b(t )

b(t )B =

m

22 1

2

������ ������ ������

������ ������ ������

��� �

!�!"�"

b( )t

t

2

##$$%%&&''((

))**++,,--..

/01234

Waveletbasis& itstransl.

ObservedNoisy Signal:

DenoisedSignal:

5�567�789�9:

t t1 2;�;<�<

=�=>�> ?�?@�@A�AB�B

C�CD�D E�EF�FG�GH�H I�IJ�J

K�KL�LM�MN�N O�OP�P Q�QR�R

S�ST�T U�UV�V W�WX�XY�YZ�Z

[�[\�\]�]^�^

_�_`�` a�ab�b c�cd�de�ef�f

g�gh�hi�ij�j

b( )t

t. . .

b( )t

t

1

b( )t

t

m+1

m+1

m+1b (t )

b (t )m

1B = m+1

B1w1 + · · ·+ Bnwn = [B1 · · · Bn]

[w1...

wn

]= Bw (n ≥ m)

Page 4: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 3

Find w so that Bw − b ≈ 0 and w has “few” nonzeros.

Formulate this as an unconstrained convex optimization problem:

P1 minw∈<n

‖Bw − b‖22 + c‖w‖1 (c > 0) “Basis Pursuit”Chen, Donoho, Saunders

Difficulty: Typically m ≥ 1000, n ≥ 8000, and B is dense. ‖ · ‖1 is nonsmooth.. .6_

Page 5: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 4

Primal-Dual Interior Point Method for P1

Idea: Reformulate P1 as a convex QP, and apply primal-dual IP method.

QP Reformulation of P1:

P1 minw∈<n

‖Bw − b‖22 + c‖w‖1

Substitute w = w+ − w− with w+ ≥ 0, w− ≥ 0, ‖w‖1 = eT (w+ + w−):

minw+≥0

w−≥0

‖Bw+ −Bw− − b︸ ︷︷ ︸y

‖22 + ceT (w+ + w−)

min ‖y‖22 + ceT

[w+

w−

]w+≥0w−≥0

[B −B]︸ ︷︷ ︸A

[w+

w−

]︸ ︷︷ ︸

x

+y = b

Page 6: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 5

QP Reformulation of P1:

min ‖y‖22 + ceTxx ≥ 0 Ax + y = b

KKT Optimality Condition for QP:Ax + y = b, x ≥ 0

ATy + z = ce, z ≥ 0Xz = 0

(X = diag[x1, ..., x2n])

Perturbed KKT Optimality Condition:Ax + y = b, x > 0

ATy + z = ce, z > 0Xz = µe (µ > 0)

Primal-Dual IP method: Apply damped Newton method to solve inexactly theperturbed KKT equations while maintaining x > 0, z > 0. Decrease µ aftereach iteration. Fiacco-McCormick ’68, Karmarkar ’84,...

Page 7: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 6

Method description:

Given µ > 0, x > 0, y, z > 0, solveA∆x + ∆y = b−Ax− y,

AT∆y + ∆z = ce−ATy − z,Z∆x + X∆z = µe−Xz

NewtonEqs.

Update

xnew

= x + .99βp∆x,

ynew

= y + .99βd∆y,

znew

= z + .99βd∆z,

µnew

= (1−min{.99, βp, βd})µ,

where βp = mini:∆xi<0

{xi

−∆xi

}, βd = min

i:∆zi<0

{zi

−∆zi

}

Page 8: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 7

Implementation & Initialization:

• Newton Eqs. reduce to

(I + AZ−1XAT )∆y = r .

Solve by Conjugate Gradient (CG) method.

Multiplication by A︸︷︷︸m×2n

& AT require O(m log m) & O(m(log m)2) opers.

• Initialization as in Chen-Donoho-Saunders ’96

• Theoretical convergence?

CG preconditioning?

Page 9: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 8

Block Coord. Minimization Method for P1

Method description:

Given w, choose I ⊆ {1, ..., n} with |I| = m, {Bi}i∈I is orthog.

Update

wnew

= arg minui=wi ∀i 6∈I

‖Bu− b‖22 + c‖u‖1 closed

← form soln

• Choose I to maximize minv∈∂uI(‖Bu−b‖2

2+c‖u‖1)|u=w

‖v‖2.

Requires O(m log m) opers. by algorithm of Coifman & Wickerhauser.

• Theoretical convergence: w-sequence is bounded & each cluster pointsolves P1.

Page 10: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 9

Convergence of BCM method depends crucially on

• differentiability of ‖ · ‖22

• separability of ‖ · ‖1

• convexity ⇒ global minimum

Page 11: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 10

Application 1: Electronic surveillance

0

10

20

30

40

50

60

bp.cptable.spectrogram

50 55 60 65

0

10

20

30

40

50

60

ew.cptable.spectrogram

microseconds

KHz

-40

-30

-20

-10

0

10

20

m = 211 = 2048, c = 4, local cosine transform, all but 4 levels

Page 12: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 11

Method efficiency :

2500

2000

1500

1000

500

1 5 10

������������

������������ ������������ ������������ ����

������������ �

��

��

�� �� �� ��

��

��

� !" #$

%&

CPU time (log scale) CPU time (log scale)

Real part Imaginary part

Objective

1 5 10

1000

500

1500

2000

2500

3000

BCR

IP IP

BCR

Comparing CPU times of IP and BCM methods (S-Plus, Sun Ultra 1).

Page 13: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 12

ML Estimation

P2: minw−`(Bw; b) + c

∑i∈J

|wi| (c > 0)

` is log likelihood, {Bi}i 6∈J are lin. indep “coarse-scale Wavelets”

• −`(y; b) = 12‖y − b‖22 Gaussian noise

• −`(y; b) =m∑

i=1

(yi − bi ln yi) (yi ≥ 0) Poisson noise

Solve P2 by adapting IP method. BCM?

Page 14: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 13

Application 2: Γ ray astronomy

0

0.5

1

1.5

Measurements

50100150200250300350

50

100

150

0

0.5

1

1.5

Anscombe−based estimate

50100150200250300350

50

100

150

0

0.5

1

1.5

TIPSH estimate

50100150200250300350

50

100

150

0

0.5

1

1.5

l1−penalized likelihood estimate

50100150200250300350

50

100

150

m = 720 · 360, c chosen by CV, Symmlets of order 8 (levels 3-8). Spatially inhomogeneousPoisson noise.

Page 15: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 14

But IP method is slow (many CG steps).. .6_

Adapt BCM method?

Page 16: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 15

General Problem Model

P3 minw

Fc(w) := f(w) + cP (w) (c ≥ 0)

f : <N → < is smooth.

P : <N → (−∞,∞] is proper, convex, lsc, and P (w) =∑n

j=1 Pj(wj)(w = (w1, ..., wn)).

• P (w) = ‖w‖1

• P (w) ={

0 if l ≤ w ≤ u∞ else

Page 17: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 16

Block Coord. Gradient Descent Method for P3

Idea: Do BCM on a quadratic approx. of f .

For w ∈ domP , I ⊆ {1, ..., n}, and H � 0n, let dH(w; I) and qH(w; I) be theoptimal soln and obj. value of

mind|di=0 ∀i 6∈I

{gTd +12dTHd + cP (w + d)− cP (w)} direc.

subprob

with g = ∇f(w).

Facts :

• dH(w; {1, ..., n}) = 0 ⇔ F ′c(w; d) ≥ 0 ∀d ∈ <N . stationarity

• H is diagonal ⇒ dH(w; I) =∑i∈I

dH(w; i), qH(w; I) =∑i∈I

qH(w; i). separab.

Page 18: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 17

Method description:

Given w ∈ domP , choose I ⊆ {1, ..., n}, H � 0n. Let d = dH(w; I).

Update wnew

= w + αd (α > 0)

• α = largest element of {1, β, β2, ...} satisfyingFc(w + αd)− Fc(w) ≤ σαqH(w; I) (0 < β < 1, 0 < σ < 1) Armijo

• I = {1}, {2}, ..., {n}, {1}, {2}, ... Gauss-Seidel

• ‖dD(w; I)‖∞ ≥ υ‖dD(w; {1, ..., n})‖∞ (0 < υ ≤ 1, D � 0n is diagonal,e.g., D = I or D = diag(H)). Gauss-Southwell-d

• qD(w; I) ≤ υ qD(w; {1, ..., n}). Gauss-Southwell-q

Page 19: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 18

Convergence Results : (a) If

• 0 < λ ≤ λi(D), λi(H) ≤ λ ∀i,

• α is chosen by Armijo rule,

• I is chosen by G-Seidel or G-Southwell-d or G-Southwell-q,

then every cluster point of the w-sequence generated by BCGD method is astationary point of Fc.

(b) If in addition P and f satisfy any of the following assumptions, then thew-sequence converges at R-linear rate (excepting G-Southwell-d).

C1 f is strongly convex, ∇f is Lipschitz cont. on domP .

C2 f is (nonconvex) quadratic. P is polyhedral.

C3 f(w) = g(Ew) + qTw, where E ∈ <m×N , q ∈ <N , g is strongly convex, ∇gis Lipschitz cont. on <m. P is polyhedral.

Page 20: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 19

C4 f(w) = maxy∈Y {(Ew)Ty − g(y)}+ qTw, where Y ⊆ <m is polyhedral,E ∈ <m×N , q ∈ <N , g is strongly convex, ∇g is Lipschitz cont. on <m. P ispolyhedral.

Notes:

• BCGD has stronger global convergence property (and cheaper iteration)than BCM.

• Proof of (b) uses a local error bound on dist (w, {stat. pts. of Fc}).

Page 21: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20

Numerical Testing (ongoing):

• Implement BCGD method in Matlab.

• Numerical tests with f from More-Garbow-Hillstrom set and CUTEr set(Gould, Orban, Toint ’05), P (w) = ‖w‖1, and different c (e.g., c = .1, 1, 10).

• Comparison with MINOS 5.5.1 (Murtagh, Saunders ’05), a Fortranimplementation of an active-set method, applied to a reformulation of P3 withP (w) = ‖w‖1 as

minw+≥0

w−≥0

f(w+ − w−) + c eT (w+ + w−).

• Preliminary results are “promising”.

Page 22: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 21

f(w) n DescriptionBAL 1000 Brown almost-linear func, nonconvex, dense Hessian.BT 1000 Broyden tridiagonal func, nonconvex, sparse Hessian.

DBV 1000 Discrete boundary value func, nonconvex, sparse Hessian.EPS 1000 Extended Powell singular func, convex, 4-block diag. Hessian.ER 1000 Extended Rosenbrook func, nonconvex, 2-block diag. Hessian.

QD1 1000 f(w) =

(n∑

i=1

wi − 1

)2

, convex, quad., rank-1 Hessian.

QD2 1000 f(w) =n∑

i=1

wi −2

n + 1

n∑j=1

wj − 1

2

+

2

n + 1

n∑j=1

wj + 1

2

,

strongly convex, quad., dense Hessian.

VD 1000 f(w) =n∑

i=1

(wi − 1)2

+

n∑j=1

j(wj − 1)

2

+

n∑j=1

j(wj − 1)

4

,

strongly convex, dense ill-conditioned Hessian.

Table 1: Least square problems from More, Garbow, Hillstrom, 1981

Page 23: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 22

MINOS BCGD- BCGD-G-Southwell- d G-Southwell- q

f(w) c ]nz/objec/cpu ]nz/objec/cpu ]nz/objec/cpu

BAL 1 1000/1000/43.9 1000/1000/.1 1000/1000/.110 1000/9999.9/43.9 1000/9999.9/.1 1000/9999.9/.1

100 1000/99997.5/44.3 1000/99997.5/.1 1000/99997.5/.2

BT .1 1000/71.725/134.4 999/71.394/4.5 999/71.394/5.01 999/672.41/95.3 21/672.70/292.6 995/991.06/1.3(?)

10 0/1000/77.7 0/1000/.01 0/1000/.01

DBV .1 0/0/52.7 0/4.5E-9/.1 0/4.5E-9/.041 0/0/52.9 0/4.5E-9/.1 0/4.5E-9/.04

10 0/0/53.0 0/4.5E-9/.01 0/4.5E-9/.01

EPS 1 1000/351.14/58.5 500/351.14/.3 500/351.14/.310 243/1250/45.7 250/1250/.05 250/1250/.05

100 0/1250/50.7 0/1250/.01 0/1250/.02

ER 1 1000/436.25/72.0 1000/436.25/.5 1000/436.25/.410 0/500/51.5 0/500/.1 0/500/.01

100 0/500/52.4 0/500/.03 0/500/.01

QD1 .1 1000/.0975/29.9 1/.0975/.01 1/.0975/.021 1000/.75/37.8 1/.75/.01 1/.75/.01

10 0/1/38.6 0/1/.01 0/1/.01

QD2 .1 1000/98.5/74.2 0/98.5/.01 0/98.5/.031 1000/751/75.8 0/751/.01 0/751/.02

10 0/1001/53.1 0/1001/.01 0/1001/.01

VD 1 1000/937.59/43.9 1000/937.66/856.3 1000/937.66/869.010 413/6726.80/57.1 1000/6746.74/235.7 999/6746.74/246.9

100 136/55043/57.8 1000/55078/12.6 1000/55078/13.3

Table 2: Performance of MINOS, BCGD-Gauss-Southwell-d/q, with winit

= (1, 1, ..., 1)

Page 24: Optimization Methods with Signal Denoising Applicationsdimitrib/PTseng/denoise06_talk.pdf · OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 20 Numerical Testing (ongoing):

OPTIMIZATION METHODS WITH SIGNAL DENOISING APPLICATIONS 23

Conclusions & Future Work

1. For ML estimation, `1-penalty imparts parsimony in the coefficients andavoid oversmoothing the signals.

2. The resulting estimation problem can be solved effectively by IP method orBCM method, exploiting the problem structure, including nondiffer. of`1-norm. Which to use? Depends on problem.

3. Problem reformulation may be needed.

4. For general problem model, we propose BCGD method. Numerical testingis ongoing.

5. Applications to denoising, regression, SVM?