Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

28
Newton’s Method applied to a scalar function Newton’s method for minimizing f(x): Twice differentiable function f(x), initial solution x 0 . Generate a sequence of solutions x 1 , x 2 , …and stop if the sequence converges to a solution with f(x)=0. 1.Solve -f(x k ) ≈ 2 f(x k )x 2. Let x k+1 =x k +x. 3. let k=k+1

description

Newton’s Method applied to a scalar function Newton’s method for minimizing f(x): Twice differentiable function f(x), initial solution x 0 . Generate a sequence of solutions x 1 , x 2 , …and stop if the sequence converges to a solution with  f(x)=0. Solve -  f(x k ) ≈  2 f(x k )  x - PowerPoint PPT Presentation

Transcript of Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Page 1: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Newton’s Method applied to a scalar function

Newton’s method for minimizing f(x):

Twice differentiable function f(x), initial solution x0. Generate a sequence of solutions x1, x2, …and stop if the sequence converges to a solution with f(x)=0.

1. Solve -f(xk) ≈ 2f(xk)x

2. Let xk+1=xk+x. 3. let k=k+1

Page 2: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Newton’s Method applied to LS

Not directly applicable to most nonlinear regression and inverse problems (not equal # of model parameters and data points, no exact solution to G(m)=d). Instead we will use N.M. to minimize a nonlinear LS problem, e.g. fit a vector of n parameters to a data vector d.

f(m)=∑ [(G(m)i-di)/i]2

Let fi(m)=(G(m)i-di)/i i=1,2,…,m, F(m)=[f1(m) … fm(m)]T

So that f(m)= ∑ fi(m)2 f(m)=∑ fi(m)2]

m

i=1

m

i=1

m

i=1

Page 3: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

NM: Solve -f(mk) ≈ 2f(mk)m

LHS:f(mk)j = -∑ 2fi(mk)jF(m)j = -2 J(mk)F(mk)

RHS: 2f(mk)m = [2J(m)TJ(m)+Q(m)]m, where

Q(m)= 2 ∑ fi(m) fi(m)

-2 J(mk)F(mk) = 2 H(m) m

m = -H-1 J(mk)F(mk) =

-H-1 f(mk) (eq. 9.19)

H(m)= 2J(m)TJ(m)+Q(m)

fi(m)=(G(m)i-di)/i i=1,2,…,m,

F(m)=[f1(m) … fm(m)]T

Page 4: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Gauss-Newton (GN) method

2f(mk)m = H(m) m = [2J(mk)TJ(mk)+Q(m)]m

ignores Q(m)=2∑ fi(m) fi(m) :

f(m)≈2J(m)TJ(m), assuming fi(m) will be reasonably small as we approach m*. That is,

Solve -f(xk) ≈ 2f(xk)x

f(m)j=∑ 2fi(m)jF(m)j, i.e.

J(mk)TJ(mk)m=-J(mk)TF(mk)

fi(m)=(G(m)i-di)/i i=1,2,…,m,

F(m)=[f1(m) … fm(m)]T

Page 5: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Newton’s Method applied to LS

Levenberg-Marquardt (LM) method uses

[J(mk)TJ(mk)+I]m=-J(mk)TF(mk)

->0 : GN

->large, steepest descent (SD) (down-gradient most rapidly). SD provides slow but certain convergence.

Which value of to use? Small values when GN is working well, switch to larger values in problem areas. Start with small value of , then adjust.

Page 6: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Steepest descent (SD)

Problem of minimizing a quadratic function

f(x) = 0.5 xTAx − bTx, (equivalent to solving Ax=b)

where b is in Rn, A is an n x n symmetric positive definite matrix. The gradient of f(x) is Ax-b, i.e.

xk+1=xk-αk(Axk-b)

where αk is chosen to minimize f(x) along the direction of the negative gradient.

Page 7: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

0)( ii rxfd

d

0)( iT

ii rrxf

0))(( iT

ii rbrxA

iT

r

iiT

i rAxbrAr

i

)()(

iTi

iTi

Arr

rr

Thus for the quadratic case:

Page 8: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Alternative: line search

Page 9: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Example of slow convergence for SD

Rosenbrock function: f(x1,x2) = (1 – x1)2 + 100 (x2 – x12)2

Page 10: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Statistics of iterative methods

Cov(Ad)=A Cov(d) AT (d has multivariate N.D.)

Cov(mL2)=(GTG)-1GT Cov(d) G(GTG)-1

Cov(d)=2I: Cov(mL2)=2(GTG)-1

However, we don’t have a linear relationship between data and estimated model parameters for the nonlinear regression, so cannot use these formulas. Instead:

F(m*+m)≈F(m*)+J(m*)m

Cov(m*)≈(J(m*)TJ(m*))-1 not exact due to linearization, confidence intervals may not be accurate

ri=G(m*)i-di , i=1

s=[∑ ri2/(m-n)]

Cov(m*)=s2(J(m*)TJ(m*))-1 establish confidence intervals, 2

Page 11: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Implementation Issues

1. Explicit (analytical) expressions for derivatives

2. Finite difference approximation for derivatives

3. When to stop iterating?

f(m)~0

||mk+1-mk||~0, |f(m)k+1-f(m)k|~0

eqs 9.47-9.49

4. Multistart method to optimize globally

Page 12: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Iterative Methods

SVD impractical when matrix has 1000’s of rows and columns

e.g., 256 x 256 cell tomography model, 100,000 ray paths, < 1% ray hits

50 Gb storage of system matrix, U ~ 80 Gb, V ~ 35 Gb

Waste of storage when system matrix is sparse

Iterative methods do not store the system matrix

Provide approximate solution, rather than exact using Gaussian elimination

Definition of iterative method:

Starting point x0, do steps x1, x2, …

Hopefully converge toward right solution x

Page 13: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Iterative Methods

Kaczmarz’s algorithm:

Each of m rows of Gi.m = di define an n-dimensional hyperplane in Rm

1) Project initial m(0) solution onto hyperplane defined by first row of G

2) Project m(1) solution onto hyperplane defined by second row of G

3) … until projections have been done onto all hyperplanes

4) Start new cycle of projections until converge

If Gm=d has a unique solution, Kaczmarz’s algorithm will converge towards this

If several solutions, it will converge toward the solution closest to m(0)

If m(0)=0, we obtain the minimum length solution

If no exact solution, converge fails, bouncing around near approximate solution

If hyperplanes are nearly orthogonal, convergence is fast

If hyperplanes are nearly parallel, convergence is slow

Page 14: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Let Gi. be the ith row of G

Consider the hyperplane defined by Gi+1. m = di+1. Since GTi+1. is perpendicular to

this hyperplane, the update to m(i) from the constraint due to row i+1 of G will be proportional to GT

i+1.

m(i+1) = m(i)+βGTi+1.

Since Gi+1. m(i+1) = di+1 then

Gi+1. (m(i) +βGTi+1. ) = di+1

Gi+1. m(i) - di+1 = - βGi+1. GTi+1.

β = [Gi+1. m(i) - di+1 ] / [Gi+1. GTi+1. ]

Kaczmarz’s algorithm:

Page 15: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

1) Let m(0) = 0

2) For i=0,1,…,m, let

m(i+1) = m(i) - GTi+1. [Gi+1. m(i) - di+1 ] / [Gi+1. GT

i+1. ]

3) If the solution has not yet converged, go back to step 1

Kaczmarz’s algorithm:

Page 16: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Kaczmarz’s Method

y=x-1

y=1

y

x

Page 17: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Specialized for tomography

Kaczmarz: m(i+1) = m(i) - GTi+1. [Gi+1. m(i) - di+1 ] / [Gi+1. GT

i+1. ]

Thus always adding a multiple of a row of G to the current solution

Gi+1. m(i) - di+1 ] / [Gi+1. GTi+1. is a normalized error in equation for i+1, and the

correction is spread over the elements of m appearing in equation i+1

Often used approximation in ART is to replace all non-zero elements in row i+1 of G with ones, thus smearing the needed correction in traveltime equally over all cells in ray path i+1

SIRT Algorithm:

ART with higher accuracy, each cell updated separately

ART algorithm:

Page 18: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

function x=kac(A,b,tolx,maxiter)[m,n]=size(A); AP=A’; x=zeros(n,1); iter=0; n2=zeros(m,1);for i=1:m n2(i)=norm(AP(:,i),2)^2;endwhile (iter <= maxiter)iter=iter+1;newx=x;for i=1:m newx=newx-((newx'*AP(:,i)-b(i))/(n2(i)))*AP(:,i); endif (norm(newx-x)/(1+norm(x)) < tolx) x=newx; return; endx=newx;end

disp('Max iterations exceeded.'); return;

Kaczmarz’s Algorithm

Page 19: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

function x=sirt(A,b,tolx,maxiter)alpha=1.0; [m,n]=size(A); alpha=1.0; A1=(A>0); AP=A’; A1P=A1';x=zeros(n,1); iter=0; N=zeros(m,1); L=zeros(m,1); NRAYS=zeros(n,1);for i=1:m N(i)=sum(A1P(:,i)); L(i)=sum(AP(:,i));endfor i=1:n NRAYS(i)=sum(A1(:,i));endwhile (1==1) iter=iter+1; if (iter > maxiter) disp('Max iterations exceeded.'); return; end newx=x; deltax=zeros(n,1); for i=1:m, q=A1P(:,i)'*newx; delta=b(i)/L(i)-q/N(i); deltax=deltax+delta*A1P(:,i); end newx=newx+alpha*deltax./NRAYS; if (norm(newx-x)/(1+norm(x)) < tolx) return; end; x=newx;end

SIRT Algorithm

Page 20: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

function x=art(A,b,tolx,maxiter)alpha=1.0; [m,n]=size(A); A1=(A>0); AP=A’; A1P=A1’; x=zeros(n,1);iter=0; N=zeros(m,1); L=zeros(m,1);for i=1:m N(i)=sum(A1(i,:)); L(i)=sum(A(i,:));endwhile (1==1) iter=iter+1; if (iter > maxiter) disp('Max iterations exceeded.'); x=newx; return; end newx=x; for i=1:m q=A1P(:,i)'*newx; delta=b(i)/L(i)-q/N(i); newx=newx+alpha*delta*A1P(:,i); end if (norm(newx-x)/(1+norm(x)) < tolx) x=newx; return; end x=newx;end

ART Algorithm

Page 21: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

True Model Reconstruction From Kaczmarz, ART, SIRT (all similar)

Page 22: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Conjugate Gradients Method

Symmetric, positive definite system of equations Ax=b

min (x) = min(1/2 xTAx - bTx)

(x) = Ax - b = 0 or Ax = b

CG method: construct basis p0, p1, … , pn-1 such that piTApj=0

when i≠j. Such basis is mutually conjugate w/r to A. Only walk once in each direction and minimize

x = ∑ ipi - maximum of n steps required!

() = 1/2 [ ∑ ipi]T A ∑ ipi - bT [∑ ipi ]

= 1/2 ∑ ∑ ijpiTA pj - bT [∑ ipi

] (summations 0->n-1)

n-1

i=0

Page 23: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Conjugate Gradients Method

piTApj=0 when i≠j. Such basis is mutually conjugate w/r to A.

(pi are said to be ‘A orthogonal’)

() = 1/2 ∑ ∑ ijpiTA pj - bT [∑ ipi

]

= 1/2 ∑ ipi

TA pi - bT [∑ ipi ]

= 1/2 (∑ ipi

TA pi - 2 ibT pi) - n independent terms

Thus min () by minimizing ith term ipi

TA pi - 2 ibT pi

->diff w/r i and set derivative to zero: i = bTpi / pi

TA pi

i.e., IF we have mutually conjugate basis, it is easy to minimize ()

Page 24: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Conjugate Gradients Method

CG constructs sequence of xi, ri=b-Axi, pi

Start: x0=0, r0=b, p0=r0, 0=r0Tr0/p0

TAp0

Assume at kth iteration, we have x0, x1, …, xk; r0, r1, …, rk;

p0, p1, …, pk; 0, 1, …, k

Assume first k+1 basis vectors pi are mutually conjugate to A, first k+1 ri are mutually orthogonal, and ri

Tpj=0 for i ≠j

Let xk+1= xk+kpk

and rk+1= rk-kApk which updates correctly, since:

rk+1 = b - Axk+1 = b - A(xk+kpk) = (b-Axk) - kApk = rk-kApk

Page 25: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Conjugate Gradients Method

Let xk+1= xk+kpk

and rk+1= rk-kApk

k+1 = rk+1Trk+1/rk

Trk

pk+1= rk+1+k+1pk

bTpk=rkTrk (eq 6.33-6.38)

Now we need proof of the assumptions

• rk+1 is orthogonal to ri for i≤k (eq 6.39-6.43)

• rk+1Trk=0 (eq 6.44-6.48)

• rk+1 is orthogonal to pi for i≤k (eq 6.49-6.54)

• pk+1TApi = 0 for i≤k (eq 6.55-6.60)

• i=k: pk+1TApk = 0 ie CG generates mutually conjugate basis

Page 26: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Conjugate Gradients Method

Thus shown that CG generates a sequence of mutually conjugate basis vectors. In theory, the method will find an exact solution in n iterations.

Given positive definite, symmetric system of eqs Ax=b, initial solution x0, let 0=0, p-1=0,r0=b-Ax0, k=0

• If k>0, let k = rkTrk/rk-1

Trk-1

• Let pk= rk+kpk-1

• Let k = rkTrk / pk

TA pk

4. Let xk+1= xk+kpk

5. Let rk+1= rk-kApk

6. Let k=k+1

Page 27: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Conjugate Gradients Least Squares Method

CG can only be applied to positive definite systems of equations, thus not applicable to general LS problems. Instead, we can apply the CGLS method to

min ||Gm-d||2

GTGm=GTd

Page 28: Newton’s Method applied to a scalar function Newton’s method for minimizing f(x):

Conjugate Gradients Least Squares Method

GTGm=GTd

rk = GTd-GTGmk = GT(d-Gmk) = GTsk

sk+1=d-Gmk+1=d-G(mk+kpk)=(d-Gmk)-kGpk=sk-kGpk

Given a system of eqs Gm=d, k=0, m0=0, p-1=0, 0=0, r0=GTs0.

• If k>0, let k = rkTrk/[rk-1

Trk-1]

• Let pk= rk+kpk-1

• Let k = rkTrk / [Gpk]T[G pk]

4. Let mk+1= mk+kpk

5. Let sk+1=sk-kGpk

6. Let rk+1= rk-kGpk

7. Let k=k+1; never computing GTG, only Gpk, GTsk+1