Local vs Global Optimization Optimization Problems ...

79
Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Outline 1 Optimization Problems 2 One-Dimensional Optimization 3 Multi-Dimensional Optimization Michael T. Heath Scientific Computing 2 / 74

Transcript of Local vs Global Optimization Optimization Problems ...

Page 1: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Outline

1 Optimization Problems

2 One-Dimensional Optimization

3 Multi-Dimensional Optimization

Michael T. Heath Scientific Computing 2 / 74

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Local vs Global Optimization

x⇤ 2 S is global minimum if f(x⇤) f(x) for all x 2 S

x⇤ 2 S is local minimum if f(x⇤) f(x) for all feasible x in

some neighborhood of x⇤

Michael T. Heath Scientific Computing 6 / 74

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Local vs Global Optimization

x⇤ 2 S is global minimum if f(x⇤) f(x) for all x 2 S

x⇤ 2 S is local minimum if f(x⇤) f(x) for all feasible x in

some neighborhood of x⇤

Michael T. Heath Scientific Computing 6 / 74

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Local vs Global Optimization

x⇤ 2 S is global minimum if f(x⇤) f(x) for all x 2 S

x⇤ 2 S is local minimum if f(x⇤) f(x) for all feasible x in

some neighborhood of x⇤

Michael T. Heath Scientific Computing 6 / 74

Page 2: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Optimization

Given function f : Rn ! R, and set S ✓ Rn, find x⇤ 2 S

such that f(x⇤) f(x) for all x 2 S

x⇤ is called minimizer or minimum of f

It suffices to consider only minimization, since maximum off is minimum of �f

Objective function f is usually differentiable, and may belinear or nonlinear

Constraint set S is defined by system of equations andinequalities, which may be linear or nonlinear

Points x 2 S are called feasible points

If S = Rn, problem is unconstrained

Michael T. Heath Scientific Computing 3 / 74

Page 3: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Optimization Problems

General continuous optimization problem:

min f(x) subject to g(x) = 0 and h(x) 0

where f : Rn ! R, g : Rn ! Rm, h : Rn ! Rp

Linear programming : f , g, and h are all linear

Nonlinear programming : at least one of f , g, and h isnonlinear

Michael T. Heath Scientific Computing 4 / 74

Page 4: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Examples: Optimization Problems

Minimize weight of structure subject to constraint on itsstrength, or maximize its strength subject to constraint onits weight

Minimize cost of diet subject to nutritional constraints

Minimize surface area of cylinder subject to constraint onits volume:

min

x1,x2f(x1, x2) = 2⇡x1(x1 + x2)

subject to g(x1, x2) = ⇡x

21x2 � V = 0

where x1 and x2 are radius and height of cylinder, and V isrequired volume

Michael T. Heath Scientific Computing 5 / 74

Page 5: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Local vs Global Optimization

x⇤ 2 S is global minimum if f(x⇤) f(x) for all x 2 S

x⇤ 2 S is local minimum if f(x⇤) f(x) for all feasible x in

some neighborhood of x⇤

Michael T. Heath Scientific Computing 6 / 74

Page 6: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Global Optimization

Finding, or even verifying, global minimum is difficult, ingeneral

Most optimization methods are designed to find localminimum, which may or may not be global minimum

If global minimum is desired, one can try several widelyseparated starting points and see if all produce sameresult

For some problems, such as linear programming, globaloptimization is more tractable

Michael T. Heath Scientific Computing 7 / 74

Page 7: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Existence of Minimum

If f is continuous on closed and bounded set S ✓ Rn, thenf has global minimum on S

If S is not closed or is unbounded, then f may have nolocal or global minimum on S

Continuous function f on unbounded set S ✓ Rn iscoercive if

lim

kxk!1f(x) = +1

i.e., f(x) must be large whenever kxk is large

If f is coercive on closed, unbounded set S ✓ Rn, then f

has global minimum on S

Michael T. Heath Scientific Computing 8 / 74

Page 8: Local vs Global Optimization Optimization Problems ...

An Example of a Coercive Function

Level Sets (contours, isosurfaces)

Goes to 1 as ||x||! 1

Page 9: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Level Sets

Level set for function f : S ✓ Rn ! R is set of all points inS for which f has some given constant value

For given � 2 R, sublevel set is

L

= {x 2 S : f(x) �}

If continuous function f on S ✓ Rn has nonempty sublevelset that is closed and bounded, then f has global minimumon S

If S is unbounded, then f is coercive on S if, and only if, allof its sublevel sets are bounded

Michael T. Heath Scientific Computing 9 / 74

(No open contours…)

Page 10: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Uniqueness of Minimum

Set S ✓ Rn is convex if it contains line segment betweenany two of its points

Function f : S ✓ Rn ! R is convex on convex set S if itsgraph along any line segment in S lies on or below chordconnecting function values at endpoints of segment

Any local minimum of convex function f on convex setS ✓ Rn is global minimum of f on S

Any local minimum of strictly convex function f on convexset S ✓ Rn is unique global minimum of f on S

Michael T. Heath Scientific Computing 10 / 74

Page 11: Local vs Global Optimization Optimization Problems ...

Convex Domains and Convex Functions

Page 12: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

First-Order Optimality Condition

For function of one variable, one can find extremum bydifferentiating function and setting derivative to zero

Generalization to function of n variables is to find criticalpoint, i.e., solution of nonlinear system

rf(x) = 0

where rf(x) is gradient vector of f , whose ith componentis @f(x)/@x

i

For continuously differentiable f : S ✓ Rn ! R, any interiorpoint x⇤ of S at which f has local minimum must be criticalpoint of f

But not all critical points are minima: they can also bemaxima or saddle points

Michael T. Heath Scientific Computing 11 / 74

Page 13: Local vs Global Optimization Optimization Problems ...

•  Saddle points in higher dimensions are more common than zero-slope inflection points in 1D

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

First-Order Optimality Condition

For function of one variable, one can find extremum bydifferentiating function and setting derivative to zero

Generalization to function of n variables is to find criticalpoint, i.e., solution of nonlinear system

rf(x) = 0

where rf(x) is gradient vector of f , whose ith componentis @f(x)/@x

i

For continuously differentiable f : S ✓ Rn ! R, any interiorpoint x⇤ of S at which f has local minimum must be criticalpoint of f

But not all critical points are minima: they can also bemaxima or saddle points

Michael T. Heath Scientific Computing 11 / 74

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

First-Order Optimality Condition

For function of one variable, one can find extremum bydifferentiating function and setting derivative to zero

Generalization to function of n variables is to find criticalpoint, i.e., solution of nonlinear system

rf(x) = 0

where rf(x) is gradient vector of f , whose ith componentis @f(x)/@x

i

For continuously differentiable f : S ✓ Rn ! R, any interiorpoint x⇤ of S at which f has local minimum must be criticalpoint of f

But not all critical points are minima: they can also bemaxima or saddle points

Michael T. Heath Scientific Computing 11 / 74

Note the open contours….

Page 14: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

First-Order Optimality Condition

For function of one variable, one can find extremum bydifferentiating function and setting derivative to zero

Generalization to function of n variables is to find criticalpoint, i.e., solution of nonlinear system

rf(x) = 0

where rf(x) is gradient vector of f , whose ith componentis @f(x)/@x

i

For continuously differentiable f : S ✓ Rn ! R, any interiorpoint x⇤ of S at which f has local minimum must be criticalpoint of f

But not all critical points are minima: they can also bemaxima or saddle points

Michael T. Heath Scientific Computing 11 / 74

•  We use the solution techniques of Chapter 5 to solve this nonlinear system.

Page 15: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Second-Order Optimality Condition

For twice continuously differentiable f : S ✓ Rn ! R, wecan distinguish among critical points by consideringHessian matrix H

f

(x) defined by

{Hf

(x)}ij

=

@

2f(x)

@x

i

@x

j

which is symmetric

At critical point x⇤, if Hf

(x⇤) is

positive definite, then x⇤ is minimum of fnegative definite, then x⇤ is maximum of findefinite, then x⇤ is saddle point of fsingular, then various pathological situations are possible

Michael T. Heath Scientific Computing 12 / 74

Page 16: Local vs Global Optimization Optimization Problems ...

Classifying Critical Points

• Example 6.5 from the text:

f(x) = 2x31 + 3x21 + 12x1x2 + 3x22 � 6x2 + 6.

• Gradient:

rf(x) =

6x21 + 6x1 + 12x212x1 + 6x2 � 6

�.

• Hessian is symmetric:

Hf

=

"@

2f

@x1@x1

@

2f

@x1@x2

@

2f

@x1@x2

@

2f

@x2@x2

#=

12x1 + 6 12

12 6

�.

• Critical points, solutions to rf(x) = 0 are

x⇤ = [1,�1]T , and

x⇤ = [2,�3]T .

• At first critical point, x⇤ = [1,�1]T ,

Hf

(1,�1) =

18 1212 6

�,

does not have positive eigenvalues, so [1,�1]T is a saddle point.

• At second critical point, x⇤ = [2,�3]T ,

Hf

(2,�3) =

30 1212 6

�,

does have positive eigenvalues, so [2,�3]T is a local minimum.

• First critical point, x⇤ = [1,�1]T , is a saddle point.

• Second critical point, x⇤ = [2,�3]T , is a local minimum.

Page 17: Local vs Global Optimization Optimization Problems ...

Classifying Critical Points

• Example 6.5 from the text:

f(x) = 2x31 + 3x21 + 12x1x2 + 3x22 � 6x2 + 6.

• Gradient:

r(x) =

6x21 + 6x1 + 12x212x1 + 6x2 � 6

�.

• Hessian is symmetric:

Hf =

12x1 + 6 12

12 6

�.

• Critical points, solutions to rf(x) = 0 are

x⇤ = [1,�1]T , and

x⇤ = [2,�3]T .

• At first critical point, x⇤ = [1,�1]T ,

Hf(1,�1) =

18 1212 6

�,

does not have positive eigenvalues, so [1,�1]T is a saddle point.

• At second critical point, x⇤ = [2,�3]T ,

Hf(2,�3) =

30 1212 6

�,

does have positive eigenvalues, so [2,�3]T is a local minimum.

Page 18: Local vs Global Optimization Optimization Problems ...

Example 6.5 from text

Classifying Critical Points

• Example 6.5 from the text:

f(x) = 2x31 + 3x21 + 12x1x2 + 3x22 � 6x2 + 6.

• Gradient:

r(x) =

6x21 + 6x1 + 12x212x1 + 6x2 � 6

�.

• Hessian is symmetric:

Hf =

12x1 + 6 12

12 6

�.

• Critical points, solutions to rf(x) = 0 are

x⇤ = [1,�1]T , and

x⇤ = [2,�3]T .

• At first critical point, x⇤ = [1,�1]T ,

Hf(1,�1) =

18 1212 6

�,

does not have positive eigenvalues, so [1,�1]T is a saddle point.

• At second critical point, x⇤ = [2,�3]T ,

Hf(2,�3) =

30 1212 6

�,

does have positive eigenvalues, so [2,�3]T is a local minimum.

Page 19: Local vs Global Optimization Optimization Problems ...

Example 6.5 from text

Classifying Critical Points

• Example 6.5 from the text:

f(x) = 2x31 + 3x21 + 12x1x2 + 3x22 � 6x2 + 6.

• Gradient:

r(x) =

6x21 + 6x1 + 12x212x1 + 6x2 � 6

�.

• Hessian is symmetric:

Hf =

12x1 + 6 12

12 6

�.

• Critical points, solutions to rf(x) = 0 are

x⇤ = [1,�1]T , and

x⇤ = [2,�3]T .

• At first critical point, x⇤ = [1,�1]T ,

Hf(1,�1) =

18 1212 6

�,

does not have positive eigenvalues, so [1,�1]T is a saddle point.

• At second critical point, x⇤ = [2,�3]T ,

Hf(2,�3) =

30 1212 6

�,

does have positive eigenvalues, so [2,�3]T is a local minimum.

Classifying Critical Points

• Example 6.5 from the text:

f(x) = 2x31 + 3x21 + 12x1x2 + 3x22 � 6x2 + 6.

• Gradient:

rf(x) =

6x21 + 6x1 + 12x212x1 + 6x2 � 6

�.

• Hessian is symmetric:

Hf

=

"@

2f

@x1@x1

@

2f

@x1@x2

@

2f

@x1@x2

@

2f

@x2@x2

#=

12x1 + 6 12

12 6

�.

• Critical points, solutions to rf(x) = 0 are

x⇤ = [1,�1]T , and

x⇤ = [2,�3]T .

• At first critical point, x⇤ = [1,�1]T ,

Hf

(1,�1) =

18 1212 6

�,

does not have positive eigenvalues, so [1,�1]T is a saddle point.

• At second critical point, x⇤ = [2,�3]T ,

Hf

(2,�3) =

30 1212 6

�,

does have positive eigenvalues, so [2,�3]T is a local minimum.

• First critical point, x⇤ = [1,�1]T , is a saddle point. Hessian not SPD.

• Second critical point, x⇤ = [2,�3]T , is a local minimum. Hessian is SPD.

Page 20: Local vs Global Optimization Optimization Problems ...

Example 6.5 from text

Classifying Critical Points

• Example 6.5 from the text:

f(x) = 2x31 + 3x21 + 12x1x2 + 3x22 � 6x2 + 6.

• Gradient:

r(x) =

6x21 + 6x1 + 12x212x1 + 6x2 � 6

�.

• Hessian is symmetric:

Hf =

12x1 + 6 12

12 6

�.

• Critical points, solutions to rf(x) = 0 are

x⇤ = [1,�1]T , and

x⇤ = [2,�3]T .

• At first critical point, x⇤ = [1,�1]T ,

Hf(1,�1) =

18 1212 6

�,

does not have positive eigenvalues, so [1,�1]T is a saddle point.

• At second critical point, x⇤ = [2,�3]T ,

Hf(2,�3) =

30 1212 6

�,

does have positive eigenvalues, so [2,�3]T is a local minimum.

Classifying Critical Points

• Example 6.5 from the text:

f(x) = 2x31 + 3x21 + 12x1x2 + 3x22 � 6x2 + 6.

• Gradient:

rf(x) =

6x21 + 6x1 + 12x212x1 + 6x2 � 6

�.

• Hessian is symmetric:

Hf

=

"@

2f

@x1@x1

@

2f

@x1@x2

@

2f

@x1@x2

@

2f

@x2@x2

#=

12x1 + 6 12

12 6

�.

• Critical points, solutions to rf(x) = 0 are

x⇤ = [1,�1]T , and

x⇤ = [2,�3]T .

• At first critical point, x⇤ = [1,�1]T ,

Hf

(1,�1) =

18 1212 6

�,

does not have positive eigenvalues, so [1,�1]T is a saddle point.

• At second critical point, x⇤ = [2,�3]T ,

Hf

(2,�3) =

30 1212 6

�,

does have positive eigenvalues, so [2,�3]T is a local minimum.

• First critical point, x⇤ = [1,�1]T , is a saddle point. Hessian not SPD.

• Second critical point, x⇤ = [2,�3]T , is a local minimum. Hessian is SPD.

Page 21: Local vs Global Optimization Optimization Problems ...

Example 6.5 from text

Classifying Critical Points

• Example 6.5 from the text:

f(x) = 2x31 + 3x21 + 12x1x2 + 3x22 � 6x2 + 6.

• Gradient:

r(x) =

6x21 + 6x1 + 12x212x1 + 6x2 � 6

�.

• Hessian is symmetric:

Hf =

12x1 + 6 12

12 6

�.

• Critical points, solutions to rf(x) = 0 are

x⇤ = [1,�1]T , and

x⇤ = [2,�3]T .

• At first critical point, x⇤ = [1,�1]T ,

Hf(1,�1) =

18 1212 6

�,

does not have positive eigenvalues, so [1,�1]T is a saddle point.

• At second critical point, x⇤ = [2,�3]T ,

Hf(2,�3) =

30 1212 6

�,

does have positive eigenvalues, so [2,�3]T is a local minimum.

Classifying Critical Points

• Example 6.5 from the text:

f(x) = 2x31 + 3x21 + 12x1x2 + 3x22 � 6x2 + 6.

• Gradient:

rf(x) =

6x21 + 6x1 + 12x212x1 + 6x2 � 6

�.

• Hessian is symmetric:

Hf

=

"@

2f

@x1@x1

@

2f

@x1@x2

@

2f

@x1@x2

@

2f

@x2@x2

#=

12x1 + 6 12

12 6

�.

• Critical points, solutions to rf(x) = 0 are

x⇤ = [1,�1]T , and

x⇤ = [2,�3]T .

• At first critical point, x⇤ = [1,�1]T ,

Hf

(1,�1) =

18 1212 6

�,

does not have positive eigenvalues, so [1,�1]T is a saddle point.

• At second critical point, x⇤ = [2,�3]T ,

Hf

(2,�3) =

30 1212 6

�,

does have positive eigenvalues, so [2,�3]T is a local minimum.

• First critical point, x⇤ = [1,�1]T , is a saddle point. Hessian not SPD.

• Second critical point, x⇤ = [2,�3]T , is a local minimum. Hessian is SPD.

Page 22: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Second-Order Optimality Condition

For twice continuously differentiable f : S ✓ Rn ! R, wecan distinguish among critical points by consideringHessian matrix H

f

(x) defined by

{Hf

(x)}ij

=

@

2f(x)

@x

i

@x

j

which is symmetric

At critical point x⇤, if Hf

(x⇤) is

positive definite, then x⇤ is minimum of fnegative definite, then x⇤ is maximum of findefinite, then x⇤ is saddle point of fsingular, then various pathological situations are possible

Michael T. Heath Scientific Computing 12 / 74

Page 23: Local vs Global Optimization Optimization Problems ...

Example of Singular Hessian in 1D

• Consider,

f(x) = (x� 1)4

rf =@f

@xi=

@f

@x= 4(x� 1)3 = 0 when x = x⇤ = 1.

Hij =@2f

@xixj=

@2f

@x2= 12(x� 1)2 = 0.

Here, the Hessian is singular at x = x⇤ and x⇤ is a minimizer of f .

• However, if

f(x) = (x� 1)3

rf =@f

@xi=

@f

@x= 3(x� 1)2 = 0 when x = x⇤ = 1.

Hij =@2f

@xixj=

@2f

@x2= 6(x� 1) = 0.

Here, the Hessian is singular at x = x⇤ and x⇤ is a not a minimizer of f .

Page 24: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Constrained OptimalityIf problem is constrained, only feasible directions arerelevantFor equality-constrained problem

min f(x) subject to g(x) = 0

where f : Rn ! R and g : Rn ! Rm, with m n, necessarycondition for feasible point x⇤ to be solution is that negativegradient of f lie in space spanned by constraint normals,

�rf(x⇤) = JT

g

(x⇤)�

where Jg

is Jacobian matrix of g, and � is vector ofLagrange multipliersThis condition says we cannot reduce objective functionwithout violating constraints

Michael T. Heath Scientific Computing 13 / 74

Page 25: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Constrained Optimality, continued

Lagrangian function L : Rn+m ! R, is defined by

L(x,�) = f(x) + �Tg(x)

Its gradient is given by

rL(x,�) =rf(x) + JT

g

(x)�g(x)

Its Hessian is given by

HL(x,�) =

B(x,�) JT

g

(x)Jg

(x) O

where

B(x,�) = Hf

(x) +mX

i=1

i

Hgi(x)

Michael T. Heath Scientific Computing 14 / 74

Page 26: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Constrained Optimality, continued

Together, necessary condition and feasibility imply criticalpoint of Lagrangian function,

rL(x,�) =rf(x) + JT

g

(x)�g(x)

�= 0

Hessian of Lagrangian is symmetric, but not positivedefinite, so critical point of L is saddle point rather thanminimum or maximum

Critical point (x⇤,�⇤

) of L is constrained minimum of f ifB(x⇤

,�⇤) is positive definite on null space of J

g

(x⇤)

If columns of Z form basis for null space, then testprojected Hessian ZTBZ for positive definiteness

Michael T. Heath Scientific Computing 15 / 74

Page 27: Local vs Global Optimization Optimization Problems ...

Equality Constraints: Case g(x)=scalar.

min f(x) subject to g(x) = 0.

• Lagrangian

L(x,�) := f(x) + � g(x)

� := Lagrange multiplier

rL =

@L@xi

@L@�

!=

rf + �rg

g

!=

0

0

!

at critical point (x⇤,�⇤).

• Note, rf(x⇤) 6= 0.

• Instead, rf(x⇤) = ��⇤ rg(x⇤).

• The gradient of f at x⇤ is a multiple of the gradient of g.

Page 28: Local vs Global Optimization Optimization Problems ...

Constrained Optimality Example, g(x) a Scalar.

❑  Comment on: “This condition says we cannot reduce objective function without violating constraints.”

❑  A key point is that, x*, rf is parallel to rg

❑  If there are multiple constraints, then rf in span{rgk} = JgT ¸

g(x) = 0

r g

-r f

Page 29: Local vs Global Optimization Optimization Problems ...

Constrained Optimality Example, g(x) a Scalar.

❑  Here, we see the gradients of f and g at a point that is not x*.

❑  Clearly, we can make more progess in reducing f(x) by moving along the g(x)=0 contour until rf is parallel to rg.

g(x) = 0

r g

-r f x*

Page 30: Local vs Global Optimization Optimization Problems ...

Example: Two Equality Constraints

min f(x) subject to g1(x) = 0 and g2(x) = 0.

• Lagrangian

L(x,�) := f(x) + �1 g1(x) + �2 g2(x).

• First-order conditions

rL =

@L@xi

@L@�k

!=

0

B@

@f

@xi+P

k

�k

@gk

@xi

0 +

✓g1g2

1

CA

=

rf + J T

g

g

!

Page 31: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Constrained Optimality, continued

Together, necessary condition and feasibility imply criticalpoint of Lagrangian function,

rL(x,�) =rf(x) + JT

g

(x)�g(x)

�= 0

Hessian of Lagrangian is symmetric, but not positivedefinite, so critical point of L is saddle point rather thanminimum or maximum

Critical point (x⇤,�⇤

) of L is constrained minimum of f ifB(x⇤

,�⇤) is positive definite on null space of J

g

(x⇤)

If columns of Z form basis for null space, then testprojected Hessian ZTBZ for positive definiteness

Michael T. Heath Scientific Computing 15 / 74

Page 32: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Constrained Optimality, continued

Together, necessary condition and feasibility imply criticalpoint of Lagrangian function,

rL(x,�) =rf(x) + JT

g

(x)�g(x)

�= 0

Hessian of Lagrangian is symmetric, but not positivedefinite, so critical point of L is saddle point rather thanminimum or maximum

Critical point (x⇤,�⇤

) of L is constrained minimum of f ifB(x⇤

,�⇤) is positive definite on null space of J

g

(x⇤)

If columns of Z form basis for null space, then testprojected Hessian ZTBZ for positive definiteness

Michael T. Heath Scientific Computing 15 / 74

•  Note: null space of Jg(x*) is the set of vectors tangent to the constraint surface (or surfaces, if more than one g present).

Page 33: Local vs Global Optimization Optimization Problems ...

Constrained Optimality Example, g(x) a Scalar.

❑  Comment on: “This condition says we cannot reduce objective function without violating constraints.”

❑  A key point is that, x*, rf is parallel to rg

❑  If there are multiple constraints, then rf in span{rgk} = JgT ¸

g(x) = 0

r g

-r f

Page 34: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Constrained Optimality, continued

If inequalities are present, then KKT optimality conditionsalso require nonnegativity of Lagrange multiplierscorresponding to inequalities, and complementaritycondition

Michael T. Heath Scientific Computing 16 / 74

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Constrained Optimality, continued

If inequalities are present, then KKT optimality conditionsalso require nonnegativity of Lagrange multiplierscorresponding to inequalities, and complementaritycondition

Michael T. Heath Scientific Computing 16 / 74

Karush-Kuhn-Tucker

rx

L(x⇤,�⇤) = 0, (1)

g(x

⇤) = 0, (2)

h(x

⇤) 0, (3)

⇤i

� 0, i = m+ 1, . . . ,m+ p inequality Lagrange multipliers (4)

⇤i

h

i

(x

⇤) = 0, i = m+ 1, . . . ,m+ p complementarity condition. (5)

1

Page 35: Local vs Global Optimization Optimization Problems ...

Inequality Constraint and Sign of ̧

❑  A key point is that, at x*, rf is parallel to rh

❑  If there are multiple constraints, then rf in span{rhk} = JhT ¸

h(x) = 0

-r h

-r f x*

x*

h(x) < 0

Page 36: Local vs Global Optimization Optimization Problems ...

Inequality Constraint and Sign of ̧

h(x) = 0

-r h

-r f x*

x*

Terminate search when

f(x⇤ +�x) � f(x⇤)| ✏.

Taylor series:

f(x⇤ +�x) = f(x⇤) + �f 0(x⇤) +�x2

2f 00(x⇤) + O(�x3)

�x2

2⇡ f(x⇤ +�x)� f(x⇤)

f 00(x⇤)

|�x| ⇡

s2✏

|f 00(x⇤)|

• So, if ✏ ⇡ ✏M , can expect accuracy to approximatelyp✏M .

• Question: Is a small f 00 good? Or bad?

• If constraint is active (h(x

⇤) = 0) then

rf and rh point in opposite directions.

• rf + �

⇤rh = 0.

• �

⇤> 0.

1

h(x) < 0

Page 37: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

DefinitionsExistence and UniquenessOptimality Conditions

Sensitivity and Conditioning

Function minimization and equation solving are closelyrelated problems, but their sensitivities differ

In one dimension, absolute condition number of root x⇤ ofequation f(x) = 0 is 1/|f 0

(x

⇤)|, so if |f(x̂)| ✏, then

|x̂� x

⇤| may be as large as ✏/|f 0(x

⇤)|

For minimizing f , Taylor series expansion

f(x̂) = f(x

⇤+ h)

= f(x

⇤) + f

0(x

⇤)h+

12 f

00(x

⇤)h

2+O(h

3)

shows that, since f

0(x

⇤) = 0, if |f(x̂)� f(x

⇤)| ✏, then

|x̂� x

⇤| may be as large asp

2✏/|f 00(x

⇤)|

Thus, based on function values alone, minima can becomputed to only about half precision

Michael T. Heath Scientific Computing 17 / 74

Page 38: Local vs Global Optimization Optimization Problems ...

Consider f=1-cos(x)

Page 39: Local vs Global Optimization Optimization Problems ...

Consider f=1-cos(x)

Page 40: Local vs Global Optimization Optimization Problems ...

Consider f=1-cos(x)

❑  As you zoom in, f(x)à0 quadratically, but xàx* only linearly.

❑  So, if |f(x)-f(x*)| ¼ ², then || x – x* || = O( ²1/2 )

Page 41: Local vs Global Optimization Optimization Problems ...

Consider f=1-cos(x)

❑  As you zoom in, f(x)à0 quadratically, but xàx* only linearly

❑  So, if |f(x)-f(x*)| ¼ ², then || x – x* || = O( ²1/2 )

± ²1/2 ²

Page 42: Local vs Global Optimization Optimization Problems ...

Sensitivity of Minimization

Terminate search when

|f(x⇤ +�x) � f(x⇤)| ✏.

Taylor series:

f(x⇤ +�x) = f(x⇤) + �xf 0(x⇤) +�x2

2f 00(x⇤) + O(�x3)

�x2

2⇡ f(x⇤ +�x)� f(x⇤)

f 00(x⇤)

|�x| ⇡

s2✏

|f 00(x⇤)|

• So, if ✏ ⇡ ✏M , can expect accuracy to approximatelyp✏M .

• Question: Is a small f 00 good? Or bad?

• If constraint is active (h(x

⇤) = 0) then

rf and rh point in opposite directions.

• rf + �

⇤rh = 0.

• �

⇤> 0.

1

Page 43: Local vs Global Optimization Optimization Problems ...

Example: Minimize Cost over Some Design Parameter, x

Cost

Design Parameter

Page 44: Local vs Global Optimization Optimization Problems ...

Example: Minimize Cost over Some Design Parameter, x

Cost

Design Parameter ❑  While having f’’(x*) makes it difficult to find the optimal x*, it in fact

is a happy circumstance because it gives you liberty to add additional constraints at no cost.

❑  So, often, having a broad minimum is good.

Page 45: Local vs Global Optimization Optimization Problems ...

Methods for One-Dimensional Problems

❑  Demonstrate ❑  Basic techniques ❑  Bracketing ❑  Convergence rates

❑  Useful for line search in multi-dimensional problems

Page 46: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method

Unimodality

For minimizing function of one variable, we need “bracket”for solution analogous to sign change for nonlinearequation

Real-valued function f is unimodal on interval [a, b] if thereis unique x

⇤ 2 [a, b] such that f(x⇤) is minimum of f on[a, b], and f is strictly decreasing for x x

⇤, strictlyincreasing for x⇤ x

Unimodality enables discarding portions of interval basedon sample function values, analogous to interval bisection

Michael T. Heath Scientific Computing 18 / 74

Page 47: Local vs Global Optimization Optimization Problems ...

Golden Section Search

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method

Golden Section Search, continued⌧ = (

p5� 1)/2

x1 = a+ (1� ⌧)(b� a); f1 = f(x1)

x2 = a+ ⌧(b� a); f2 = f(x2)

while ((b� a) > tol) do

if (f1 > f2) then

a = x1

x1 = x2

f1 = f2

x2 = a+ ⌧(b� a)

f2 = f(x2)

else

b = x2

x2 = x1

f2 = f1

x1 = a+ (1� ⌧)(b� a)

f1 = f(x1)

end

end

Michael T. Heath Scientific Computing 21 / 74

f1 > f2

Page 48: Local vs Global Optimization Optimization Problems ...

Golden Section Search

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method

Golden Section Search, continued⌧ = (

p5� 1)/2

x1 = a+ (1� ⌧)(b� a); f1 = f(x1)

x2 = a+ ⌧(b� a); f2 = f(x2)

while ((b� a) > tol) do

if (f1 > f2) then

a = x1

x1 = x2

f1 = f2

x2 = a+ ⌧(b� a)

f2 = f(x2)

else

b = x2

x2 = x1

f2 = f1

x1 = a+ (1� ⌧)(b� a)

f1 = f(x1)

end

end

Michael T. Heath Scientific Computing 21 / 74

f1 > f2

Discard

Page 49: Local vs Global Optimization Optimization Problems ...

Golden Section Search

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method

Golden Section Search, continued⌧ = (

p5� 1)/2

x1 = a+ (1� ⌧)(b� a); f1 = f(x1)

x2 = a+ ⌧(b� a); f2 = f(x2)

while ((b� a) > tol) do

if (f1 > f2) then

a = x1

x1 = x2

f1 = f2

x2 = a+ ⌧(b� a)

f2 = f(x2)

else

b = x2

x2 = x1

f2 = f1

x1 = a+ (1� ⌧)(b� a)

f1 = f(x1)

end

end

Michael T. Heath Scientific Computing 21 / 74

a x1 New x2

f1 > f2

Page 50: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method

Golden Section Search

Suppose f is unimodal on [a, b], and let x1 and x2 be twopoints within [a, b], with x1 < x2

Evaluating and comparing f(x1) and f(x2), we can discardeither (x2, b] or [a, x1), with minimum known to lie inremaining subinterval

To repeat process, we need compute only one newfunction evaluation

To reduce length of interval by fixed fraction at eachiteration, each new pair of points must have samerelationship with respect to new interval that previous pairhad with respect to previous interval

Michael T. Heath Scientific Computing 19 / 74

Page 51: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method

Golden Section Search, continued

To accomplish this, we choose relative positions of twopoints as ⌧ and 1� ⌧ , where ⌧

2= 1� ⌧ , so

⌧ = (

p5� 1)/2 ⇡ 0.618 and 1� ⌧ ⇡ 0.382

Whichever subinterval is retained, its length will be ⌧

relative to previous interval, and interior point retained willbe at position either ⌧ or 1� ⌧ relative to new interval

To continue iteration, we need to compute only one newfunction value, at complementary point

This choice of sample points is called golden sectionsearch

Golden section search is safe but convergence rate is onlylinear, with constant C ⇡ 0.618

Michael T. Heath Scientific Computing 20 / 74

Requires Unimodality!

Page 52: Local vs Global Optimization Optimization Problems ...

Golden Section Search

❑  f(x) unimodal on [a,b].

❑  Subdivide [a,b] into 3 parts.

❑  If f1 > f2 discard (a,x1)

❑  Choose new point in larger of remaining two intervals: (x1,x2) or (x2,b).

❑  à x1 and x2 should be closer to center and not at 1/3 , 2/3.

-

6

-• • • •a x1 x2 b

1

-

6

-• • • •a x1 x2 b

1

a’ x’1 x’2

f1 > f2

Page 53: Local vs Global Optimization Optimization Problems ...

Golden Section Geometry

❑  Want new section [1-¿, ¿] to have same relation as [0,1- ¿ ] to [0,1].

-

6

-

� -⌧• • • •0 0.382 0.618 1

1-⌧ ⌧

⌧ � (1� ⌧ )

⌧=

1 � ⌧

1

2⌧ � 1 = ⌧ � ⌧ 2

⌧ 2 + ⌧ � 1 = 0

⌧ =�1 +

p1 + 4

2=

p5� 1

2= 0.618

1

Page 54: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method

Golden Section Search, continued⌧ = (

p5� 1)/2

x1 = a+ (1� ⌧)(b� a); f1 = f(x1)

x2 = a+ ⌧(b� a); f2 = f(x2)

while ((b� a) > tol) do

if (f1 > f2) then

a = x1

x1 = x2

f1 = f2

x2 = a+ ⌧(b� a)

f2 = f(x2)

else

b = x2

x2 = x1

f2 = f1

x1 = a+ (1� ⌧)(b� a)

f1 = f(x1)

end

end

Michael T. Heath Scientific Computing 21 / 74

Page 55: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method

Example: Golden Section Search

Use golden section search to minimize

f(x) = 0.5� x exp(�x

2)

Michael T. Heath Scientific Computing 22 / 74

Page 56: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method

Example, continued

x1 f1 x2 f2

0.764 0.074 1.236 0.232

0.472 0.122 0.764 0.074

0.764 0.074 0.944 0.113

0.652 0.074 0.764 0.074

0.584 0.085 0.652 0.074

0.652 0.074 0.695 0.071

0.695 0.071 0.721 0.071

0.679 0.072 0.695 0.071

0.695 0.071 0.705 0.071

0.705 0.071 0.711 0.071

< interactive example >

Michael T. Heath Scientific Computing 23 / 74

Page 57: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method

Successive Parabolic InterpolationFit quadratic polynomial to three function valuesTake minimum of quadratic to be new approximation tominimum of function

New point replaces oldest of three previous points andprocess is repeated until convergenceConvergence rate of successive parabolic interpolation issuperlinear, with r ⇡ 1.324

Michael T. Heath Scientific Computing 24 / 74

Page 58: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method

Example: Successive Parabolic Interpolation

Use successive parabolic interpolation to minimize

f(x) = 0.5� x exp(�x

2)

Michael T. Heath Scientific Computing 25 / 74

Page 59: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method

Example, continued

x

k

f(x

k

)

0.000 0.500

0.600 0.081

1.200 0.216

0.754 0.073

0.721 0.071

0.692 0.071

0.707 0.071

< interactive example >

Michael T. Heath Scientific Computing 26 / 74

Not monotone Superlinear

Successive Parabolic Interpolation is Newton’s Method applied to a quadratic model of the data. r=1.324 We turn to Newton’s method next, r=2.

Page 60: Local vs Global Optimization Optimization Problems ...

Matlab: Successive Parabolic Interpolation: Naive Bracketing

Page 61: Local vs Global Optimization Optimization Problems ...

Convergence Behavior

❑  Question 1: What type of convergence is this?

❑  Question 2: What does this plot say about conditioning of

Page 62: Local vs Global Optimization Optimization Problems ...

Matlab: Successive Parabolic Interpolation – Replace Oldest

Page 63: Local vs Global Optimization Optimization Problems ...

Matlab: Successive Parabolic Interpolation – Replace Oldest

Page 64: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method

Newton’s MethodAnother local quadratic approximation is truncated Taylorseries

f(x+ h) ⇡ f(x) + f

0(x)h+

f

00(x)

2

h

2

By differentiation, minimum of this quadratic function of h isgiven by h = �f

0(x)/f

00(x)

Suggests iteration scheme

x

k+1 = x

k

� f

0(x

k

)/f

00(x

k

)

which is Newton’s method for solving nonlinear equationf

0(x) = 0

Newton’s method for finding minimum normally hasquadratic convergence rate, but must be started closeenough to solution to converge < interactive example >

Michael T. Heath Scientific Computing 27 / 74

Page 65: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method

Example: Newton’s MethodUse Newton’s method to minimize f(x) = 0.5� x exp(�x

2)

First and second derivatives of f are given by

f

0(x) = (2x

2 � 1) exp(�x

2)

andf

00(x) = 2x(3� 2x

2) exp(�x

2)

Newton iteration for zero of f 0 is given by

x

k+1 = x

k

� (2x

2k

� 1)/(2x

k

(3� 2x

2k

))

Using starting guess x0 = 1, we obtain

x

k

f(x

k

)

1.000 0.132

0.500 0.111

0.700 0.071

0.707 0.071

Michael T. Heath Scientific Computing 28 / 74

Page 66: Local vs Global Optimization Optimization Problems ...

Matlab Example: newton.m

Page 67: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method

Safeguarded Methods

As with nonlinear equations in one dimension,slow-but-sure and fast-but-risky optimization methods canbe combined to provide both safety and efficiency

Most library routines for one-dimensional optimization arebased on this hybrid approach

Popular combination is golden section search andsuccessive parabolic interpolation, for which no derivativesare required

Michael T. Heath Scientific Computing 29 / 74

Page 68: Local vs Global Optimization Optimization Problems ...

Methods for Multi-Dimensional Problems

Page 69: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization

Steepest Descent Method

Let f : Rn ! R be real-valued function of n real variables

At any point x where gradient vector is nonzero, negativegradient, �rf(x), points downhill toward lower values of f

In fact, �rf(x) is locally direction of steepest descent: fdecreases more rapidly along direction of negativegradient than along any other

Steepest descent method: starting from initial guess x0,successive approximate solutions given by

xk+1 = x

k

� ↵

k

rf(xk

)

where ↵

k

is line search parameter that determines how farto go in given direction

Michael T. Heath Scientific Computing 31 / 74

Page 70: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization

Steepest Descent, continued

Given descent direction, such as negative gradient,determining appropriate value for ↵

k

at each iteration isone-dimensional minimization problem

min

↵k

f(xk

� ↵

k

rf(xk

))

that can be solved by methods already discussed

Steepest descent method is very reliable: it can alwaysmake progress provided gradient is nonzero

But method is myopic in its view of function’s behavior, andresulting iterates can zigzag back and forth, making veryslow progress toward solution

In general, convergence rate of steepest descent is onlylinear, with constant factor that can be arbitrarily close to 1

Michael T. Heath Scientific Computing 32 / 74

Page 71: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization

Example, continued

< interactive example >

Michael T. Heath Scientific Computing 35 / 74

Page 72: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization

Example: Steepest Descent

Use steepest descent method to minimize

f(x) = 0.5x

21 + 2.5x

22

Gradient is given by rf(x) =

x1

5x2

Taking x0 =

5

1

�, we have rf(x0) =

5

5

Performing line search along negative gradient direction,

min

↵0f(x0 � ↵0rf(x0))

exact minimum along line is given by ↵0 = 1/3, so next

approximation is x1 =

3.333

�0.667

Michael T. Heath Scientific Computing 33 / 74

Page 73: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization

Example, continued

xk

f(xk

) rf(xk

)

5.000 1.000 15.000 5.000 5.000

3.333 �0.667 6.667 3.333 �3.333

2.222 0.444 2.963 2.222 2.222

1.481 �0.296 1.317 1.481 �1.481

0.988 0.198 0.585 0.988 0.988

0.658 �0.132 0.260 0.658 �0.658

0.439 0.088 0.116 0.439 0.439

0.293 �0.059 0.051 0.293 �0.293

0.195 0.039 0.023 0.195 0.195

0.130 �0.026 0.010 0.130 �0.130

Michael T. Heath Scientific Computing 34 / 74

Page 74: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization

Example, continued

< interactive example >

Michael T. Heath Scientific Computing 35 / 74

Page 75: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization

Newton’s Method

Broader view can be obtained by local quadraticapproximation, which is equivalent to Newton’s method

In multidimensional optimization, we seek zero of gradient,so Newton iteration has form

xk+1 = x

k

�H�1f

(xk

)rf(xk

)

where Hf

(x) is Hessian matrix of second partialderivatives of f ,

{Hf

(x)}ij

=

@

2f(x)

@x

i

@x

j

Michael T. Heath Scientific Computing 36 / 74

Page 76: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization

Newton’s Method, continued

Do not explicitly invert Hessian matrix, but instead solvelinear system

Hf

(xk

)sk

= �rf(xk

)

for Newton step sk

, then take as next iterate

xk+1 = x

k

+ sk

Convergence rate of Newton’s method for minimization isnormally quadratic

As usual, Newton’s method is unreliable unless startedclose enough to solution to converge

< interactive example >

Michael T. Heath Scientific Computing 37 / 74

Page 77: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization

Example: Newton’s Method

Use Newton’s method to minimize

f(x) = 0.5x

21 + 2.5x

22

Gradient and Hessian are given by

rf(x) =

x1

5x2

�and H

f

(x) =

1 0

0 5

Taking x0 =

5

1

�, we have rf(x0) =

5

5

Linear system for Newton step is1 0

0 5

�s0 =

�5

�5

�, so

x1 = x0 + s0 =

5

1

�+

�5

�1

�=

0

0

�, which is exact solution

for this problem, as expected for quadratic function

Michael T. Heath Scientific Computing 38 / 74

Page 78: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization

Newton’s Method, continued

In principle, line search parameter is unnecessary withNewton’s method, since quadratic model determineslength, as well as direction, of step to next approximatesolution

When started far from solution, however, it may still beadvisable to perform line search along direction of Newtonstep s

k

to make method more robust (damped Newton)

Once iterates are near solution, then ↵

k

= 1 should sufficefor subsequent iterations

Michael T. Heath Scientific Computing 39 / 74

Page 79: Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization

Newton’s Method, continued

If objective function f has continuous second partialderivatives, then Hessian matrix H

f

is symmetric, andnear minimum it is positive definite

Thus, linear system for step to next iterate can be solved inonly about half of work required for LU factorization

Far from minimum, Hf

(xk

) may not be positive definite, soNewton step s

k

may not be descent direction for function,i.e., we may not have

rf(xk

)

Tsk

< 0

In this case, alternative descent direction can becomputed, such as negative gradient or direction ofnegative curvature, and then perform line search

Michael T. Heath Scientific Computing 40 / 74