Local vs Global Optimization Optimization Problems ...

Optimization ProblemsOne-Dimensional OptimizationMulti-Dimensional Optimization

Outline

1 Optimization Problems

2 One-Dimensional Optimization

3 Multi-Dimensional Optimization

Michael T. Heath Scientific Computing 2 / 74


DefinitionsExistence and UniquenessOptimality Conditions

Local vs Global Optimization

x⇤ 2 S is global minimum if f(x⇤) f(x) for all x 2 S

x⇤ 2 S is local minimum if f(x⇤) f(x) for all feasible x in

some neighborhood of x⇤


















Optimization

Given function f : Rn ! R, and set S ✓ Rn, find x⇤ 2 S

such that f(x⇤) f(x) for all x 2 S

x⇤ is called minimizer or minimum of f

It suffices to consider only minimization, since maximum off is minimum of �f

Objective function f is usually differentiable, and may belinear or nonlinear

Constraint set S is defined by system of equations andinequalities, which may be linear or nonlinear

Points x 2 S are called feasible points

If S = Rn, problem is unconstrained




Optimization Problems

General continuous optimization problem:

min f(x) subject to g(x) = 0 and h(x) 0

where f : Rn ! R, g : Rn ! Rm, h : Rn ! Rp

Linear programming : f , g, and h are all linear

Nonlinear programming : at least one of f , g, and h isnonlinear




Examples: Optimization Problems

Minimize weight of structure subject to constraint on itsstrength, or maximize its strength subject to constraint onits weight

Minimize cost of diet subject to nutritional constraints

Minimize surface area of cylinder subject to constraint onits volume:

min

x1,x2f(x1, x2) = 2⇡x1(x1 + x2)

subject to g(x1, x2) = ⇡x

21x2 � V = 0

where x1 and x2 are radius and height of cylinder, and V isrequired volume




Global Optimization

Finding, or even verifying, global minimum is difficult, ingeneral

Most optimization methods are designed to find localminimum, which may or may not be global minimum

If global minimum is desired, one can try several widelyseparated starting points and see if all produce sameresult

For some problems, such as linear programming, globaloptimization is more tractable




Existence of Minimum

If f is continuous on closed and bounded set S ✓ Rn, thenf has global minimum on S

If S is not closed or is unbounded, then f may have nolocal or global minimum on S

Continuous function f on unbounded set S ✓ Rn iscoercive if

lim

kxk!1f(x) = +1

i.e., f(x) must be large whenever kxk is large

If f is coercive on closed, unbounded set S ✓ Rn, then f

has global minimum on S


An Example of a Coercive Function

Level Sets (contours, isosurfaces)

Goes to 1 as ||x||! 1



Level Sets

Level set for function f : S ✓ Rn ! R is set of all points inS for which f has some given constant value

For given � 2 R, sublevel set is

L

�

= {x 2 S : f(x) �}

If continuous function f on S ✓ Rn has nonempty sublevelset that is closed and bounded, then f has global minimumon S

If S is unbounded, then f is coercive on S if, and only if, allof its sublevel sets are bounded


(No open contours…)



Uniqueness of Minimum

Set S ✓ Rn is convex if it contains line segment betweenany two of its points

Function f : S ✓ Rn ! R is convex on convex set S if itsgraph along any line segment in S lies on or below chordconnecting function values at endpoints of segment

Any local minimum of convex function f on convex setS ✓ Rn is global minimum of f on S

Any local minimum of strictly convex function f on convexset S ✓ Rn is unique global minimum of f on S


Convex Domains and Convex Functions



First-Order Optimality Condition

For function of one variable, one can find extremum bydifferentiating function and setting derivative to zero

Generalization to function of n variables is to find criticalpoint, i.e., solution of nonlinear system

rf(x) = 0

where rf(x) is gradient vector of f , whose ith componentis @f(x)/@x

i

For continuously differentiable f : S ✓ Rn ! R, any interiorpoint x⇤ of S at which f has local minimum must be criticalpoint of f

But not all critical points are minima: they can also bemaxima or saddle points


•  Saddle points in higher dimensions are more common than zero-slope inflection points in 1D






rf(x) = 0


i









rf(x) = 0


i




Note the open contours….






rf(x) = 0


i




•  We use the solution techniques of Chapter 5 to solve this nonlinear system.



Second-Order Optimality Condition

For twice continuously differentiable f : S ✓ Rn ! R, wecan distinguish among critical points by consideringHessian matrix H

f

(x) defined by

{Hf

(x)}ij

=

@

2f(x)

@x

i

@x

j

which is symmetric

At critical point x⇤, if Hf

(x⇤) is

positive definite, then x⇤ is minimum of fnegative definite, then x⇤ is maximum of findefinite, then x⇤ is saddle point of fsingular, then various pathological situations are possible


Classifying Critical Points

• Example 6.5 from the text:

f(x) = 2x31 + 3x21 + 12x1x2 + 3x22 � 6x2 + 6.

• Gradient:

rf(x) =

6x21 + 6x1 + 12x212x1 + 6x2 � 6

�.

• Hessian is symmetric:

Hf

=

"@

2f

@x1@x1

@

2f

@x1@x2

@

2f

@x1@x2

@

2f

@x2@x2

#=

12x1 + 6 12

12 6

�.

• Critical points, solutions to rf(x) = 0 are

x⇤ = [1,�1]T , and

x⇤ = [2,�3]T .

• At first critical point, x⇤ = [1,�1]T ,

Hf

(1,�1) =

18 1212 6

�,

does not have positive eigenvalues, so [1,�1]T is a saddle point.

• At second critical point, x⇤ = [2,�3]T ,

Hf

(2,�3) =

30 1212 6

�,

does have positive eigenvalues, so [2,�3]T is a local minimum.

• First critical point, x⇤ = [1,�1]T , is a saddle point.

• Second critical point, x⇤ = [2,�3]T , is a local minimum.



f(x) = 2x31 + 3x21 + 12x1x2 + 3x22 � 6x2 + 6.

• Gradient:

r(x) =

6x21 + 6x1 + 12x212x1 + 6x2 � 6

�.


Hf =

12x1 + 6 12

12 6

�.


x⇤ = [1,�1]T , and

x⇤ = [2,�3]T .


Hf(1,�1) =

18 1212 6

�,



Hf(2,�3) =

30 1212 6

�,


Example 6.5 from text



f(x) = 2x31 + 3x21 + 12x1x2 + 3x22 � 6x2 + 6.

• Gradient:

r(x) =

6x21 + 6x1 + 12x212x1 + 6x2 � 6

�.


Hf =

12x1 + 6 12

12 6

�.


x⇤ = [1,�1]T , and

x⇤ = [2,�3]T .


Hf(1,�1) =

18 1212 6

�,



Hf(2,�3) =

30 1212 6

�,


Example 6.5 from text



f(x) = 2x31 + 3x21 + 12x1x2 + 3x22 � 6x2 + 6.

• Gradient:

r(x) =

6x21 + 6x1 + 12x212x1 + 6x2 � 6

�.


Hf =

12x1 + 6 12

12 6

�.


x⇤ = [1,�1]T , and

x⇤ = [2,�3]T .


Hf(1,�1) =

18 1212 6

�,



Hf(2,�3) =

30 1212 6

�,




f(x) = 2x31 + 3x21 + 12x1x2 + 3x22 � 6x2 + 6.

• Gradient:

rf(x) =

6x21 + 6x1 + 12x212x1 + 6x2 � 6

�.


Hf

=

"@

2f

@x1@x1

@

2f

@x1@x2

@

2f

@x1@x2

@

2f

@x2@x2

#=

12x1 + 6 12

12 6

�.


x⇤ = [1,�1]T , and

x⇤ = [2,�3]T .


Hf

(1,�1) =

18 1212 6

�,



Hf

(2,�3) =

30 1212 6

�,


• First critical point, x⇤ = [1,�1]T , is a saddle point. Hessian not SPD.

• Second critical point, x⇤ = [2,�3]T , is a local minimum. Hessian is SPD.



Second-Order Optimality Condition

For twice continuously differentiable f : S ✓ Rn ! R, wecan distinguish among critical points by consideringHessian matrix H

f

(x) defined by

{Hf

(x)}ij

=

@

2f(x)

@x

i

@x

j

which is symmetric

At critical point x⇤, if Hf

(x⇤) is

positive definite, then x⇤ is minimum of fnegative definite, then x⇤ is maximum of findefinite, then x⇤ is saddle point of fsingular, then various pathological situations are possible


Example of Singular Hessian in 1D

• Consider,

f(x) = (x� 1)4

rf =@f

@xi=

@f

@x= 4(x� 1)3 = 0 when x = x⇤ = 1.

Hij =@2f

@xixj=

@2f

@x2= 12(x� 1)2 = 0.

Here, the Hessian is singular at x = x⇤ and x⇤ is a minimizer of f .

• However, if

f(x) = (x� 1)3

rf =@f

@xi=

@f

@x= 3(x� 1)2 = 0 when x = x⇤ = 1.

Hij =@2f

@xixj=

@2f

@x2= 6(x� 1) = 0.

Here, the Hessian is singular at x = x⇤ and x⇤ is a not a minimizer of f .



Constrained OptimalityIf problem is constrained, only feasible directions arerelevantFor equality-constrained problem

min f(x) subject to g(x) = 0

where f : Rn ! R and g : Rn ! Rm, with m n, necessarycondition for feasible point x⇤ to be solution is that negativegradient of f lie in space spanned by constraint normals,

�rf(x⇤) = JT

g

(x⇤)�

where Jg

is Jacobian matrix of g, and � is vector ofLagrange multipliersThis condition says we cannot reduce objective functionwithout violating constraints




Constrained Optimality, continued

Lagrangian function L : Rn+m ! R, is defined by

L(x,�) = f(x) + �Tg(x)

Its gradient is given by

rL(x,�) =rf(x) + JT

g

(x)�g(x)

�

Its Hessian is given by

HL(x,�) =

B(x,�) JT

g

(x)Jg

(x) O

�

where

B(x,�) = Hf

(x) +mX

i=1

�

i

Hgi(x)





Together, necessary condition and feasibility imply criticalpoint of Lagrangian function,


g

(x)�g(x)

�= 0

Hessian of Lagrangian is symmetric, but not positivedefinite, so critical point of L is saddle point rather thanminimum or maximum

Critical point (x⇤,�⇤

) of L is constrained minimum of f ifB(x⇤

,�⇤) is positive definite on null space of J

g

(x⇤)

If columns of Z form basis for null space, then testprojected Hessian ZTBZ for positive definiteness


Equality Constraints: Case g(x)=scalar.

min f(x) subject to g(x) = 0.

• Lagrangian

L(x,�) := f(x) + � g(x)

� := Lagrange multiplier

rL =

@L@xi

@L@�

!=

rf + �rg

g

!=

0

0

!

at critical point (x⇤,�⇤).

• Note, rf(x⇤) 6= 0.

• Instead, rf(x⇤) = ��⇤ rg(x⇤).

• The gradient of f at x⇤ is a multiple of the gradient of g.

Constrained Optimality Example, g(x) a Scalar.

❑  Comment on: “This condition says we cannot reduce objective function without violating constraints.”

❑  A key point is that, x*, rf is parallel to rg

❑  If there are multiple constraints, then rf in span{rgk} = JgT ¸

g(x) = 0

r g

-r f


❑  Here, we see the gradients of f and g at a point that is not x*.

❑  Clearly, we can make more progess in reducing f(x) by moving along the g(x)=0 contour until rf is parallel to rg.

g(x) = 0

r g

-r f x*

Example: Two Equality Constraints

min f(x) subject to g1(x) = 0 and g2(x) = 0.

• Lagrangian

L(x,�) := f(x) + �1 g1(x) + �2 g2(x).

• First-order conditions

rL =

@L@xi

@L@�k

!=

0

B@

@f

@xi+P

k

�k

@gk

@xi

0 +

✓g1g2

◆

1

CA

=

rf + J T

g

�

g

!






g

(x)�g(x)

�= 0





g

(x⇤)








g

(x)�g(x)

�= 0





g

(x⇤)



•  Note: null space of Jg(x*) is the set of vectors tangent to the constraint surface (or surfaces, if more than one g present).


❑  Comment on: “This condition says we cannot reduce objective function without violating constraints.”

❑  A key point is that, x*, rf is parallel to rg

❑  If there are multiple constraints, then rf in span{rgk} = JgT ¸

g(x) = 0

r g

-r f




If inequalities are present, then KKT optimality conditionsalso require nonnegativity of Lagrange multiplierscorresponding to inequalities, and complementaritycondition





If inequalities are present, then KKT optimality conditionsalso require nonnegativity of Lagrange multiplierscorresponding to inequalities, and complementaritycondition


Karush-Kuhn-Tucker

rx

L(x⇤,�⇤) = 0, (1)

g(x

⇤) = 0, (2)

h(x

⇤) 0, (3)

�

⇤i

� 0, i = m+ 1, . . . ,m+ p inequality Lagrange multipliers (4)

�

⇤i

h

i

(x

⇤) = 0, i = m+ 1, . . . ,m+ p complementarity condition. (5)

1

Inequality Constraint and Sign of ̧

❑  A key point is that, at x*, rf is parallel to rh

❑  If there are multiple constraints, then rf in span{rhk} = JhT ¸

h(x) = 0

-r h

-r f x*

x*

h(x) < 0

Inequality Constraint and Sign of ̧

h(x) = 0

-r h

-r f x*

x*

Terminate search when

f(x⇤ +�x) � f(x⇤)| ✏.

Taylor series:

f(x⇤ +�x) = f(x⇤) + �f 0(x⇤) +�x2

2f 00(x⇤) + O(�x3)

�x2

2⇡ f(x⇤ +�x)� f(x⇤)

f 00(x⇤)

|�x| ⇡

s2✏

|f 00(x⇤)|

• So, if ✏ ⇡ ✏M , can expect accuracy to approximatelyp✏M .

• Question: Is a small f 00 good? Or bad?

• If constraint is active (h(x

⇤) = 0) then

rf and rh point in opposite directions.

• rf + �

⇤rh = 0.

• �

⇤> 0.

1

h(x) < 0

Consider f=1-cos(x)

Consider f=1-cos(x)

❑  As you zoom in, f(x)à0 quadratically, but xàx* only linearly.

❑  So, if |f(x)-f(x*)| ¼ ², then || x – x* || = O( ²1/2 )

Consider f=1-cos(x)

❑  As you zoom in, f(x)à0 quadratically, but xàx* only linearly

❑  So, if |f(x)-f(x*)| ¼ ², then || x – x* || = O( ²1/2 )

± ²1/2 ²

Sensitivity of Minimization

Terminate search when

|f(x⇤ +�x) � f(x⇤)| ✏.

Taylor series:

f(x⇤ +�x) = f(x⇤) + �xf 0(x⇤) +�x2

2f 00(x⇤) + O(�x3)

�x2

2⇡ f(x⇤ +�x)� f(x⇤)

f 00(x⇤)

|�x| ⇡

s2✏

|f 00(x⇤)|

• So, if ✏ ⇡ ✏M , can expect accuracy to approximatelyp✏M .

• Question: Is a small f 00 good? Or bad?

• If constraint is active (h(x

⇤) = 0) then

rf and rh point in opposite directions.

• rf + �

⇤rh = 0.

• �

⇤> 0.

1

Example: Minimize Cost over Some Design Parameter, x

Cost

Design Parameter

Example: Minimize Cost over Some Design Parameter, x

Cost

Design Parameter ❑  While having f’’(x*) makes it difficult to find the optimal x*, it in fact

is a happy circumstance because it gives you liberty to add additional constraints at no cost.

❑  So, often, having a broad minimum is good.

Methods for One-Dimensional Problems

❑  Demonstrate ❑  Basic techniques ❑  Bracketing ❑  Convergence rates

❑  Useful for line search in multi-dimensional problems


Golden Section SearchSuccessive Parabolic InterpolationNewton’s Method

Unimodality

For minimizing function of one variable, we need “bracket”for solution analogous to sign change for nonlinearequation

Real-valued function f is unimodal on interval [a, b] if thereis unique x

⇤ 2 [a, b] such that f(x⇤) is minimum of f on[a, b], and f is strictly decreasing for x x

⇤, strictlyincreasing for x⇤ x

Unimodality enables discarding portions of interval basedon sample function values, analogous to interval bisection


Golden Section Search



Golden Section Search, continued⌧ = (

p5� 1)/2

x1 = a+ (1� ⌧)(b� a); f1 = f(x1)

x2 = a+ ⌧(b� a); f2 = f(x2)

while ((b� a) > tol) do

if (f1 > f2) then

a = x1

x1 = x2

f1 = f2

x2 = a+ ⌧(b� a)

f2 = f(x2)

else

b = x2

x2 = x1

f2 = f1

x1 = a+ (1� ⌧)(b� a)

f1 = f(x1)

end

end


f1 > f2





p5� 1)/2

x1 = a+ (1� ⌧)(b� a); f1 = f(x1)

x2 = a+ ⌧(b� a); f2 = f(x2)


if (f1 > f2) then

a = x1

x1 = x2

f1 = f2

x2 = a+ ⌧(b� a)

f2 = f(x2)

else

b = x2

x2 = x1

f2 = f1

x1 = a+ (1� ⌧)(b� a)

f1 = f(x1)

end

end


f1 > f2

Discard





p5� 1)/2

x1 = a+ (1� ⌧)(b� a); f1 = f(x1)

x2 = a+ ⌧(b� a); f2 = f(x2)


if (f1 > f2) then

a = x1

x1 = x2

f1 = f2

x2 = a+ ⌧(b� a)

f2 = f(x2)

else

b = x2

x2 = x1

f2 = f1

x1 = a+ (1� ⌧)(b� a)

f1 = f(x1)

end

end


a x1 New x2

f1 > f2




Suppose f is unimodal on [a, b], and let x1 and x2 be twopoints within [a, b], with x1 < x2

Evaluating and comparing f(x1) and f(x2), we can discardeither (x2, b] or [a, x1), with minimum known to lie inremaining subinterval

To repeat process, we need compute only one newfunction evaluation

To reduce length of interval by fixed fraction at eachiteration, each new pair of points must have samerelationship with respect to new interval that previous pairhad with respect to previous interval




Golden Section Search, continued

To accomplish this, we choose relative positions of twopoints as ⌧ and 1� ⌧ , where ⌧

2= 1� ⌧ , so

⌧ = (

p5� 1)/2 ⇡ 0.618 and 1� ⌧ ⇡ 0.382

Whichever subinterval is retained, its length will be ⌧

relative to previous interval, and interior point retained willbe at position either ⌧ or 1� ⌧ relative to new interval

To continue iteration, we need to compute only one newfunction value, at complementary point

This choice of sample points is called golden sectionsearch

Golden section search is safe but convergence rate is onlylinear, with constant C ⇡ 0.618


Requires Unimodality!


❑  f(x) unimodal on [a,b].

❑  Subdivide [a,b] into 3 parts.

❑  If f1 > f2 discard (a,x1)

❑  Choose new point in larger of remaining two intervals: (x1,x2) or (x2,b).

❑  à x1 and x2 should be closer to center and not at 1/3 , 2/3.

-

6

-• • • •a x1 x2 b

1

-

6

-• • • •a x1 x2 b

1

a’ x’1 x’2

f1 > f2

Golden Section Geometry

❑  Want new section [1-¿, ¿] to have same relation as [0,1- ¿ ] to [0,1].

-

6

-

� -⌧• • • •0 0.382 0.618 1

1-⌧ ⌧

⌧ � (1� ⌧ )

⌧=

1 � ⌧

1

2⌧ � 1 = ⌧ � ⌧ 2

⌧ 2 + ⌧ � 1 = 0

⌧ =�1 +

p1 + 4

2=

p5� 1

2= 0.618

1




p5� 1)/2

x1 = a+ (1� ⌧)(b� a); f1 = f(x1)

x2 = a+ ⌧(b� a); f2 = f(x2)


if (f1 > f2) then

a = x1

x1 = x2

f1 = f2

x2 = a+ ⌧(b� a)

f2 = f(x2)

else

b = x2

x2 = x1

f2 = f1

x1 = a+ (1� ⌧)(b� a)

f1 = f(x1)

end

end




Example: Golden Section Search

Use golden section search to minimize

f(x) = 0.5� x exp(�x

2)




Example, continued

x1 f1 x2 f2

0.764 0.074 1.236 0.232

0.472 0.122 0.764 0.074

0.764 0.074 0.944 0.113

0.652 0.074 0.764 0.074

0.584 0.085 0.652 0.074

0.652 0.074 0.695 0.071

0.695 0.071 0.721 0.071

0.679 0.072 0.695 0.071

0.695 0.071 0.705 0.071

0.705 0.071 0.711 0.071

< interactive example >




Successive Parabolic InterpolationFit quadratic polynomial to three function valuesTake minimum of quadratic to be new approximation tominimum of function

New point replaces oldest of three previous points andprocess is repeated until convergenceConvergence rate of successive parabolic interpolation issuperlinear, with r ⇡ 1.324




Example: Successive Parabolic Interpolation

Use successive parabolic interpolation to minimize

f(x) = 0.5� x exp(�x

2)




Example, continued

x

k

f(x

k

)

0.000 0.500

0.600 0.081

1.200 0.216

0.754 0.073

0.721 0.071

0.692 0.071

0.707 0.071



Not monotone Superlinear

Successive Parabolic Interpolation is Newton’s Method applied to a quadratic model of the data. r=1.324 We turn to Newton’s method next, r=2.

Matlab: Successive Parabolic Interpolation: Naive Bracketing

Convergence Behavior

❑  Question 1: What type of convergence is this?

❑  Question 2: What does this plot say about conditioning of

Matlab: Successive Parabolic Interpolation – Replace Oldest



Newton’s MethodAnother local quadratic approximation is truncated Taylorseries

f(x+ h) ⇡ f(x) + f

0(x)h+

f

00(x)

2

h

2

By differentiation, minimum of this quadratic function of h isgiven by h = �f

0(x)/f

00(x)

Suggests iteration scheme

x

k+1 = x

k

� f

0(x

k

)/f

00(x

k

)

which is Newton’s method for solving nonlinear equationf

0(x) = 0

Newton’s method for finding minimum normally hasquadratic convergence rate, but must be started closeenough to solution to converge < interactive example >




Example: Newton’s MethodUse Newton’s method to minimize f(x) = 0.5� x exp(�x

2)

First and second derivatives of f are given by

f

0(x) = (2x

2 � 1) exp(�x

2)

andf

00(x) = 2x(3� 2x

2) exp(�x

2)

Newton iteration for zero of f 0 is given by

x

k+1 = x

k

� (2x

2k

� 1)/(2x

k

(3� 2x

2k

))

Using starting guess x0 = 1, we obtain

x

k

f(x

k

)

1.000 0.132

0.500 0.111

0.700 0.071

0.707 0.071


Matlab Example: newton.m



Safeguarded Methods

As with nonlinear equations in one dimension,slow-but-sure and fast-but-risky optimization methods canbe combined to provide both safety and efficiency

Most library routines for one-dimensional optimization arebased on this hybrid approach

Popular combination is golden section search andsuccessive parabolic interpolation, for which no derivativesare required


Methods for Multi-Dimensional Problems


Unconstrained OptimizationNonlinear Least SquaresConstrained Optimization

Steepest Descent Method

Let f : Rn ! R be real-valued function of n real variables

At any point x where gradient vector is nonzero, negativegradient, �rf(x), points downhill toward lower values of f

In fact, �rf(x) is locally direction of steepest descent: fdecreases more rapidly along direction of negativegradient than along any other

Steepest descent method: starting from initial guess x0,successive approximate solutions given by

xk+1 = x

k

� ↵

k

rf(xk

)

where ↵

k

is line search parameter that determines how farto go in given direction




Steepest Descent, continued

Given descent direction, such as negative gradient,determining appropriate value for ↵

k

at each iteration isone-dimensional minimization problem

min

↵k

f(xk

� ↵

k

rf(xk

))

that can be solved by methods already discussed

Steepest descent method is very reliable: it can alwaysmake progress provided gradient is nonzero

But method is myopic in its view of function’s behavior, andresulting iterates can zigzag back and forth, making veryslow progress toward solution

In general, convergence rate of steepest descent is onlylinear, with constant factor that can be arbitrarily close to 1




Example, continued





Example: Steepest Descent

Use steepest descent method to minimize

f(x) = 0.5x

21 + 2.5x

22

Gradient is given by rf(x) =

x1

5x2

�

Taking x0 =

5

1

�, we have rf(x0) =

5

5

�

Performing line search along negative gradient direction,

min

↵0f(x0 � ↵0rf(x0))

exact minimum along line is given by ↵0 = 1/3, so next

approximation is x1 =

3.333

�0.667

�




Example, continued

xk

f(xk

) rf(xk

)

5.000 1.000 15.000 5.000 5.000

3.333 �0.667 6.667 3.333 �3.333

2.222 0.444 2.963 2.222 2.222

1.481 �0.296 1.317 1.481 �1.481

0.988 0.198 0.585 0.988 0.988

0.658 �0.132 0.260 0.658 �0.658

0.439 0.088 0.116 0.439 0.439

0.293 �0.059 0.051 0.293 �0.293

0.195 0.039 0.023 0.195 0.195

0.130 �0.026 0.010 0.130 �0.130




Example, continued





Newton’s Method

Broader view can be obtained by local quadraticapproximation, which is equivalent to Newton’s method

In multidimensional optimization, we seek zero of gradient,so Newton iteration has form

xk+1 = x

k

�H�1f

(xk

)rf(xk

)

where Hf

(x) is Hessian matrix of second partialderivatives of f ,

{Hf

(x)}ij

=

@

2f(x)

@x

i

@x

j




Newton’s Method, continued

Do not explicitly invert Hessian matrix, but instead solvelinear system

Hf

(xk

)sk

= �rf(xk

)

for Newton step sk

, then take as next iterate

xk+1 = x

k

+ sk

Convergence rate of Newton’s method for minimization isnormally quadratic

As usual, Newton’s method is unreliable unless startedclose enough to solution to converge





Example: Newton’s Method

Use Newton’s method to minimize

f(x) = 0.5x

21 + 2.5x

22

Gradient and Hessian are given by

rf(x) =

x1

5x2

�and H

f

(x) =

1 0

0 5

�

Taking x0 =

5

1

�, we have rf(x0) =

5

5

�

Linear system for Newton step is1 0

0 5

�s0 =

�5

�5

�, so

x1 = x0 + s0 =

5

1

�+

�5

�1

�=

0

0

�, which is exact solution

for this problem, as expected for quadratic function





In principle, line search parameter is unnecessary withNewton’s method, since quadratic model determineslength, as well as direction, of step to next approximatesolution

When started far from solution, however, it may still beadvisable to perform line search along direction of Newtonstep s

k

to make method more robust (damped Newton)

Once iterates are near solution, then ↵

k

= 1 should sufficefor subsequent iterations





If objective function f has continuous second partialderivatives, then Hessian matrix H

f

is symmetric, andnear minimum it is positive definite

Thus, linear system for step to next iterate can be solved inonly about half of work required for LU factorization

Far from minimum, Hf

(xk

) may not be positive definite, soNewton step s

k

may not be descent direction for function,i.e., we may not have

rf(xk

)

Tsk

< 0

In this case, alternative descent direction can becomputed, such as negative gradient or direction ofnegative curvature, and then perform line search


Local vs Global Optimization Optimization Problems ...

Documents

Transcript of Local vs Global Optimization Optimization Problems ...