Optimization

Issues

What is optimization? What real life situations give rise to

optimization problems? When is it easy to optimize? What are we trying to optimize? What can cause problems when we try to

optimize? What methods can we use to optimize?

One-Dimensional Minimization

Golden section search

Brent’s method

One-Dimensional Minimization

Golden section search: successively narrowing the brackets of upper and lower bounds

Terminating condition: |x3–x1|<

Start with x1,x2,x3 where f2 is smaller than f1 and f3Iteration: Choose x4 somewhere in the larger intervalTwo cases for f4: • f4a: [x1,x2,x4]• f4b: [x2,x4,x3]

Initial bracketing…

)(min xfRx

Upper bound a, lower bound b, initial estimate x f(a) > f(x) < f(b) This condition guarantees that a minimum is

contained somewhere within the interval. On each iteration a new point x' is selected using one

of the available algorithms. If the new point is a better estimate of the minimum,

i.e. where f(x') < f(x), then the current estimate of the minimum x is updated.

The new point also allows the size of the bounded interval to be reduced, by choosing the most compact set of points which satisfies the constraint f(a) > f(x) < f(b).

The interval is reduced until it encloses the true minimum to a desired tolerance.

This provides a best estimate of the location of the minimum and a rigorous error estimate.

From GSL

Golden Section Search

618.12

51

11

b

aa

ab

abccba

b

a

cb

cb

aa

c

Guaranteed linear convergence:[x1,x3]/[x1,x4] = 1.618

[GSL] Choosing the golden section as the bisection ratio can be shown to provide the fastest convergence for this type of algorithm.

Golden Section (reference)

Fibonacci Search (ref)

2

511

k

k

F

FFi: 0, 1, 1, 2, 3, 5, 8, 13, …

Related…

Parabolic Interpolation (Brent)

Brent Details (From GSL)

The minimum of the parabola is taken as a guess for the minimum.

If it lies within the bounds of the current interval then the interpolating point is accepted, and used to generate a smaller interval.

If the interpolating point is not accepted then the algorithm falls back to an ordinary golden section step.

The full details of Brent's method include some additional checks to improve convergence.

Brent(details)

The abscissa x that is the minimum of a parabola through three points (a,f(a)), (b,f(b)), (c,f(c))

Multi-Dimensional Minimization

Gradient Descent

Conjugate Gradient

f: RnR. If f(x) is of class C2, objective function Gradient of f

Hessian of f

Gradient and Hessian

Optimality

Positive semi-definite Hessian

Taylor’s expansion

For one dimensional f(x)

Multi-Dimensional Optimization

0 :points critical

:

f

RRf n

Higher dimensional root finding is no easier (more difficult) than minimization

Quasi-Newton Method

The various quasi-Newton methods (DFP, BFGS, Broyden) differ in their choice of the solution to update B.

Taylor’s series of f(x) around xk:

B: an approximation to the Hessian matrix

The gradient of this approximation:

Setting this gradient to zero provides the Newton step:

Gradient Descent

Are the directions always orthogonal? Yes!

ExampleMinimize

minimum

Gradient is perpendicular to level curves and surfaces

(proof)

Weakness of Gradient Descent

Narrow valley

where

Any function f(x) can be locally approximated by a quadratic function

Conjugate gradient method is a method that works well on this kind of problems

Conjugate Gradient

An iterative method for solving linear systems Ax=b, where A is symmetric and positive definite

Guaranteed to converge in n steps, where n is the system size

Symmetric A is positive definite if it has (any of these):1. All n eigenvalues are positive2. All n upper left determinants are positive3. All n pivots are positive4. xTAx is positive except at x = 0

Symmetric A is positive definite if it has (any of these):1. All n eigenvalues are positive2. All n upper left determinants are positive3. All n pivots are positive4. xTAx is positive except at x = 0

Details (from wikipedia)

Two nonzero vectors u & v are conjugate w.r.t. A:

{pk} are n mutually conjugate directions. {pk} form a basis of Rn.

x*, the solution to Ax=b, can be expressed in this basis

Therefore,Find pk’sSolve k’s

Find pk’sSolve k’s

The Iterative Method

Equivalent problem: find the minimal of the quadratic function,

Taking the first basis vector p1 to be the gradient of f at x = x0; the other vectors in the basis will be conjugate to the gradient

rk: the residual at kth step,

Note that rk is the negative gradient of f at x = xk

The Algorithm

Example

yx

yx

bA

xbAxxxf

yfxf

TT

1022

2161

2

1,

51

18

)( 21

Stationary point at [-1/26, -5/26]

Solving Linear Equations

The optimality condition seems to suggest that CG can be used to solve linear equations

CG is only applicable for symmetric positive definite A. For arbitrary linear systems, solve the normal equation

since ATA is symmetric and positive-semidefinite for any A But, k(ATA) = k(A)^2! Slower convergence, worse accuracy

BiCG (biconjugate gradient) is the approach to use for general A

Multidimensional Minimizer [GSL]

Conjugate gradient Fletcher-Reeves, Polak-Ribiere

Quasi-Newton Broyden-Fletcher-Goldfarb-Shanno (BFGS) Utilizes 2nd order approximation

Steepest descent Inefficient (for demonstration purpose)

Simplex algorithm (Nelder and Mead) Without derivative

GSL Example

Objective function: paraboloid

42

132

02),( ppyppxpyxf

Starting from (5,7)

30220110),( 22 yxyxf

Conjugate gradientConverge in 12 iterations

Steepest descentConverge in 158 iterations

[Solutions in Numerical Recipe]

Sec.2.7 linbcg (biconjugate gradient): general AReference A implicitly through atimes

Sec.10.6 frprmn (minimization) Model test problem: spacetime, …

Optimization

Documents

Transcript of Optimization