Optimization
description
Transcript of Optimization
![Page 1: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/1.jpg)
Optimization
![Page 2: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/2.jpg)
Issues
What is optimization? What real life situations give rise to
optimization problems? When is it easy to optimize? What are we trying to optimize? What can cause problems when we try to
optimize? What methods can we use to optimize?
![Page 3: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/3.jpg)
One-Dimensional Minimization
Golden section search
Brent’s method
![Page 4: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/4.jpg)
One-Dimensional Minimization
Golden section search: successively narrowing the brackets of upper and lower bounds
Terminating condition: |x3–x1|<
Start with x1,x2,x3 where f2 is smaller than f1 and f3Iteration: Choose x4 somewhere in the larger intervalTwo cases for f4: • f4a: [x1,x2,x4]• f4b: [x2,x4,x3]
Initial bracketing…
)(min xfRx
![Page 5: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/5.jpg)
Upper bound a, lower bound b, initial estimate x f(a) > f(x) < f(b) This condition guarantees that a minimum is
contained somewhere within the interval. On each iteration a new point x' is selected using one
of the available algorithms. If the new point is a better estimate of the minimum,
i.e. where f(x') < f(x), then the current estimate of the minimum x is updated.
The new point also allows the size of the bounded interval to be reduced, by choosing the most compact set of points which satisfies the constraint f(a) > f(x) < f(b).
The interval is reduced until it encloses the true minimum to a desired tolerance.
This provides a best estimate of the location of the minimum and a rigorous error estimate.
From GSL
![Page 6: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/6.jpg)
Golden Section Search
618.12
51
11
b
aa
ab
abccba
b
a
cb
cb
aa
c
Guaranteed linear convergence:[x1,x3]/[x1,x4] = 1.618
[GSL] Choosing the golden section as the bisection ratio can be shown to provide the fastest convergence for this type of algorithm.
![Page 7: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/7.jpg)
Golden Section (reference)
![Page 8: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/8.jpg)
Fibonacci Search (ref)
2
511
k
k
F
FFi: 0, 1, 1, 2, 3, 5, 8, 13, …
Related…
![Page 9: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/9.jpg)
Parabolic Interpolation (Brent)
![Page 10: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/10.jpg)
Brent Details (From GSL)
The minimum of the parabola is taken as a guess for the minimum.
If it lies within the bounds of the current interval then the interpolating point is accepted, and used to generate a smaller interval.
If the interpolating point is not accepted then the algorithm falls back to an ordinary golden section step.
The full details of Brent's method include some additional checks to improve convergence.
![Page 11: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/11.jpg)
Brent(details)
The abscissa x that is the minimum of a parabola through three points (a,f(a)), (b,f(b)), (c,f(c))
![Page 12: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/12.jpg)
Multi-Dimensional Minimization
Gradient Descent
Conjugate Gradient
![Page 13: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/13.jpg)
f: RnR. If f(x) is of class C2, objective function Gradient of f
Hessian of f
Gradient and Hessian
![Page 14: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/14.jpg)
Optimality
Positive semi-definite Hessian
Taylor’s expansion
For one dimensional f(x)
![Page 15: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/15.jpg)
Multi-Dimensional Optimization
0 :points critical
:
f
RRf n
Higher dimensional root finding is no easier (more difficult) than minimization
![Page 16: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/16.jpg)
Quasi-Newton Method
The various quasi-Newton methods (DFP, BFGS, Broyden) differ in their choice of the solution to update B.
Taylor’s series of f(x) around xk:
B: an approximation to the Hessian matrix
The gradient of this approximation:
Setting this gradient to zero provides the Newton step:
![Page 17: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/17.jpg)
Gradient Descent
Are the directions always orthogonal? Yes!
![Page 18: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/18.jpg)
ExampleMinimize
minimum
![Page 19: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/19.jpg)
…
![Page 20: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/20.jpg)
Gradient is perpendicular to level curves and surfaces
(proof)
![Page 21: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/21.jpg)
Weakness of Gradient Descent
Narrow valley
![Page 22: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/22.jpg)
where
Any function f(x) can be locally approximated by a quadratic function
Conjugate gradient method is a method that works well on this kind of problems
![Page 23: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/23.jpg)
Conjugate Gradient
An iterative method for solving linear systems Ax=b, where A is symmetric and positive definite
Guaranteed to converge in n steps, where n is the system size
Symmetric A is positive definite if it has (any of these):1. All n eigenvalues are positive2. All n upper left determinants are positive3. All n pivots are positive4. xTAx is positive except at x = 0
Symmetric A is positive definite if it has (any of these):1. All n eigenvalues are positive2. All n upper left determinants are positive3. All n pivots are positive4. xTAx is positive except at x = 0
![Page 24: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/24.jpg)
Details (from wikipedia)
Two nonzero vectors u & v are conjugate w.r.t. A:
{pk} are n mutually conjugate directions. {pk} form a basis of Rn.
x*, the solution to Ax=b, can be expressed in this basis
Therefore,Find pk’sSolve k’s
Find pk’sSolve k’s
![Page 25: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/25.jpg)
The Iterative Method
Equivalent problem: find the minimal of the quadratic function,
Taking the first basis vector p1 to be the gradient of f at x = x0; the other vectors in the basis will be conjugate to the gradient
rk: the residual at kth step,
Note that rk is the negative gradient of f at x = xk
![Page 26: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/26.jpg)
The Algorithm
![Page 27: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/27.jpg)
Example
yx
yx
bA
xbAxxxf
yfxf
TT
1022
2161
2
1,
51
18
)( 21
Stationary point at [-1/26, -5/26]
![Page 28: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/28.jpg)
Solving Linear Equations
The optimality condition seems to suggest that CG can be used to solve linear equations
CG is only applicable for symmetric positive definite A. For arbitrary linear systems, solve the normal equation
since ATA is symmetric and positive-semidefinite for any A But, k(ATA) = k(A)^2! Slower convergence, worse accuracy
BiCG (biconjugate gradient) is the approach to use for general A
![Page 29: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/29.jpg)
Multidimensional Minimizer [GSL]
Conjugate gradient Fletcher-Reeves, Polak-Ribiere
Quasi-Newton Broyden-Fletcher-Goldfarb-Shanno (BFGS) Utilizes 2nd order approximation
Steepest descent Inefficient (for demonstration purpose)
Simplex algorithm (Nelder and Mead) Without derivative
![Page 30: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/30.jpg)
GSL Example
Objective function: paraboloid
42
132
02),( ppyppxpyxf
Starting from (5,7)
30220110),( 22 yxyxf
![Page 31: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/31.jpg)
Conjugate gradientConverge in 12 iterations
Steepest descentConverge in 158 iterations
![Page 32: Optimization](https://reader034.fdocuments.in/reader034/viewer/2022051821/56815822550346895dc588ea/html5/thumbnails/32.jpg)
[Solutions in Numerical Recipe]
Sec.2.7 linbcg (biconjugate gradient): general AReference A implicitly through atimes
Sec.10.6 frprmn (minimization) Model test problem: spacetime, …