Optimization II: Unconstrained...
Transcript of Optimization II: Unconstrained...
![Page 1: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/1.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Optimization II:Unconstrained Multivariable
CS 205A:Mathematical Methods for Robotics, Vision, and Graphics
Doug James (and Justin Solomon)
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 1 / 24
![Page 2: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/2.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Announcements
I Today’s class:I Unconstrained optimization:
I Newton’s method (uses Hessians)I BFGS method (no Hessians)
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 2 / 24
![Page 3: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/3.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
UnconstrainedMultivariable Problems
minimizef : Rn→ R
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 3 / 24
![Page 4: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/4.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Recall
∇f (~x)
“Direction ofsteepest ascent”
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 4 / 24
![Page 5: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/5.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Recall
−∇f (~x)
“Direction ofsteepest descent”
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 5 / 24
![Page 6: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/6.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Observation
If ∇f (~x) 6= ~0, forsufficiently small α > 0,
f (~x− α∇f (~x)) ≤ f (~x)
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 6 / 24
![Page 7: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/7.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Gradient Descent Algorithm
Iterate until convergence:
1. gk(t) ≡ f (~xk − t∇f (~xk))2. Find t∗ ≥ 0 minimizing
(or decreasing) gk3. ~xk+1 ≡ ~xk − t∗∇f (~xk)
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 7 / 24
![Page 8: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/8.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Stopping Condition
∇f (~xk) ≈ ~0Don’t forget:
Check optimality!
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 8 / 24
![Page 9: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/9.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Line Search
gk(t) ≡ f (~xk − t∇f (~xk))
I One-dimensional optimization
I Don’t have to minimize completely:
Wolfe conditions
I Constant t: “Learning rate”
Worth reading about:Nesterov’s Accelerated Gradient Descent
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 9 / 24
![Page 10: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/10.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Line Search
gk(t) ≡ f (~xk − t∇f (~xk))
I One-dimensional optimization
I Don’t have to minimize completely:
Wolfe conditions
I Constant t: “Learning rate”
Worth reading about:Nesterov’s Accelerated Gradient Descent
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 9 / 24
![Page 11: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/11.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Gradient Descent (book)
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 10 / 24
![Page 12: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/12.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Newton’s Method (again!)
f (~x) ≈f (~xk) +∇f (~xk)>(~x− ~xk)
+1
2(~x− ~xk)>Hf(~xk)(~x− ~xk)
=⇒ ~xk+1 = ~xk − [Hf(~xk)]−1∇f (~xk)
Consideration:What if Hf is not positive (semi-)definite?
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 11 / 24
![Page 13: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/13.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Newton’s Method (again!)
f (~x) ≈f (~xk) +∇f (~xk)>(~x− ~xk)
+1
2(~x− ~xk)>Hf(~xk)(~x− ~xk)
=⇒ ~xk+1 = ~xk − [Hf(~xk)]−1∇f (~xk)
Consideration:What if Hf is not positive (semi-)definite?
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 11 / 24
![Page 14: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/14.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Newton’s Method (again!)
f (~x) ≈f (~xk) +∇f (~xk)>(~x− ~xk)
+1
2(~x− ~xk)>Hf(~xk)(~x− ~xk)
=⇒ ~xk+1 = ~xk − [Hf(~xk)]−1∇f (~xk)
Consideration:What if Hf is not positive (semi-)definite?
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 11 / 24
![Page 15: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/15.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Motivation
I∇f might be hard tocompute but Hf is harder
IHf might be dense: n2
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 12 / 24
![Page 16: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/16.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Quasi-Newton Methods
Approximate derivatives toavoid expensive calculations
e.g. secant, Broyden, ...
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 13 / 24
![Page 17: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/17.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Common Optimization Assumption
I∇f known
IHf unknown or hard tocompute
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 14 / 24
![Page 18: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/18.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Quasi-Newton Optimization
~xk+1 = ~xk − αkB−1k ∇f (~xk)Bk ≈ Hf(~xk)
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 15 / 24
![Page 19: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/19.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Warning
<advanced material>
See Nocedal & Wright
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 16 / 24
![Page 20: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/20.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Broyden-Style Update
Bk+1(~xk+1 − ~xk) =∇f (~xk+1)−∇f (~xk)
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 17 / 24
![Page 21: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/21.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Additional Considerations
I Bk should be symmetricI Bk should be positive
(semi-)definite
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 18 / 24
![Page 22: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/22.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Davidon-Fletcher-Powell (DFP)
minBk+1
‖Bk+1 −Bk‖
s.t. B>k+1 = Bk+1
Bk+1(~xk+1 − ~xk) = ∇f (~xk+1)−∇f (~xk)
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 19 / 24
![Page 23: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/23.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Observation
‖Bk+1−Bk‖ small does notmean ‖B−1k+1−B
−1k ‖ is small
Idea: Try to approximateB−1k directly
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 20 / 24
![Page 24: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/24.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Observation
‖Bk+1−Bk‖ small does notmean ‖B−1k+1−B
−1k ‖ is small
Idea: Try to approximateB−1k directly
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 20 / 24
![Page 25: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/25.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
BFGS Update
minHBk+1
‖Hk+1 −Hk‖
s.t. H>k+1 = Hk+1
~xk+1 − ~xk = Hk+1(∇f (~xk+1)−∇f (~xk))
State of the art!
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 21 / 24
![Page 26: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/26.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
BFGS (book - typo)
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 22 / 24
![Page 27: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/27.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Lots of Missing Details
I Choice of ‖ · ‖
I Limited-memoryalternative
Next
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 23 / 24
![Page 28: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s](https://reader035.fdocuments.in/reader035/viewer/2022071210/60213aeb8db7377c84405831/html5/thumbnails/28.jpg)
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff
Automatic Differentiation
I Techniques to numerically evaluate the derivative of a function
specified by a computer program.
I https://en.wikipedia.org/wiki/Automatic_differentiation
I Different from finite differences (approximation) and symbolic
differentiation.
I In Julia: http://www.juliadiff.orgI Example: ForwardDiff. (uses dual numbers)https://github.com/JuliaDiff/ForwardDiff.jl
Next
CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 24 / 24