Optimization Class Project. MIPTOptimization Class Project. MIPT Introduction The most well-known...
Transcript of Optimization Class Project. MIPTOptimization Class Project. MIPT Introduction The most well-known...
![Page 1: Optimization Class Project. MIPTOptimization Class Project. MIPT Introduction The most well-known minimization technique for unconstrained problems is New-tons Method. In each iteration,](https://reader036.fdocuments.in/reader036/viewer/2022071416/6113ee033b205f7b926882eb/html5/thumbnails/1.jpg)
Benchmarking of quasi-Newton methodsIgor SokolovOptimization Class Project. MIPT
IntroductionThe most well-known minimization technique for unconstrained problems is New-tons Method. In each iteration, the step update is xk+1 = xk −
(∇2fk
)∇fk.
wever, the inverse of the Hessian has to be calculated in every iteration so it takesO(n3)
. Moreover, in some applications, the second derivatives may be unavailable.One fix to the problem is to use a finite difference approximation to the Hessian.
We consider solving the nonlinear unconstrained minimization problem
min f (x), x ∈ R
Let’s consider the following quadratic model of the objective function
mk(p) = fk +∇fTk p +12Bkp, where Bk = BTk , Bk � 0 is an n× n
The minimizer pk of this convex quadratic model pk = −B−1k ∇fk is used as thesearch direction, and the new iterate is
xk+1 = xk + αpk, let sk = αpkThe general structure of quasi-Newton method can be summarized as follows
• Given x0, B0(or H0), k → 0;
• For k = 0, 1, 2, . . .
Evaluate gradient gk.
Calculate sk by line search or trust region methods.
xk+1← xk + skyk ← gk+1 − gkUpdate Bk+1 or Hk+1 according to the quasi-Newton formulas.
End(for)
Basic requirement in each iteration, i.e., Bksk = yk (or Hkyk = sk)
Quasi-Newton Formulas for Optimization
BFGS
min ||H −Hk||,s.t H = HT , Hyk = sk
Hk+1 = (I − ρskyTk )Hk(I − ρyksTk ) + ρsks
Tk
where ρ = 1yTk sk
Bk+1 = Bk −Bksks
TkBk
sTkBksk+yky
Tk
yTk skDFP
min ||B −Bk||,s.t B = BT , Bsk = yk
Bk+1 = (I − γyksTk )Hk(I − γskyTk ) + γyky
Tk
where γ = 1yTk sk
Hk+1 = Hk −Hkyky
TkHk
yTkHkyk+sks
Tk
yTk skPSB
min ||B −Bk||,s.t (B −Bk) = (B −Bk)T ,Bsk = yk
Bk+1 = Bk −(yk−Bksk)s
Tk+sk(yk−Bksk)
T
sTk sk+
+sk(yk−Bksk)sks
Tk
(sTk sk)2
Hk+1 = Hk −(sk−Hkyk)y
Tk+yk(sk−Hkyk)
T
yTk yk+
+sk(sk−Hkyk)yky
Tk
(yTk yk)2
SR1
Bk+1 = Bk + σννT ,s.t Bk+1sk = yk
Bk+1 = Bk +(yk−Bksk)(yk−Bksk)
T
(yk−Bksk)Tsk,
Hk+1 = Hk +(sk−Hkyk)(sk−Hkyk)
T
(sk−Hkyk)Tyk
Method Advantages DisadvantagesBFGS
• H0 � 0 hence if H0 � 0
• self correcting property if Wolfechosen
• superlinear convergence
• yTk sk ≈ 0 formula produce badresults
• sensitive to round-off error
• sensitive to inaccurate line search
• can get stuck on a saddle pointfor the nonlinear problem
DFP• can be highly inefficient at cor-
recting large eigenvalues of ma-trices
• sensitive to round-off error
• sensitive to inaccurate linesearch
• can get stuck on a saddle pointfor nonlinear problem
PSB• superlinear convergence
SR1• garantees to be Bk+1 � 0 even
if skyk > 0 doesn’t satisfied
• very good approximations to theHessian matrices, often betterthan BFGS
Line Search vs. Trust Region
• Line search (strong Wolfe conditions)
f (xk + αkpk) ≤ f (xk) + c1αk∇Tk pk|f (xk + αkpk)
Tpk| ≤ c2|∇fTk pk|• Trust region
Both direction and step size find from solving
minp∈Rnmk(p) = fk +∇fTk p +12Bkp s.t ||p|| ≤ ∆k
Numerical Experiment
• All quasi-newton methods (BFGS,DFP, PSB, SR1) with two strategies (linesearch, trust region) were implemented in Python (overall 8 algorithms)
• 165 various N -d(N ≥ 2) strong benchmark problems
• For each algorithm, all problems were launched from the random point of domain100 times and results were averaged
Examples of benchmark problems
x1
10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0
x2
10.07.5
5.02.5
0.02.55.07.510.0
f(x1,
x 2)
0.00.20.40.60.81.0
NewFunction02 Test Function
x1
10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0
x2
10.07.5
5.02.5
0.02.55.07.510.0
f(x1,
x 2)
2.001.751.501.251.000.750.500.25
CrossInTray Test Function
Results
0.0 0.2 0.4 0.6 0.8 1.0Proportion of successfully solved problems
0
20
40
60
80
100
120
Num
ber o
f pro
blem
s
Distribution of successfully solved problems (line search)BFGS_LS(suc)DFP_LS(suc)PSB_LS(suc)SR1_LS(suc)
0.0 0.2 0.4 0.6 0.8 1.0Proportion of successfully solved problems
0
20
40
60
80
100
120
Num
ber o
f pro
blem
s
Distribution of successfully solved problems (trust region)BFGS_TR(suc)DFP_TR(suc)PSB_TR(suc)SR1_TR(suc)
Proportion of problems where have been success-fully found global minimizer in more than half of alllaunches
in more than0.99 launches
square under plot ofperformance profilefunction
Strategy BFGS DFP PSB SR1 TotalLS 0.21 0.12 2.58 0.16 0.08 2.54 0.09 0.05 2.66 0.09 0.05 2.97 0.07 0.04TR 0.18 0.12 2.93 0.16 0.1 2.91 0.19 0.14 2.93 0.11 0.06 2.83 0.1 0.05
Performance evaluation: ns - number of solvers, np - number of problems, ts,p -
time, rs,p =ts,p
min{ts,p:s∈S} - performance profile function
ρs(τ ) = 1npsize{p : 1 ≤ p ≤ np, log(rs,p ≤ τ )} - defines the probability for
solver s that the performance ratio rs,p is within a factor τ of the best possibleratio
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
P(sp
<)
Performanse profile, line search, CPU time plotBFGS_LSDFP_LSPSB_LSSR1_LS
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
P(sp
<)
Performanse profile, trust region, CPU time plotBFGS_TRDFP_TRPSB_TRSR1_TR
Conclusions and Further Work
• Trust region based strategy for all methods(except BFGS) successfully solvedmore problems than line search. It’s square metric also showed better results
• Though SR1 solved the least amount of problems it founded solution faster forline search
• Further, one may add to comparison L-BFGS-B
Source code of this work is available here [3]
AcknowledgementsThe author of the project expresses gratitude to Daniil Merkulov for his lectures ofoptimization and his support in this work.
[1] Ding, Yang, Enkeleida Lushi, and Qingguo Li. ”Investigation of quasi-Newtonmethods for unconstrained optimization.” Simon Fraser University, Canada(2004).
[2] Wright, Stephen, and Jorge Nocedal. Numerical optimization. Springer Science35.67-68 (1999)
[3] https://github.com/IgorSokoloff/benchmarking_of_quasi_
newton_methods