Optimization Class Project. MIPTOptimization Class Project. MIPT Introduction The most well-known...

Benchmarking of quasi-Newton methodsIgor SokolovOptimization Class Project. MIPT

IntroductionThe most well-known minimization technique for unconstrained problems is New-tons Method. In each iteration, the step update is xk+1 = xk −

(∇2fk

)∇fk.

wever, the inverse of the Hessian has to be calculated in every iteration so it takesO(n3)

. Moreover, in some applications, the second derivatives may be unavailable.One fix to the problem is to use a finite difference approximation to the Hessian.

We consider solving the nonlinear unconstrained minimization problem

min f (x), x ∈ R

Let’s consider the following quadratic model of the objective function

mk(p) = fk +∇fTk p +12Bkp, where Bk = BTk , Bk � 0 is an n× n

The minimizer pk of this convex quadratic model pk = −B−1k ∇fk is used as thesearch direction, and the new iterate is

xk+1 = xk + αpk, let sk = αpkThe general structure of quasi-Newton method can be summarized as follows

• Given x0, B0(or H0), k → 0;

• For k = 0, 1, 2, . . .

Evaluate gradient gk.

Calculate sk by line search or trust region methods.

xk+1← xk + skyk ← gk+1 − gkUpdate Bk+1 or Hk+1 according to the quasi-Newton formulas.

End(for)

Basic requirement in each iteration, i.e., Bksk = yk (or Hkyk = sk)

Quasi-Newton Formulas for Optimization

BFGS

min ||H −Hk||,s.t H = HT , Hyk = sk

Hk+1 = (I − ρskyTk )Hk(I − ρyksTk ) + ρsks

Tk

where ρ = 1yTk sk

Bk+1 = Bk −Bksks

TkBk

sTkBksk+yky

Tk

yTk skDFP

min ||B −Bk||,s.t B = BT , Bsk = yk

Bk+1 = (I − γyksTk )Hk(I − γskyTk ) + γyky

Tk

where γ = 1yTk sk

Hk+1 = Hk −Hkyky

TkHk

yTkHkyk+sks

Tk

yTk skPSB

min ||B −Bk||,s.t (B −Bk) = (B −Bk)T ,Bsk = yk

Bk+1 = Bk −(yk−Bksk)s

Tk+sk(yk−Bksk)

T

sTk sk+

+sk(yk−Bksk)sks

Tk

(sTk sk)2

Hk+1 = Hk −(sk−Hkyk)y

Tk+yk(sk−Hkyk)

T

yTk yk+

+sk(sk−Hkyk)yky

Tk

(yTk yk)2

SR1

Bk+1 = Bk + σννT ,s.t Bk+1sk = yk

Bk+1 = Bk +(yk−Bksk)(yk−Bksk)

T

(yk−Bksk)Tsk,

Hk+1 = Hk +(sk−Hkyk)(sk−Hkyk)

T

(sk−Hkyk)Tyk

Method Advantages DisadvantagesBFGS

• H0 � 0 hence if H0 � 0

• self correcting property if Wolfechosen

• superlinear convergence

• yTk sk ≈ 0 formula produce badresults

• sensitive to round-off error

• sensitive to inaccurate line search

• can get stuck on a saddle pointfor the nonlinear problem

DFP• can be highly inefficient at cor-

recting large eigenvalues of ma-trices

• sensitive to round-off error

• sensitive to inaccurate linesearch

• can get stuck on a saddle pointfor nonlinear problem

PSB• superlinear convergence

SR1• garantees to be Bk+1 � 0 even

if skyk > 0 doesn’t satisfied

• very good approximations to theHessian matrices, often betterthan BFGS

Line Search vs. Trust Region

• Line search (strong Wolfe conditions)

f (xk + αkpk) ≤ f (xk) + c1αk∇Tk pk|f (xk + αkpk)

Tpk| ≤ c2|∇fTk pk|• Trust region

Both direction and step size find from solving

minp∈Rnmk(p) = fk +∇fTk p +12Bkp s.t ||p|| ≤ ∆k

Numerical Experiment

• All quasi-newton methods (BFGS,DFP, PSB, SR1) with two strategies (linesearch, trust region) were implemented in Python (overall 8 algorithms)

• 165 various N -d(N ≥ 2) strong benchmark problems

• For each algorithm, all problems were launched from the random point of domain100 times and results were averaged

Examples of benchmark problems

x1

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

x2

10.07.5

5.02.5

0.02.55.07.510.0

f(x1,

x 2)

0.00.20.40.60.81.0

NewFunction02 Test Function

x1

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

x2

10.07.5

5.02.5

0.02.55.07.510.0

f(x1,

x 2)

2.001.751.501.251.000.750.500.25

CrossInTray Test Function

Results

0.0 0.2 0.4 0.6 0.8 1.0Proportion of successfully solved problems

0

20

40

60

80

100

120

Num

ber o

f pro

blem

s

Distribution of successfully solved problems (line search)BFGS_LS(suc)DFP_LS(suc)PSB_LS(suc)SR1_LS(suc)

0.0 0.2 0.4 0.6 0.8 1.0Proportion of successfully solved problems

0

20

40

60

80

100

120

Num

ber o

f pro

blem

s

Distribution of successfully solved problems (trust region)BFGS_TR(suc)DFP_TR(suc)PSB_TR(suc)SR1_TR(suc)

Proportion of problems where have been success-fully found global minimizer in more than half of alllaunches

in more than0.99 launches

square under plot ofperformance profilefunction

Strategy BFGS DFP PSB SR1 TotalLS 0.21 0.12 2.58 0.16 0.08 2.54 0.09 0.05 2.66 0.09 0.05 2.97 0.07 0.04TR 0.18 0.12 2.93 0.16 0.1 2.91 0.19 0.14 2.93 0.11 0.06 2.83 0.1 0.05

Performance evaluation: ns - number of solvers, np - number of problems, ts,p -

time, rs,p =ts,p

min{ts,p:s∈S} - performance profile function

ρs(τ ) = 1npsize{p : 1 ≤ p ≤ np, log(rs,p ≤ τ )} - defines the probability for

solver s that the performance ratio rs,p is within a factor τ of the best possibleratio

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

P(sp

<)

Performanse profile, line search, CPU time plotBFGS_LSDFP_LSPSB_LSSR1_LS

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

P(sp

<)

Performanse profile, trust region, CPU time plotBFGS_TRDFP_TRPSB_TRSR1_TR

Conclusions and Further Work

• Trust region based strategy for all methods(except BFGS) successfully solvedmore problems than line search. It’s square metric also showed better results

• Though SR1 solved the least amount of problems it founded solution faster forline search

• Further, one may add to comparison L-BFGS-B

Source code of this work is available here [3]

AcknowledgementsThe author of the project expresses gratitude to Daniil Merkulov for his lectures ofoptimization and his support in this work.

[1] Ding, Yang, Enkeleida Lushi, and Qingguo Li. ”Investigation of quasi-Newtonmethods for unconstrained optimization.” Simon Fraser University, Canada(2004).

[2] Wright, Stephen, and Jorge Nocedal. Numerical optimization. Springer Science35.67-68 (1999)

[3] https://github.com/IgorSokoloff/benchmarking_of_quasi_

newton_methods

https://github.com/IgorSokoloff/benchmarking_of_quasi_newton_methods

https://github.com/IgorSokoloff/benchmarking_of_quasi_newton_methods

Optimization Class Project. MIPTOptimization Class Project. MIPT Introduction The most well-known...

Documents

Transcript of Optimization Class Project. MIPTOptimization Class Project. MIPT Introduction The most well-known...