Math 408A: Non-Linear Optimization
Transcript of Math 408A: Non-Linear Optimization
![Page 1: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/1.jpg)
Outline
Math 408A: Non-Linear Optimization
Lecture 15
February 12
Lecture 15
Math 408A: Non-Linear Optimization
![Page 2: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/2.jpg)
Outline
Lecture 15
Math 408A: Non-Linear Optimization
![Page 3: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/3.jpg)
Outline
Broyden Updates
Given g : Rn → Rn solve g(x) = 0.
Algorithm: Broyden’s Method
Initialization: x0 ∈ Rn, B0 ∈ Rn×n
Having (xk ,Bk) compute (xk+1,Bx+1) as follows:
Solve Bksk = −g(xk) for sk and set
xk+1 : = xk + sk
yk : = g(xk)− g(xk+1)
Bk+1 : = Bk +(yk − Bksk)skT
skTsk
.
Lecture 15
Math 408A: Non-Linear Optimization
![Page 4: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/4.jpg)
Outline
Broyden Updates
Algorithm: Broyden’s Method (Inverse Updating)
Initialization: x0 ∈ Rn, B0 ∈ Rn×n
Having (xk ,Bk) compute (xk+1,Bx+1) as follows:
Solve sk = −Jkg(xk) for sk and set
xk+1 : = xk + sk
yk : = g(xk)− g(xk+1)
Jk+1i = Jk +(sk − Jkyk)skT
Jk
skTJky
.
Lecture 15
Math 408A: Non-Linear Optimization
![Page 5: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/5.jpg)
Outline
MSE for Optimization
P : minimizex∈Rn
f (x) ,
where f : Rn → R is twice continuously differentiable.
Solve ∇f (x) = 0 using the Newton-Like iteration
xk+1 = xk −M−1k ∇f (xk) .
The MSE becomesMk+1s
k = yk ,
where
sk := xk+1 − xk and yk := ∇f (xk+1)−∇f (xk).
Lecture 15
Math 408A: Non-Linear Optimization
![Page 6: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/6.jpg)
Outline
MSE for Optimization
P : minimizex∈Rn
f (x) ,
where f : Rn → R is twice continuously differentiable.Solve ∇f (x) = 0 using the Newton-Like iteration
xk+1 = xk −M−1k ∇f (xk) .
The MSE becomesMk+1s
k = yk ,
where
sk := xk+1 − xk and yk := ∇f (xk+1)−∇f (xk).
Lecture 15
Math 408A: Non-Linear Optimization
![Page 7: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/7.jpg)
Outline
MSE for Optimization
P : minimizex∈Rn
f (x) ,
where f : Rn → R is twice continuously differentiable.Solve ∇f (x) = 0 using the Newton-Like iteration
xk+1 = xk −M−1k ∇f (xk) .
The MSE becomesMk+1s
k = yk ,
where
sk := xk+1 − xk and yk := ∇f (xk+1)−∇f (xk).
Lecture 15
Math 408A: Non-Linear Optimization
![Page 8: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/8.jpg)
Outline
MSE for Optimization
The Broyden update gives
Mk+1 = Mk +(yk −Mksk)skT
skTsk
.
This is unsatisfactory for two reasons:
1. Since Mk approximates ∇2f (xk) it must be symmetric.
2. Since we are minimizing, then Mk must be positive definite toinsure that sk = −M−1
k ∇f (xk) is a direction of descent for fat xk .
Lecture 15
Math 408A: Non-Linear Optimization
![Page 9: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/9.jpg)
Outline
MSE for Optimization
The Broyden update gives
Mk+1 = Mk +(yk −Mksk)skT
skTsk
.
This is unsatisfactory for two reasons:
1. Since Mk approximates ∇2f (xk) it must be symmetric.
2. Since we are minimizing, then Mk must be positive definite toinsure that sk = −M−1
k ∇f (xk) is a direction of descent for fat xk .
Lecture 15
Math 408A: Non-Linear Optimization
![Page 10: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/10.jpg)
Outline
SR1 Update
The Broyden class of updates is
Mk+1 = Mk +(yk −Mksk)vT
vT sk.
The only symmetric member of this class is obtained by taking
v = (yk −Mksk)
giving
Mk+1 = Mk +(yk −Mksk)(yk −Mksk)
T
(yk −Mksk)Tsk
.
The corresponding inverse update (by SMW) is
Hk+1 = Hk +(sk − Hkyk)(sk − Hkyk)T
(sk − Hkyk)T yk.
But this update may not be positive definite.
Lecture 15
Math 408A: Non-Linear Optimization
![Page 11: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/11.jpg)
Outline
SR1 Update
The Broyden class of updates is
Mk+1 = Mk +(yk −Mksk)vT
vT sk.
The only symmetric member of this class is obtained by taking
v = (yk −Mksk)
giving
Mk+1 = Mk +(yk −Mksk)(yk −Mksk)
T
(yk −Mksk)Tsk
.
The corresponding inverse update (by SMW) is
Hk+1 = Hk +(sk − Hkyk)(sk − Hkyk)T
(sk − Hkyk)T yk.
But this update may not be positive definite.
Lecture 15
Math 408A: Non-Linear Optimization
![Page 12: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/12.jpg)
Outline
SR1 Update
The Broyden class of updates is
Mk+1 = Mk +(yk −Mksk)vT
vT sk.
The only symmetric member of this class is obtained by taking
v = (yk −Mksk)
giving
Mk+1 = Mk +(yk −Mksk)(yk −Mksk)
T
(yk −Mksk)Tsk
.
The corresponding inverse update (by SMW) is
Hk+1 = Hk +(sk − Hkyk)(sk − Hkyk)T
(sk − Hkyk)T yk.
But this update may not be positive definite.Lecture 15
Math 408A: Non-Linear Optimization
![Page 13: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/13.jpg)
Outline
Positive Definite Updating
Basic Problem:Let M ∈ Rn×n be a given symmetric positive definite (spd) matrix.Given s, y ∈ Rn find an spd matrix M+ such that M+s = y andM+ is “close” to M.
We know that “close” cannot mean a rank one modification sincethe SR1 may not preserve positive definiteness.
Let’s try another approach that combines the Broyden update withyet another important property of symmetric matrices.
Lecture 15
Math 408A: Non-Linear Optimization
![Page 14: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/14.jpg)
Outline
Positive Definite Updating
Basic Problem:Let M ∈ Rn×n be a given symmetric positive definite (spd) matrix.Given s, y ∈ Rn find an spd matrix M+ such that M+s = y andM+ is “close” to M.
We know that “close” cannot mean a rank one modification sincethe SR1 may not preserve positive definiteness.
Let’s try another approach that combines the Broyden update withyet another important property of symmetric matrices.
Lecture 15
Math 408A: Non-Linear Optimization
![Page 15: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/15.jpg)
Outline
Positive Definite Updating
Basic Problem:Let M ∈ Rn×n be a given symmetric positive definite (spd) matrix.Given s, y ∈ Rn find an spd matrix M+ such that M+s = y andM+ is “close” to M.
We know that “close” cannot mean a rank one modification sincethe SR1 may not preserve positive definiteness.
Let’s try another approach that combines the Broyden update withyet another important property of symmetric matrices.
Lecture 15
Math 408A: Non-Linear Optimization
![Page 16: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/16.jpg)
Outline
Positive Definite Updating
Every spd matrix M can be written in the form M = LLT for somenon-singular matrix L.
Indeed, the LU factorization can be used to give the so calledCholesky factorization M = LLT where L is lower triangular.
Similarly, if M+ exists, then there must be a nonsingular matrix Jsuch that M+ = JJT .
Lecture 15
Math 408A: Non-Linear Optimization
![Page 17: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/17.jpg)
Outline
Positive Definite Updating
Every spd matrix M can be written in the form M = LLT for somenon-singular matrix L.
Indeed, the LU factorization can be used to give the so calledCholesky factorization M = LLT where L is lower triangular.
Similarly, if M+ exists, then there must be a nonsingular matrix Jsuch that M+ = JJT .
Lecture 15
Math 408A: Non-Linear Optimization
![Page 18: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/18.jpg)
Outline
Positive Definite Updating
Every spd matrix M can be written in the form M = LLT for somenon-singular matrix L.
Indeed, the LU factorization can be used to give the so calledCholesky factorization M = LLT where L is lower triangular.
Similarly, if M+ exists, then there must be a nonsingular matrix Jsuch that M+ = JJT .
Lecture 15
Math 408A: Non-Linear Optimization
![Page 19: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/19.jpg)
Outline
Positive Definite Updating
So M = LLT and M+ = JJT with M+s = y .
Therefore, if we set v = JT s, then Jv = y .
We apply the Broyden update to Jv = y and L to get
J = L +(y − Lv)vT
vTv.
Using v = JT s, we get
v = JT s = LT s +v(y − Lv)T s
vTv.
But then v = αLT s for some α ∈ R.
Lecture 15
Math 408A: Non-Linear Optimization
![Page 20: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/20.jpg)
Outline
Positive Definite Updating
So M = LLT and M+ = JJT with M+s = y .
Therefore, if we set v = JT s, then Jv = y .
We apply the Broyden update to Jv = y and L to get
J = L +(y − Lv)vT
vTv.
Using v = JT s, we get
v = JT s = LT s +v(y − Lv)T s
vTv.
But then v = αLT s for some α ∈ R.
Lecture 15
Math 408A: Non-Linear Optimization
![Page 21: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/21.jpg)
Outline
Positive Definite Updating
So M = LLT and M+ = JJT with M+s = y .
Therefore, if we set v = JT s, then Jv = y .
We apply the Broyden update to Jv = y and L to get
J = L +(y − Lv)vT
vTv.
Using v = JT s, we get
v = JT s = LT s +v(y − Lv)T s
vTv.
But then v = αLT s for some α ∈ R.
Lecture 15
Math 408A: Non-Linear Optimization
![Page 22: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/22.jpg)
Outline
Positive Definite Updating
So M = LLT and M+ = JJT with M+s = y .
Therefore, if we set v = JT s, then Jv = y .
We apply the Broyden update to Jv = y and L to get
J = L +(y − Lv)vT
vTv.
Using v = JT s, we get
v = JT s = LT s +v(y − Lv)T s
vTv.
But then v = αLT s for some α ∈ R.
Lecture 15
Math 408A: Non-Linear Optimization
![Page 23: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/23.jpg)
Outline
Positive Definite Updating
Hence
v = JT s = LT s +v(y − Lv)T s
vTv
with v = αLT s.
Plugging in we get
αLT s = LT s +αLT s(y − αLLT s)T s
α2sTLLT s.
Hence
α2 =
[sTy
sTMs
].
Lecture 15
Math 408A: Non-Linear Optimization
![Page 24: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/24.jpg)
Outline
Positive Definite Updating
Hence
v = JT s = LT s +v(y − Lv)T s
vTv
with v = αLT s.
Plugging in we get
αLT s = LT s +αLT s(y − αLLT s)T s
α2sTLLT s.
Hence
α2 =
[sTy
sTMs
].
Lecture 15
Math 408A: Non-Linear Optimization
![Page 25: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/25.jpg)
Outline
Positive Definite Updating
Hence
v = JT s = LT s +v(y − Lv)T s
vTv
with v = αLT s.
Plugging in we get
αLT s = LT s +αLT s(y − αLLT s)T s
α2sTLLT s.
Hence
α2 =
[sTy
sTMs
].
Lecture 15
Math 408A: Non-Linear Optimization
![Page 26: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/26.jpg)
Outline
Broyden-Fletcher-Goldfarb-Shanno (BFGS) Updating
Therefore, this update only exists if sTy > 0 in which case
J = L +(y − αMs)sTL
αsTMs,
with
α =
[sTy
sTMs
]1/2
,
yielding
M+ = M +yyT
yT s− MssTM
sTMs.
This is called the BFGS update. It is currently viewed as the bestMSE update available for optimization due to its outstandingperformace in practise.
Lecture 15
Math 408A: Non-Linear Optimization
![Page 27: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/27.jpg)
Outline
BFGS Updating
In addition, we note that the M+ can be obtained directly from thematrices J.
If the QR factorization of JT is JT = QR, we can set L+ = Ryielding
M+ = JJT = RTQTQR = L+LT+.
Lecture 15
Math 408A: Non-Linear Optimization
![Page 28: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/28.jpg)
Outline
Inverse BFGS Updating
Sherman-Morrison-Woodbury formula again gives the inverseupdate
H+ = H +(s + Hy)T yssT
(sT y)2− HysT + syTH
sT y,
where H = M−1.
Lecture 15
Math 408A: Non-Linear Optimization
![Page 29: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/29.jpg)
Outline
How to insure sTy > 0
The BFGS update preserves both symmetry and positive
definiteness when skTyk > 0. But is this condition reasonable or
even implementable?
Recall thaty = yk = ∇f (xk+1)−∇f (xk)
andsk = xk+1 − xk = tkdk ,
wheredk = −tkHk∇f (xk)
is the search direction and tk > 0 is the stepsize.Hence
ykTsk = ∇f (xk+1)T sk −∇f (xk)T sk
= tk(∇f (xk + tkdk)Tdk −∇f (xk)Tdk) ,
Lecture 15
Math 408A: Non-Linear Optimization
![Page 30: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/30.jpg)
Outline
How to insure sTy > 0
The BFGS update preserves both symmetry and positive
definiteness when skTyk > 0. But is this condition reasonable or
even implementable?Recall that
y = yk = ∇f (xk+1)−∇f (xk)
andsk = xk+1 − xk = tkdk ,
wheredk = −tkHk∇f (xk)
is the search direction and tk > 0 is the stepsize.
Hence
ykTsk = ∇f (xk+1)T sk −∇f (xk)T sk
= tk(∇f (xk + tkdk)Tdk −∇f (xk)Tdk) ,
Lecture 15
Math 408A: Non-Linear Optimization
![Page 31: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/31.jpg)
Outline
How to insure sTy > 0
The BFGS update preserves both symmetry and positive
definiteness when skTyk > 0. But is this condition reasonable or
even implementable?Recall that
y = yk = ∇f (xk+1)−∇f (xk)
andsk = xk+1 − xk = tkdk ,
wheredk = −tkHk∇f (xk)
is the search direction and tk > 0 is the stepsize.Hence
ykTsk = ∇f (xk+1)T sk −∇f (xk)T sk
= tk(∇f (xk + tkdk)Tdk −∇f (xk)Tdk) ,Lecture 15
Math 408A: Non-Linear Optimization
![Page 32: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/32.jpg)
Outline
How to insure sTy > 0
ykTsk = ∇f (xk+1)T sk −∇f (xk)T sk
= tk(∇f (xk + tkdk)Tdk −∇f (xk)Tdk) ,
Since Hk is sdp, ∇f (xk)Tdk < 0 so tk > 0.Therefore, to get skT
yk > 0 we must show tk > 0 can be choosenso that
∇f (xk + tkdk)Tdk ≥ β∇f (xk)Tdk
for some β ∈ (0, 1) since in this case
∇f (xk + tkdk)Tdk −∇f (xk)Tdk ≥ (β − 1)∇f (xk)Tdk > 0.
But this precisely the second condition in the weak Wolfeconditions with β = c2. Hence a successful BFGS update canalways be obtained.
Lecture 15
Math 408A: Non-Linear Optimization
![Page 33: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/33.jpg)
Outline
How to insure sTy > 0
ykTsk = ∇f (xk+1)T sk −∇f (xk)T sk
= tk(∇f (xk + tkdk)Tdk −∇f (xk)Tdk) ,
Since Hk is sdp, ∇f (xk)Tdk < 0 so tk > 0.
Therefore, to get skTyk > 0 we must show tk > 0 can be choosen
so that∇f (xk + tkdk)Tdk ≥ β∇f (xk)Tdk
for some β ∈ (0, 1) since in this case
∇f (xk + tkdk)Tdk −∇f (xk)Tdk ≥ (β − 1)∇f (xk)Tdk > 0.
But this precisely the second condition in the weak Wolfeconditions with β = c2. Hence a successful BFGS update canalways be obtained.
Lecture 15
Math 408A: Non-Linear Optimization
![Page 34: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/34.jpg)
Outline
How to insure sTy > 0
ykTsk = ∇f (xk+1)T sk −∇f (xk)T sk
= tk(∇f (xk + tkdk)Tdk −∇f (xk)Tdk) ,
Since Hk is sdp, ∇f (xk)Tdk < 0 so tk > 0.Therefore, to get skT
yk > 0 we must show tk > 0 can be choosenso that
∇f (xk + tkdk)Tdk ≥ β∇f (xk)Tdk
for some β ∈ (0, 1) since in this case
∇f (xk + tkdk)Tdk −∇f (xk)Tdk ≥ (β − 1)∇f (xk)Tdk > 0.
But this precisely the second condition in the weak Wolfeconditions with β = c2. Hence a successful BFGS update canalways be obtained.
Lecture 15
Math 408A: Non-Linear Optimization
![Page 35: Math 408A: Non-Linear Optimization](https://reader033.fdocuments.in/reader033/viewer/2022041718/6253ecd6064fac3d4423f0c7/html5/thumbnails/35.jpg)
Outline
How to insure sTy > 0
ykTsk = ∇f (xk+1)T sk −∇f (xk)T sk
= tk(∇f (xk + tkdk)Tdk −∇f (xk)Tdk) ,
Since Hk is sdp, ∇f (xk)Tdk < 0 so tk > 0.Therefore, to get skT
yk > 0 we must show tk > 0 can be choosenso that
∇f (xk + tkdk)Tdk ≥ β∇f (xk)Tdk
for some β ∈ (0, 1) since in this case
∇f (xk + tkdk)Tdk −∇f (xk)Tdk ≥ (β − 1)∇f (xk)Tdk > 0.
But this precisely the second condition in the weak Wolfeconditions with β = c2. Hence a successful BFGS update canalways be obtained.
Lecture 15
Math 408A: Non-Linear Optimization