Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate...
Transcript of Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate...
![Page 1: Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate of gradient It = It "get+ £19) Approximation Step sire J-of the gradient Requirement](https://reader034.fdocuments.in/reader034/viewer/2022052020/60346b9a4f564d6ed447aba4/html5/thumbnails/1.jpg)
Lecture 14 : Stochastic Variation Inference I
Scribes : Dat,
Jabien, Ting
![Page 2: Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate of gradient It = It "get+ £19) Approximation Step sire J-of the gradient Requirement](https://reader034.fdocuments.in/reader034/viewer/2022052020/60346b9a4f564d6ed447aba4/html5/thumbnails/2.jpg)
Recap : Vaniatianal Inference
foal : Approximate Posterior
Generative Model : plx ,z , 13 ) = pcx 17,13 ) PHIPIB )
Variational Dist : q( 7:10 ) 9113 :D ) I p( 7,131×7
Objective : Evidence Lower Bound ( ELBO )
leg PCX 7,13 )L ( 10,19 ) = Etqlz :q)q( P :D )
got ;q)q( Bid )
= log pcx ) - KL( 917,13 :¢ ,9 ) pct ,pl× ) )
![Page 3: Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate of gradient It = It "get+ £19) Approximation Step sire J-of the gradient Requirement](https://reader034.fdocuments.in/reader034/viewer/2022052020/60346b9a4f564d6ed447aba4/html5/thumbnails/3.jpg)
Latent Diniohlet Allocation : Generative Model
Generative model :
Pu ~ Dirichktlw )xdn Bh
Oa ~ Dirichlet ( x ) K
2- dn~ Discrete ( Od ) on A Zan 9d×dnl7dn=h~ Discrete ( Pa ) Nd
D
pcx , 7,13 ) = ( th pcxalza ,p ) Peta ) ) ( th.pl But )p(7a
:O) = |dOd p(7a19d ) pl Odin
![Page 4: Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate of gradient It = It "get+ £19) Approximation Step sire J-of the gradient Requirement](https://reader034.fdocuments.in/reader034/viewer/2022052020/60346b9a4f564d6ed447aba4/html5/thumbnails/4.jpg)
LDA Global vs Local Parameters
Generative Model : plx ,z , 13 ) =
pc× 17,13 ) pH )p( 137
Variational Approx : qc 7 ;¢ ) qlp :D ) I pc 7,13 1×7
local global- -
p( x. 7, p ) = ( l§
,
p( Xdltd ,p)p(7d ) ) §..pl/3u)ql7:q)qlp:7)=(jPf.qc7aio.a
) ) th 91Pa :%)
tolocal Variation purams global ( variational ) purms( only depend on doc d )
![Page 5: Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate of gradient It = It "get+ £19) Approximation Step sire J-of the gradient Requirement](https://reader034.fdocuments.in/reader034/viewer/2022052020/60346b9a4f564d6ed447aba4/html5/thumbnails/5.jpg)
Stochastic Variational Inference
Problem : Wikipedia has 5Mt entries. If we
wanted to do VBEM then we need updates
7 =
anggnax£14,1 )¢a=
argonanaxLCold ,¢d ,D )
Now each of these updates -
requires a full pass
over the 5Mt documents
Solution ; Optimize I with stochastic
gradient descent
![Page 6: Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate of gradient It = It "get+ £19) Approximation Step sire J-of the gradient Requirement](https://reader034.fdocuments.in/reader034/viewer/2022052020/60346b9a4f564d6ed447aba4/html5/thumbnails/6.jpg)
Stochastic Gradient
DescentAscent
Idea : Optimize objective with noisy estimate of gradient
It =It "
+get,£19 )
ApproximationStep sire J - of thegradient
Requirement I : Gradient estimate should be unbiased
EH,n£(g) ) = VILLA ) ( Intuition : Random walk
with drift,
in direction
Requirement 2 , Robbins . Monro conditionsotsmaimt )
oo
§, ,
ft = a ( Infinite §,
ft' < a ( Finite variance )
mean
displacement I
![Page 7: Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate of gradient It = It "get+ £19) Approximation Step sire J-of the gradient Requirement](https://reader034.fdocuments.in/reader034/viewer/2022052020/60346b9a4f564d6ed447aba4/html5/thumbnails/7.jpg)
Intuition : Choosing the Step She
€Intuition : Choose step size
L 7
g inversely proportional
a.MY#IionIFIiitioI=•fa
xt "
=
×t+gH"0×11×1 Hi
; =
¥
dxidxjHessian
![Page 8: Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate of gradient It = It "get+ £19) Approximation Step sire J-of the gradient Requirement](https://reader034.fdocuments.in/reader034/viewer/2022052020/60346b9a4f564d6ed447aba4/html5/thumbnails/8.jpg)
Stochastic Variation al Inference
Approximation : Compute ELBO for batch of docs
2110,9 )-
pc x. 7,137£17 ) =
mqax#
qa ;¢,q( p ;D ,&Jq( z ;¢)q( is ;D )
= §. moya
#an ;¢a , ,p⇒lbsPYY.la?Ifh
"
](
+ is #up ... ,legFi÷l )
![Page 9: Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate of gradient It = It "get+ £19) Approximation Step sire J-of the gradient Requirement](https://reader034.fdocuments.in/reader034/viewer/2022052020/60346b9a4f564d6ed447aba4/html5/thumbnails/9.jpg)
Stochastic Variation al Inference
Approximation : Compute ELBO for batch of does
D
£ ' ' '
={( my#
an :#mm , by P
"gYiIl!y-
? )
§ KL(q( p :D ) Hp ( p ) )
Choose batch of does : b ~ Unif( { 1
, ...
,D } )
B
D ( ×sFsI 19 ) =5[( my#aa. ;¢.iq#osb2Pa.abMyb=i
b b
§ KL(q( p :D ) Hp ( p ) )
![Page 10: Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate of gradient It = It "get+ £19) Approximation Step sire J-of the gradient Requirement](https://reader034.fdocuments.in/reader034/viewer/2022052020/60346b9a4f564d6ed447aba4/html5/thumbnails/10.jpg)
Stochastic Vaniatimal Inference
TL ; DR ; For conditionally conjugate exponential
family models we can compute a natural gradientf PCO ;a7
t.LU ) = Eq , , ;¢ ,[ ygk , 7,4 ) ] - I
~ Natural param .
D
ng( x. 7,4 = ( 4.+ 2 tlxa, ta ) , x. +
D )de '
\ sufficient Stats
This yields gradient updates
It=
It
"+gt§LhY,⇒←..=
It '
'll . stl 's ftp.nlnslst
![Page 11: Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate of gradient It = It "get+ £19) Approximation Step sire J-of the gradient Requirement](https://reader034.fdocuments.in/reader034/viewer/2022052020/60346b9a4f564d6ed447aba4/html5/thumbnails/11.jpg)
Natural Gradients : Coordinate Transformations
Example : Suppose we have two equivalent
sets at coordinates (X , ,Xz ) and ( t.kz )~
X,
X,
= [,
Xz Iz Iz = ÷
~
x.
X.
Llx , ,×d= - ( ¥ + F) LCI, ,I.l= ( xT+Ei )
![Page 12: Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate of gradient It = It "get+ £19) Approximation Step sire J-of the gradient Requirement](https://reader034.fdocuments.in/reader034/viewer/2022052020/60346b9a4f564d6ed447aba4/html5/thumbnails/12.jpg)
Gradient Values Change with Coordinates
4. o I
4I. =
±
off =L,
6,
• I ,= I• die =
'
-
£ Fx, gz
/ 62
×. I
Llx , ,xd= - ( ¥i+¥,
) LCI, ,Ii= . kite )
2£ = - 2g 2£ = - 25, = 6
,%
2×1 6,2 ZI,
OX ,
8£ = - ZXZ 0£ = - 2Iz = 62 dL2×2 Fe dxi Fx ,
![Page 13: Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate of gradient It = It "get+ £19) Approximation Step sire J-of the gradient Requirement](https://reader034.fdocuments.in/reader034/viewer/2022052020/60346b9a4f564d6ed447aba4/html5/thumbnails/13.jpg)
Coordinate - Invariant Gradients
Idea : Enforce requirement that the
oldrate of change must at must be invariant
d1= d±T(0×11×11 =DIT ( OELCII ) da*gt = @ LK )at at at
Jacobian : Matrix of partial derivatives
J= 9 ' 25 '
T*£i⇒=Jt(P×L(a)
OF,dxnz
0×2 24Chain Rule
* a ¥ .
;
= ? ¥,9Ei=?5ii¥ :
![Page 14: Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate of gradient It = It "get+ £19) Approximation Step sire J-of the gradient Requirement](https://reader034.fdocuments.in/reader034/viewer/2022052020/60346b9a4f564d6ed447aba4/html5/thumbnails/14.jpg)
Coordinate - Invariant Gradients
Assumption :
daymoves along 7£41
- t d× = Tx Llx ,
I LK , = J TELCE ) at
DI= TE Lei )
atk¥0×21×7 =(0×21×1)'s (0×21×1)
= (EELKYTJ'It(be£cx
, )
* VINET #Lies )
![Page 15: Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate of gradient It = It "get+ £19) Approximation Step sire J-of the gradient Requirement](https://reader034.fdocuments.in/reader034/viewer/2022052020/60346b9a4f564d6ed447aba4/html5/thumbnails/15.jpg)
Coordinate - Invariant Gradients
dxSolution : Define natural gradient at = f×[ 1 x ,
By L (E) = ( JTJJ'
VILII ) 5×4×1 = task ,
= 5'
##P×£k) = J
"8×11×1nDIask , = ( a £ kilt ( 0×21×1 )
at
= CfeLWT #¥ VILKI
= ( Fifa ) )t( En La ) = TEETH
![Page 16: Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate of gradient It = It "get+ £19) Approximation Step sire J-of the gradient Requirement](https://reader034.fdocuments.in/reader034/viewer/2022052020/60346b9a4f564d6ed447aba4/html5/thumbnails/16.jpg)
Coordinate - Invariant Gradients
Distance Metrics : 6=1 Ii ] I = JTGJ
dxtdx = dx ? + dxzz =
dettcdx/ =
JTJ
= DET ( JTJ ) DE = DIT£de
~
E, = E I
,= E J :| 6.
a ] 6 = Isis, ]dxtdx = 5,2 DE? t 63 DE }
* ' ' '
www.?ii!i.nFEEIi?ki
![Page 17: Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate of gradient It = It "get+ £19) Approximation Step sire J-of the gradient Requirement](https://reader034.fdocuments.in/reader034/viewer/2022052020/60346b9a4f564d6ed447aba4/html5/thumbnails/17.jpg)
Natural Gradients in Vaniational Inference
Distance Metric ; Symmetric KL divergence
KL "m( 9.7' ) = Eap ; , ,llg9qMp¥o;) Kllalrsilllqassi'D
+ Ftaanoillggtfsfits )1<491BSIIHQCB;D ))
ddt 6C 7) di = KL "m( 9. �1�+ di )
f , £19 ) = Gills ) t.LI/ ) Gt ) = Otago )