Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate...

Lecture 14 : Stochastic Variation Inference I

Scribes : Dat,

Jabien, Ting

Recap : Vaniatianal Inference

foal : Approximate Posterior

Generative Model : plx ,z , 13 ) = pcx 17,13 ) PHIPIB )

Variational Dist : q( 7:10 ) 9113 :D ) I p( 7,131×7

Objective : Evidence Lower Bound ( ELBO )

leg PCX 7,13 )L ( 10,19 ) = Etqlz :q)q( P :D )

got ;q)q( Bid )

= log pcx ) - KL( 917,13 :¢ ,9 ) pct ,pl× ) )

Latent Diniohlet Allocation : Generative Model

Generative model :

Pu ~ Dirichktlw )xdn Bh

Oa ~ Dirichlet ( x ) K

2- dn~ Discrete ( Od ) on A Zan 9d×dnl7dn=h~ Discrete ( Pa ) Nd

D

pcx , 7,13 ) = ( th pcxalza ,p ) Peta ) ) ( th.pl But )p(7a

:O) = |dOd p(7a19d ) pl Odin

LDA Global vs Local Parameters

Generative Model : plx ,z , 13 ) =

pc× 17,13 ) pH )p( 137

Variational Approx : qc 7 ;¢ ) qlp :D ) I pc 7,13 1×7

local global- -

p( x. 7, p ) = ( l§

,

p( Xdltd ,p)p(7d ) ) §..pl/3u)ql7:q)qlp:7)=(jPf.qc7aio.a

) ) th 91Pa :%)

tolocal Variation purams global ( variational ) purms( only depend on doc d )

Stochastic Variational Inference

Problem : Wikipedia has 5Mt entries. If we

wanted to do VBEM then we need updates

7 =

anggnax£14,1 )¢a=

argonanaxLCold ,¢d ,D )

Now each of these updates -

requires a full pass

over the 5Mt documents

Solution ; Optimize I with stochastic

gradient descent

Stochastic Gradient

DescentAscent

Idea : Optimize objective with noisy estimate of gradient

It =It "

+get,£19 )

ApproximationStep sire J - of thegradient

Requirement I : Gradient estimate should be unbiased

EH,n£(g) ) = VILLA ) ( Intuition : Random walk

with drift,

in direction

Requirement 2 , Robbins . Monro conditionsotsmaimt )

oo

§, ,

ft = a ( Infinite §,

ft' < a ( Finite variance )

mean

displacement I

Intuition : Choosing the Step She

€Intuition : Choose step size

L 7

g inversely proportional

a.MY#IionIFIiitioI=•fa

xt "

=

×t+gH"0×11×1 Hi

; =

¥

dxidxjHessian

Stochastic Variation al Inference

Approximation : Compute ELBO for batch of docs

2110,9 )-

pc x. 7,137£17 ) =

mqax#

qa ;¢,q( p ;D ,&Jq( z ;¢)q( is ;D )

= §. moya

#an ;¢a , ,p⇒lbsPYY.la?Ifh

"

](

+ is #up ... ,legFi÷l )

Stochastic Variation al Inference

Approximation : Compute ELBO for batch of does

D

£ ' ' '

={( my#

an :#mm , by P

"gYiIl!y-

? )

§ KL(q( p :D ) Hp ( p ) )

Choose batch of does : b ~ Unif( { 1

, ...

,D } )

B

D ( ×sFsI 19 ) =5[( my#aa. ;¢.iq#osb2Pa.abMyb=i

b b

§ KL(q( p :D ) Hp ( p ) )

Stochastic Vaniatimal Inference

TL ; DR ; For conditionally conjugate exponential

family models we can compute a natural gradientf PCO ;a7

t.LU ) = Eq , , ;¢ ,[ ygk , 7,4 ) ] - I

~ Natural param .

D

ng( x. 7,4 = ( 4.+ 2 tlxa, ta ) , x. +

D )de '

\ sufficient Stats

This yields gradient updates

It=

It

"+gt§LhY,⇒←..=

It '

'll . stl 's ftp.nlnslst

Natural Gradients : Coordinate Transformations

Example : Suppose we have two equivalent

sets at coordinates (X , ,Xz ) and ( t.kz )~

X,

X,

= [,

Xz Iz Iz = ÷

~

x.

X.

Llx , ,×d= - ( ¥ + F) LCI, ,I.l= ( xT+Ei )

Gradient Values Change with Coordinates

4. o I

4I. =

±

off =L,

6,

• I ,= I• die =

'

-

£ Fx, gz

/ 62

×. I

Llx , ,xd= - ( ¥i+¥,

) LCI, ,Ii= . kite )

2£ = - 2g 2£ = - 25, = 6

,%

2×1 6,2 ZI,

OX ,

8£ = - ZXZ 0£ = - 2Iz = 62 dL2×2 Fe dxi Fx ,

Coordinate - Invariant Gradients

Idea : Enforce requirement that the

oldrate of change must at must be invariant

d1= d±T(0×11×11 =DIT ( OELCII ) da*gt = @ LK )at at at

Jacobian : Matrix of partial derivatives

J= 9 ' 25 '

T*£i⇒=Jt(P×L(a)

OF,dxnz

0×2 24Chain Rule

* a ¥ .

;

= ? ¥,9Ei=?5ii¥ :


Assumption :

daymoves along 7£41

- t d× = Tx Llx ,

I LK , = J TELCE ) at

DI= TE Lei )

atk¥0×21×7 =(0×21×1)'s (0×21×1)

= (EELKYTJ'It(be£cx

, )

* VINET #Lies )


dxSolution : Define natural gradient at = f×[ 1 x ,

By L (E) = ( JTJJ'

VILII ) 5×4×1 = task ,

= 5'

##P×£k) = J

"8×11×1nDIask , = ( a £ kilt ( 0×21×1 )

at

= CfeLWT #¥ VILKI

= ( Fifa ) )t( En La ) = TEETH


Distance Metrics : 6=1 Ii ] I = JTGJ

dxtdx = dx ? + dxzz =

dettcdx/ =

JTJ

= DET ( JTJ ) DE = DIT£de

~

E, = E I

,= E J :| 6.

a ] 6 = Isis, ]dxtdx = 5,2 DE? t 63 DE }

* ' ' '

www.?ii!i.nFEEIi?ki

Natural Gradients in Vaniational Inference

Distance Metric ; Symmetric KL divergence

KL "m( 9.7' ) = Eap ; , ,llg9qMp¥o;) Kllalrsilllqassi'D

+ Ftaanoillggtfsfits )1<491BSIIHQCB;D ))

ddt 6C 7) di = KL "m( 9. �1�+ di )

f , £19 ) = Gills ) t.LI/ ) Gt ) = Otago )

Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate...

Documents

Transcript of Scribes Jabien TingStochastic Gradient Descent Ascent Idea: Optimize objective with noisy estimate...