Lecture Variational 13 Inference Kaushal PaniniVariational Inference: Interpretation as...

Click here to load reader

  • date post

    30-May-2020
  • Category

    Documents

  • view

    3
  • download

    0

Embed Size (px)

Transcript of Lecture Variational 13 Inference Kaushal PaniniVariational Inference: Interpretation as...

  • Lecture 13Variational Inference

    Scribes : Kaushal Panini

    Niklas Smedeuranh - Margulies

  • Variational Inference

    Idea , Approximate posterior by maximizinga variational lower bound

    ,ply ? Ptt , fly )

    " 9) = *

    go.o.cn/eogPgY:I;:T

    )

    7. O ly= log pig ) t Eg o ,llogpqlcz.co#)= leg pays - KL (9170101711party ) )

    q

    s log pay , MaximizingL lol ) is the

    Same as minimizing KL

  • Variational Inference : Interpretation as Regularization

    Equivalent Interpretation : Regularized maximum likelihood

    Llp ) = Eq , .gg , I leg MYTHOS) pcy.t.os-pcyiz.ospcz.co )

    I

    917,01lol

    =Ego. . o , I log pcyiz.at t leg ]= Ego, o , I logpcyiz.io/-kL(qc7Olos//pl7.os)I I"male log likelihood as make sure got , Oslo )

    large as possible"

    is similar to prior"

  • Intuition : Minimizing KL divergencesPly , × , ,X . ) = PCYIX , ,×z ) pcx , ,xz )

    Z

    / qcx ) = Norm ( x;µ, 2)g) qc ,× , ,×z , ÷ q(× , )q( × . ,qcx , ) ÷ Norm /× , ;µ , ,6 ? )

    qkz ) := Normkn ;µz,6i )

    LC aim ) :[email protected], ,×z )

    = pcyipix , ,×ziy ) = leg ply ) - KL ( q( x , ,x , ) Hpcx , ,×zly ) )

    Intuition : KL divergence under

    approximates

    variance

  • Intuition : Minimizing KL divergencesz

    gP( Yixi ,×z ) = pcylx , ,×up(

    1-, ,x > )

    a. ( x ) = Norm ( x ; p, [ )g) pagan qc ,× , ,×z , ÷ qk , , qkz )Propagate qlx , ) :-. Norm ( × , ;µ , ,6 , )91×21 :-. µorm( Xz ;µz , G)

    * KL( plx . ,xzly ) 119k , ,xz ) )9k , ,×z I

    KL( qlx , ,xz ) Hpcx . ,xzly ) ) = |d× , dx . qk , ,xz 1 log-

    plx , ,Xz1y )

    ligng.

    q leg 'f = o lipm,

    a log ph = as

    Intuition : q ( × , ,xe ) → o whenever PC x , ,xz ly )→0

  • Algorithm : Variational Expectation Maximization

    Define : qcz , O ; g) = gets Of't ) g C O 's 90 )

    Objective

    :LClot , do ) = Et qiao , flog

    '

    "q¥toI slog ply ,

    -

    Repeat until Ileft , 00 ) converges ( change smallerthan some threshold )

    1. Expectation Step

    lot = argy.mxL ( oligo) Analogous to EM

    step for j

    z. Maximization Step

    Updates distribution010 = angurax £ ( 97,010 ) glo ; go ) instead of

    40 point estimate O

  • Example : Gaussian Mixture ( Simplified )

    Generative Model f !Isi

    is :/huh , d ~ Norm ( pro ,d , So )

    2- n - Discrete ( YK , . . . ,Yk )

    ynl7n=h n Norml pea ,EI )

  • Model Selection

    µMargined likelihood

    "

    AverageEvidence livelihood

    "

    £ I log ply ) log pigs = logldtdoply.z.io )

    K=2•

    I

    I

    Number of Clusters

    Intuition : Can avoid over fitting by keeping modelwith highest £

  • Variational Expectation Maximization : Updates

    119707 ) =

    Eqcziopsqcy, lbs

    g%YgtYg¢n, ]

    = #gczioftiqcylpn) ↳ Ply ,Z , 7 ) ) ← depends an47 and 47- Egypt, I log917145) -Eqcyiqn, flog9411091)

    depends on 47 depends on 47

    E - step : o --

    ftp.E.ge?,q.,fEqcy,qn,llogpcy.t.y7)-bgqczipl)

    9171472 exp ( Eqcyipyllogpcyit.nl)M - step : o -- ¥ y Egm, LEG ,y⇒ I log pcy.tn ) ) - by94144 )

    genial ) a exp LEqczigzilloyplyit.nl))

  • Intermezzo : Functional Derivatives

    Idea : Compute derivativeofan integral win . 't . a function

    ply , 7. y )° = 8%7, fat dy amain ) (log gifs ) at

    ply , 7. y )= / dy gigs (log gig ) -

    ldyactsang !%,drop integral over 2- , take derivative of integrand= - log get , t Egg , flog plyit.nl - log 9in ) ) - I

    leg 9th = Egon , flog ply , 7. y ) ) t const

    depends on 7- ensures henna lineation

  • Variational Expectation Maximization : Updates

    1197071 =

    Eqcziopsqcy, lbs

    g%YofYg¢n, ]

    = Ege , Ileg ply ,Z , y ) ) ← depends an

    147917197)

    47 and 47-Etgcziqz, I log9171ft) ) -Eqcyiqn, flog9411091)depends on 47 depends on 47E - step : 917197 ) x exp f Egg , µ , I log ply , 2,77 ) )

    M - step : qcy 147 ) L exp I Eg # 147 , flog ply .IM ) ) )

  • Gaussian Mixture : Derivation of Updates

    Idea : Exploit Exponential Families

    log pcy.z.ms = log pi y l 2- ,

    4)t log

    pcztrf) + log pin )

    9 9 9All of these are exponential family

    log pcyiz . y'd

    = E { y I[zn=h ) tcyn ) -acyilIEti-hli.bghey,hlog pctlyt ) = ? { ytuI[zn=h )

    leg populate ) =D ! .FR - D ? . acyl ) t log hints

    leg pcyt197) = ItTy ' t loghcyz )

  • Gaussian Mixture : Derivation of Updates

    E - step : Collect all terms that depend on 2- n

    leg

    qcz.nllot ) = Egon , µ , [ logpcyn.tn

    . n ) ] t . . -

    = In Egon , µ , [ yhb ) I 7n=h ) Ayn )t Egon , µ ,

    [ME) I[7n=h ) t . . .

    &9

    Need expected valuesE (y andEdyta )

  • Gaussian Mixture : Derivation of Updates

    M - step : Collect all terms that depend on y

    z htleg 91414) = Ego ,,µz, [ logpsyn.tn . n ) ) t - . -YZ

    = can :(Kiyl+ . . -YZ my¢ h Qu , ,-log an! 9%= { y ! " ( f InEm, litton ) tan )tin!

    Need expected value t § alga ) #Eq , , , lIftn=h))) t D !! )Eave , [ Il7n=h ) ) onbu.ee#

  • Gaussian Mixture : Variational EM

    Objective : Variational Evidence Lower Bound ( ELBO )

    1197071 = Eqcziopsqcy, lbsg%Y¥Yg¢n , ]Repeat until £10744 ) converges

    I . Expectation Step ; Update get ) (keeping qcy ) fixed )

    exp L Each , [ log plynitih17 ) ) )= Eg , ,lIL7n=hl )Th =

    f exp #an , [ log pcyn.tn -- ely ) )2

    .Maximization Step : Update qcy) (keeping q CZ) fixed )

    ¢hI=Nutty

    OIL?.

    -- {loiitisni +9?. ¢ ?! -- Nuts ?