Variational Bayes and Variational Message Passing · Variational Bayes and Variational Message...

Post on 23-May-2020

26 views 0 download

Transcript of Variational Bayes and Variational Message Passing · Variational Bayes and Variational Message...

Variational Bayes and VariationalMessage Passing

Mohammad Emtiyaz Khan

CS,UBC

Variational Bayes and Variational Message Passing – p.1/16

Variational Inference

Find a tractable distribution Q(H) that closely approximatesthe true posterior distribution P (H|V ).

log P (V ) =∑

H

Q(H) log P (V )

=∑

H

Q(H) logP (H,V )

P (H|V )

=∑

H

Q(H) log

[P (H,V )

Q(H)

Q(H)

P (H|V )

]

=∑

H

Q(H) logP (H,V )

Q(H)︸ ︷︷ ︸

L(Q)

+∑

H

−Q(H) logP (H|V )

Q(H)︸ ︷︷ ︸

KL(Q||P )

Variational Bayes and Variational Message Passing – p.2/16

Variational Inference

log P (V ) = L(Q) + KL(Q||P ) (1)

L(Q) =∑

H

Q(H) logP (H,V )

Q(H)(2)

KL(Q||P ) = −∑

H

Q(H) logP (H|V )

Q(H)(3)

Find Q(H) that maximizes lower bound L(Q) (andhence minimizes KL divergence).

For Q(H) = P (H|V ), KL vanishes to zero, but P (H|V )is intractable (that’s why variational approach).

Trick : Consider a restricted class of Q(H), and thenfind the member which minimizes the KL divergence.

Variational Bayes and Variational Message Passing – p.3/16

Factorized Distributions

Q(H) =∏

i

Qi(Hi) (4)

Substituting this in the expression for lower bound,

L(Q) =∑

H

i

Qi(Hi) logP (H,V )

i Qi(Hi)(Outline)

=∑

H

i

Qi(Hi) log P (H,V )−∑

H

i

Qi(Hi)∑

i

log Qi(Hi)

=∑

H

i

Qi(Hi) log P (H,V )−∑

i

Hi

i

Qi(Hi) log Qi(Hi)

=∑

H

i

Qi(Hi) log P (H,V ) +∑

i

H(Qi)

Variational Bayes and Variational Message Passing – p.4/16

Factorized Distributions

Now separate out all the terms in one factor Qj.

L(Q) =∑

Hj

Qj(Hj)〈log P (H,V )〉∼Qj(Hj)︸ ︷︷ ︸

log Q∗

j (Hj)

+ H(Qi) +∑

i6=j

H(Qi)

= −KL(Qj ||Q∗j) + terms not in Qj (5)

This bound is maximized wrt Qj when

log Qj(Hj) = log Q∗j(Hj) = 〈log P (H,V )〉∼Qj(Hj) + c (6)

Now iterate, guaranteed convergence ...

Variational Bayes and Variational Message Passing – p.5/16

Variational Bayes for Bayesian Networks

log Q∗j(Hj) = 〈log P (H,V )〉∼Qj(Hj) + c

=∑

i

〈log P (Xi|pai)〉∼Qj(Hj)+ c

= 〈log P (Hj |paj)〉∼Qj(Hj)

+∑

k∈chj

〈log P (Xk|paj)〉∼Qj(Hj) + c

Variational Bayes and Variational Message Passing – p.6/16

Exponential-Conjugate Models

P (Y |θ) = exp[φTY (θ)u(Y ) + f(Y ) + g(θ)] (7)

u(Y ) = Natural statistics (8)

φY (θ) = Natural Parameter vector (9)

g(θ) = Constant of integration (10)

Example I: Bernoulli Distribution

p(x|µ) = µx(1− µ)1−x (11)

log p(x|µ) = x log µ + (1− x) log(1− µ) (12)

= logµ

(1− µ)︸ ︷︷ ︸

φ(µ)

x︸︷︷︸

u(x)

+ log(1− µ)︸ ︷︷ ︸

g(µ)

(13)

Variational Bayes and Variational Message Passing – p.7/16

Exponential-Conjugate Models

P (Y |θ) = exp[φTY (θ)u(Y ) + f(Y ) + g(θ)] (14)

P (Y |φ) = exp[φT u(Y ) + f(Y ) + g̃(φ)](Re-parametrization)

Property I: 〈u(Y )〉P (Y |θ) = −dg̃(φ)dφ

log p(x|µ) = logµ

(1− µ)︸ ︷︷ ︸

φ(µ)

x︸︷︷︸

u(x)

+ log(1− µ)︸ ︷︷ ︸

g(µ)

(15)

φ = logµ

(1− µ)⇒ µ =

1 + eφ(16)

g(µ) = log(1− µ) = − log(1 + eφ) = g̃(φ) (17)

E(x) = 〈u(Y )〉 = eφ(1 + eφ)−1 = µ (18)

Variational Bayes and Variational Message Passing – p.8/16

Exponential-Conjugate Models

P (Y |θ) = exp[φTY (θ)u(Y ) + f(Y ) + g(θ)] (19)

Example II: Gaussian Distribution θ → Y → X ← β

p(Y |θ) = (2π)−1/2 exp−1

2(Y −θ)2

log p(Y |θ) = [θ,−1/2]︸ ︷︷ ︸

φY (θ)

[

Y

Y 2

]

︸ ︷︷ ︸

uY (Y )

−1

2θ2

︸︷︷︸

gY (θ)

−1

2log(2π)

︸ ︷︷ ︸

fY (Y )

p(X|Y, β) = (2π)−1/2β1/2 exp−β

2(X−Y )2

log p(X|Y, β) = [βY,−β/2]︸ ︷︷ ︸

φX(Y,β)

[

X

X2

]

︸ ︷︷ ︸

uX(X)

+−1

2(βY 2 + log β)

︸ ︷︷ ︸

gX(Y,β)

−1

2log(2π)

︸ ︷︷ ︸

fX(X)

Variational Bayes and Variational Message Passing – p.9/16

Exponential-Conjugate Models

Property II: Multi-linearity θ → Y → X ← β

log p(X|Y, β) = [βY,−β/2]︸ ︷︷ ︸

φX(Y,β)

[

X

X2

]

︸ ︷︷ ︸

uX(X)

+−1

2(βY 2 + log β)

︸ ︷︷ ︸

gX(Y,β)

−1

2log(2π)

︸ ︷︷ ︸

fX(X)

= [βX,−β/2]︸ ︷︷ ︸

φXY (X,β)

[

Y

Y 2

]

︸ ︷︷ ︸

uY (Y )

+−1

2(βX2 + log β)

︸ ︷︷ ︸

gXY (X,β)

−1

2log(2π)

︸ ︷︷ ︸

fY (Y )

log p(Y |θ) = [θ,−1/2]︸ ︷︷ ︸

φY (θ)

[

Y

Y 2

]

︸ ︷︷ ︸

uY (Y )

−1

2θ2

︸︷︷︸

gY (θ)

−1

2log(2π)

︸ ︷︷ ︸

fY (Y )

Variational Bayes and Variational Message Passing – p.10/16

Exponential-Conjugate Models

Consider Y node and it’s children in θ → Y → X ← β,

log P (Y |θ) = φTY (θ)uY (Y ) + fY (Y ) + gY (θ)

log P (X|Y, β) = φTX(Y, β)uX(X) + fX(X) + gX(Y, β)

= φTXY (X, β)uY (Y ) + gXY (Y, β)

Recall that,

log Q∗Y (Y ) = 〈log P (Y |θ)〉∼QY (Y ) + 〈log P (X|Y, β)〉∼QY (Y ) + c

= 〈φTY (θ)uY (Y ) + fY (Y ) + gY (θ)〉∼QY (Y )

+〈φTXY (X, β)uY (Y ) + gXY (Y, β)〉∼QY (Y ) + c

= 〈φTY (θ) + φT

XY (X, β)〉∼QY (Y )uY (Y ) + fY (Y ) + c1

Variational Bayes and Variational Message Passing – p.11/16

Exponential-Conjugate Models

log Q∗Y (Y ) = 〈φT

Y (θ) + φTXY (X, β)〉∼QY (Y )uY (Y ) + fY (Y ) + c1

Finally,

〈φTY (θ)〉 = [θ,−1/2]

〈φTXY (X, β)〉 = 〈[βX,−β/2]〉

Later is found using the property I (explain).

Variational Bayes and Variational Message Passing – p.12/16

Back to Bayesian Networks

Take each node, write the expression as a function ofnatural statistics of that node.

log Q∗Y (Y )

= 〈log P (Y |paY )〉∼QY (Y ) +∑

k∈chj

〈log P (Xk|paj)〉∼QY (Y ) + c

=

〈φTY (θ) +

k∈chj

φTXY (X, β)〉∼QY (Y )

uY (Y ) + fY (Y ) + c1

The compute the expectation of natural statistics of eachchildren node, and use that to find the quantity in bracket.

Variational Bayes and Variational Message Passing – p.13/16

Variational Message Passing

Message from a parent node Y to a child node X:

mY →X = 〈uY 〉 (20)

Message from a child node X to a parent node Y:

mX→Y = φ̃XY (〈uX〉, {mi→X}i∈cpY) (21)

Node Y update it’s posterior Q∗Y :

φ∗Y = φ̃Y ({mi→Y }i∈paY

) +∑

j∈chY

mj→Y (22)

Variational Bayes and Variational Message Passing – p.14/16

Variational Message Passing

Variational Bayes and Variational Message Passing – p.15/16

Discussion

Initialization and message passing schedule.

Calculation of Lower Bound

Allowable Model

VIBES

Variational Bayes and Variational Message Passing – p.16/16