A Note on Expectation-Propagation for Latent Dirichlet Allocation
Click here to load reader
-
Upload
tomonari-masada -
Category
Technology
-
view
133 -
download
3
Transcript of A Note on Expectation-Propagation for Latent Dirichlet Allocation
Expectation-Propagation for Latent Dirichlet Allocation
Tomonari MASADA @ Nagasaki University
January 15, 2014
1
This manuscript contains a derivation of the update formulae of expectation-propagation (EP) presentedin the following paper:
Thomas Minka and John Lafferty.Expectation-Propagation for the Generative Aspect Model.in Proc. of the 18th Conference on Uncertainty in Artificial Intelligence, pp. 352-359, 2002.1
2
A joint distribution for document d in LDA can be written as below.
p(x,θd|α) = p(θd|α)p(x|θd)
= p(θd|α)
nd∏i=1
p(xdi|θd) = p(θd|α)
nd∏i=1
(∑k
θdkφkxdi
)= p(θd|α)
∏w
(∑k
θdkφkw
)ndw
, (1)
where p(θd|α) is a Dirichlet prior distribution.
We approximate p(w|θd) =∑k θdkφkw by tw(θd) = sw
∏k θ
βwk
dk .
Then we obtain an approximated joint distribution as follows:
p(x,θd|α) ≈ p(θd|α)∏w
tw(θd)ndw = p(θd|α)
∏w
(sw∏k
θβwk
dk
)ndw
=Γ(∑k αk)∏
k Γ(αk)
∏w
sndww
∏k
θαk−1+
∑w βwkndw
dk (2)
Let γdk = αk +∑w βwkndw. That is,
p(xd,θd|α) ≈Γ(∑k αk)∏
k Γ(αk)
∏w
sndww
∏k
θγdk−1dk . (3)
An approximated posterior q(θd) can be obtained as follows:
q(θd) =
Γ(∑
k αk)∏k Γ(αk)
∏w s
ndww
∏k θ
γdk−1dk∫ Γ(
∑k αk)∏
k Γ(αk)
∏w s
ndww
∏k θ
γdk−1dk dθd
=
∏k θ
γdk−1dk∫ ∏
k θγdk−1dk dθd
=Γ(∑k γdk)∏
k Γ(γdk)
∏k
θγdk−1dk . (4)
1http://research.microsoft.com/en-us/um/people/minka/papers/aspect/
1
3
For each word token in document d, we remove its contribution to∏w tw(θd)
ndw =∏w
(sw∏k θ
βwk
dk
)ndw
.
When the token is a token of word w, this corresponds to a division by sw∏k θ
βwk
dk .Note that we here consider a removal of word token, not a removal of word type.Then we obtain an ‘old’ posterior after this division as follows:
q\w(θd) =Γ(∑
k(γdk − βwk))∏
k Γ(γdk − βwk)
∏k
θγdk−βwk−1dk =
Γ(∑k γ\wdk )∏
k Γ(γ\wdk )
∏k
θγ\wdk −1
dk , (5)
where γ\wdk = γdk − βwk.
By combining p(w|θd) =∑k θdkΦkw with q\w(θd), we obtain a something similar to joint distribution
as follows:
(∑k
θdkφkw) ·Γ(∑k γ\wdk )∏
k Γ(γ\wdk )
∏k
θγ\wdk −1
dk .
Therefore, by integrating θd out, we obtain a something similar to evidence as follows:∫ ∑k
θdkφkwΓ(∑k γ\wdk )∏
k′ Γ(γ\wdk′)
∏k′
θγ\wdk′−1
dk′ dθd =
∑k φkwγ
\wdk∑
k γ\wdk
.
Consequently, we obtain a new posterior as follows:
q̃(θd) =
∑k γ\wdk∑
k φkwγ\wdk
(∑k
θdkφkw
)Γ(∑k γ\wdk )∏
k Γ(γ\wdk )
∏k
θγ\wdk −1
dk . (6)
4
Based on Eq. (6), we calculate Eq̃[θdk] and Eq̃[θ2dk].
Eq̃[θdk] =
∫θdkq̃(θd)dθdk
=
∫θdk
γ̂d\w∑
k φkwγ\wdk
(∑k
θdkφkw
) Γ(γ̂d\w)∏
k Γ(γ\wdk )
∏k
θγ\wdk −1
dk dθdk
=γ̂d\w∑
k φkwγ\wdk
{φkw
γ\wdk (γ
\wdk + 1)
γ̂d\w(γ̂d
\w + 1)+∑k′ 6=k
φk′wγ\wdk γ
\wdk′
γ̂d\w(γ̂d
\w + 1)
}
=γ̂d\w∑
k φkwγ\wdk
·γ\wdk
γ̂d\w ·
φkw +∑k φkwγ
\wdk
γ̂d\w + 1
(7)
Eq̃[θ2dk] =
∫θ2dkq̃(θd)dθdk
=
∫θ2dk
γ̂d\w∑
k φkwγ\wdk
(∑k
θdkφkw
) Γ(γ̂d\w)∏
k Γ(γ\wdk )
∏k
θγ\wdk −1
dk dθdk
=γ̂d\w∑
k φkwγ\wdk
{φkw
γ\wdk (γ
\wdk + 1)(γ
\wdk + 2)
γ̂d\w(γ̂d
\w + 1)(γ̂d\w + 2)
+∑k′ 6=k
φk′wγ\wdk (γ
\wdk + 1)γ
\wdk′
γ̂d\w(γ̂d
\w + 1)(γ̂d\w + 2)
}
=γ̂d\w∑
k φkwγ\wdk
·γ\wdk
γ̂d\w ·
γ\wdk + 1
γ̂d\w + 1
·2φkw +
∑k φkwγ
\wdk
γ̂d\w + 2
, (8)
where γ̂d\w =
∑k γ\wdk .
2
We determine a Dirichlet distribution Dir(γ′d) so that the meanγ′dk
γ̂′d
and the average second moment2
1K
∑kγ′dk(γ′
dk+1)γ̂′d(γ̂′
d+1) are equal to those of q̃(θd), where γ̂′d =∑k γ′dk. This procedure is called moment
matching.To be precise, we obtain γ′d by solving the following equations:
γ′dkγ̂′d
= Eq̃[θdk] for each k, and (9)
1
K
∑k
γ′dk(γ′dk + 1)
γ̂′d(γ̂′d + 1)
=1
K
∑k
Eq̃[θ2dk] . (10)
We can obtain γ′dk as below.
γ′dk = Eq̃[θdk]γ̂′d∑k
Eq̃[θdk]γ̂′d(Eq̃[θdk]γ̂′d + 1)
γ̂′d(γ̂′d + 1)
=∑k
Eq̃[θ2dk]∑
k
Eq̃[θdk](Eq̃[θdk]γ̂′d + 1) =∑k
Eq̃[θ2dk](γ̂′d + 1)∑
k
Eq̃[θdk]2γ̂′d +∑k
Eq̃[θdk] =∑k
Eq̃[θ2dk]γ̂′d +
∑k
Eq̃[θ2dk]∑
k
(Eq̃[θdk]2 − Eq̃[θ2
dk])γ̂′d =
∑k
(Eq̃[θ
2dk]− Eq̃[θdk]
)∴ γ̂′d =
∑k(Eq̃[θ
2dk]− Eq̃[θdk])∑
k(Eq̃[θdk]2 − Eq̃[θ2dk])
(11)
∴ γ′dk =
∑k(Eq̃[θ
2dk]− Eq̃[θdk])∑
k(Eq̃[θdk]2 − Eq̃[θ2dk])· Eq̃[θdk] (12)
5
We now have a Dirichlet distribution that approxiates the posterior q̃(θd) in Eq. (6), i.e.,
q̃(θd) =
∑k γ\wdk∑
k φkwγ\wdk
(∑k
θdkφkw
)Γ(∑k γ\wdk )∏
k Γ(γ\wdk )
∏k
θγ\wdk −1
dk .
To obtain a density function of Dirichlet distribution from Eq. (6), we replace the term (∑k θdkφkw)
by t′w(θd) = s′w∏k θ
β′wk
dk . We set the resulting function equal to the density function of Dir(γ′d) and obtainthe following equation:∑
k γ\wdk∑
k φkwγ\wdk
s′w∏k
θβ′wk
dk
Γ(∑k γ\wdk )∏
k Γ(γ\wdk )
∏k
θγ\wdk −1
dk =Γ(∑k γ′dk)∏
k Γ(γ′dk)
∏k
θγ′dk−1dk . (13)
Consequently, s′w and β′wk are determined as follows:
β′wk = γ′dk − γ\wdk , (14)
s′w =
∑k φkwγ
\wdk∑
k γ\wdk
Γ(∑k γ′dk)∏
k Γ(γ′dk)
∏k Γ(γ
\wdk )
Γ(∑k γ\wdk )
. (15)
2http://www.ucl.ac.uk/statistics/research/pdfs/135.zip
3