Applying Dynamic Language Models for Streaming Text to LDA
-
Upload
tomonari-masada -
Category
Engineering
-
view
385 -
download
0
Transcript of Applying Dynamic Language Models for Streaming Text to LDA
Applying Dynamic Language Models for Streaming Text to LDA
Tomonari MASADA @ Nagasaki University
August 27, 2014
The evidence is given as follows:
p(w|α,φ,γ, δ) =∫ ∑
z
D∏d=1
p(θd|γ)T∏
t=1
K∏k=1
p(βtk|β1:t−1,k,αk, δ1:t−1,φ)D∏
d=1
p(zd|θd)p(wd|zd,βtd)dθdβ .
(1)
A lower bound of the log of the evidence is obtained as follows based on Jensen’s inequality:
ln p(w|α,φ,γ, δ)
≥∑d
{∫q(θd) ln p(θd|γd)dθd −
∫q(θd) ln q(θd)dθd
}+∑t
∑k
∑v
∫q(βkv) ln p(βtkv|β1:t−1,kv,αk, δ, φv)dβkv −
∑t
∑k
∑v
∫q(βtkv) ln q(βtkv)dβtkv
+∑d
∫ ∑zd
q(βtd)q(zd) ln p(wd|zd,βtd
)dβtd+∑d
∫ ∑zd
q(θd)q(zd) ln p(zd|θd)dθd
−∑d
∑zd
q(zd) ln q(zd) . (2)
Let this lower bound be denoted by L.We assume that q(θd) =
Γ(∑
k ηdk)∏k Γ(ηdk)
∏k θ
ηdk−1dk . Then the first term of L can be rewritten as follows:∫
q(θd) ln p(θd|γd)dθd −∫q(θd) ln q(θd)dθd
= lnΓ(∑k
γk)−∑k
ln Γ(γk)− ln Γ(∑k
ηdk) +∑k
ln Γ(ηdk) +∑k
(γk − ηdk){ψ(ηdk)− ψ(
∑k′
ηdk′)}.
(3)
Define Ck:t,s ≡ exp(α⊤k f(xt,xs))∑t−1
s′=t−cexp(α⊤
k f(xt,xs′ )).
We assume that q(βtkv) =1√
2πσtkvexp
{− (βtkv−µtkv)
2
2σtkv
}. Then the second and the third terms of L
can be rewritten as follows:
T∑t=1
∫q(βkv) ln p(βtkv|β1:t−1,kv,αk, δ, φv)dβkv −
T∑t=1
∫q(βtkv) ln q(βtkv)dβtkv
=T∑
t=1
∫q(βkv) ln
[1√2πφv
exp{−
(βtkv −∑t−1
s=t−c Ck:t,sβskv)2
2φv
}]dβkv +
T
2+T
2ln(2π) +
1
2
T∑t=1
lnσtkv
=1
2
T∑t=1
lnσtkvφv
− 1
2φv
T∑t=1
∫q(βkv)
(βtkv −
t−1∑s=t−c
Ck:t,sβskv
)2
dβkv + const.
=1
2
T∑t=1
lnσtkvφv
−T∑
t=1
(µtkv −
∑t−1s=t−c Ck:t,sµskv
)2
2φv−
T∑t=1
σtkv +∑t−1
s=t−c C2k:t,sσskv
2φv+ const. , (4)
1
where the last rewrite can be obtained based on the following equation:∫q(βtkv)β
2tkvdβtkv =
∫q(βtkv)
{(βtkv − µtkv)
2 + 2βtkvµtkv − µ2tkv
}dβtkv = σtkv + µ2
tkv . (5)
We denote the posterior probability that the word v is assigned to topic k in document d as ιdvk, where∑Kk=1 ιdvk = 1 holds. Then the fourth term of L can be rewritten as follows:∫ ∑
zd
q(βtd)q(zd) ln p(wd|zd,βtd
)dβtd=
∫q(βtd
)
nd∑i=1
K∑k=1
ιdwdik ln
{exp(a1:td−1,kwdi
+ βtdkwdi)∑V
v=1 exp(a1:td−1,kv + βtdkv)
}dβtd
=
∫q(βtd
)
V∑v=1
ndv
K∑k=1
ιdvk ln
{exp(a1:td−1,kv + βtdkv)∑V
v′=1 exp(a1:td−1,kv′ + βtdkv′)
}dβtd
=
∫q(βtd
)V∑
v=1
ndv
K∑k=1
ιdvk(a1:td−1,kv + βtdkv)dβtd
−∫q(βtd
)V∑
v=1
ndv
K∑k=1
ιdvk ln{ V∑
v′=1
exp(a1:td−1,kv′ + βtdkv′)}dβtd
(6)
The first term of the RHS of Eq. (6) can be rewritten as follows:∫q(βtd
)V∑
v=1
ndv
K∑k=1
ιdvk(a1:td−1,kv + βtdkv)dβtd=
V∑v=1
ndv
K∑k=1
ιdvka1:td−1,kv +V∑
v=1
ndv
K∑k=1
ιdvkµtdkv . (7)
We can obtain the upper bound of the second term of the RHS of Eq. (6) by using the inequality ln(x) ≤−1 + x/ζ + ln(ζ) as follows:∫
q(βtd)
V∑v=1
ndv
K∑k=1
ιdvk ln{ V∑
v′=1
exp(a1:td−1,kv′ + βtdkv′)}dβtd
=K∑
k=1
[(∑v
ndvιdvk
)∫q(βtdk
) ln{ V∑
v′=1
exp(a1:td−1,kv′ + βtdkv′)}dβtdk
]
≤K∑
k=1
[(∑v
ndvιdvk
){− 1 + ln(ζtdk) +
1
ζtdk
V∑v′=1
∫q(βtdk
) exp(a1:td−1,kv′ + βtdkv′)dβtdk
}]
= −V∑
v=1
ndv +K∑
k=1
[(∑v
ndvιdvk
){ln(ζtdk) +
1
ζtdk
∑v
exp(a1:td−1,kv) exp(µtkv +
σtkv2
)}], (8)
where ∫q(βtkv) exp(βtkv)dq(βtkv) =
1√2πσtkv
∫exp
{− (βtkv − µtkv)
2
2σtkv
}exp(βtkv)dq(βtkv)
=1√
2πσtkv
∫exp
{− (βtkv − µtkv)
2 − 2σtkvβtkv2σtkv
}dq(βtkv)
=1√
2πσtkv
∫exp
{− (βtkv − µtkv − σtkv)
2 + 2µtkvσtkv + σ2tkv
2σtkv
}dq(βtkv)
= exp(2µtkvσtkv + σ2
tkv
2σtkv
)× 1√
2πσtkv
∫exp
{− (βtkv − µtkv − σtkv)
2
2σtkv
}dq(βtkv)
= exp(µtkv +
σtkv2
). (9)
The fifth term of L can be rewritten as follows:∫ ∑zd
q(θd)q(zd) ln p(zd|θd)dθd =
∫q(θd)
nd∑i=1
K∑k=1
ιdwdik ln θdkdθd =
∫q(θd)
V∑v=1
ndv
K∑k=1
ιdvk ln θdkdθd
=∑k
∫q(θdk)
(∑v
ndvιdvk
)ln θdkdθdk =
∑k
(∑v
ndvιdvk
){ψ(γdk)− ψ(
∑k′
γdk′)}. (10)
2
And the sixth term of L can be rewritten as follows:∑zd
q(zd) ln q(zd) =K∑
k=1
V∑v=1
ιdvk ln ιdvk . (11)
Consequently, we obtain a lower bound of L as follows:
L
≥N∑
d=1
[ln Γ(
∑kγk)−
K∑k=1
ln Γ(γk)− ln Γ(∑
kηdk) +
K∑k=1
ln Γ(ηdk) +
K∑k=1
(γk − ηdk){ψ(ηdk)− ψ(
∑k′ηdk′)
}]
+1
2
K∑k=1
V∑v=1
T∑t=1
lnσtkvφv
−K∑
k=1
V∑v=1
T∑t=1
(µtkv −
∑t−1s=t−c Ck:t,sµskv
)2
2φv−
K∑k=1
V∑v=1
T∑t=1
σtkv +∑t−1
s=t−c C2k:t,sσskv
2φv
+
N∑d=1
V∑v=1
ndv
K∑k=1
ιdvka1:td−1,kv +
N∑d=1
V∑v=1
ndv
K∑k=1
ιdvkµtdkv
−N∑
d=1
K∑k=1
[( V∑v=1
ndvιdvk
){ln(ζtdk) +
1
ζtdk
V∑v=1
exp(a1:td−1,kv) exp(µtkv +
σtkv2
)}]
+
N∑d=1
K∑k=1
( V∑v=1
ndvιdvk
){ψ(γdk)− ψ(
∑k′γdk′)
}−
N∑d=1
K∑k=1
V∑v=1
ιdvk ln ιdvk + const. (12)
We denote the obtained lower bound by L.
∂L
∂ζtk= − 1
ζtk
∑{d:td=t}
( V∑v=1
ndvιdvk
)+
1
ζ2tk
{ ∑{d:td=t}
( V∑v=1
ndvιdvk
)} V∑v=1
exp(a1:t−1,kv) exp(µtkv +
σtkv2
).
(13)
∂L∂ζtk
= 0 gives the following update: ζtk =∑V
v=1 exp(a1:t−1,kv) exp(µtkv +
σtkv
2
). This can be used in the
formulas presented below.
∂L
∂ιdvk= ndva1:td−1,kv + ndvµtdkv − ndv
{ln(ζtdk) +
1
ζtdk
V∑v′=1
exp(a1:td−1,kv′) exp(µtdkv′ +
σtdkv′
2
)}+ ndv
{ψ(γdk)− ψ(
∑k′γdk′)
}− ln ιdvk + const. (14)
Therefore,
ιdvk ∝ 1
ζtdkexp
{a1:td−1,kv + µtdkv + ψ(γdk)− ψ(
∑k′γdk′)
− 1
ζtdk
V∑v′=1
exp(a1:td−1,kv′) exp(µtdkv′ +
σtdkv′
2
)}∝ 1
ζtdkexp(a1:td−1,kv + µtdkv) exp
{ψ(γdk)− ψ(
∑k′γdk′)
}· (15)
∂L
∂µtkv=− 1
φv
{µtkv −
t−1∑s=t−c
Ck:t,sµskv +
min(T,t+c)∑u=t+1
(− Ck:u,tµukv + C2
k:u,tµtkv +
t−1∑s=u−c
Ck:u,tCk:u,sµskv
)}
+∑
{d:td=t}
ndvιdvk − 1
ζtk
( ∑{d:td=t}
V∑v=1
ndvιdvk
)exp
(a1:t−1,kv + µtkv + σtkv/2
). (16)
3
However, when we update µtkv only at timestep t,
∂L
∂µtkv= − 1
φv
(µtkv −
t−1∑s=t−c
Ck:t,sµskv
)
+∑
{d:td=t}
ndvιdvk − 1
ζtk
( ∑{d:td=t}
V∑v=1
ndvιdvk
)exp
(a1:t−1,kv + µtkv + σtkv/2
), (17)
where we can regard µskv for s < t as a constant, because µskv is not updated at timestep t > s. ∂L∂µtkv
= 0gives the following equation:
0 = µtkv +ntkφv exp(a1:t−1,kv + σtkv/2)
ζtkexp(µtkv)−
t−1∑s=t−c
C(t,s)kvµskv − φvntvk , (18)
where ntvk ≡∑
{d:td=t} ndvιdvk and ntk ≡∑
v ndvk. The RHS of this equation has the form of f(x) =
x + Aex − B. f ′(x) = 1 + Aex > 0. Therefore, f(x) = 0 can be solved by the bisection method. Forexample, initialize x to be B −A, because f(B −A) > 0.
∂L
∂σtkv=
1
2σtkv−
1 +∑min(T,t+c)
u=t+1 Ck:u,t
2φv− ntk
2ζtkexp
(a1:t−1,kv + µtkv + σtkv/2
). (19)
However, when we update σtkv only at timestep t,
∂L
∂σtkv=
1
2σtkv− 1
2φv− ntk
2ζtkexp
(a1:t−1,kv + µtkv + σtkv/2
). (20)
The RHS has the form of f(x) = 12x −Ae
x/2−B. f ′(x) = − 1x2 −Aex/2 < 0. Since f(0) > 0 and f(∞) < 0,
f(x) = 0 can be solved by the bisection method. For example, initialize x to be 12B .
∂L
∂φv= −KT
2φv+
K∑k=1
T∑t=1
(µtkv −
∑t−1s=t−c Ck:t,sµskv
)2
2φ2v
+K∑
k=1
T∑t=1
σtkv +∑t−1
s=t−c Ck:t,sσskv
2φ2v
(21)
∂L∂φv
= 0 gives the following formula:
φv =1
KT
K∑k=1
T∑t=1
(µtkv −
t−1∑s=t−c
Ck:t,sµskv
)2
+1
KT
K∑k=1
T∑t=1
(σtkv +
t−1∑s=t−c
Ck:t,sσskv
)(22)
∂L
∂αkm=
V∑v=1
1
φv
T∑t=1
( t−1∑s=t−c
∂Ck:t,s
∂αkmµskv
)(µtkv −
t−1∑s=t−c
Ck:t,sµskv
)
−V∑
v=1
1
φv
T∑t=1
t−1∑s=t−c
∂Ck:t,s
∂αkmCk:t,sσskv , (23)
where
∂Ck:t,s
∂αkm=
∂
∂αkm
exp(α⊤k f(xt,xs))∑t−1
s′=t−c exp(α⊤k f(xt,xs′))
=fm(xt,xs) exp(α
⊤k f(xt,xs))∑t−1
s′=t−c exp(α⊤k f(xt,xs′))
−exp(α⊤
k f(xt,xs))∑t−1
s′=t−c fm(xt,xs′) exp(α⊤k f(xt,xs′))
{∑t−1
s′=t−c exp(α⊤k f(xt,xs′))}2
= Ck:t,sfm(xt,xs)− Ck:t,s
∑t−1s′=t−c fm(xt,xs′) exp(α
⊤k f(xt,xs′))∑t−1
s′=t−c exp(α⊤k f(xt,xs′))
. (24)
4