VariationalApproximation Methods For Graphical Models · Graphical Models –BayesNets Visit to...
Transcript of VariationalApproximation Methods For Graphical Models · Graphical Models –BayesNets Visit to...
.
Variational Approximation MethodsFor Graphical Models
Slides by Ydo Wexler
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Graphical Models – Bayes Nets
Visit to Asia Smoking
Lung CancerTuberculosis
Abnormalityin Chest Bronchitis
X-Ray Dyspnea
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Graphical Models – Bayes Nets
Visit to Asia EarthQuake
TsunamiSun-Tan
Washed by Waves Surfing
Dead Missing
∏=
=n
iiin xpxxp
11 )|(),,( paK
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
There are many types of queries.
Most queries involve evidence
An evidence e is an assignment of values to a set E of variables in the domain
P(Dyspnea = Yes | Visit_to_Asia = Yes, Smoking=Yes)
P(Smoking=Yes | Dyspnea = Yes ) V S
LT
A B
X D
Queries
V S
LT
A B
X D
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
We are particularly interested in the marginal probability P(e)
called likelihood of evidence
Queries
P(Dyspnea = Yes)
V S
LT
A B
X D
( )∑∑∑∑∑∑∑=X A B L S T V
vtslbaxdP ,,,,,,,
)|()|(),|()|(
)|(),|(
vtPslPslaPsbP
axPbadPX A B L S T V∑∑∑∑∑∑∑=
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Likelihood of evidence
Locus 1 Locus 3 Locus 4
Si3
m
L i1f
L i1m
L i3m
Xi1
Si3
f
L i2f
L i2m
L i3f
Xi2
Xi3
Locus 2 (Disease)
Y 3
y 2Y 1
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
This Computation is called inference
Inference
In general, inference is NP-hard! (we can simulate Boolean gates in the network – reduction to 3-SAT)
For some graphical models inference is polynomial
(for example, chains & trees)
Visit to Asia EarthQuake
TsunamiSun-Tan
Washed by Waves Surfing
Dead Missing
Can use the Forward-Backward algorithm for trees
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Approximations
For genetic linkage analysis inference is hard (on large pedigrees) – we turn to approximations
Sampling
Markov Chain Monte Carlo (MCMC)
Variational techniques
• Mean-Field algorithm
• Structure-based approximations
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Preliminaries – Relative Entropy
(Shanon) Entropy – a measure of information:
where P(x) is the probability that X is in the state x
[ ]∑−=X
P xPxPXH )(log)(][ 2
Expectation – the weighted average according to a probability
∑ ⋅=X
P xPxXE )(][
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Preliminaries – Relative Entropy
Relative Entropy - Kullback-Leibler distance
A distance between two probability distributions P, Q
( ))]([log][)()(log)()||(
XPEXHxPxQxQPQD
X
+−=
= ∑
[ ]∑−=X
Q xQxQXH )(log)(][
[ ] )()(log)]([log xQxPXPEX
Q ∑=
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Relative Entropy - Example
5.0)1,1(1.0)0,1(1.0)1,0(
3.0)0,0(
===
===
QQQ
BAQ
4.0)1,1(2.0)0,1(2.0)1,0(
2.0)0,0(
===
===
PPP
BAP
A
B
∑=BA baP
baQbaQPQD, ),(
),(log),()||(
4.05.0log5.0
2.01.0log1.0
2.01.0log1.0
2.03.0log3.0 +++=
135.016.01.01.0175.0 =+−−=
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Approximating using relative entropy
We want to approximate the Likelihood of evidence
Given an evidence e such that HXE \=
The approximating distribution Q should be easy for inference – otherwise we gain nothing
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Approximating using relative entropy
)()(),(log),(log)(log
hQhQhePhePeP
HH∑∑ ==
We want to approximate the Likelihood of evidence
Given an evidence e such that HXE \=
The approximating distribution Q is defined only on H
( ))(||)()(),(log)( xPhQD
hQhePhQ
H−=≥ ∑
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Approximating using relative entropy
∑=−H hQ
hePhQPQD)(),(log)()||(
Claim: the maximal value of -D(Q||P) is exactly the
log-likelihood, and it is achieved if Q(h)=P(h|e)
Proof:
∑=H ehP
ePehPehP)|(
)()|(log)|(
∑ ==H
ePehPeP )(log)|()(log
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Mean-Field Approximation
Why are we better off computing D(Q||P) instead of computing the log-likelihood?
Because we will choose a distribution Q the inference on which is not hard
Mean-Field Approximation – we choose the most simple Q by:
∏=j
jj xqhQ )()(
where Xj is a single variable (node) in the network
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Mean-Field Approximation - Example
Visit to Asia Smoking
Lung CancerTuberculosis
Abnormalityin Chest Bronchitis
X-Ray Dyspnea
∏∈
=},,,,,,{
)(),,,,,,(xbaltsvrrQxbaltsvQ
4.0)1(6.0)0(
====
XQXQ
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Mean-Field Approximation - Example
Visit to Asia Smoking
Lung CancerTuberculosis
Abnormalityin Chest Bronchitis
X-Ray Dyspnea
)1,0,1,0,0,0,1( ======= xbaltsvQ
4.0)1(6.0)0(
====
XQXQ
9.0)1(1.0)0(
====
TQTQ
2.0)1(8.0)0(
====
SQSQ
4.0)1(6.0)0(
====
VQVQ
7.0)1(3.0)0(
====
AQAQ
6.0)1(4.0)0(
====
BQBQ
5.0)1(5.0)0(
====
LQLQ
001792.04.04.07.05.01.08.04.0 =⋅⋅⋅⋅⋅⋅=
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Mean-Field Approximation
Computing D(Q||P):
∑ ∏∏
∏∑ ==−H
kkk
iii
jjj
H xq
xpaxPxq
hQhePhQPQD
)(
))(|(log)(
)(),(log)()||(
∑∑ ∏∑∑ ∏
−
=
k Hkk
jjj
i Hii
jjj xqxqxpaxPxq )(log)())(|(log)(
∑∑∑ ∑ ∑ ∏ −
=
∈ k Xkkkk
i Xpa Xii
Xpaijjj
ki i i
xqxqxpaxPxq )(log)())(|(log)()( )}(,{
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Mean-Field Approximation – Setting Q
Problem:
u How do we find a “good” Q?u What is the best Q?
u Does such a distribution Q exist?
Some answers:
u We don’t know what is the best Q
u A “good” distribution Q exists if P(h|e) is approximately of the same form as Q
u We try to find a stationary point of D(Q||P) which is a local minimum of the KL-distance
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Mean-Field Approximation – Setting Q
notations:
=01
ijfIf
Otherwise
)}(,{ iij XpaXX ∈
{ })(, iii XpaXD =
( ))(| iii XpaXP=ψ
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Mean-Field Approximation – Setting QMean-Field Algorithm
Output: A revised set qj(Xj) such that Q is a stationary point of D(Q||P)
Iterate over the nodes:
ifi XD
jkDXmk
kkjij ji im
xq ψγ log)(1: \ }:{
∑ ∑ ∏=
←
≠∈∈
Input: A distribution over a Bayesian network, a distribution ∏=
jjj XqQ )(∏=
iiP ψ
)()( jj xjj exq γ←
Normalize qj
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Mean-Field Approximation – Setting Q
Why will this work (converge to a stationary point)?
Lemma: Let ∏=j
jj XqQ )(∏=i
iP ψ and
Then,)()(
log)()||(jj
jj
Xjj x
xqxqPQD
jΓ
= ∑
where, )()( jj xjj ex γ=Γ
Proof:
[ ])]([log)()||( xPEQHPQD Q+−=
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Mean-Field Approximation - Example
∑ ∑ ∑−−=j j jX X
jXX
jjjjjjj xxQxxQxqxqxqQH )|(log)|()()(log)()(\
∑ ∑ ∏∑ ∏
−−=
≠≠j j jX X jkkk
XX jkkkjjjjjj xqxqxqxqxq )(log)()()(log)(
\
∑∑ ∑=i X
iXX
jjjQj j
xxQxqxPE ψlog)|()()]([log\
∑ ∑ ∑ ∏=
≠∈∈
=
j ij ji imX fii
XDjk
DXmkkkjj xqxq
1: \ }:{
log)()( ψ
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Mean-Field Approximation – Setting Q
[ ])]([log)()||( xPEQHPQD Q+−=
∑ ∑ ∑ ∏∑
−=
= ≠j ij jij Xi
fi XD jkkkjj
Xjjjj xqxqxqxq ψlog)()()(log)(
1: \
jγ
∑∑ Γ−=jj X
jjjjX
jjjj xxqxqxq )(log)()(log)(
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Generalized Mean-Field Approximation
When there is strong dependency of variables in the network, the mean-field approximation may be far off
We want to take advantage of the network structure
Simple solution: Q will factor to less terms – each term consists of several variables in the network
A
B
c
D
),(),( dcqbaqQ cdab=
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Generalized Mean-Field Approximation
Complexity of the algorithm is exponential in the terms tree-width
Terms that capture together variables with strong dependency promise a better approximation
More flexible forms of Q require inference (over the network formed by Q)
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
GMF Approximation – Genetic Linkage Example
Locus 1 Locus 3 Locus 4
Si3
m
L i1f
L i1m
L i3m
Xi1
Si3
f
L i2f
L i2m
L i3f
Xi2
Xi3
Locus 2 (Disease)
Y 3
y 2Y 1
PDF created with pdfFactory Pro trial version www.pdffactory.com
.
Summary
Inference has limitations - no way around NP-hardness
Approximation quality depends on several things:
• It is a tradeoff of time and quality
• Understanding the problem
• Flexibility of the approximating distribution
Guarantees on the approximation quality
PDF created with pdfFactory Pro trial version www.pdffactory.com