Overview
• Introduction: Mixed graphical models
• SampleSearch: Sampling with Searching
• Exploiting structure in samplinig: AND/OR Importance sampling
Overview
• Introduction: Mixed graphical models
• SampleSearch: Sampling with Searching
• Exploiting structure in samplinig: AND/OR Importance sampling
Bayesian Networks: Representation(Pearl, 1988)
))(|()(
))(|(
1
i
n
i
i
ii
XpaXPXP
XpaXP
:CPTs
lung Cancer
Smoking
X-ray
Bronchitis
DyspnoeaP(D|C,B)
P(B|S)
P(S)
P(X|C,S)
P(C|S)
P(S, C, B, X, D) = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B)
Belief Updating:P (lung cancer=yes | smoking=no, dyspnoea=yes ) = ?
Mixed Networks: Mixing Belief and Constraints
Belief or Bayesian Networks
A
D
B C
E
FA
D
B C
E
F
)|(),,|(
),|(),|(),|(),( :CPTS
}1,0{:Domains
,,,,, :Variables
AFPBAEP
CBDPACPABPAP
DDDDDD
FEDCBA
FEDCBA
Constraint Networks
)( :solutions ofset theExpresses
),(),(),(),( :sConstraint
}1,0{:Domains
,,,,, :Variables
4321
Rsol
EARBCDRACFRABCR
DDDDDD
FEDCBA
FEDCBA
B C D=0
D=1
0 0 0 1
0 1 .1 .9
1 0 .3 .7
1 1 1 0
),|( CBDP
allowednot is 1D1,C1,B
allowednot is 0,0,0
)(3
DCB
BCDR
B= R=
Constraints could be specified externally or may occur as zeros in the Belief network
Mixed networks:Distribution and Queries
• The distribution represented by a mixed network T=(B,R):
• Queries:– Weighted Counting (Equivalent to P(e), partition
function, solution counting)
– Marginal distribution:
)(
)(Rsolx
B xPM
otherwise ,0
)( if ),(1
)(RsolxxP
MxP BT
)( iT XP
Applications• Determinism: More Ubiquitous than you may think!
• Transportation Planning (Liao et al. 2004, Gogate et al. 2005)– Predicting and Inferring Car Travel Activity of individuals
• Genetic Linkage Analysis (Fischelson and Geiger, 2002)– associate functionality of genes to their location on chromosomes.
• Functional/Software Verification (Bergeron, 2000)– Generating random test programs to check validity of hardware
• First Order Probabilistic models (Domingos et al. 2006, Milch et al. 2005)– Citation matching
8
S13m
L11fL11m
L13m
X11 S13f
L12fL12m
L13f
X12
X13Model for locus 1
S23m
L21fL21m
L23m
X21 S23f
L22fL22m
L23f
X22
X23Model for locus 2
Functions on Orange nodes are deterministic
Approximate Inference
• Approximations are hard with determinism• Randomized Polynomial ε-approximation possible when no
zeros are present (Karp 1993, Cheng 2001)• ε-approximation NP-hard in the presence of zeros• Gibbs sampling is problematic when MCMC is not ergodic.
• Current remedies– Replace zeros with very small values (Laplace
correction: Naive Bayes, NLP)– bad performance when zeros or determinism is
real!
Overview
• Introduction: Mixed graphical models
• SampleSearch: Sampling with Searching– Rejection problem – Recovery, and analysis– Empirical evaluation
• Exploiting structure in samplinig: AND/OR Importance sampling
Importance Sampling: Overview
• Given a proposal or importance distribution Q(z) such that f(z)>0 implies Q(z)>0, rewrite
EXZwherezfM
eEXPeP
Zz
EXB
\ P(e) )(
),()(\
)(
)(
)(
)()()(
zQ
zfE
zQ
zQzfzfM Q
Zz Zz
Given i.i.d. samples z1,..,zN from Q(z),
P(e)M]M̂[E )(1
)(
)(1ˆQ
11
N
j
N
jj
j
j zwNzQ
zf
NM
Generating i.i.d. samples from Q
A=0
B=0 B=1 B=0 B=1
A=1
C=1C=1C=1 C=0 C=0 C=0 C=1
Root
0.8 0.2
0.4 0.6
0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8
0.2 0.8
C=0
)8.0.2.0()(),|(),8.0,2.0,6.0,4.0()|(),2.0,8.0()(
),|()|()(),,(
),...,|(.....)|()( 11121
CQBACQABQAQ
BACQABQAQCBAQ
XXXQXXQXQ Q(X) nn
Rejection Problemsolution. anot is xi if
:sampling Importance
0)(
)(
)(1~
1
xif
xiQ
xif
NM
N
i
A=0
B=0
C=0
B=1 B=0 B=1
A=1
C=1C=1C=1 C=0 C=0 C=0 C=1
Root
0.8 0.2
0.4 0.6
0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8
0.2 0.8
Toss a biased coin
P(H)=0.8, P(T)=0.2
Say we get H
Rejection Problem solution. anot is xi if
:sampling Importance
0)(
)(
)(1~
1
xif
xiQ
xif
NM
N
i
A=0
B=0
C=0
B=1
C=1C=1 C=0
Root
0.8
0.4 0.6
0.2 0.8 0.2 0.8
Toss a biased coin
P(H)=0.4, P(T)=0.6
Say, We get a Head
Rejection Problem solution. anot is xi if
:sampling Importance
0)(
)(
)(1~
1
xif
xiQ
xif
NM
N
i
A=0
B=0
C=0
B=1
C=1C=1 C=0
Root
0.8
0.4 0.6
0.2 0.8 0.2 0.8
Toss a biased coin
P(H)=0.4, P(T)=0.6
Say, We get a Head
Rejection Problem
• A large number of assignments generated will be rejected, thrown away
solution. anot is xi if
:sampling Importance
0)(
)(
)(1~
1
xif
xiQ
xif
NM
N
i
A=0
B=0
C=0 C=1
Root
0.8
0.4
0.2 0.8
Toss a biased coin
P(H)=0.4, P(T)=0.6
Say, We get a Head
Rejection Problem
All Blue leaves correspond to solutions i.e. f(x) >0All Red leaves correspond to non-solutions i.e. f(x)=0
solution. anot is xif 0)(
)(
)(1ˆ :sampling Importance
i
1
i
N
i i
i
xf
xQ
xf
NM
A=0
B=0
C=0
B=1 B=0 B=1
A=1
C=1C=1C=1 C=0 C=0 C=0 C=1
Root
0.8 0.2
0.4 0.6
0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8
0.2 0.8
Constraints: A≠B A≠C
Revising Q to backtrack-free distribution:
All Blue leaves correspond to solutions i.e. f(x) >0All Red leaves correspond to non-solutions i.e. f(x)=0
A=0
B=0
C=0
B=1 B=0 B=1
A=1
C=1C=1C=1 C=0 C=0 C=0 C=1
Root
0.8 0.2
0.40 0.61
0 0 0 1 1 0 0 0
0.21 0.80
QF(branch)=0 if no solutions under it
QF(branch)=1 if no solutions under the other branch
QF(branch) =Q(branch) otherwise
Constraints: A≠B A≠C
Generating samples from QF
C=1
Root
0.80.2
?? Solution?? Solution
?? Solution?? Solution
A=0
00.4 0.6 1
?? Solution ?? Solution
B=1
00.2 0.81
• Invoke an oracle at each branch.• Oracle returns True if
there is a solution under a branch
• False, otherwise
Constraints: A≠B A≠C QF(branch)=0 if no solutions under it
QF(branch)=1 if no solutions under the other branch
QF(branch) =Q(branch) otherwise
Generating samples from QF
• Oracles: In practice• Adaptive consistency as pre-
processing step• A complete CSP solver
• Too costly• O(exp(treewidth))• Invoked O(nd) for each
sample
A=0
B=1
C=1
Root
0.80.2
1
1
Constraints: A≠B A≠C
Gogate et al., UAI 2005, Gogate and Dechter, UAI 2005
Approximations of QF
• Use i-consistency instead of adaptive consistency – O(ni) time and space complexity– identify some zeros so that they are never sampled
• Cons: Too weak when constraint portion is hard.
Reje
ction
Rat
e
Percentage of Zeros identified
NP-hard exponential time and space
No processing
Polynomial consistency enforcement
Gogate et al., UAI 2005, Gogate and Dechter, UAI 2005
25
Algorithm SampleSearch
A=0
B=0
C=0
B=1 B=0 B=1
A=1
C=1C=1C=1 C=0 C=0 C=0 C=1
Root
0.8 0.2
0.4 0.6
0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8
0.2 0.8
N
j j
j
zQ
zf
NM
1 )(
)(1ˆ
Gogate and Dechter, AISTATS 2007, AAAI 2007
Constraints: A≠B A≠C
26
Algorithm SampleSearch
A=0
B=0
C=0
B=1 B=0 B=1
A=1
C=1C=1C=1 C=0 C=0 C=0 C=1
Root
0.8 0.2
0.4 0.6
0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8
0.2 0.81
N
j j
j
zQ
zf
NM
1 )(
)(1ˆ
Gogate and Dechter, AISTATS 2007, AAAI 2007
Constraints: A≠B A≠C
27
Algorithm SampleSearch
A=0
B=1 B=0 B=1
A=1
C=1C=1C=0 C=0 C=0 C=1
Root
0.8 0.2
0.2 0.8 0.2 0.8 0.2 0.8
0.2 0.8
Resume Sampling
1
N
j j
j
zQ
zf
NM
1 )(
)(1ˆ
Gogate and Dechter, AISTATS 2007, AAAI 2007
Constraints: A≠B A≠C
28
Algorithm SampleSearch
A=0
B=1 B=0 B=1
A=1
C=1C=1C=0 C=0 C=0 C=1
Root
0.8 0.2
1
0.2 0.8 0.2 0.8 0.2 0.8
0.2 0.8
Constraint Violated
Until f(sample)>0
1
N
j j
j
zQ
zf
NM
1 )(
)(1ˆ
Gogate and Dechter, AISTATS 2007, AAAI 2007
Constraints: A≠B A≠C
29
Generate more Samples
A=0
B=0
C=0
B=1 B=0 B=1
A=1
C=1C=1C=1 C=0 C=0 C=0 C=1
Root
0.8 0.2
0.4 0.6
0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8
0.2 0.8
N
j j
j
zQ
zf
NM
1 )(
)(1ˆ
Gogate and Dechter, AISTATS 2007, AAAI 2007
Constraints: A≠B A≠C
30
Generate more Samples
A=0
B=0
C=0
B=1 B=0 B=1
A=1
C=1C=1C=1 C=0 C=0 C=0 C=1
Root
0.8 0.2
0.4 0.6
0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8
0.2 0.8
1
N
j j
j
zQ
zf
NM
1 )(
)(1ˆ
Gogate and Dechter, AISTATS 2007, AAAI 2007
Constraints: A≠B A≠C
Traces of SampleSearch
A=0
B=1
C=1
Root
A=0
B=0B=1
C=1
Root
A=0
B=1
C=1
Root
C=0
A=0
B=0 B=1
C=1
Root
C=0
Constraints: A≠B A≠C
Gogate and Dechter, AISTATS 2007, AAAI 2007
32
SampleSearch: Sampling Distribution• Problem: Due to Search, the samples are no
longer i.i.d. from Q
• Theorem: SampleSearch generates i.i.d. samples from the backtrack-free distribution
M]M[E )(
)(1
ondistributi free-Backtrack :Q
FQ
1
F
N
j jF
j
zQ
zf
NM
MMEzQ
zf
NM Q
N
j j
j
ˆ ,)(
)(1ˆ1
Gogate and Dechter, AISTATS 2007, AAAI 2007
The Sampling distribution QF of SampleSearch
A=0
B=0 B=1
C=1C=1 C=0
Root
0.8
0 1
0 0 0 1
C=0
What is probability of generating A=0?
QF(A=0)=0.8
Why? SampleSearch is systematic
What is probability of generating (A=0,B=1)?
QF(B=1|A=0)=1
Why? SampleSearch is systematic
What is probability of generating (A=0,B=0)?
Simple: QF(B=0|A=0)=0
All samples generated by SampleSearch are solutions
Backtrack-free distribution
N
i i
i
xQ
xf
NM
1 )(
)(1~
Gogate and Dechter, AISTATS 2007, AAAI 2007
Constraints: A≠B A≠C
Asymptotic approximations of QF
A=0
B=0 B=1
C=1C=1 C=0
Root
0.8
0 1
0 0 0 1
C=0
Hole ?
Don’t know
No solutions here
No solutions here
•IF Hole THEN
• UF=Q (i.e. there is a solution at the other branch)
• LF=0 (i.e. no solution at the other branch)
Gogate and Dechter, AISTATS 2007, AAAI 2007
QF(branch)=0 if no solutions under it
QF(branch) =Q(branch) otherwise
Approximations:Convergence in the limit
• Store all possible traces
A=0
B=1
C=1
Root
0.8
1
1
?
Gogate and Dechter, AISTATS 2007, AAAI 2007
A=0
B=1
C=1
Root
C=0
0.8
?
0.6
1
?A=0
B=0B=1
C=1
Root
0.8
?
1
0.8
?
Approximations:Convergence in the limit
• From the combined sample tree, update U and L.
IF Hole THEN UFN=Q and LF
N=0
)()()(:
)(
)(lim
)(
)(lim
)(
)()(
xLxQxUBounding
unbiasedallyAsymptotic
MxL
xfE
xU
xfE
xQ
xfExfM
FN
FFN
FN
NFN
N
FXx
A=0
B=1
C=1
Root
0.8
1
1
?
Gogate and Dechter, AISTATS 2007, AAAI 2007
Improving Naive SampleSeach:The IJGP-wc-SS algorithm
• Better Search Strategy– Can use any state-of-the-art CSP/SAT solver e.g. minisat (Een and
Sorrenson 2006)
• Better Proposal distribution– Use output of IJGP- a generalized belief propagation to compute
the initial importance function
• w-cutset importance sampling (Bidyuk and Dechter, 2007)– Reduce variance by sampling from a subspace
Gogate and Dechter, AISTATS 2007, AAAI 2007
Experiments• Tasks
– Weighted Counting– Marginals
• Benchmarks– Satisfiability problems (counting solutions)– Linkage networks– Relational instances (First order probabilistic networks)– Grid networks– Logistics planning instances
• Algorithms– IJGP-wc-SS/LB and IJGP-wc-SS/UB– IJGP-wc-IS (Vanilla algorithm that does not perform search)– SampleCount (Gomes et al. 2007, SAT)– ApproxCount (Wei and Selman, 2007, SAT)– EPIS (Changhe and Druzdzel, 2006)– RELSAT (Bayardo and Peshoueshk, 2000, SAT)– Edge Deletion Belief Propagation (Choi and Darwiche, 2006)– Iterative Join Graph Propagation (Dechter et al., 2002)– Variable Elimination and Conditioning (VEC)
Results: Probability of EvidenceLinkage instances (UAI 2006 evaluation)
Time Bound: 3 hrsM: number of samples generated in 10 hrsZ: Probability of Evidence
Results: Probability of EvidenceRelational instances (UAI 2008 evaluation)
Time Bound: 10 hrsM: number of samples generated in 10 hrsZ: Probability of Evidence
Results: Solution CountsLatin Square instances (size 8 to 16)
Time Bound: 10 hrsM: number of samples generated in 10 hrsZ: Solution Counts
Results: Solution CountsLangford instances
Time Bound: 10 hrsM: number of samples generated in 10 hrsZ: Solution Counts
Results on Marginals
• Evaluation Criteria
• Always bounded between 0 and 1• Lower Bounds the KL distance• When probabilities close to zero are present KL
distance may tend to infinity.
n
xAxP
cedisHellinger
xAeApproximatxPExactn
i Dxii
ii
ii
1
2)()(
21
tan
)(: )(:
Results: Posterior MarginalsLinkage instances (UAI 2006 evaluation)
Time Bound: 3 hrsTable shows the Hellinger distance (∆) and Number of samples: M
Summary: SampleSearch• Manages rejection problem while sampling
• Sampling Distribution is the backtrack-free distribution QF
• Approximation of QF by storing all traces yielding an asymptotically unbiased estimator– Linear time and space overhead– Bound the weighted counts from above and below
• Empirically, when a substantial number of zero probabilities are present, SampleSearch dominate.
Overview
• Introduction: Mixed graphical models
• SampleSearch: Sampling with Searching
• Exploiting structure in sampling: AND/OR Importance sampling
51
Motivation
)(
)(
)(
)()()(
zQ
zfE
zQ
zQzfzfM Q
Zz Zz
4,3,2,1
4432321 )(),,(),,()(xxxx
CBAz
xfxxxfxxxfzfM
4,3,2,1
4321 ),()()()(xxxx
CBAz
xxfxfxfzfM
Given Q(z), Importance sampling totally disregards the structure of f(z) while approximating it.
52
OR Search Tree
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1
C
D
F
E
B
A 0 1
A
E
C
B
F
D
A B C RABC
0 0 0 10 0 1 10 1 0 00 1 1 11 0 0 11 0 1 11 1 0 11 1 1 0
A B E RABE
0 0 0 10 0 1 00 1 0 10 1 1 11 0 0 01 0 1 11 1 0 11 1 1 0
A E F RAEF
0 0 0 00 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1 0
B C D RBCD
0 0 0 10 0 1 10 1 0 10 1 1 01 0 0 11 0 1 01 1 0 11 1 1 1
Constraint Satisfaction – Counting Solutions
54
AND/OR Search Tree
A
E
C
B
F
D
A
D
B
EC
F
A B C RABC
0 0 0 10 0 1 10 1 0 00 1 1 11 0 0 11 0 1 11 1 0 11 1 1 0
A B E RABE
0 0 0 10 0 1 00 1 0 10 1 1 11 0 0 01 0 1 11 1 0 11 1 1 0
A E F RAEF
0 0 0 00 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1 0
B C D RBCD
0 0 0 10 0 1 10 1 0 10 1 1 01 0 0 11 0 1 01 1 0 11 1 1 1
AOR
0AND
BOR
0AND
OR E
OR F F
AND 0 1 0 1
AND 0 1
C
D D
0 1 0 1
0 1
1
E
F F
0 1 0 1
0 1
C
D D
0 1 0 1
0 1
1
B
0
E
F F
0 1 0 1
0 1
C
D D
0 1 0 1
0 1
1
E
F F
0 1 0 1
0 1
C
D D
0 1 0 1
0 1
Constraint Satisfaction – Counting Solutions
pseudo tree
55
AND/OR TreeA
D
B
EC
F
AOR
0AND
BOR
0AND
OR E
OR F
AND 0 1
AND 0 1
C
D D
0 1 0 1
0 1
1 1 1 0 0 1
0
1
E
F F
0 1 0 1
0 1
C
D
0 1
0 1
1
B
0
E
F
0 1
0 1
C
D D
0 1 0 1
0 1
1
E
F
0 1
0 1
C
D
0 1
0 1
1 1 0 1 1 1
0
1 1 1 0 1 0 1 0 1 1
0 0 0
pseudo tree
OR – casing upon variable values
AND – decomposition into independent subproblems
56
Complexity of AND/OR Tree Search
AND/OR tree OR tree
Space O(n) O(n)
Time
O(n km)O(n kw* log n)
[Freuder & Quinn85], [Collin, Dechter & Katz91], [Bayardo & Miranker95], [Darwiche01]
O(kn)
k = domain sizem = depth of pseudo-treen = number of variablesw*= treewidth
Background: AND/OR search space
X
Z
Y
≠
≠
{0,1,2}
{0,1,2}
{0,1,2}
Problem
Z
X Y
Pseudo-tree
X
Z
YChain Pseudo-tree
Z
0 1 2
X Y X Y X Y
1 2 1 202 0 2 0 101
OR
AND
OR
AND
AND/OR Tree
X
0 1 2
Z Z Z
1 2 20 0 1
Y
1 2
Y
1 2
Y
0 2
Y
0 2
Y
0 1
Y
0 1
OR Tree
57
Example Bayesian network
58
P(X|Z) X=0 X=1 X=2
Z=0 0.3 0.4 0.3
Z=1 0.2 0.7 0.1
P(Y|Z) Y=0 Y=1 Y=2
Z=0 0.5 0.1 0.4
Z=1 0.2 0.6 0.2
P(Y)
Y=0 0.2
Y=1 0.7
Y=2 0.1
P(Z) Z=0 Z=1
0.8 0.2
Y
B
Z
X
A
Evidence A=0, B=0
P(X)
X=0 0.1
X=1 0.2
X=2 0.6
f(XZ) f(YZ)
f(Z)
XYZ
ZfYZfXZfbaPM )()()(),(
59
Recap:Conventional Importance Sampling
)|()|()(
)()()(
)|()|()()|()|()(
)()()(
ZYQZXQZQ
ZfYZfXZfE
ZYQZXQZQZYQZXQZQ
ZfYZfXZfM
Q
XYZ
XYZ
ZfYZfXZfM )()()(
)|()|()()( ZYQZXQZQXYZQ
Gogate and Dechter, UAI 2008, CP2008
AND/OR idea!Decompose this expectation
Y
B
Z
X
A
60
AND/OR Importance Sampling (General Idea)
Z
X Y
Pseudo-tree
)|(
)|(
)()|(
)|(
)()(
)(
)(ZYQ
ZYQ
YZfZXQ
ZXQ
XZfZQ
ZQ
ZfM
YXZ
)|()|()()|()|()(
)()()(ZYQZXQZQ
ZYQZXQZQ
ZfYZfXZfM
XYZ
Z
ZYQ
YZfEZ
ZXQ
XZfE
ZQ
ZfEM QQQ |
)|(
)(|
)|(
)(
)(
)(
• Decompose Expectation
Y
B
Z
X
A
61
• Estimate all conditional expectations separately• How?
– Record all samples– For each sample that has Z=j
• Estimate the conditional expectations X|Z and Y|Z using samples corresponding to X|Z=j and Y|Z=j respectively.
• Combine the results
AND/OR Importance Sampling (General Idea) Z
X Y
Pseudo-tree
Z
ZYQ
YZfEZ
ZXQ
XZfE
ZQ
ZfEM QQQ |
)|(
)(|
)|(
)(
)(
)(
Y
B
Z
X
A
62
AND/OR Importance Sampling
Z
0
X
2
1
1
Y
0 1
X
21
Y
0 1
<1.6,2> <0.4,2>
<0.16,1> <0.36,1> <0.2,1><0.14,1> <0.28,1> <0.84,1>
<0.08,1><0.12,1>
0|)0|(
)0,(Z
ZXQ
ZXfE
0|)0|(
)0,(Z
ZYQ
ZYfE
Sample # Z X Y
1 0 1 0
2 0 2 1
3 1 1 1
4 1 2 0
2/)0,2w(x0)z1,w(x
0 Zhaving Xon samples of Weight Average
0|)0|(
)0,( of Estimate
z
ZZXQ
ZXfE
Z
X Y
Pseudo-tree
Z
ZYQ
YZfEZ
ZXQ
XZfE
ZQ
ZfEM QQQ |
)|(
)(|
)|(
)(
)(
)(
Gogate and Dechter, UAI 2008, CP2008
Y
B
Z
X
A
64
AND/OR Importance Sampling
Z
0
X
2
1
1
Y
0 1
X
21
Y
0 1
<1.6,2> <0.4,2>
<0.16,1> <0.36,1> <0.2,1><0.14,1> <0.28,1> <0.84,1>
<0.08,1><0.12,1>
0.262
0.360.16
0.17
2
0.140.2
0.462
0.840.08
0.2
2
0.120.28
0.09246.0*2.0 0.044217.0*26.0
05376.04
)092.04.02(0.0442)1.6(2
All AND nodes: Separate Components. Take ProductOperator: Product
All OR nodes: Conditional Expectations given the assignment above it
Operator: Average
Sample # Z X Y
1 0 1 0
2 0 2 1
3 1 1 1
4 1 2 0
Gogate and Dechter, UAI 2008, CP2008
65
Algorithm AND/OR Importance Sampling
1. Generate samples x1, . . . ,xN from Q along O.
2. Build a AND/OR sample tree for the samples x1, . . . ,xN along the ordering O.
3. FOR all leaf nodes i of AND-OR tree do1. IF AND-node v(i)= 1 ELSE v(i)=0
4. FOR every node n from leaves to the root do1. IF AND-node v(n)=product of children2. IF OR-node v(n) = Average of children
5. Return v(root-node)
Gogate and Dechter, UAI 2008, CP2008
67
Properties of AND/OR Importance Sampling
• Unbiased estimate of weighted counts.• AND/OR estimate has lower Variance than
conventional importance sampling estimate.• Variance Reduction
– Easy to Prove for case of complete independence (Goodman, 1960)
– Complicated to prove for general conditional independence case (Gogate thesis, papers!)
Gogate and Dechter, UAI 2008, CP2008
AND/OR w-cutset (Rao-Blackwellised) sampling
• Rao-Blackwellisation (Rao, 1963)– Partition X into K and R, such that we can compute
P(R|k) efficiently.– Sample from K and sum out R– Estimate:
– w-cutset sampling (Bidyuk and Dechter, 2003): Select K such that the treewidth of R after removing K is bounded by “w”.
N
i i
Ri
RB kQ
kRP
NM
1 )(
)|(1ˆ
Weighted Counts conditioned on K=ki
Gogate and Dechter, Under review, CP 2009
AND/OR w-cutset sampling• Perform AND/OR tree or graph sampling on K• Exact inference on R• Orthogonal approaches:
– Theorem: Combining AND/OR sampling and w-cutset sampling yields further variance reduction.
A
B C
D
E G
F
A
B C
D
E G
F
A
B C
A
B
C
Full pseudo tree
Start pseudo treeon the cutset variables
OR pseudo tree on the cutset variables
Graphical model
Gogate and Dechter, Under review, CP 2009
70
From Search Trees to Search Graphs
• Any two nodes that root identical subtrees (subgraphs) can be merged
71
From Search Trees to Search Graphs
• Any two nodes that root identical subtrees (subgraphs) can be merged
72
Merging Based on Context
• One way of recognizing nodes that can be merged:
context (X) = ancestors of X in pseudo tree that are connected to X, or to descendants of X
[ ]
[A]
[AB]
[AE][BC]
[AB]
A
D
B
EC
F
pseudo tree
A
E
C
B
F
D
A
E
C
B
F
D
AND/OR Graphs
C
B D
A
2
B D
0 101
C
0 1
B D B D
1 2 1 202 0 2
2 0 1 1
A
1 2
A
0 2
A
0
A
1
A
2
A
0
2
B D
0 101
C
0 1
D D
1 2 0 2
1 2
2
0
B
1
A
0
2
A
1
B
0
A
2
AND/OR tree
AND/OR graph
C
B
D
A
≠
≠
≠
{0,1,2}
{0,1,2}
{0,1,2}
{0,1,2}
C
0
B
1
D
1
A
0
2
D
2
A
1
1
B
0
D
0
A
2 0
2
D
2
A
C
0 1
B D B D
1 2 1 202 0 2
A
0
A
1
A
2
A
0
C
0 1
D D
1 2 0 22
0
B
1
A
0
2
A
1
B
0
A
2
C
B D
A
C
B
D
A
≠
≠
≠
{0,1,2}
{0,1,2}
{0,1,2}
{0,1,2}
OR Sample Tree
AND/OR Sample Tree AND/OR Sample
Graph
AND/OR graph sampling
IS
w-cutset ISAND/OR
Tree IS
AND/OR w-cutset Tree IS
AND/OR Graph IS
AND/OR w-cutset Graph IS
Variance Hierarchy and Complexity
O(nN)O(1)
O(nN)O(h)
O(cN+(n-c)Nexp(w))O(h+(n-c)exp(w))
O(cN+(n-c)Nexp(w))O(1)
O(nNw*)O(nN)
O(cNw*+(n-c)Nexp(w))O(cN+(n-c)exp(w))
Experiments
• Benchmarks– Linkage analysis– Graph coloring– Grids
• Algorithms– OR tree sampling– AND/OR tree sampling– AND/OR graph sampling– w-cutset versions of the three schemes above
Results: Probability of EvidenceLinkage instances (UAI 2006 evaluation)
Time Bound: 1hr
82
Summary: AND/OR Importance sampling
• AND/OR sampling: A general scheme to exploit conditional independence in sampling
• Theoretical guarantees: lower sampling error than conventional sampling
• Variance reduction orthogonal to Rao-Blackwellised sampling.
• Better empirical performance than conventional sampling.
Top Related