Download - SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Overview

• Introduction: Mixed graphical models

• SampleSearch: Sampling with Searching

• Exploiting structure in samplinig: AND/OR Importance sampling

Mixed Networks: Mixing Belief and Constraints

Belief or Bayesian Networks

A

D

B C

E

FA

D

B C

E

F

)|(),,|(

),|(),|(),|(),( :CPTS

}1,0{:Domains

,,,,, :Variables

AFPBAEP

CBDPACPABPAP

DDDDDD

FEDCBA

FEDCBA

Constraint Networks

)( :solutions ofset theExpresses

),(),(),(),( :sConstraint

}1,0{:Domains

,,,,, :Variables

4321

Rsol

EARBCDRACFRABCR

DDDDDD

FEDCBA

FEDCBA

B C D=0

D=1

0 0 0 1

0 1 .1 .9

1 0 .3 .7

1 1 1 0

),|( CBDP

allowednot is 1D1,C1,B

allowednot is 0,0,0

)(3

DCB

BCDR

B= R=

Constraints could be specified externally or may occur as zeros in the Belief network

Mixed networks:Distribution and Queries

• The distribution represented by a mixed network T=(B,R):

• Queries:– Weighted Counting (Equivalent to P(e), partition

function, solution counting)

– Marginal distribution:

)(

)(Rsolx

B xPM

otherwise ,0

)( if ),(1

)(RsolxxP

MxP BT

)( iT XP

Applications• Determinism: More Ubiquitous than you may think!

• Transportation Planning (Liao et al. 2004, Gogate et al. 2005)– Predicting and Inferring Car Travel Activity of individuals

• Genetic Linkage Analysis (Fischelson and Geiger, 2002)– associate functionality of genes to their location on chromosomes.

• Functional/Software Verification (Bergeron, 2000)– Generating random test programs to check validity of hardware

• First Order Probabilistic models (Domingos et al. 2006, Milch et al. 2005)– Citation matching

8

S13m

L11fL11m

L13m

X11 S13f

L12fL12m

L13f

X12

X13Model for locus 1

S23m

L21fL21m

L23m

X21 S23f

L22fL22m

L23f

X22

X23Model for locus 2

Functions on Orange nodes are deterministic

Approximate Inference

• Approximations are hard with determinism• Randomized Polynomial ε-approximation possible when no

zeros are present (Karp 1993, Cheng 2001)• ε-approximation NP-hard in the presence of zeros• Gibbs sampling is problematic when MCMC is not ergodic.

• Current remedies– Replace zeros with very small values (Laplace

correction: Naive Bayes, NLP)– bad performance when zeros or determinism is

real!

Overview


• SampleSearch: Sampling with Searching– Rejection problem – Recovery, and analysis– Empirical evaluation

• Exploiting structure in samplinig: AND/OR Importance sampling

Importance Sampling: Overview

• Given a proposal or importance distribution Q(z) such that f(z)>0 implies Q(z)>0, rewrite

EXZwherezfM

eEXPeP

Zz

EXB

\ P(e) )(

),()(\

)(

)(

)(

)()()(

zQ

zfE

zQ

zQzfzfM Q

Zz Zz

Given i.i.d. samples z1,..,zN from Q(z),

P(e)M]M̂[E )(1

)(

)(1ˆQ

11

N

j

N

jj

j

j zwNzQ

zf

NM

Generating i.i.d. samples from Q

A=0

B=0 B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

C=0

)8.0.2.0()(),|(),8.0,2.0,6.0,4.0()|(),2.0,8.0()(

),|()|()(),,(

),...,|(.....)|()( 11121

CQBACQABQAQ

BACQABQAQCBAQ

XXXQXXQXQ Q(X) nn

Rejection Problemsolution. anot is xi if

:sampling Importance

0)(

)(

)(1~

1

xif

xiQ

xif

NM

N

i

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

Toss a biased coin

P(H)=0.8, P(T)=0.2

Say we get H

Rejection Problem solution. anot is xi if


0)(

)(

)(1~

1

xif

xiQ

xif

NM

N

i

A=0

B=0

C=0

B=1

C=1C=1 C=0

Root

0.8

0.4 0.6

0.2 0.8 0.2 0.8

Toss a biased coin

P(H)=0.4, P(T)=0.6

Say, We get a Head

Rejection Problem

• A large number of assignments generated will be rejected, thrown away

solution. anot is xi if


0)(

)(

)(1~

1

xif

xiQ

xif

NM

N

i

A=0

B=0

C=0 C=1

Root

0.8

0.4

0.2 0.8

Toss a biased coin

P(H)=0.4, P(T)=0.6

Say, We get a Head

Rejection Problem

All Blue leaves correspond to solutions i.e. f(x) >0All Red leaves correspond to non-solutions i.e. f(x)=0

solution. anot is xif 0)(

)(

)(1ˆ :sampling Importance

i

1

i

N

i i

i

xf

xQ

xf

NM

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

Constraints: A≠B A≠C

Revising Q to backtrack-free distribution:

All Blue leaves correspond to solutions i.e. f(x) >0All Red leaves correspond to non-solutions i.e. f(x)=0

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.40 0.61

0 0 0 1 1 0 0 0

0.21 0.80

QF(branch)=0 if no solutions under it

QF(branch)=1 if no solutions under the other branch

QF(branch) =Q(branch) otherwise


Generating samples from QF

C=1

Root

0.80.2

?? Solution?? Solution

?? Solution?? Solution

A=0

00.4 0.6 1

?? Solution ?? Solution

B=1

00.2 0.81

• Invoke an oracle at each branch.• Oracle returns True if

there is a solution under a branch

• False, otherwise

Constraints: A≠B A≠C QF(branch)=0 if no solutions under it

QF(branch)=1 if no solutions under the other branch


Generating samples from QF

• Oracles: In practice• Adaptive consistency as pre-

processing step• A complete CSP solver

• Too costly• O(exp(treewidth))• Invoked O(nd) for each

sample

A=0

B=1

C=1

Root

0.80.2

1

1


Gogate et al., UAI 2005, Gogate and Dechter, UAI 2005

Approximations of QF

• Use i-consistency instead of adaptive consistency – O(ni) time and space complexity– identify some zeros so that they are never sampled

• Cons: Too weak when constraint portion is hard.

Reje

ction

Rat

e

Percentage of Zeros identified

NP-hard exponential time and space

No processing

Polynomial consistency enforcement

Gogate et al., UAI 2005, Gogate and Dechter, UAI 2005

25

Algorithm SampleSearch

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

N

j j

j

zQ

zf

NM

1 )(

)(1ˆ

Gogate and Dechter, AISTATS 2007, AAAI 2007


26


A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.81

N

j j

j

zQ

zf

NM

1 )(

)(1ˆ



27


A=0

B=1 B=0 B=1

A=1

C=1C=1C=0 C=0 C=0 C=1

Root

0.8 0.2

0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

Resume Sampling

1

N

j j

j

zQ

zf

NM

1 )(

)(1ˆ



28


A=0

B=1 B=0 B=1

A=1

C=1C=1C=0 C=0 C=0 C=1

Root

0.8 0.2

1

0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

Constraint Violated

Until f(sample)>0

1

N

j j

j

zQ

zf

NM

1 )(

)(1ˆ



29

Generate more Samples

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

N

j j

j

zQ

zf

NM

1 )(

)(1ˆ



30

Generate more Samples

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

1

N

j j

j

zQ

zf

NM

1 )(

)(1ˆ



Traces of SampleSearch

A=0

B=1

C=1

Root

A=0

B=0B=1

C=1

Root

A=0

B=1

C=1

Root

C=0

A=0

B=0 B=1

C=1

Root

C=0



32

SampleSearch: Sampling Distribution• Problem: Due to Search, the samples are no

longer i.i.d. from Q

• Theorem: SampleSearch generates i.i.d. samples from the backtrack-free distribution

M]M[E )(

)(1

ondistributi free-Backtrack :Q

FQ

1

F

N

j jF

j

zQ

zf

NM

MMEzQ

zf

NM Q

N

j j

j

ˆ ,)(

)(1ˆ1


The Sampling distribution QF of SampleSearch

A=0

B=0 B=1

C=1C=1 C=0

Root

0.8

0 1

0 0 0 1

C=0

What is probability of generating A=0?

QF(A=0)=0.8

Why? SampleSearch is systematic

What is probability of generating (A=0,B=1)?

QF(B=1|A=0)=1

Why? SampleSearch is systematic

What is probability of generating (A=0,B=0)?

Simple: QF(B=0|A=0)=0

All samples generated by SampleSearch are solutions

Backtrack-free distribution

N

i i

i

xQ

xf

NM

1 )(

)(1~



Asymptotic approximations of QF

A=0

B=0 B=1

C=1C=1 C=0

Root

0.8

0 1

0 0 0 1

C=0

Hole ?

Don’t know

No solutions here

No solutions here

•IF Hole THEN

• UF=Q (i.e. there is a solution at the other branch)

• LF=0 (i.e. no solution at the other branch)


QF(branch)=0 if no solutions under it


Approximations:Convergence in the limit

• Store all possible traces

A=0

B=1

C=1

Root

0.8

1

1

?


A=0

B=1

C=1

Root

C=0

0.8

?

0.6

1

?A=0

B=0B=1

C=1

Root

0.8

?

1

0.8

?

Approximations:Convergence in the limit

• From the combined sample tree, update U and L.

IF Hole THEN UFN=Q and LF

N=0

)()()(:

)(

)(lim

)(

)(lim

)(

)()(

xLxQxUBounding

unbiasedallyAsymptotic

MxL

xfE

xU

xfE

xQ

xfExfM

FN

FFN

FN

NFN

N

FXx

A=0

B=1

C=1

Root

0.8

1

1

?


Improving Naive SampleSeach:The IJGP-wc-SS algorithm

• Better Search Strategy– Can use any state-of-the-art CSP/SAT solver e.g. minisat (Een and

Sorrenson 2006)

• Better Proposal distribution– Use output of IJGP- a generalized belief propagation to compute

the initial importance function

• w-cutset importance sampling (Bidyuk and Dechter, 2007)– Reduce variance by sampling from a subspace


Experiments• Tasks

– Weighted Counting– Marginals

• Benchmarks– Satisfiability problems (counting solutions)– Linkage networks– Relational instances (First order probabilistic networks)– Grid networks– Logistics planning instances

• Algorithms– IJGP-wc-SS/LB and IJGP-wc-SS/UB– IJGP-wc-IS (Vanilla algorithm that does not perform search)– SampleCount (Gomes et al. 2007, SAT)– ApproxCount (Wei and Selman, 2007, SAT)– EPIS (Changhe and Druzdzel, 2006)– RELSAT (Bayardo and Peshoueshk, 2000, SAT)– Edge Deletion Belief Propagation (Choi and Darwiche, 2006)– Iterative Join Graph Propagation (Dechter et al., 2002)– Variable Elimination and Conditioning (VEC)

Results: Probability of EvidenceLinkage instances (UAI 2006 evaluation)

Time Bound: 3 hrsM: number of samples generated in 10 hrsZ: Probability of Evidence

Results: Probability of EvidenceRelational instances (UAI 2008 evaluation)

Time Bound: 10 hrsM: number of samples generated in 10 hrsZ: Probability of Evidence

Results: Solution CountsLatin Square instances (size 8 to 16)

Time Bound: 10 hrsM: number of samples generated in 10 hrsZ: Solution Counts

Results: Solution CountsLangford instances

Time Bound: 10 hrsM: number of samples generated in 10 hrsZ: Solution Counts

Results on Marginals

• Evaluation Criteria

• Always bounded between 0 and 1• Lower Bounds the KL distance• When probabilities close to zero are present KL

distance may tend to infinity.

n

xAxP

cedisHellinger

xAeApproximatxPExactn

i Dxii

ii

ii

1

2)()(

21

tan

)(: )(:

Results: Posterior MarginalsLinkage instances (UAI 2006 evaluation)

Time Bound: 3 hrsTable shows the Hellinger distance (∆) and Number of samples: M

Summary: SampleSearch• Manages rejection problem while sampling

• Sampling Distribution is the backtrack-free distribution QF

• Approximation of QF by storing all traces yielding an asymptotically unbiased estimator– Linear time and space overhead– Bound the weighted counts from above and below

• Empirically, when a substantial number of zero probabilities are present, SampleSearch dominate.

Overview


• SampleSearch: Sampling with Searching

• Exploiting structure in sampling: AND/OR Importance sampling

51

Motivation

)(

)(

)(

)()()(

zQ

zfE

zQ

zQzfzfM Q

Zz Zz

4,3,2,1

4432321 )(),,(),,()(xxxx

CBAz

xfxxxfxxxfzfM

4,3,2,1

4321 ),()()()(xxxx

CBAz

xxfxfxfzfM

Given Q(z), Importance sampling totally disregards the structure of f(z) while approximating it.

52

OR Search Tree

0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1

C

D

F

E

B

A 0 1

A

E

C

B

F

D

A B C RABC

0 0 0 10 0 1 10 1 0 00 1 1 11 0 0 11 0 1 11 1 0 11 1 1 0

A B E RABE

0 0 0 10 0 1 00 1 0 10 1 1 11 0 0 01 0 1 11 1 0 11 1 1 0

A E F RAEF

0 0 0 00 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1 0

B C D RBCD

0 0 0 10 0 1 10 1 0 10 1 1 01 0 0 11 0 1 01 1 0 11 1 1 1

Constraint Satisfaction – Counting Solutions

54

AND/OR Search Tree

A

E

C

B

F

D

A

D

B

EC

F

A B C RABC

0 0 0 10 0 1 10 1 0 00 1 1 11 0 0 11 0 1 11 1 0 11 1 1 0

A B E RABE

0 0 0 10 0 1 00 1 0 10 1 1 11 0 0 01 0 1 11 1 0 11 1 1 0

A E F RAEF

0 0 0 00 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1 0

B C D RBCD

0 0 0 10 0 1 10 1 0 10 1 1 01 0 0 11 0 1 01 1 0 11 1 1 1

AOR

0AND

BOR

0AND

OR E

OR F F

AND 0 1 0 1

AND 0 1

C

D D

0 1 0 1

0 1

1

E

F F

0 1 0 1

0 1

C

D D

0 1 0 1

0 1

1

B

0

E

F F

0 1 0 1

0 1

C

D D

0 1 0 1

0 1

1

E

F F

0 1 0 1

0 1

C

D D

0 1 0 1

0 1

Constraint Satisfaction – Counting Solutions

pseudo tree

55

AND/OR TreeA

D

B

EC

F

AOR

0AND

BOR

0AND

OR E

OR F

AND 0 1

AND 0 1

C

D D

0 1 0 1

0 1

1 1 1 0 0 1

0

1

E

F F

0 1 0 1

0 1

C

D

0 1

0 1

1

B

0

E

F

0 1

0 1

C

D D

0 1 0 1

0 1

1

E

F

0 1

0 1

C

D

0 1

0 1

1 1 0 1 1 1

0

1 1 1 0 1 0 1 0 1 1

0 0 0

pseudo tree

OR – casing upon variable values

AND – decomposition into independent subproblems

56

Complexity of AND/OR Tree Search

AND/OR tree OR tree

Space O(n) O(n)

Time

O(n km)O(n kw* log n)

[Freuder & Quinn85], [Collin, Dechter & Katz91], [Bayardo & Miranker95], [Darwiche01]

O(kn)

k = domain sizem = depth of pseudo-treen = number of variablesw*= treewidth

Background: AND/OR search space

X

Z

Y

≠

≠

{0,1,2}

{0,1,2}

{0,1,2}

Problem

Z

X Y

Pseudo-tree

X

Z

YChain Pseudo-tree

Z

0 1 2

X Y X Y X Y

1 2 1 202 0 2 0 101

OR

AND

OR

AND

AND/OR Tree

X

0 1 2

Z Z Z

1 2 20 0 1

Y

1 2

Y

1 2

Y

0 2

Y

0 2

Y

0 1

Y

0 1

OR Tree

57

Example Bayesian network

58

P(X|Z) X=0 X=1 X=2

Z=0 0.3 0.4 0.3

Z=1 0.2 0.7 0.1

P(Y|Z) Y=0 Y=1 Y=2

Z=0 0.5 0.1 0.4

Z=1 0.2 0.6 0.2

P(Y)

Y=0 0.2

Y=1 0.7

Y=2 0.1

P(Z) Z=0 Z=1

0.8 0.2

Y

B

Z

X

A

Evidence A=0, B=0

P(X)

X=0 0.1

X=1 0.2

X=2 0.6

f(XZ) f(YZ)

f(Z)

XYZ

ZfYZfXZfbaPM )()()(),(

59

Recap:Conventional Importance Sampling

)|()|()(

)()()(

)|()|()()|()|()(

)()()(

ZYQZXQZQ

ZfYZfXZfE

ZYQZXQZQZYQZXQZQ

ZfYZfXZfM

Q

XYZ

XYZ

ZfYZfXZfM )()()(

)|()|()()( ZYQZXQZQXYZQ

Gogate and Dechter, UAI 2008, CP2008

AND/OR idea!Decompose this expectation

Y

B

Z

X

A

60

AND/OR Importance Sampling (General Idea)

Z

X Y

Pseudo-tree

)|(

)|(

)()|(

)|(

)()(

)(

)(ZYQ

ZYQ

YZfZXQ

ZXQ

XZfZQ

ZQ

ZfM

YXZ

)|()|()()|()|()(

)()()(ZYQZXQZQ

ZYQZXQZQ

ZfYZfXZfM

XYZ

Z

ZYQ

YZfEZ

ZXQ

XZfE

ZQ

ZfEM QQQ |

)|(

)(|

)|(

)(

)(

)(

• Decompose Expectation

Y

B

Z

X

A

61

• Estimate all conditional expectations separately• How?

– Record all samples– For each sample that has Z=j

• Estimate the conditional expectations X|Z and Y|Z using samples corresponding to X|Z=j and Y|Z=j respectively.

• Combine the results

AND/OR Importance Sampling (General Idea) Z

X Y

Pseudo-tree

Z

ZYQ

YZfEZ

ZXQ

XZfE

ZQ

ZfEM QQQ |

)|(

)(|

)|(

)(

)(

)(

Y

B

Z

X

A

62

AND/OR Importance Sampling

Z

0

X

2

1

1

Y

0 1

X

21

Y

0 1

<1.6,2> <0.4,2>

<0.16,1> <0.36,1> <0.2,1><0.14,1> <0.28,1> <0.84,1>

<0.08,1><0.12,1>

0|)0|(

)0,(Z

ZXQ

ZXfE

0|)0|(

)0,(Z

ZYQ

ZYfE

Sample # Z X Y

1 0 1 0

2 0 2 1

3 1 1 1

4 1 2 0

2/)0,2w(x0)z1,w(x

0 Zhaving Xon samples of Weight Average

0|)0|(

)0,( of Estimate

z

ZZXQ

ZXfE

Z

X Y

Pseudo-tree

Z

ZYQ

YZfEZ

ZXQ

XZfE

ZQ

ZfEM QQQ |

)|(

)(|

)|(

)(

)(

)(


Y

B

Z

X

A

64

AND/OR Importance Sampling

Z

0

X

2

1

1

Y

0 1

X

21

Y

0 1

<1.6,2> <0.4,2>

<0.16,1> <0.36,1> <0.2,1><0.14,1> <0.28,1> <0.84,1>

<0.08,1><0.12,1>

0.262

0.360.16

0.17

2

0.140.2

0.462

0.840.08

0.2

2

0.120.28

0.09246.0*2.0 0.044217.0*26.0

05376.04

)092.04.02(0.0442)1.6(2

All AND nodes: Separate Components. Take ProductOperator: Product

All OR nodes: Conditional Expectations given the assignment above it

Operator: Average

Sample # Z X Y

1 0 1 0

2 0 2 1

3 1 1 1

4 1 2 0


65

Algorithm AND/OR Importance Sampling

1. Generate samples x1, . . . ,xN from Q along O.

2. Build a AND/OR sample tree for the samples x1, . . . ,xN along the ordering O.

3. FOR all leaf nodes i of AND-OR tree do1. IF AND-node v(i)= 1 ELSE v(i)=0

4. FOR every node n from leaves to the root do1. IF AND-node v(n)=product of children2. IF OR-node v(n) = Average of children

5. Return v(root-node)


67

Properties of AND/OR Importance Sampling

• Unbiased estimate of weighted counts.• AND/OR estimate has lower Variance than

conventional importance sampling estimate.• Variance Reduction

– Easy to Prove for case of complete independence (Goodman, 1960)

– Complicated to prove for general conditional independence case (Gogate thesis, papers!)


AND/OR w-cutset (Rao-Blackwellised) sampling

• Rao-Blackwellisation (Rao, 1963)– Partition X into K and R, such that we can compute

P(R|k) efficiently.– Sample from K and sum out R– Estimate:

– w-cutset sampling (Bidyuk and Dechter, 2003): Select K such that the treewidth of R after removing K is bounded by “w”.

N

i i

Ri

RB kQ

kRP

NM

1 )(

)|(1ˆ

Weighted Counts conditioned on K=ki

Gogate and Dechter, Under review, CP 2009

AND/OR w-cutset sampling• Perform AND/OR tree or graph sampling on K• Exact inference on R• Orthogonal approaches:

– Theorem: Combining AND/OR sampling and w-cutset sampling yields further variance reduction.

A

B C

D

E G

F

A

B C

D

E G

F

A

B C

A

B

C

Full pseudo tree

Start pseudo treeon the cutset variables

OR pseudo tree on the cutset variables

Graphical model

Gogate and Dechter, Under review, CP 2009

70

From Search Trees to Search Graphs

• Any two nodes that root identical subtrees (subgraphs) can be merged

71

From Search Trees to Search Graphs

• Any two nodes that root identical subtrees (subgraphs) can be merged

72

Merging Based on Context

• One way of recognizing nodes that can be merged:

context (X) = ancestors of X in pseudo tree that are connected to X, or to descendants of X

[ ]

[A]

[AB]

[AE][BC]

[AB]

A

D

B

EC

F

pseudo tree

A

E

C

B

F

D

A

E

C

B

F

D

AND/OR Graphs

C

B D

A

2

B D

0 101

C

0 1

B D B D

1 2 1 202 0 2

2 0 1 1

A

1 2

A

0 2

A

0

A

1

A

2

A

0

2

B D

0 101

C

0 1

D D

1 2 0 2

1 2

2

0

B

1

A

0

2

A

1

B

0

A

2

AND/OR tree

AND/OR graph

C

B

D

A

≠

≠

≠

{0,1,2}

{0,1,2}

{0,1,2}

{0,1,2}

C

0

B

1

D

1

A

0

2

D

2

A

1

1

B

0

D

0

A

2 0

2

D

2

A

C

0 1

B D B D

1 2 1 202 0 2

A

0

A

1

A

2

A

0

C

0 1

D D

1 2 0 22

0

B

1

A

0

2

A

1

B

0

A

2

C

B D

A

C

B

D

A

≠

≠

≠

{0,1,2}

{0,1,2}

{0,1,2}

{0,1,2}

OR Sample Tree

AND/OR Sample Tree AND/OR Sample

Graph

AND/OR graph sampling

IS

w-cutset ISAND/OR

Tree IS

AND/OR w-cutset Tree IS

AND/OR Graph IS

AND/OR w-cutset Graph IS

Variance Hierarchy and Complexity

O(nN)O(1)

O(nN)O(h)

O(cN+(n-c)Nexp(w))O(h+(n-c)exp(w))

O(cN+(n-c)Nexp(w))O(1)

O(nNw*)O(nN)

O(cNw*+(n-c)Nexp(w))O(cN+(n-c)exp(w))

Experiments

• Benchmarks– Linkage analysis– Graph coloring– Grids

• Algorithms– OR tree sampling– AND/OR tree sampling– AND/OR graph sampling– w-cutset versions of the three schemes above

Results: Probability of EvidenceLinkage instances (UAI 2006 evaluation)

Time Bound: 1hr

82

Summary: AND/OR Importance sampling

• AND/OR sampling: A general scheme to exploit conditional independence in sampling

• Theoretical guarantees: lower sampling error than conventional sampling

• Variance reduction orthogonal to Rao-Blackwellised sampling.

• Better empirical performance than conventional sampling.