Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav...

186
Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter

Transcript of Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav...

Page 1: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Sampling Techniques for Probabilistic and Deterministic Graphical models

Bozhena BidyukVibhav GogateRina Dechter

Page 2: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Overview

1. Probabilistic Reasoning/Graphical models2. Importance Sampling3. Markov Chain Monte Carlo: Gibbs Sampling4. Sampling in presence of Determinism 5. Rao-Blackwellisation6. AND/OR importance sampling

Page 3: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Overview

1. Probabilistic Reasoning/Graphical models2. Importance Sampling3. Markov Chain Monte Carlo: Gibbs Sampling4. Sampling in presence of Determinism 5. Cutset-based Variance Reduction6. AND/OR importance sampling

Page 4: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Probabilistic Reasoning;Graphical models

• Graphical models:– Bayesian network, constraint networks, mixed network

• Queries• Exact algorithm

– using inference, – search and hybrids

• Graph parameters: – tree-width, cycle-cutset, w-cutset

Page 5: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

5

Bayesian Networks (Pearl, 1988)

P(S, C, B, X, D) = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B

lung Cancer

Smoking

X-ray

Bronchitis

DyspnoeaP(D|C,B)

P(B|S)

P(S)

P(X|C,S)

P(C|S)

CPT: C B P(D|C,B)0 0 0.1 0.90 1 0.7 0.31 0 0.8 0.21 1 0.9 0.1

Belief Updating:P (lung cancer=yes | smoking=no, dyspnoea=yes ) = ?

))(|()(

))(|(

1

i

n

i

i

ii

XpaXPXP

XpaXP

:CPTs

Probability of evidence:P ( smoking=no, dyspnoea=yes ) = ?

Page 6: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Queries

• Probability of evidence (or partition function)

• Posterior marginal (beliefs):

• Most Probable Explanation

)var( 1

|)|()(eX

n

ieii paxPeP

X iii CZ )(

)var( 1

)var( 1

|)|(

|)|(

)(

),()|(

eX

n

jejj

XeX

n

jejj

ii

paxP

paxP

eP

exPexP i

e),xP(maxarg*xx

Page 7: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

A Bred greenred yellowgreen redgreen yellowyellow greenyellow red

Map coloring

Variables: countries (A B C etc.)

Values: colors (red green blue)

Constraints: etc. ,ED D, AB,A

C

A

B

DE

F

G

Constraint Networks

Constraint graph

A

BD

CG

F

E

Task: find a solutionCount solutions, find a good one

Page 8: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

8

= {(¬C), (A v B v C), (¬A v B v E), (¬B v C v D)}.

Propositional Satisfiability

Page 9: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Mixed Networks: Mixing Belief and ConstraintsBelief or Bayesian Networks

A

D

B C

E

FA

D

B C

E

F

)|(),,|(

),|(),|(),|(),( :CPTS

}1,0{:Domains

,,,,, :Variables

AFPBAEP

CBDPACPABPAP

DDDDDD

FEDCBA

FEDCBA

Constraint Networks

)( :solutions ofset theExpresses

),(),(),(),( :sConstraint

}1,0{:Domains

,,,,, :Variables

4321

Rsol

EARBCDRACFRABCR

DDDDDD

FEDCBA

FEDCBA

B C D=0

D=1

0 0 0 1

0 1 .1 .9

1 0 .3 .7

1 1 1 0

),|( CBDP

allowednot is 1D1,C1,B

allowednot is 0,0,0

)(3

DCB

BCDR

B= R=

Constraints could be specified externally or may occur as zeros in the Belief network

Same queries (e.g., weighted counts)

)(

)(Rsolx

B xPM

Page 10: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

10

Belief Updating

“Moral” graph

A

D E

CB

P(a|e=0) P(a,e=0)=

bcde ,,,0

P(a)P(b|a)P(c|a)P(d|b,a)P(e|b,c)=

0e

P(a) d

),,,( ecdahB

b

P(b|a)P(d|b,a)P(e|b,c)

B C

ED

Variable Elimination

P(c|a)c

Page 11: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

11

Bucket Elimination Algorithm elim-bel (Dechter 1996)

b

Elimination operator

P(a|e=0)

W*=4Exp(w*)

bucket B:

P(a)

P(c|a)

P(b|a) P(d|b,a) P(e|b,c)

bucket C:

bucket D:

bucket E:

bucket A:

e=0

B

C

D

E

A

e)(a,hD

(a)hE

e)c,d,(a,hB

e)d,(a,hC

Page 12: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

12

Bucket Elimination )0,()0|( eaPeaP

debc

cbePbadPacPabPaPeaP,0,,

),|(),|()|()|()()0,(

dc b e

badPcbePabPacPaP ),|(),|()|()|()(0

Elimination Order: d,e,b,cQuery:

D:

E:

B:

C:

A:

d

D badPbaf ),|(),(),|( badP

),|( cbeP ),|0(),( cbePcbfE

b

EDB cbfbafabPcaf ),(),()|(),(

)()()0,( afApeaP C)(aP

)|( acP c

BC cafacPaf ),()|()(

)|( abP

D,A,BD,A,B E,B,CE,B,C

B,A,CB,A,C

C,AC,A

AA

),( bafD ),( cbfE

),( cafB

)(afC

AA

DD EE

CCBB

Bucket TreeD E

B

C

A

Original Functions Messages

Time and space exp(w*)

Page 13: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

13

Complexity of Elimination

))((exp ( * dwnOddw ordering along graph moral of widthinduced the)(*

The effect of the ordering:

4)( 1* dw 2)( 2

* dw“Moral” graph

A

D E

CB

B

C

D

E

A

E

D

C

B

A

Page 14: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Cutset-Conditioning

C P

J A

L

B

E

DF M

O

H

K

G N

C P

J

L

B

E

DF M

O

H

K

G N

A

C P

J

L

E

DF M

O

H

K

G N

B

P

J

L

E

DF M

O

H

K

G N

C

Cycle cutset = {A,B,C}

C P

J A

L

B

E

DF M

O

H

K

G N

C P

J

L

B

E

DF M

O

H

K

G N

C P

J

L

E

DF M

O

H

K

G N

C P

J A

L

B

E

DF M

O

H

K

G N

Page 15: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

15

Search Over the Cutset

A=yellow A=green

B=red B=blue B=red B=blueB=green B=yellow

C

K

G

LD

FH

M

J

E

C

K

G

LD

FH

M

J

E

C

K

G

LD

FH

M

J

E

C

K

G

LD

FH

M

J

E

C

K

G

LD

FH

M

J

E

C

K

G

LD

FH

M

J

E

• Inference may require too much memory

• Condition on some of the variablesA

C

B K

G

LD

FH

M

J

E

GraphColoringproblem

Space: exp(w): w is a user-controled parameter Time: exp(w+c(w))

Page 16: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

2 1? ?? ?

A aB b

A AB b 3 4

A? | B? |

? ?? ? 5 6

A | aB | b •6 individuals

•Haplotype: {2, 3}• Genotype: {6}• Unknown

Linkage Analysis

Page 17: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

17

L11m L11f

X11

L12m L12f

X12

L13m L13f

X13

L14m L14f

X14

L15m L15f

X15

L16m L16f

X16

S13m

S15m

S16mS15m

S15m

S15m

L21m L21f

X21

L22m L22f

X22

L23m L23f

X23

L24m L24f

X24

L25m L25f

X25

L26m L26f

X26

S23m

S25m

S26mS25m

S25m

S25m

L31m L31f

X31

L32m L32f

X32

L33m L33f

X33

L34m L34f

X34

L35m L35f

X35

L36m L36f

X36

S33m

S35m

S36mS35m

S35m

S35m

Linkage Analysis: 6 People, 3 Markers

Page 18: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Applications• Determinism: More Ubiquitous than you may think!

• Transportation Planning (Liao et al. 2004, Gogate et al. 2005)– Predicting and Inferring Car Travel Activity of individuals

• Genetic Linkage Analysis (Fischelson and Geiger, 2002)– associate functionality of genes to their location on chromosomes.

• Functional/Software Verification (Bergeron, 2000)– Generating random test programs to check validity of hardware

• First Order Probabilistic models (Domingos et al. 2006, Milch et al. 2005)– Citation matching

Page 19: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

19

Inference vs Conditioning-Search Inference

Exp(w*) time/space

A

D

B C

E

F0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1

E

C

F

D

B

A 0 1 SearchExp(n) timeO(n) space

EK

F

L

H

C

BA

M

G

J

D

ABC

BDEF

DGF

EFH

FHK

HJ KLM

A=yellow A=green

B=blue B=red B=blueB=green

CK

G

LD

F H

M

J

EACB K

G

LD

F H

M

J

E

CK

G

LD

F H

M

J

EC

K

G

LD

F H

M

J

EC

K

G

LD

F H

M

J

ESearch+inference:Space: exp(w)Time: exp(w+c(w))

Page 20: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

20

Approximation

• Since inference, search and hybrids are too expensive when graph is dense; (high treewidth) then:

• Bounding inference:

• mini-bucket and mini-clustering• Belief propagation

• Bounding search:• Sampling

• Goal: an anytime scheme

Page 21: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

21

Approximation

• Since inference, search and hybrids are too expensive when graph is dense; (high treewidth) then:

• Bounding inference:

• mini-bucket and mini-clustering• Belief propagation

• Bounding search:• Sampling

• Goal: an anytime scheme

Page 22: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Overview

1. Probabilistic Reasoning/Graphical models2. Importance Sampling3. Markov Chain Monte Carlo: Gibbs Sampling4. Sampling in presence of Determinism 5. Rao-Blackwellisation6. AND/OR importance sampling

Page 23: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Outline

• Definitions and Background on Statistics• Theory of importance sampling• Likelihood weighting• State-of-the-art importance sampling

techniques

23

Page 24: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

A sample• Given a set of variables X={X1,...,Xn}, a sample,

denoted by St is an instantiation of all variables:

24

),...,,( tn

ttt xxxS 21

Page 25: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

How to draw a sample ?Univariate distribution

• Example: Given random variable X having domain {0, 1} and a distribution P(X) = (0.3, 0.7).

• Task: Generate samples of X from P.• How?

– draw random number r [0, 1]– If (r < 0.3) then set X=0– Else set X=1

25

Page 26: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

How to draw a sample?Multi-variate distribution

• Let X={X1,..,Xn} be a set of variables• Express the distribution in product form

• Sample variables one by one from left to right, along the ordering dictated by the product form.

• Bayesian network literature: Logic sampling

26

),...,|(...)|()()( 11121 nn XXXPXXPXPXP

Page 27: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Logic sampling (example)

27

X1

X4

X2X3

)( 1XP

)|( 12 XXP )|( 13 XXP

)|( from Sample .4

)|( from Sample .3

)|( from Sample .2

)( from Sample .1

sample generate//

Evidence No

33,2244

1133

1122

11

xXxXxPx

xXxPx

xXxPx

xPx

k

),|()|()|()(),,,( 324131214321 XXXPXXPXXPXPXXXXP

),|( 324 XXXP

Page 28: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Expected value: Given a probability distribution P(X) and a function g(X) defined over a set of variables X = {X1, X2, … Xn}, the expected value of g w.r.t. P is

Variance: The variance of g w.r.t. P is:

Expected value and Variance

28

)()()]([ xPxgxgEx

P

)()]([)()]([ 2 xPxgExgxgVarx

PP

Page 29: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Monte Carlo Estimate

• Estimator: – An estimator is a function of the samples.– It produces an estimate of the unknown

parameter of the sampling distribution.

29

T

t

t

P

SgT

g

P

1

T21

1

:bygiven is [g(x)]E of estimate carlo Monte the

, fromdrawn S ,S ,S samples i.i.d.Given

)(ˆ

Page 30: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Example: Monte Carlo estimate• Given:

– A distribution P(X) = (0.3, 0.7).– g(X) = 40 if X equals 0 = 50 if X equals 1.

• Estimate EP[g(x)]=(40x0.3+50x0.7)=47.• Generate k samples from P: 0,1,1,1,0,1,1,0,1,0

30

4610

650440

150040

samples

XsamplesXsamplesg

#

)(#)(#ˆ

Page 31: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Outline

• Definitions and Background on Statistics• Theory of importance sampling• Likelihood weighting• State-of-the-art importance sampling

techniques

31

Page 32: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Importance sampling: Main idea

• Transform the probabilistic inference problem into the problem of computing the expected value of a random variable w.r.t. to a distribution Q.

• Generate random samples from Q.• Estimate the expected value from the

generated samples.

32

Page 33: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Importance sampling for P(e)

)(,)()(ˆ

:

)]([)(

),(

)(

)(),(),()(

)(),(

,\

ZQzwT

eP

zwEzQ

ezPE

zQ

zQezPezPeP

zQezP

EXZLet

tT

t

t

QQzz

z where1

estimate Carlo Monte

:as P(e) rewritecan weThen,

00

satisfying on,distributi (proposal) a be Q(Z)Let

1

Page 34: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Properties of IS estimate of P(e) • Convergence: by law of large numbers

• Unbiased.

• Variance:

Tfor )()(

1)(ˆ ..

1ePzw

TeP saT

i

i

T

zwVarzw

TVarePVar QN

i

iQQ

)]([)(

1)(ˆ

1

)()](ˆ[ ePePEQ

Page 35: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Properties of IS estimate of P(e)• Mean Squared Error of the estimator

T

xwVar

ePVar

ePVarePEeP

ePePEePMSE

Q

Q

QQ

QQ

)]([

)(ˆ

)(ˆ)](ˆ[)(

)()(ˆ)(ˆ

2

2

This quantity enclosed in the brackets is zero because the expected value of the

estimator equals the expected value of g(x)

Page 36: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Estimating P(Xi|e)

36

)|()|(E:biased is Estimate

,

,

)(ˆ),(ˆ

)|(:estimate Ratio

IS.by r denominato andnumerator Estimate :Idea

)(),(

)(

),()(

),(

),()(

)(

),()|(

otherwise. 0 and xcontains z if 1 is which function, delta-dirac a be (z)Let

T

1k

T

1k

ix i

exPexP

e)w(z

e))w(z(z

eP

exPexP

zQezP

E

zQ

ezPzE

ezP

ezPz

eP

exPexP

ii

k

kkx

ii

Q

xQ

z

zx

ii

i

i

i

Page 37: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Properties of the IS estimator for P(Xi|e)

• Convergence: By Weak law of large numbers

• Asymptotically unbiased

• Variance– Harder to analyze– Liu suggests a measure called “Effective sample

size”

37

T as )|()|( exPexP ii

)|()]|([lim exPexPE iiPT

Page 38: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Effective Sample size

38

possible. as small as bemust weights theof variance theTherefore,

Q. from samples T)ESS(Q, worth are P from samples T Thus

),()]|([

)]|(ˆ[

)]([var1),(:

:using e)|P(x estimatecan wee),|P(z from samplesGiven

)|()()|(

1

i

TQESS

T

exPVar

exPVar

zw

TTQESSDefine

)(zgT

|e)(xP

ezPzgexP

iQ

iP

Q

T

j

txi

zxi

i

i

Ideal estimator

Measures how much the estimator deviates from the ideal one.

Page 39: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Outline

• Definitions and Background on Statistics• Theory of importance sampling• Likelihood weighting• State-of-the-art importance sampling

techniques

39

Page 40: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Likelihood Weighting: Proposal Distribution

40

EEjj

EXXii

EXX EEjjii

n

EXXii

j

i

i j

i

paeP

epaxP

paePepaxP

xQ

exPw

xxx

Weights

xXXXPXP

epaXPEXQ

)|(

),|(

)|(),|(

)(

),(

),..,(:

:

),|()(

),|()\(

\

\

\

sample aGiven

)X,Q(X

.xX Evidence

and )X,X|P(X)X|P(X)P(X)X,X,P(X :networkBayesian aGiven

:Example

1

2213131

22

213121321

Page 41: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Likelihood Weighting: Sampling

41

e e e e e

e e e e

Sample in topological order over X !

Clamp evidence, Sample xi P(Xi|pai), P(Xi|pai) is a look-up in CPT!

Page 42: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Outline

• Definitions and Background on Statistics• Theory of importance sampling• Likelihood weighting• State-of-the-art importance sampling

techniques

42

Page 43: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Proposal selection• One should try to select a proposal that is as

close as possible to the posterior distribution.

)|()(

)()(

),(

estimator variance-zero a have to,0)()(

),(

)()()(

),(1)]([)(ˆ

2

ezPzQ

zQeP

ezP

ePzQ

ezP

zQePzQ

ezP

NT

zwVarePVar

Zz

QQ

Page 44: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Proposal Distributions used in Literature

• AIS-BN (Adaptive proposal) • Cheng and Druzdzel, 2000

• Iterative Belief Propagation • Changhe and Druzdzel, 2003

• Iterative Join Graph Propagation (IJGP) and variable ordering • Gogate and Dechter, 2005

Page 45: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Perfect sampling using Bucket Elimination

• Algorithm:– Run Bucket elimination on the problem along an

ordering o=(XN,..,X1).

– Sample along the reverse ordering: (X1,..,XN)

– At each variable Xi, recover the probability P(Xi|x1,...,xi-1) by referring to the bucket.

Page 46: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

SP2 46

Bucket elimination (BE) Algorithm elim-bel (Dechter 1996)

b

Elimination operator

P(e)

bucket B:

P(a)

P(C|A)

P(B|A) P(D|B,A) P(e|B,C)

bucket C:

bucket D:

bucket E:

bucket A:

B

C

D

E

A

e)(A,hD

(a)hE

e)C,D,(A,hB

e)D,(A,hC

Page 47: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

SP2 47

Sampling from the output of BE (Dechter 2002)

bucket B:

P(A)

P(C|A)

P(B|A) P(D|B,A) P(e|B,C)

bucket C:

bucket D:

bucket E:

bucket A:

e)(A,hD

(A)hE

e)C,D,(A,hB

e)D,(A,hC

Q(A)aA:

(A)hP(A)Q(A) E

Sample

ignore :bucket Evidence

e)D,(a,he)a,|Q(Dd D :Sample

bucket in the aASet C

e)C,d,(a,hd)e,a,|Q(C cC :Sample

bucket in the dDa,ASet B

),|(),|()|( cbePaBdPaBP

d)e,a,|Q(Cb B :Sample

bucket in the cCd,Da,ASet

Page 48: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

SP2 48

Mini-buckets: “local inference”

• Computation in a bucket is time and space exponential in the number of variables involved

• Therefore, partition functions in a bucket into “mini-buckets” on smaller number of variables

• Can control the size of each “mini-bucket”, yielding polynomial complexity.

Page 49: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

SP2 49 49

Mini-Bucket Elimination

bucket A:

bucket E:

bucket D:

bucket C:

bucket B:

ΣB

P(B|A) P(D|B,A)

hE(A)

hB(A,D)

P(e|B,C)

Mini-buckets

ΣB

P(C|A) hB(C,e)

hD(A)

hC(A,e)

Approximation of P(e)

Space and Time constraints:Maximum scope size of the new function generated should be bounded by 2

BE generates a function having scope size 3. So it cannot be used.

P(A)

Page 50: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

SP2 50 50

Sampling from the output of MBE

bucket A:

bucket E:

bucket D:

bucket C:

bucket B: P(B|A) P(D|B,A)

hE(A)

hB(A,D)

P(e|B,C)

P(C|A) hB(C,e)

hD(A)

hC(A,e)Sampling is same as in BE-sampling except that now we construct Q from a randomly selected “mini-bucket”

Page 51: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

IJGP-Sampling (Gogate and Dechter, 2005)

• Iterative Join Graph Propagation (IJGP)– A Generalized Belief Propagation scheme (Yedidia

et al., 2002)• IJGP yields better approximations of P(X|E)

than MBE– (Dechter, Kask and Mateescu, 2002)

• Output of IJGP is same as mini-bucket “clusters”

• Currently the best performing IS scheme!

Page 52: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

k

)(ˆ Re

')(Q Update

)(N

1)(ˆe)(EP̂

Q z,...,z samples Generate

dok to1iFor

0)(ˆ

))(|(...))(|()()(Q Proposal Initial

1k

1

N1

2211

eEPturn

End

QQkQ

zweEP

from

eEP

ZpaZQZpaZQZQZ

kk

iN

jk

k

nn

Adaptive Importance Sampling

Page 53: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Adaptive Importance Sampling

• General case• Given k proposal distributions• Take N samples out of each distribution• Approximate P(e)

1)(ˆ

1

k

j

proposaljthweightAvgk

eP

Page 54: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Estimating Q'(z)

sampling importanceby estimated is

)Z,..,Z|(ZQ'each where

))(|('...))(|(')(')(Q

1-i1i

221'

nn ZpaZQZpaZQZQZ

Page 55: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Overview

1. Probabilistic Reasoning/Graphical models2. Importance Sampling3. Markov Chain Monte Carlo: Gibbs Sampling4. Sampling in presence of Determinism 5. Rao-Blackwellisation6. AND/OR importance sampling

Page 56: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Markov Chain

• A Markov chain is a discrete random process with the property that the next state depends only on the current state (Markov Property):

56

x1 x2 x3 x4

)|(),...,,|( 1121 tttt xxPxxxxP

• If P(Xt|xt-1) does not depend on t (time homogeneous) and state space is finite, then it is often expressed as a transition function (aka transition matrix) 1)(

x

xXP

Page 57: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Example: Drunkard’s Walk• a random walk on the number line where, at

each step, the position may change by +1 or −1 with equal probability

57

1 2 3

,...}2,1,0{)( XD5.05.0

)1()1(

n

nPnP

transition matrix P(X)

1 2

Page 58: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Example: Weather Model

58

5.05.0

1.09.0

)()(

sunny

rainy

sunnyPrainyP

transition matrix P(X)

},{)( sunnyrainyXD

rain rain rainrain sun

Page 59: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Multi-Variable System

• state is an assignment of values to all the variables

59

x1t

x2t

x3t

x1t+1

x2t+1

x3t+1

},...,,{ 21tn

ttt xxxx

finitediscreteXDXXXX i ,)(},,,{ 321

Page 60: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Bayesian Network System• Bayesian Network is a representation of the

joint probability distribution over 2 or more variables

X1t

X2t

X3t

},,{ 321tttt xxxx

X1

X2 X3

60

},,{ 321 XXXX

x1t+1

x2t+1

x3t+1

Page 61: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

61

Stationary DistributionExistence

• If the Markov chain is time-homogeneous, then the vector (X) is a stationary distribution (aka invariant or equilibrium distribution, aka “fixed point”), if its entries sum up to 1 and satisfy:

• Finite state space Markov chain has a unique stationary distribution if and only if:– The chain is irreducible– All of its states are positive recurrent

)(

)|()()(XDx

jiji

i

xxPxx

Page 62: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

62

Irreducible

• A state x is irreducible if under the transition rule one has nonzero probability of moving from x to any other state and then coming back in a finite number of steps

• If one state is irreducible, then all the states must be irreducible

(Liu, Ch. 12, pp. 249, Def. 12.1.1)

Page 63: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

63

Recurrent

• A state x is recurrent if the chain returns to x with probability 1

• Let M(x ) be the expected number of steps to return to state x

• State x is positive recurrent if M(x ) is finite The recurrent states in a finite state chain are positive recurrent .

Page 64: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Stationary Distribution Convergence

• Consider infinite Markov chain:nnn PPxxPP 00)( )|(

• Initial state is not important in the limit “The most useful feature of a “good” Markov

chain is its fast forgetfulness of its past…” (Liu, Ch. 12.1)

)(lim n

nP

64

• If the chain is both irreducible and aperiodic, then:

Page 65: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Aperiodic

• Define d(i) = g.c.d.{n > 0 | it is possible to go from i to i in n steps}. Here, g.c.d. means the greatest common divisor of the integers in the set. If d(i)=1 for i, then chain is aperiodic

• Positive recurrent, aperiodic states are ergodic

65

Page 66: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Markov Chain Monte Carlo• How do we estimate P(X), e.g., P(X|e) ?

66

• Generate samples that form Markov Chain with stationary distribution =P(X|e)

• Estimate from samples (observed states):visited states x0,…,xn can be viewed as “samples”

from distribution

),(1

)(1

tT

t

xxT

x

)(lim xT

Page 67: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

MCMC Summary• Convergence is guaranteed in the limit

• Samples are dependent, not i.i.d.• Convergence (mixing rate) may be slow• The stronger correlation between states, the

slower convergence!

• Initial state is not important, but… typically, we throw away first K samples - “burn-in”

67

Page 68: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Gibbs Sampling (Geman&Geman,1984)

• Gibbs sampler is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables

• Sample new variable value one variable at a time from the variable’s conditional distribution:

• Samples form a Markov chain with stationary distribution P(X|e)

68

)\|(},...,,,..,|()( 111 it

itn

ti

ti

tii xxXPxxxxXPXP

Page 69: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Gibbs Sampling: IllustrationThe process of Gibbs sampling can be understood as a random walk in the space of all instantiations of X=x (remember drunkard’s walk):

In one step we can reach instantiations that differ from current one by value assignment to at most one variable (assume randomized choice of variables Xi).

Page 70: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Ordered Gibbs Sampler

Generate sample xt+1 from xt :

In short, for i=1 to N:

70

),\|( from sampled

),,...,,|(

...

),,...,,|(

),,...,,|(

1

11

12

11

1

31

121

22

3211

11

exxXPxX

exxxXPxX

exxxXPxX

exxxXPxX

it

itii

tN

ttN

tNN

tN

ttt

tN

ttt

ProcessAllVariablesIn SomeOrder

Page 71: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Transition Probabilities in BN

Markov blanket:

:)|( )\|( tiii

ti markovXPxxXP

ij chX

jjiiit

i paxPpaxPxxxP )|()|()\|(

)()( jj chX

jiii pachpaXmarkov

Xi

Given Markov blanket (parents, children, and their parents),Xi is independent of all other nodes

71

Computation is linear in the size of Markov blanket!

Page 72: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Ordered Gibbs Sampling Algorithm(Pearl,1988)

Input: X, E=eOutput: T samples {xt }Fix evidence E=e, initialize x0 at random1. For t = 1 to T (compute samples)2. For i = 1 to N (loop through variables)3. xi

t+1 P(Xi | markovit)

4. End For5. End For

Page 73: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Gibbs Sampling Example - BN

73

X1

X4

X8 X5 X2

X3

X9 X7

X6

}{},,...,,{ 9921 XEXXXX

X1 = x10

X6 = x60

X2 = x20

X7 = x70

X3 = x30

X8 = x80

X4 = x40

X5 = x50

Page 74: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Gibbs Sampling Example - BN

74

X1

X4

X8 X5 X2

X3

X9 X7

X6

),,...,|( 908

021

11 xxxXPx

}{},,...,,{ 9921 XEXXXX

),,...,|( 908

112

12 xxxXPx

Page 75: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Answering Queries P(xi |e) = ?• Method 1: count # of samples where Xi = xi (histogram estimator):

T

t

tiiiii markovxXP

TxXP

1

)|(1

)(

T

t

tiii xx

TxXP

1

),(1

)( Dirac delta f-n

• Method 2: average probability (mixture estimator):

• Mixture estimator converges faster (consider estimates for the unobserved values of Xi; prove via Rao-Blackwell theorem)

Page 76: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Rao-Blackwell Theorem

Rao-Blackwell Theorem: Let random variable set X be composed of two groups of variables, R and L. Then, for the joint distribution (R,L) and function g, the following result applies

76

)]([}|)({[ RgVarLRgEVar for a function of interest g, e.g., the mean or

covariance (Casella&Robert,1996, Liu et. al. 1995).

• theorem makes a weak promise, but works well in practice!• improvement depends the choice of R and L

Page 77: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Importance vs. Gibbs

T

tt

tt

t

T

t

t

T

t

xQ

xPxg

Tg

eXQX

xgT

Xg

eXPeXP

eXPx

1

1

)(

)()(1

)|(

)(1

)(ˆ

)|()|(ˆ

)|(ˆ

wt

Gibbs:

Importance:

Page 78: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Gibbs Sampling: Convergence

78

• Sample from `P(X|e)P(X|e)• Converges iff chain is irreducible and ergodic• Intuition - must be able to explore all states:

– if Xi and Xj are strongly correlated, Xi=0 Xj=0, then, we cannot explore states with Xi=1 and Xj=1

• All conditions are satisfied when all probabilities are positive

• Convergence rate can be characterized by the second eigen-value of transition matrix

Page 79: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Gibbs: Speeding Convergence

Reduce dependence between samples (autocorrelation)

• Skip samples• Randomize Variable Sampling Order• Employ blocking (grouping)• Multiple chainsReduce variance (cover in the next section)

79

Page 80: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Blocking Gibbs Sampler• Sample several variables together, as a block• Example: Given three variables X,Y,Z, with domains

of size 2, group Y and Z together to form a variable W={Y,Z} with domain size 4. Then, given sample (xt,yt,zt), compute next sample:

+ Can improve convergence greatly when two variables are strongly correlated!

- Domain of the block variable grows exponentially with the #variables in a block!

80

)|,(),(

)(),|(1111

1

tttt

tttt

xZYPwzy

wPzyXPx

Page 81: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Gibbs: Multiple Chains• Generate M chains of size K• Each chain produces independent estimate Pm:

81

M

imP

MP

1

K

ti

tiim xxxP

KexP

1

)\|(1

)|(

Treat Pm as independent random variables.

• Estimate P(xi|e) as average of Pm (xi|e) :

Page 82: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Gibbs Sampling Summary• Markov Chain Monte Carlo method(Gelfand and Smith, 1990, Smith and Roberts, 1993, Tierney, 1994)

• Samples are dependent, form Markov Chain• Sample from which converges to• Guaranteed to converge when all P > 0• Methods to improve convergence:

– Blocking– Rao-Blackwellised

82

)|( eXP )|( eXP

Page 83: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Overview

1. Probabilistic Reasoning/Graphical models2. Importance Sampling3. Markov Chain Monte Carlo: Gibbs Sampling4. Sampling in presence of Determinism 5. Rao-Blackwellisation6. AND/OR importance sampling

Page 84: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Outline

• Rejection problem• Backtrack-free distribution

– Constructing it in practice• SampleSearch

– Construct the backtrack-free distribution on the fly.

• Approximate estimators• Experiments

Page 85: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Outline

• Rejection problem• Backtrack-free distribution

– Constructing it in practice• SampleSearch

– Construct the backtrack-free distribution on the fly.

• Approximate estimators• Experiments

Page 86: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Rejection problem

• Importance sampling requirement – P(z,e) > 0 Q(z)>0

• When P(z,e)=0 but Q(z) > 0, the weight of the sample is zero and it is rejected.

• The probability of generating a rejected sample should be very small.– Otherwise the estimate will be zero.

N

ii

i

zQ

ezP

NeP

1

1

)(

),()(ˆ

Page 87: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Rejection Problem

All Blue leaves correspond to solutions i.e. g(x) >0All Red leaves correspond to non-solutions i.e. g(x)=0

solution. anot is if 0

1 :sampling Importance

1

ii

N

ii

i

zezP

zQ

ezP

NeP

),(

)(

),()(ˆ

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

Constraints: A≠B A≠C

If z violates either A≠B or A≠C then P(z,e)=0

Q: Positive

Page 88: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Outline

• Rejection problem• Backtrack-free distribution

– Constructing it in practice• SampleSearch

– Construct the backtrack-free distribution on the fly.

• Approximate estimators• Experiments

Page 89: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Backtrack-free distribution: A rejection-free distribution

All Blue leaves correspond to solutions i.e. g(x) >0All Red leaves correspond to non-solutions i.e. g(x)=0

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0 1

0 0 0 1 1 0 0 0

1 0

QF(branch)=0 if no solutions under it

QF(branch) αQ(branch) otherwise

Constraints: A≠B A≠C

Page 90: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Generating samples from QF

C=1

Root

0.80.2

?? Solution?? Solution

?? Solution?? Solution

A=0

00.4 0.6 1

?? Solution ?? Solution

B=1

00.2 0.81

QF(branch)=0 if no solutions under it

QF(branch) αQ(branch) otherwise

• Invoke an oracle at each branch.• Oracle returns

True if there is a solution under a branch

• False, otherwise

Constraints: A≠B A≠C

Page 91: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Generating samples from QF

• Oracles• Adaptive consistency as pre-

processing step• Constant time table look-up• Exponential in the

treewidth of the constraint portion.

• A complete CSP solver • Need to run it at each

assignment.

A=0

B=1

C=1

Root

0.80.2

1

1

Constraints: A≠B A≠C

Gogate et al., UAI 2005, Gogate and Dechter, UAI 2005

Page 92: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Outline

• Rejection problem• Backtrack-free distribution

– Constructing it in practice• SampleSearch

– Construct the backtrack-free distribution on the fly.

• Approximate estimators• Experiments

Page 93: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

93

Algorithm SampleSearch

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

N

jj

j

zQ

ezP

NeP

1

1

)(

),()(ˆ

Gogate and Dechter, AISTATS 2007, AAAI 2007

Constraints: A≠B A≠C

Page 94: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

94

Algorithm SampleSearch

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.81

Gogate and Dechter, AISTATS 2007, AAAI 2007

Constraints: A≠B A≠C

N

jj

j

zQ

ezP

NeP

1

1

)(

),()(ˆ

Page 95: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

95

Algorithm SampleSearch

A=0

B=1 B=0 B=1

A=1

C=1C=1C=0 C=0 C=0 C=1

Root

0.8 0.2

0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

Resume Sampling

1

Gogate and Dechter, AISTATS 2007, AAAI 2007

Constraints: A≠B A≠C

N

jj

j

zQ

ezP

NeP

1

1

)(

),()(ˆ

Page 96: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

96

Algorithm SampleSearch

A=0

B=1 B=0 B=1

A=1

C=1C=1C=0 C=0 C=0 C=1

Root

0.8 0.2

1

0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

Constraint Violated

Until P(sample,e)>0

1

Gogate and Dechter, AISTATS 2007, AAAI 2007

Constraints: A≠B A≠C

N

jj

j

zQ

ezP

NeP

1

1

)(

),()(ˆ

Page 97: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

97

Generate more Samples

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

Gogate and Dechter, AISTATS 2007, AAAI 2007

Constraints: A≠B A≠C

N

jj

j

zQ

ezP

NeP

1

1

)(

),()(ˆ

Page 98: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

98

Generate more Samples

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

1

Gogate and Dechter, AISTATS 2007, AAAI 2007

Constraints: A≠B A≠C

N

jj

j

zQ

ezP

NeP

1

1

)(

),()(ˆ

Page 99: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Traces of SampleSearch

A=0

B=1

C=1

Root

A=0

B=0B=1

C=1

Root

A=0

B=1

C=1

Root

C=0

A=0

B=0 B=1

C=1

Root

C=0

Constraints: A≠B A≠C

Gogate and Dechter, AISTATS 2007, AAAI 2007

Page 100: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

100

SampleSearch: Sampling Distribution• Problem: Due to Search, the samples are no

longer i.i.d. from Q.

• Thm: SampleSearch generates i.i.d. samples from the backtrack-free distribution

)()(,)(

),()( ePePE

zQ

ezP

NeP Q

N

jj

j

1

1

Gogate and Dechter, AISTATS 2007, AAAI 2007

)()(ˆ,)(

),()(ˆ ePePE

zQ

ezP

NeP FQ

N

jjF

j

F F

1

1

Page 101: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

The Sampling distribution QF of SampleSearch

A=0

B=0 B=1

C=1C=1 C=0

Root

0.8

0 1

0 0 0 1

C=0

What is probability of generating A=0?

QF(A=0)=0.8

Why? SampleSearch is systematic

What is probability of generating (A=0,B=1)?

QF(B=1|A=0)=1

Why? SampleSearch is systematic

What is probability of generating (A=0,B=0)?

Simple: QF(B=0|A=0)=0

All samples generated by SampleSearch are solutions

Backtrack-free distribution

Gogate and Dechter, AISTATS 2007, AAAI 2007

Constraints: A≠B A≠C

N

jj

j

zQ

ezP

NeP

1

1

)(

),()(ˆ

Page 102: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Outline

• Rejection problem• Backtrack-free distribution

– Constructing it in practice• SampleSearch

– Construct the backtrack-free distribution on the fly.

• Approximate estimators• Experiments

Page 103: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Asymptotic approximations of QF

A=0

B=0 B=1

C=1C=1 C=0

Root

0.8

0 1

0 0 0 1

C=0

Hole ?

Don’t know

No solutions here

No solutions here

•IF Hole THEN

• UF=Q (i.e. there is a solution at the other branch)

• LF=0 (i.e. no solution at the other branch)

Gogate and Dechter, AISTATS 2007, AAAI 2007

Page 104: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Approximations:Convergence in the limit

• Store all possible traces

A=0

B=1

C=1

Root

0.8

1

1

?

Gogate and Dechter, AISTATS 2007, AAAI 2007

A=0

B=1

C=1

Root

C=0

0.8

?

0.6

1

?A=0

B=0B=1

C=1

Root

0.8

?

1

0.8

?

Page 105: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Approximations:Convergence in the limit

• From the combined sample tree, update U and L.IF Hole THEN UF

N=Q and LFN=0

)()(ˆ)(

)()()(

)()(

),(lim

)(

),(lim

ePePeP

zLzQzU

ePzL

ezPE

zU

ezPE

LFF

UF

FN

FFN

FN

NFN

N

:Bounding

unbiasedally Asymptotic A=0

B=1

C=1

Root

0.8

1

1

?

Gogate and Dechter, AISTATS 2007, AAAI 2007

Page 106: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Upper and Lower Approximations

• Asymptotically unbiased.• Upper and lower bound on the unbiased

sample mean• Linear time and space overhead• Bias versus variance tradeoff

– Bias = difference between the upper and lower approximation.

Page 107: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Improving Naive SampleSeach

• Better Search Strategy– Can use any state-of-the-art CSP/SAT solver e.g.

minisat (Een and Sorrenson 2006)• All theorems and result hold

• Better Importance Function– Use output of generalized belief propagation to

compute the initial importance function Q (Gogate and Dechter, 2005)

Gogate and Dechter, AISTATS 2007, AAAI 2007

Page 108: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Experiments• Tasks

– Weighted Counting– Marginals

• Benchmarks– Satisfiability problems (counting solutions)– Linkage networks– Relational instances (First order probabilistic networks)– Grid networks– Logistics planning instances

• Algorithms– SampleSearch/UB, SampleSearch/LB– SampleCount (Gomes et al. 2007, SAT)– ApproxCount (Wei and Selman, 2007, SAT)– RELSAT (Bayardo and Peshoueshk, 2000, SAT)– Edge Deletion Belief Propagation (Choi and Darwiche, 2006)– Iterative Join Graph Propagation (Dechter et al., 2002)– Variable Elimination and Conditioning (VEC)– EPIS (Changhe and Druzdzel, 2006)

Page 109: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Results: Solution CountsLangford instances

Time Bound: 10 hrs

Page 110: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.
Page 111: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Results: Probability of EvidenceLinkage instances (UAI 2006 evaluation)

Time Bound: 3 hrs

Page 112: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Results: Probability of EvidenceLinkage instances (UAI 2008 evaluation)

Time Bound: 3 hrs

Page 113: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.
Page 114: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Results on Marginals

• Evaluation Criteria

• Always bounded between 0 and 1• Lower Bounds the KL distance• When probabilities close to zero are present KL

distance may tend to infinity.

n

xAxP

cedisHellinger

xAeApproximatxPExactn

i Dxii

ii

ii

1

2)()(

21

tan

)(: )(:

Page 115: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Results: Posterior MarginalsLinkage instances (UAI 2006 evaluation)

Time Bound: 3 hrsDistance measure: Hellinger distance

Page 116: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.
Page 117: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Summary: SampleSearch• Manages rejection problem while sampling

– Systematic backtracking search• Sampling Distribution of SampleSearch is the backtrack-free

distribution QF

– Expensive to compute• Approximation of QF based on storing all traces that yields an

asymptotically unbiased estimator– Linear time and space overhead– Bound the sample mean from above and below

• Empirically, when a substantial number of zero probabilities are present, SampleSearch based schemes dominate their pure sampling counter-parts and Generalized Belief Propagation.

Page 118: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Overview

1. Probabilistic Reasoning/Graphical models2. Importance Sampling3. Markov Chain Monte Carlo: Gibbs Sampling4. Sampling in presence of Determinism 5. Rao-Blackwellisation6. AND/OR importance sampling

Page 119: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Sampling: Performance

• Gibbs sampling– Reduce dependence between samples

• Importance sampling– Reduce variance

• Achieve both by sampling a subset of variables and integrating out the rest (reduce dimensionality), aka Rao-Blackwellisation

• Exploit graph structure to manage the extra cost

119

Page 120: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Smaller Subset State-Space

• Smaller state-space is easier to cover

120

},,,{ 4321 XXXXX },{ 21 XXX

64)( XD 16)( XD

Page 121: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Smoother Distribution

121

0001

1011

0

0.05

0.1

0.15

0.2

0.25

0001

1011

P(X1,X2,X3,X4)00 01 10 11

01

0

0.05

0.1

0.15

0.2

0.25

0

1

P(X1,X2)0 1

Page 122: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Speeding Up Convergence• Mean Squared Error of the estimator:

PVarBIASPMSE QQ 2

22

][ˆ]ˆ[]ˆ[ PEPEPVarPMSE QQQQ

• Reduce variance speed up convergence !

• In case of unbiased estimator, BIAS=0

122

Page 123: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Rao-Blackwellisation

123

)}(~{]}|)([{)}({

)}(ˆ{

]}|)([{)}({

]}|)({var[]}|)([{)}({

]}|)([]|)([{1

)(~

)}()({1

)(ˆ

1

1

xgVarT

lxhEVar

T

xhVarxgVar

lxgEVarxgVar

lxgElxgEVarxgVar

lxhElxhET

xg

xhxhT

xg

LRX

T

T

Liu, Ch.2.3

Page 124: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Rao-Blackwellisation

• X=RL• Importance Sampling:

• Gibbs Sampling: – autocovariances are lower (less correlation

between samples)– if Xi and Xj are strongly correlated, Xi=0 Xj=0, only

include one fo them into a sampling set124

})(

)({}

),(

),({

RQ

RPVar

LRQ

LRPVar QQ

Liu, Ch.2.5.5

“Carry out analytical computation as much as possible” - Liu

Page 125: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Blocking Gibbs Sampler vs. Collapsed

• Standard Gibbs: (1)

• Blocking: (2)

• Collapsed: (3)

125

X Y

Z

),|(),,|(),,|( yxzPzxyPzyxP

)|,(),,|( xzyPzyxP

)|(),|( xyPyxP

FasterConvergence

Page 126: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Collapsed Gibbs SamplingGenerating Samples

Generate sample ct+1 from ct :

126

),\|(

),,...,,|(

...

),,...,,|(

),,...,,|(

1

11

12

11

1

31

121

22

3211

11

ecccPcC

eccccPcC

eccccPcC

eccccPcC

it

itii

tK

ttK

tKK

tK

ttt

tK

ttt

from sampled

In short, for i=1 to K:

Page 127: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Collapsed Gibbs Sampler

Input: C X, E=eOutput: T samples {ct }Fix evidence E=e, initialize c0 at random1. For t = 1 to T (compute samples)2. For i = 1 to N (loop through variables)3. ci

t+1 P(Ci | ct\ci)

4. End For5. End For

Page 128: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Calculation Time

• Computing P(ci| ct\ci,e) is more expensive (requires inference)

• Trading #samples for smaller variance:– generate more samples with higher covariance– generate fewer samples with lower covariance

• Must control the time spent computing sampling probabilities in order to be time-effective!

128

Page 129: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Exploiting Graph PropertiesRecall… computation time is exponential in the

adjusted induced width of a graph• w-cutset is a subset of variable s.t. when they

are observed, induced width of the graph is w• when sampled variables form a w-cutset ,

inference is exp(w) (e.g., using Bucket Tree Elimination)

• cycle-cutset is a special case of w-cutset

129

Sampling w-cutset w-cutset sampling!

Page 130: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

What If C=Cycle-Cutset ?

130

}{},{ 905

02

0 XE,xxc

X1

X7

X5 X4

X2

X9 X8

X3

X6

X1

X7

X4

X9 X8

X3

X6

P(x2,x5,x9) – can compute using Bucket Elimination

P(x2,x5,x9) – computation complexity is O(N)

Page 131: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Computing Transition Probabilities

131

),,1(:

),,0(:

932

932

xxxPBE

xxxPBE

X1

X7

X5 X4

X2

X9 X8

X3

X6

),,1()|1(

),,0()|0(

),,1(),,0(

93232

93232

932932

xxxPxxP

xxxPxxP

xxxPxxxP

Compute joint probabilities:

Normalize:

Page 132: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Cutset Sampling-Answering Queries

• Query: ci C, P(ci |e)=? same as Gibbs:

132

computed while generating sample tusing bucket tree elimination

compute after generating sample tusing bucket tree elimination

T

ti

tii ecccP

T|ecP

1

),\|(1

)(ˆ

T

t

tii ,ecxP

T|e)(xP

1

)|(1

• Query: xi X\C, P(xi |e)=?

Page 133: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Cutset Sampling vs. Cutset Conditioning

133

)|()|(

)()|(

)|(1

)(

)(

1

ecPc,exP

T

ccountc,exP

,ecxPT

|e)(xP

CDci

CDci

T

t

tii

• Cutset Conditioning

• Cutset Sampling

)|()|()(

ecPc,exP|e)P(xCDc

ii

Page 134: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Cutset Sampling Example

134

)(

)(

)(

3

1)|(

)(

)(

)(

9252

9152

9052

92

9252

32

9152

22

9052

12

,x| xxP

,x| xxP

,x| xxP

xxP

,x| xxP x

,x| xxP x

,x| xxP x

X1

X7

X6 X5 X4

X2

X9 X8

X3

Estimating P(x2|e) for sampling node X2 :

Sample 1

Sample 2

Sample 3

Page 135: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Cutset Sampling Example

135

),,|(},{

),,|(},{

),,|(},{

935

323

35

32

3

925

223

25

22

2

915

123

15

12

1

xxxxPxxc

xxxxPxxc

xxxxPxxc

Estimating P(x3 |e) for non-sampled node X3 :

X1

X7

X6 X5 X4

X2

X9 X8

X3

),,|(

),,|(

),,|(

3

1)|(

935

323

925

223

915

123

93

xxxxP

xxxxP

xxxxP

xxP

Page 136: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

CPCS54 Test Results

136

MSE vs. #samples (left) and time (right)

Ergodic, |X|=54, D(Xi)=2, |C|=15, |E|=3

Exact Time = 30 sec using Cutset Conditioning

CPCS54, n=54, |C|=15, |E|=3

0

0.001

0.002

0.003

0.004

0 1000 2000 3000 4000 5000

# samples

Cutset Gibbs

CPCS54, n=54, |C|=15, |E|=3

0

0.0002

0.0004

0.0006

0.0008

0 5 10 15 20 25

Time(sec)

Cutset Gibbs

Page 137: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

CPCS179 Test Results

137

MSE vs. #samples (left) and time (right) Non-Ergodic (1 deterministic CPT entry)|X| = 179, |C| = 8, 2<= D(Xi)<=4, |E| = 35

Exact Time = 122 sec using Cutset Conditioning

CPCS179, n=179, |C|=8, |E|=35

0

0.002

0.004

0.006

0.008

0.01

0.012

100 500 1000 2000 3000 4000

# samples

Cutset Gibbs

CPCS179, n=179, |C|=8, |E|=35

0

0.002

0.004

0.006

0.008

0.01

0.012

0 20 40 60 80

Time(sec)

Cutset Gibbs

Page 138: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

CPCS360b Test Results

138

MSE vs. #samples (left) and time (right)

Ergodic, |X| = 360, D(Xi)=2, |C| = 21, |E| = 36

Exact Time > 60 min using Cutset Conditioning

Exact Values obtained via Bucket Elimination

CPCS360b, n=360, |C|=21, |E|=36

0

0.00004

0.00008

0.00012

0.00016

0 200 400 600 800 1000

# samples

Cutset Gibbs

CPCS360b, n=360, |C|=21, |E|=36

0

0.00004

0.00008

0.00012

0.00016

1 2 3 5 10 20 30 40 50 60

Time(sec)

Cutset Gibbs

Page 139: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Random Networks

139

MSE vs. #samples (left) and time (right)

|X| = 100, D(Xi) =2,|C| = 13, |E| = 15-20

Exact Time = 30 sec using Cutset Conditioning

RANDOM, n=100, |C|=13, |E|=15-20

0

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

0 200 400 600 800 1000 1200

# samples

Cutset Gibbs

RANDOM, n=100, |C|=13, |E|=15-20

0

0.0002

0.0004

0.0006

0.0008

0.001

0 1 2 3 4 5 6 7 8 9 10 11

Time(sec)

Cutset Gibbs

Page 140: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Coding NetworksCutset Transforms Non-Ergodic Chain to Ergodic

140

MSE vs. time (right)

Non-Ergodic, |X| = 100, D(Xi)=2, |C| = 13-16, |E| = 50

Sample Ergodic Subspace U={U1, U2,…Uk}

Exact Time = 50 sec using Cutset Conditioning

x1 x2 x3 x4

u1 u2 u3 u4

p1 p2 p3 p4

y4y3y2y1

Coding Networks, n=100, |C|=12-14

0.001

0.01

0.1

0 10 20 30 40 50 60

Time(sec)

IBP Gibbs Cutset

Page 141: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Non-Ergodic Hailfinder

141

MSE vs. #samples (left) and time (right)

Non-Ergodic, |X| = 56, |C| = 5, 2 <=D(Xi) <=11, |E| = 0

Exact Time = 2 sec using Loop-Cutset Conditioning

HailFinder, n=56, |C|=5, |E|=1

0.0001

0.001

0.01

0.1

1

1 2 3 4 5 6 7 8 9 10

Time(sec)

Cutset Gibbs

HailFinder, n=56, |C|=5, |E|=1

0.0001

0.001

0.01

0.1

0 500 1000 1500

# samples

Cutset Gibbs

Page 142: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

CPCS360b - MSE

142

cpcs360b, N=360, |E|=[20-34], w*=20, MSE

0

0.000005

0.00001

0.000015

0.00002

0.000025

0 200 400 600 800 1000 1200 1400 1600

Time (sec)

Gibbs

IBP

|C|=26,fw=3

|C|=48,fw=2

MSE vs. Time

Ergodic, |X| = 360, |C| = 26, D(Xi)=2

Exact Time = 50 min using BTE

Page 143: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Cutset Importance Sampling

• Apply Importance Sampling over cutset C

T

t

tT

tt

t

wTcQ

ecP

TeP

11

1

)(

),(1)(ˆ

T

t

ttii wcc

TecP

1

),(1

)|(

T

t

ttii wecxP

TexP

1

),|(1

)|(

where P(ct,e) is computed using Bucket Elimination

(Gogate & Dechter, 2005) and (Bidyuk & Dechter, 2006)

Page 144: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Likelihood Cutset Weighting (LCS)• Z=Topological Order{C,E}• Generating sample t+1:

144

For End

If End

),...,|(

Else

If

:do For

11

11

1

1

ti

ti

ti

iiti

i

i

zzZPz

e ,zzz

E Z

ZZ • computed while generating sample tusing bucket tree elimination

• can be memoized for some number of instances K (based on memory available

KL[P(C|e), Q(C)] ≤ KL[P(X|e), Q(X)]

Page 145: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Pathfinder 1

145

Page 146: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Pathfinder 2

146

Page 147: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Link

147

Page 148: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Summary

• i.i.d. samples• Unbiased estimator• Generates samples fast

• Samples from Q• Reject samples with

zero-weight• Improves on cutset

• Dependent samples• Biased estimator• Generates samples

slower• Samples from `P(X|e)• Does not converge in

presence of constraints• Improves on cutset

148

Importance Sampling Gibbs Sampling

Page 149: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

CPCS360b

149

cpcs360b, N=360, |LC|=26, w*=21, |E|=15

1.E-05

1.E-04

1.E-03

1.E-02

0 2 4 6 8 10 12 14

Time (sec)

MS

ELW

AIS-BN

Gibbs

LCS

IBP

LW – likelihood weightingLCS – likelihood weighting on a cutset

Page 150: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

CPCS422b

150

1.0E-05

1.0E-04

1.0E-03

1.0E-02

0 10 20 30 40 50 60

MSE

Time (sec)

cpcs422b, N=422, |LC|=47, w*=22, |E|=28 LW

AIS-BN

Gibbs

LCS

IBP

LW – likelihood weightingLCS – likelihood weighting on a cutset

Page 151: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Coding Networks

151

coding, N=200, P=3, |LC|=26, w*=21

1.0E-05

1.0E-04

1.0E-03

1.0E-02

1.0E-01

0 2 4 6 8 10

Time (sec)

MS

ELWAIS-BNGibbsLCSIBP

LW – likelihood weightingLCS – likelihood weighting on a cutset

Page 152: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Overview

1. Probabilistic Reasoning/Graphical models2. Importance Sampling3. Markov Chain Monte Carlo: Gibbs Sampling4. Sampling in presence of Determinism 5. Rao-Blackwellisation6. AND/OR importance sampling

Page 153: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Motivation

153

Expected value of the number on the face of a die:

What is the expected value of the product of the numbers on the face of “k” dice?

536

654321.

k).( 53

Page 154: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Monte Carlo estimate

• Perform the following experiment N times.– Toss all the k dice.– Record the product of the numbers on the top

face of each die.• Report the average over the N runs.

N

i

k

jNZ

1 1

thth run N in the dice"j" theof face on thenumber the1

)(ˆ

Page 155: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

155

How the sample average converges?

10 dice. Exact Answer is (3.5)10

Page 156: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

But This is Really Dumb?

156

The dice are independent. A better Monte Carlo estimate

1. Perform the experiment N times2. For each dice record the average3. Take a product of the averages

• Conventional estimate: Averages of products.

k

j

N

inew N

Z1 1

thth run N in the dice"j" theof face on thenumber the1

)(ˆ

N

i

k

jNZ

1 1

thth run N in the dice"j" theof face on thenumber the1

)(ˆ

Page 157: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

157

How the sample Average Converges

Page 158: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

158

Moral of the story

• Make use of (conditional) independence to get better results

• Used for exact inference extensively– Bucket Elimination (Dechter, 1996)– Junction tree (Lauritzen and Speigelhalter, 1988)– Value Elimination (Bacchus et al. 2004)– Recursive Conditioning (Darwiche, 2001)– BTD (Jegou et al., 2002)– AND/OR search (Dechter and Mateescu, 2007)

• How to use it for sampling?– AND/OR Importance sampling

Page 159: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Background: AND/OR search space

B

A

C

{0,1,2}

{0,1,2}

{0,1,2}

Problem

A

B C

Pseudo-tree

A

B

CChain Pseudo-tree

A

0 1 2

B C B C B C

1 2 1 202 0 2 0 101

OR

AND

OR

AND

AND/OR Tree

A

0 1 2

B B B

1 2 20 0 1

C

1 2

C

1 2

C

0 2

C

0 2

C

0 1

C

0 1

OR Tree

159

Page 160: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

AND/OR sampling: Example

A

FD

CB

P(A)

P(B|A) P(C|A)

P(D|B) P(F|C)

)|()|()|()|()(),(,,

cfPbdPabPacPaPfdPcba

Page 161: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

161

AND/OR Importance Sampling (General Idea)

• Decompose ExpectationA

B C

Pseudo-tree

)|()|()(

)|()|()|()|()(

)|()|()()|()|()(

)|()|()|()|()(),(

,,

acQabQaQ

cfPbdPabPacPaPE

acQabQaQacQabQaQ

cfPbdPabPacPaPfdP

Q

cba

)|()|()|()|()(),(,,

cfPbdPabPacPaPfdPcba

)|()|()(),,( ACQABQAQCBAQ

Gogate and Dechter, UAI 2008, CP2008

Page 162: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

162

AND/OR Importance Sampling (General Idea)

• Decompose Expectation

A

B C

Pseudo-tree

)|()|()()|()|()(

)|()|()|()|()(),(

,,

acQabQaQacQabQaQ

cfPbdPabPacPaPfdP

cba

)|()|(

)|()|()|(

)|(

)|()|()(

)(

)(),( acQ

acQ

cfPacPabQ

abQ

bdPabPaQ

aQ

aPfdP

a cb

a

acQ

cfPacPEa

abQ

bdPabPE

aQ

aPEfdP QQQ |

)|(

)|()|(|

)|(

)|()|(

)(

)(),(

Page 163: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

163

• Compute all expectations separately• How?

– Record all samples– For each sample that has A=a

• Estimate the conditional expectations separately using the generated samples

• Combine the results

AND/OR Importance Sampling (General Idea)

A

B C

Pseudo-tree

a

acQ

cfPacPEa

abQ

bdPabPE

aQ

aPEfdP QQQ |

)|(

)|()|(|

)|(

)|()|(

)(

)(),(

Page 164: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

164

AND/OR Importance Sampling

A

0

B

2

1

1

C

0 1

B

21

C

0 1

Sample # A B C

1 0 1 0

2 0 2 1

3 1 1 1

4 1 2 0

2

02w(B0)A1,w(B

0A having B of samples of Weight Average

00

0 of Estimate

),

|)|(

)|()|(

A

AAbQ

bdPAbPE

A

B C

Pseudo-tree

Page 165: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

165

AND/OR Importance Sampling

All AND nodes: Separate Components. Take ProductOperator: Product

All OR nodes: Conditional Expectations given the assignment above it

Operator: Weighted Average

Sample # Z X Y

1 0 1 0

2 0 2 1

3 1 1 1

4 1 2 0

Gogate and Dechter, UAI 2008, CP2008

A

0

B

2

1

1

C

0 1

B

21

C

0 1

Avg Avg Avg

Avg

Avg

Page 166: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

166

Algorithm AND/OR Importance Sampling

1. Construct a pseudo-tree.2. Construct a proposal distribution along the pseudo-tree3. Generate samples x1, . . . ,xN from Q along O.

4. Build a AND/OR sample tree for the samples x1, . . . ,xN along the ordering O.

5. FOR all leaf nodes i of AND-OR tree do1. IF AND-node v(i)= 1 ELSE v(i)=0

6. FOR every node n from leaves to the root do1. IF AND-node v(n)=product of children2. IF OR-node v(n) = Average of children

7. Return v(root-node)

Gogate and Dechter, UAI 2008, CP2008

Page 167: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

167

# samples in AND/OR vs Conventional

• 8 Samples in AND/OR space versus 4 samples in importance sampling

• Example: A=0, B=2, C=0 is not generated but still considered in the AND/OR space

A

0

B

2

1

1

C

0 1

A

21

B

0 1

Sample # A B C

1 0 1 0

2 0 2 1

3 1 1 1

4 1 2 0

Gogate and Dechter, UAI 2008, CP2008

Page 168: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

168

Why AND/OR Importance Sampling

• AND/OR estimates have smaller variance.• Variance Reduction

– Easy to Prove for case of complete independence (Goodman, 1960)

– Complicated to prove for general conditional independence case (See Vibhav Gogate’s thesis)!

Gogate and Dechter, UAI 2008, CP2008

tindependen

tindependennot

2

22

22

,][][][][][][

][

,][][][][][][

][

N

yVxV

N

xEyV

N

yExVyxV

N

yVxV

N

xEyV

N

yExVxyV

Note the squared term.

Page 169: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

C

B D

A

C

B

D

A

{0,1,2}

{0,1,2}

{0,1,2}

{0,1,2}

C

0 1

B D B D

1 2 1 202 0 2

A

0

A

1

A

2

A

0

C

0 1

B D B D

1 2 1 202 0 2

A

2

A

0

A

0 1

AND/OR sample graph12 samples

AND/OR sample tree8 samples

AND/OR Graph sampling

Page 170: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

170

Combining AND/OR sampling and w-cutset sampling

• Reduce the variance of weights– Rao-Blackwellised w-cutset sampling (Bidyuk and

Dechter, 2007)• Increase the number of samples; kind of

– AND/OR Tree and Graph sampling (Gogate and Dechter, 2008)

• Combine the two

N

zwVarzw

NVarePVar QN

i

iQQ

)]([)()(ˆ

1

1

Page 171: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

171

Algorithm AND/OR w-cutset sampling

Given an integer constant w1. Partition the set of variables into K and R, such that the treewidth

of R is bounded by w.2. AND/OR sampling on K

1. Construct a pseudo-tree of K and compute Q(K) consistent with K2. Generate samples from Q(K) and store them on an AND/OR tree

3. Rao-Blackwellisation (Exact inference) at each leaf1. For each leaf node of the tree compute Z(R|g) where g is the assignment

from the leaf to the root.

4. Value computation: Recursively from the leaves to the root1. At each AND node compute product of values at children2. At each OR node compute a weighted average over the values at children

5. Return the value of the root node

Page 172: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

172

AND/OR w-cutset sampling:Step 1: Partition the set of variables

D

E G

F

C

A B

Graphical model

Practical constraint: Can only perform exact inference if the treewidth is bounded by 1.

Page 173: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

174

AND/OR w-cutset sampling:Step 2: AND/OR sampling over {A,B,C}

Pseudo-tree

C

A B

D

E G

F

C

A B

Graphical model

Page 174: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

175

Samples: (C=0,A=0,B=1), (C=0,A=1,B=1), (C=1,A=0,B=0), (C=1,A=1,B=0)

AND/OR w-cutset sampling:Step 2: AND/OR sampling over {A,B,C}

Pseudo-tree

C

A B

C

AA B B

0 1

0 1 0 1 01

Page 175: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

176

C

AA B B

0 1

0 1 0 1 01

Ff Gg

fBHfgHgBHgCH ),1(),(),1(),0(0)c|1v(B

:0C|1B of Value

AND/OR w-cutset sampling:Step 3: Exact inference at each leaf

Page 176: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

177

AND/OR w-cutset sampling:Step 4: Value computation

C

AA B B

0 1

0 1 0 1 01

Ff Gg

fBHfgHgBHgCH ),1(),(),1(),0(0)C|1v(B

:0Cgiven 1B of Value

0|,,ˆ

:0Cgiven B of Value

CGFBE

0|,,ˆ

:0Cgiven A of Value

CEDAE

Value of C: Estimate of the partition function

Page 177: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

178

Properties and Improvements

• Basic underlying scheme for sampling remains the same– The only thing that changes is what you estimate

from the samples– Can be combined with any state-of-the-art

importance sampling technique• Graph vs Tree sampling

– Take full advantage of the conditional independence properties uncovered from the primal graph

Page 178: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

179

AND/OR w-cutset samplingAdvantages and Disadvantages

• Advantages– Variance Reduction– Relatively fewer calls to the Rao-Blackwellisation step

due to efficient caching (Lazy Rao-Blackwellisation)– Dynamic Rao-Blackwellisation when context-specific or

logical dependencies are present• Particularly suitable for Markov logic networks (Richardson

and Domingos, 2006).

• Disadvantages– Increases time and space complexity and therefore

fewer samples may be generated.

Page 179: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

180

Take away Figure:Variance Hierarchy and Complexity

IS

w-cutset ISAND/OR Tree

IS

AND/OR w-cutset Tree IS

AND/OR Graph IS

AND/OR w-cutset Graph IS

O(nN)O(1) O(nN)

O(h)

O(nNexp(w))O(h+nexp(w))

O(nNexp(w))O(exp(w))

O(nNt)O(nN)

O(nNtexp(w))O(nN+nexp(w))

Page 180: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Experiments

• Benchmarks– Linkage analysis– Graph coloring

• Algorithms– OR tree sampling– AND/OR tree sampling– AND/OR graph sampling– w-cutset versions of the three schemes above

Page 181: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Results: Probability of EvidenceLinkage instances (UAI 2006 evaluation)

Time Bound: 1hr

Page 182: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.
Page 183: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Results: Probability of EvidenceLinkage instances (UAI 2008 evaluation)

Time Bound: 1hr

Page 184: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.
Page 185: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

Results: Solution countingGraph coloring instance

Time Bound: 1hr

Page 186: Sampling Techniques for Probabilistic and Deterministic Graphical models Bozhena Bidyuk Vibhav Gogate Rina Dechter.

187

Summary: AND/OR Importance sampling

• AND/OR sampling: A general scheme to exploit conditional independence in sampling

• Theoretical guarantees: lower sampling error than conventional sampling

• Variance reduction orthogonal to Rao-Blackwellised sampling.

• Better empirical performance than conventional sampling.