Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf ·...

51
Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling Methods Machine Learning Torsten Möller ©Möller/Mori 1

Transcript of Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf ·...

Page 1: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Sampling MethodsMachine Learning

Torsten Möller

©Möller/Mori 1

Page 2: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Recall – Inference For General Graphs

• Junction tree algorithm is an exact inference method forarbitrary graphs

• A particular tree structure defined over cliques of variables• Inference ends up being exponential in maximum clique size• Therefore slow in many cases

• Sampling methods: represent desired distribution with a setof samples, as more samples are used, obtain moreaccurate representation

©Möller/Mori 2

Page 3: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Outline

Sampling

Rejection Sampling

Importance Sampling

Markov Chain Monte Carlo

©Möller/Mori 3

Page 4: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Outline

Sampling

Rejection Sampling

Importance Sampling

Markov Chain Monte Carlo

©Möller/Mori 4

Page 5: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Sampling• The fundamental problem we address in this lecture is how

to obtain samples from a probability distribution p(z)• This could be a conditional distribution p(z|e)

• We often wish to evaluate expectations such as

E[f ] =∫

f(z)p(z)dz

• e.g. mean when f(z) = z

• For complicated p(z), this is difficult to do exactly,approximate as

f̂ =1

L

L∑l=1

f(z(l))

where {z(l)|l = 1, . . . , L} are independent samples from p(z)

©Möller/Mori 5

Page 6: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Sampling

p(z) f(z)

z

• Approximate

f̂ =1

L

L∑l=1

f(z(l))

where {z(l)|l = 1, . . . , L} are independent samples from p(z)

©Möller/Mori 6

Page 7: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Bayesian Networks - Generating Fair Samples

Cloudy

RainSprinkler

WetGrass

C

TF

.80

.20

P(R|C)C

TF

.10

.50

P(S|C)

S R

T TT FF TF F

.90

.90

.99

P(W|S,R)

P(C).50

.01

• How can we generate a fair set of samples from this BN?from Russell and Norvig, AIMA ©Möller/Mori 7

Page 8: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Sampling from Bayesian Networks

• Sampling from discrete Bayesian networks with noobservations is straight-forward, using ancestral sampling

• Bayesian network specifies factorization of joint distribution

P (z1, . . . , zn) =

n∏i=1

P (zi|pa(zi))

• Sample in-order, sample parents before children• Possible because graph is a DAG

• Choose value for zi from p(zi|pa(zi))

©Möller/Mori 8

Page 9: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Sampling From Empty Network – Example

Cloudy

RainSprinkler

WetGrass

C

TF

.80

.20

P(R|C)C

TF

.10

.50

P(S|C)

S R

T TT FF TF F

.90

.90

.99

P(W|S,R)

P(C).50

.01

from Russell and Norvig, AIMA©Möller/Mori 9

Page 10: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Sampling From Empty Network – Example

Cloudy

RainSprinkler

WetGrass

C

TF

.80

.20

P(R|C)C

TF

.10

.50

P(S|C)

S R

T TT FF TF F

.90

.90

.99

P(W|S,R)

P(C).50

.01

from Russell and Norvig, AIMA©Möller/Mori 10

Page 11: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Sampling From Empty Network – Example

Cloudy

RainSprinkler

WetGrass

C

TF

.80

.20

P(R|C)C

TF

.10

.50

P(S|C)

S R

T TT FF TF F

.90

.90

.99

P(W|S,R)

P(C).50

.01

from Russell and Norvig, AIMA©Möller/Mori 11

Page 12: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Sampling From Empty Network – Example

Cloudy

RainSprinkler

WetGrass

C

TF

.80

.20

P(R|C)C

TF

.10

.50

P(S|C)

S R

T TT FF TF F

.90

.90

.99

P(W|S,R)

P(C).50

.01

from Russell and Norvig, AIMA©Möller/Mori 12

Page 13: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Sampling From Empty Network – Example

Cloudy

RainSprinkler

WetGrass

C

TF

.80

.20

P(R|C)C

TF

.10

.50

P(S|C)

S R

T TT FF TF F

.90

.90

.99

P(W|S,R)

P(C).50

.01

from Russell and Norvig, AIMA©Möller/Mori 13

Page 14: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Sampling From Empty Network – Example

Cloudy

RainSprinkler

WetGrass

C

TF

.80

.20

P(R|C)C

TF

.10

.50

P(S|C)

S R

T TT FF TF F

.90

.90

.99

P(W|S,R)

P(C).50

.01

from Russell and Norvig, AIMA©Möller/Mori 14

Page 15: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Sampling From Empty Network – Example

Cloudy

RainSprinkler

WetGrass

C

TF

.80

.20

P(R|C)C

TF

.10

.50

P(S|C)

S R

T TT FF TF F

.90

.90

.99

P(W|S,R)

P(C).50

.01

from Russell and Norvig, AIMA©Möller/Mori 15

Page 16: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Ancestral Sampling

• This sampling procedure is fair, the fraction of samples with aparticular value tends towards the joint probability of thatvalue

• Define SPS(z1, . . . , zn) to be the probability of generating theevent (z1, . . . , zn)

• This is equal to p(z1, . . . , zn) due to the semantics of theBayesian network

• Define NPS(z1, . . . , zn) to be the number of times wegenerate the event (z1, . . . , zn)

limN→∞

NPS(z1, . . . , zn)

N= SPS(z1, . . . , zn) = p(z1, . . . , zn)

©Möller/Mori 16

Page 17: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Sampling Marginals

Cloudy

RainSprinkler

WetGrass

C

TF

.80

.20

P(R|C)C

TF

.10

.50

P(S|C)

S R

T TT FF TF F

.90

.90

.99

P(W|S,R)

P(C).50

.01

• Note that this procedure can be appliedto generate samples for marginals as well

• Simply discard portions of sample whichare not needed

• e.g. For marginal p(rain), sample(cloudy = t, sprinkler = f, rain =t, wg = t) just becomes (rain = t)

• Still a fair sampling procedure

©Möller/Mori 17

Page 18: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Sampling with Evidence

• What if we observe some values and want samples fromp(z|e)?

• Naive method, logic sampling:• Generate N samples from p(z) using ancestral sampling• Discard those samples that do not have correct evidence

values• e.g. For p(rain|cloudy = t, spr = t, wg = t), sample(cloudy = t, spr = f, rain = t, wg = t) discarded

• Generates fair samples, but wastes time• Many samples will be discarded for low p(e)

©Möller/Mori 18

Page 19: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Other Problems

• Continuous variables?• Gaussian okay, Box-Muller and other methods• More complex distributions?

• Undirected graphs (MRFs)?

©Möller/Mori 19

Page 20: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Outline

Sampling

Rejection Sampling

Importance Sampling

Markov Chain Monte Carlo

©Möller/Mori 20

Page 21: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Rejection Samplingp(z) f(z)

z

• Consider the case of an arbitrary, continuous p(z)

• How can we draw samples from it?• Assume we can evaluate p(z), up to some normalization

constantp(z) =

1

Zpp̃(z)

where p̃(z) can be efficiently evaluated (e.g. MRF)

©Möller/Mori 21

Page 22: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Proposal Distribution

z0 z

u0

kq(z0) kq(z)

p̃(z)

• Let’s also assume that we have some simpler distributionq(z) called a proposal distribution from which we can easilydraw samples

• e.g. q(z) is a Gaussian• We can then draw samples from q(z) and use these• But these wouldn’t be fair samples from p(z)?!

©Möller/Mori 22

Page 23: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Comparison Function and Rejection

z0 z

u0

kq(z0) kq(z)

p̃(z)

• Introduce constant k such that kq(z) ≥ p̃(z) for all z• Rejection sampling procedure:

• Generate z0 from q(z)• Generate u0 from [0, kq(z0)] uniformly• If u0 > p̃(z) reject sample z0, otherwise keep it

• Original samples are uniform in grey region• Kept samples uniform in white region – hence samples fromp(z)

©Möller/Mori 23

Page 24: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Rejection Sampling Analysis

• How likely are we to keep samples?• Probability a sample is accepted is:

p(accept) =

∫{p̃(z)/kq(z)}q(z)dz

=1

k

∫p̃(z)dz

• Smaller k is better subject to kq(z) ≥ p̃(z) for all z• If q(z) is similar to p̃(z), this is easier

• In high-dim spaces, acceptance ratio falls off exponentially• Finding a suitable k challenging

©Möller/Mori 24

Page 25: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Outline

Sampling

Rejection Sampling

Importance Sampling

Markov Chain Monte Carlo

©Möller/Mori 25

Page 26: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Discretization

• Importance sampling is a sampling technique for computingexpectations:

E[f ] =∫

f(z)p(z)dz

• Could approximate using discretization over a uniform grid:

E[f ] ≈L∑l=1

f(z(l))p(z(l))

• c.f. Riemannian sum• Much wasted computation, exponential scaling in dimension• Instead, again use a proposal distribution instead of a

uniform grid

©Möller/Mori 26

Page 27: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Importance samplingp(z) f(z)

z

q(z)

• Approximate expectation

E[f ] =

∫f(z)p(z)dz =

∫f(z)

p(z)

q(z)q(z)dz

≈ 1

L

L∑l=1

f(z(l))p(z(l))

q(z(l))

• Quantities p(z(l))/q(z(l)) are known as importance weights• Correct for use of wrong distribution q(z) in sampling

©Möller/Mori 27

Page 28: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Likelihood Weighted Sampling• Consider the case where we have evidence e and again

desire an expectation over p(x|e)• If we have a Bayesian network, we can use a particular type

of importance sampling called likelihood weighted sampling:• Perform ancestral sampling• If a variable zi is in the evidence set, set its value rather than

sampling• Importance weights are: ??

p(z(l))

q(z(l))= ?

p(z(l))

q(z(l))=

p(x, e)

p(e)

1∏zi ̸∈e p(zi|pai)

∝∏zi∈e

p(zi|pai)

©Möller/Mori 28

Page 29: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Likelihood Weighted Sampling• Consider the case where we have evidence e and again

desire an expectation over p(x|e)• If we have a Bayesian network, we can use a particular type

of importance sampling called likelihood weighted sampling:• Perform ancestral sampling• If a variable zi is in the evidence set, set its value rather than

sampling• Importance weights are:

??

p(z(l))

q(z(l))= ?

p(z(l))

q(z(l))=

p(x, e)

p(e)

1∏zi ̸∈e p(zi|pai)

∝∏zi∈e

p(zi|pai)

©Möller/Mori 29

Page 30: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Likelihood Weighted Sampling• Consider the case where we have evidence e and again

desire an expectation over p(x|e)• If we have a Bayesian network, we can use a particular type

of importance sampling called likelihood weighted sampling:• Perform ancestral sampling• If a variable zi is in the evidence set, set its value rather than

sampling• Importance weights are:

??

p(z(l))

q(z(l))= ?

p(z(l))

q(z(l))=

p(x, e)

p(e)

1∏zi ̸∈e p(zi|pai)

∝∏zi∈e

p(zi|pai)

©Möller/Mori 30

Page 31: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Likelihood Weighted Sampling Example

Cloudy

RainSprinkler

WetGrass

C

TF

.80

.20

P(R|C)C

TF

.10

.50

P(S|C)

S R

T TT FF TF F

.90

.90

.99

P(W|S,R)

P(C).50

.01

w = 1.0from Russell and Norvig, AIMA

©Möller/Mori 31

Page 32: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Likelihood Weighted Sampling Example

Cloudy

RainSprinkler

WetGrass

C

TF

.80

.20

P(R|C)C

TF

.10

.50

P(S|C)

S R

T TT FF TF F

.90

.90

.99

P(W|S,R)

P(C).50

.01

w = 1.0from Russell and Norvig, AIMA

©Möller/Mori 32

Page 33: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Likelihood Weighted Sampling Example

Cloudy

RainSprinkler

WetGrass

C

TF

.80

.20

P(R|C)C

TF

.10

.50

P(S|C)

S R

T TT FF TF F

.90

.90

.99

P(W|S,R)

P(C).50

.01

w = 1.0× 0.1from Russell and Norvig, AIMA

©Möller/Mori 33

Page 34: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Likelihood Weighted Sampling Example

Cloudy

RainSprinkler

WetGrass

C

TF

.80

.20

P(R|C)C

TF

.10

.50

P(S|C)

S R

T TT FF TF F

.90

.90

.99

P(W|S,R)

P(C).50

.01

w = 1.0× 0.1from Russell and Norvig, AIMA

©Möller/Mori 34

Page 35: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Likelihood Weighted Sampling Example

Cloudy

RainSprinkler

WetGrass

C

TF

.80

.20

P(R|C)C

TF

.10

.50

P(S|C)

S R

T TT FF TF F

.90

.90

.99

P(W|S,R)

P(C).50

.01

w = 1.0× 0.1× 0.99 = 0.099from Russell and Norvig, AIMA

©Möller/Mori 35

Page 36: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Sampling Importance Resampling

• Note that importance sampling, e.g. likelihood weightedsampling, gives approximation to expectation, not samples

• But samples can be obtained using these ideas• Sampling-importance-resampling uses a proposal

distribution q(z) to generate samples• Unlike rejection sampling, no parameter k is needed

©Möller/Mori 36

Page 37: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

SIR - Algorithm

• Sampling-importance-resampling algorithm has two stages• Sampling:

• Draw samples z(1), . . . , z(L) from proposal distribution q(z)

• Importance resampling:• Put weights on samples

wl =p̃(z(l))/q(z(l))∑

m p̃(z(m))/q(z(m))

• Draw samples from the discrete set z(1), . . . , z(L) accordingto weights wl (uniform distribution)

• This two stage process is correct in the limit as L → ∞

©Möller/Mori 37

Page 38: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Outline

Sampling

Rejection Sampling

Importance Sampling

Markov Chain Monte Carlo

©Möller/Mori 38

Page 39: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Markov Chain Monte Carlo

• Markov chain Monte Carlo (MCMC) methods also use aproposal distribution to generate samples from anotherdistribution

• Unlike the previous methods, we keep track of the samplesgenerated z(1), . . . , z(τ)

• The proposal distribution depends on the current state:q(z|z(τ))

• Intuitively, walking around in state space, each step dependsonly on the current state

©Möller/Mori 39

Page 40: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Metropolis Algorithm

• Simple algorithm for walking around in state space:• Draw sample z∗ ∼ q(z|z(τ))• Accept sample with probability

A(z∗,z(τ)) = min(1,

p̃(z∗)

p̃(z(τ))

)• If accepted z(τ+1) = z∗, else z(τ+1) = z(τ)

• Note that if z∗ is better than z(τ), it is always accepted• Every iteration produces a sample

• Though sometimes it’s the same as previous• Contrast with rejection sampling

• The basic Metropolis algorithm assumes the proposaldistribution is symmetric q(zA|zB) = q(zB|zA)

©Möller/Mori 40

Page 41: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Metropolis Algorithm

• Simple algorithm for walking around in state space:• Draw sample z∗ ∼ q(z|z(τ))• Accept sample with probability

A(z∗,z(τ)) = min(1,

p̃(z∗)

p̃(z(τ))

)• If accepted z(τ+1) = z∗, else z(τ+1) = z(τ)

• Note that if z∗ is better than z(τ), it is always accepted• Every iteration produces a sample

• Though sometimes it’s the same as previous• Contrast with rejection sampling

• The basic Metropolis algorithm assumes the proposaldistribution is symmetric q(zA|zB) = q(zB|zA)

©Möller/Mori 41

Page 42: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Metropolis Algorithm

• Simple algorithm for walking around in state space:• Draw sample z∗ ∼ q(z|z(τ))• Accept sample with probability

A(z∗,z(τ)) = min(1,

p̃(z∗)

p̃(z(τ))

)• If accepted z(τ+1) = z∗, else z(τ+1) = z(τ)

• Note that if z∗ is better than z(τ), it is always accepted• Every iteration produces a sample

• Though sometimes it’s the same as previous• Contrast with rejection sampling

• The basic Metropolis algorithm assumes the proposaldistribution is symmetric q(zA|zB) = q(zB|zA)

©Möller/Mori 42

Page 43: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Metropolis Algorithm

• Simple algorithm for walking around in state space:• Draw sample z∗ ∼ q(z|z(τ))• Accept sample with probability

A(z∗,z(τ)) = min(1,

p̃(z∗)

p̃(z(τ))

)• If accepted z(τ+1) = z∗, else z(τ+1) = z(τ)

• Note that if z∗ is better than z(τ), it is always accepted• Every iteration produces a sample

• Though sometimes it’s the same as previous• Contrast with rejection sampling

• The basic Metropolis algorithm assumes the proposaldistribution is symmetric q(zA|zB) = q(zB|zA)

©Möller/Mori 43

Page 44: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Metropolis Example

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

• p(z) is anisotropic Gaussian, proposal distribution q(z) isisotropic Gaussian

• Red lines show rejected moves, green lines show acceptedmoves

• As τ → ∞, distribution of z(τ) tends to p(z)• True if q(zA|zB) > 0• In practice, burn-in the chain, collect samples after some

iterations• Only keep every M th sample©Möller/Mori 44

Page 45: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Metropolis Example - Graphical Model

Cloudy

RainSprinkler

WetGrass

C

TF

.80

.20

P(R|C)C

TF

.10

.50

P(S|C)

S R

T TT FF TF F

.90

.90

.99

P(W|S,R)

P(C).50

.01

• Consider running Metropolis algorithm to draw samples fromp(cloudy, rain|spr = t, wg = t)

• Define q(z|zτ ) to be uniformly pick from cloudy, rain,uniformly reset its value

©Möller/Mori 45

Page 46: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Metropolis ExampleCloudy

RainSprinkler

WetGrass

Cloudy

RainSprinkler

WetGrass

Cloudy

RainSprinkler

WetGrass

Cloudy

RainSprinkler

WetGrass

• Walk around in this state space, keep track of how manytimes each state occurs

©Möller/Mori 46

Page 47: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Metropolis-Hastings Algorithm

• A generalization of the previous algorithm for asymmetricproposal distributions known as the Metropolis-Hastingsalgorithm

• Accept a step with probability

A(z∗, z(τ)) = min(1,

p̃(z∗)q(z(τ)|z∗)

p̃(z(τ))q(z∗|z(τ))

)

• A sufficient condition for this algorithm to produce thecorrect distribution is detailed balance

©Möller/Mori 47

Page 48: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Gibbs Sampling

• A simple coordinate-wise MCMC method• Given distribution p(z) = p(z1, . . . , zM ), sample each

variable (either in pre-defined or random order)• Sample z

(τ+1)1 ∼ p(z1|z(τ)2 , z

(τ)3 , . . . , z

(τ)M )

• Sample z(τ+1)2 ∼ p(z2|z(τ+1)

1 , z(τ)3 , . . . , z

(τ)M )

• . . .• Sample z

(τ+1)M ∼ p(zM |z(τ+1)

1 , z(τ+1)2 , . . . , z

(τ+1)M−1 )

• These are easy if Markov blanket is small, e.g. in MRF withsmall cliques, and forms amenable to sampling

©Möller/Mori 48

Page 49: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Gibbs Sampling - Example

z1

z2

L

l

©Möller/Mori 49

Page 50: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Gibbs Sampling Example - Graphical Model

Cloudy

RainSprinkler

WetGrass

C

TF

.80

.20

P(R|C)C

TF

.10

.50

P(S|C)

S R

T TT FF TF F

.90

.90

.99

P(W|S,R)

P(C).50

.01

• Consider running Gibbs sampling onp(cloudy, rain|spr = t, wg = t)

• q(z|zτ ): pick from cloudy, rain, reset its value according top(cloudy|rain, spr, wg) (or p(rain|cloudy, spr, wg))

• This is often easy – only need to look at Markov blanket©Möller/Mori 50

Page 51: Sampling Methods - Machine Learningvda.univie.ac.at/Teaching/ML/15s/LectureNotes/11_sampling.pdf · Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo Sampling

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Conclusion

• Readings: Ch. 11.1-11.3 (we skipped much of it)• Sampling methods use proposal distributions to obtain

samples from complicated distributions• Different methods, different methods of correcting for

proposal distribution not matching desired distribution• In practice, effectiveness relies on having good proposal

distribution, which matches desired distribution well

©Möller/Mori 51