Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara...

67
Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan MCMC Motivation Monte Carlo Principle and Sampling Methods MCMC Algorithms Applications Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California Institute of Technology April 20th, 2017 1 / 50

Transcript of Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara...

Page 1: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Markov Chain Monte Carlo for Machine Learning

Sara Beery, Natalie Bernat, and Eric Zhan

Advanced Topics in Machine LearningCalifornia Institute of Technology

April 20th, 2017

1 / 50

Page 2: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Table of Contents

1 MCMC Motivation

2 Monte Carlo Principle and Sampling Methods

3 MCMC Algorithms

4 Applications

2 / 50

Page 3: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

History of Monte Carlo methods

• Enrico Fermi used to calculate incredibly accuratepredictions using statistical sampling methods whenhe had insomnia, in order to impress his friends.

• Sam Ulam used sampling to approximate theprobability that a solitaire layout would be successfulafter the combinatorics proved too difficult.

• Both of these examples get at the heart of MonteCarlo methods: selecting a statistical sample toapproximate a hard combinatorial problem by amuch simpler problem.

• These ideas have been extended to fields acrossscience, spurred on by the development of computers

3 / 50

Page 4: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

History of Monte Carlo methods

• Enrico Fermi used to calculate incredibly accuratepredictions using statistical sampling methods whenhe had insomnia, in order to impress his friends.

• Sam Ulam used sampling to approximate theprobability that a solitaire layout would be successfulafter the combinatorics proved too difficult.

• Both of these examples get at the heart of MonteCarlo methods: selecting a statistical sample toapproximate a hard combinatorial problem by amuch simpler problem.

• These ideas have been extended to fields acrossscience, spurred on by the development of computers

3 / 50

Page 5: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

History of Monte Carlo methods

• Enrico Fermi used to calculate incredibly accuratepredictions using statistical sampling methods whenhe had insomnia, in order to impress his friends.

• Sam Ulam used sampling to approximate theprobability that a solitaire layout would be successfulafter the combinatorics proved too difficult.

• Both of these examples get at the heart of MonteCarlo methods: selecting a statistical sample toapproximate a hard combinatorial problem by amuch simpler problem.

• These ideas have been extended to fields acrossscience, spurred on by the development of computers

3 / 50

Page 6: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

History of Monte Carlo methods

• Enrico Fermi used to calculate incredibly accuratepredictions using statistical sampling methods whenhe had insomnia, in order to impress his friends.

• Sam Ulam used sampling to approximate theprobability that a solitaire layout would be successfulafter the combinatorics proved too difficult.

• Both of these examples get at the heart of MonteCarlo methods: selecting a statistical sample toapproximate a hard combinatorial problem by amuch simpler problem.

• These ideas have been extended to fields acrossscience, spurred on by the development of computers

3 / 50

Page 7: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Table of Contents

1 MCMC Motivation

2 Monte Carlo Principle and Sampling Methods

3 MCMC Algorithms

4 Applications

4 / 50

Page 8: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Bayesian Inference and Learning

We often need to solve the following intractable problems in Bayesian Statistics,given unknown variables x ∈ X and data y ∈ Y :

• Normalization: we need to compute the normalizing factor∫X p(y |x ′)p(x ′)dx ′ in Bayes’ theorem in order to compute the posteriorp(x |y) given p(x) and p(y |x)

• Marginalization: Computing the marginal posterior p(x |y) =∫Z p(x , z |y)dz

given the joint posterior of (x , z) ∈ X × Z

• Expectation: Obtain the expectation Ep(x |y)(f (x)) =∫Z f (x)p(x |y)dx for

some function f : X → Rnf , such as the conditional mean or the conditionalcovariance

5 / 50

Page 9: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Bayesian Inference and Learning

We often need to solve the following intractable problems in Bayesian Statistics,given unknown variables x ∈ X and data y ∈ Y :

• Normalization: we need to compute the normalizing factor∫X p(y |x ′)p(x ′)dx ′ in Bayes’ theorem in order to compute the posteriorp(x |y) given p(x) and p(y |x)

• Marginalization: Computing the marginal posterior p(x |y) =∫Z p(x , z |y)dz

given the joint posterior of (x , z) ∈ X × Z

• Expectation: Obtain the expectation Ep(x |y)(f (x)) =∫Z f (x)p(x |y)dx for

some function f : X → Rnf , such as the conditional mean or the conditionalcovariance

5 / 50

Page 10: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Bayesian Inference and Learning

We often need to solve the following intractable problems in Bayesian Statistics,given unknown variables x ∈ X and data y ∈ Y :

• Normalization: we need to compute the normalizing factor∫X p(y |x ′)p(x ′)dx ′ in Bayes’ theorem in order to compute the posteriorp(x |y) given p(x) and p(y |x)

• Marginalization: Computing the marginal posterior p(x |y) =∫Z p(x , z |y)dz

given the joint posterior of (x , z) ∈ X × Z

• Expectation: Obtain the expectation Ep(x |y)(f (x)) =∫Z f (x)p(x |y)dx for

some function f : X → Rnf , such as the conditional mean or the conditionalcovariance

5 / 50

Page 11: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Other problems that can be approximated with MCMC

• Statistical mechanics: Computing the partition function of a system caninvolve a large sum over possible configurations

• Optimization: Finding the optimal solution over a large feasible set

• Penalized likelihood model selection: Avoiding wasting computingresources on computing maximum likelihood estimates over a large initial setof models

6 / 50

Page 12: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Table of Contents

1 MCMC Motivation

2 Monte Carlo Principle and Sampling Methods

3 MCMC Algorithms

4 Applications

7 / 50

Page 13: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Introduction to Monte Carlo

We first draw N i.i.d. samples x (i)Ni=1 from a target density p(x) on a

high-dimensional space X . We can use these N samples to estimate the targetdensity p(x) using a point-mass function:

pN(x) =1

N

N∑i=1

δx(i)(x) (1)

where δx(i)(x) is the Dirac-delta mass located at x(i)

8 / 50

Page 14: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Introduction to Monte Carlo

This allows us to approximate integrals or large sums I (f ) with tractable, smallersums IN(f ). These converge by the strong law of large numbers:

IN(f ) =1

N

N∑i=1

f (x (i))a.s.−→

N→∞I (f ) =

∫Xf (x)p(x)dx (2)

9 / 50

Page 15: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Introduction to Monte Carlo

• Note that IN(f ) is unbiased.

• If the variance of f (x) satisfies σ2f (x) = Ep(x)(f2(x))− I 2(f ) <∞, then

var(IN(f )) =σ2f (x)N

• Bounded variance of f also guarantees convergence in distribution of theerror via the central limit theorem:

√N(IN(f )− I (f )) =⇒

N→∞N (0, σ2f (x)) (3)

10 / 50

Page 16: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Inverse Transform Sampling

How do we sample from “easy-to-sample” distributions?

Suppose we want to sample from a distribution f with CDF F .

• Sample u ∼ Uniform[0, 1]

• Compute x = F−1(u)(This gives us the largest x : P(−∞ < f < x) <= u)

• Then x will be sampled according to f

11 / 50

Page 17: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Inverse Transform Sampling

Example: N (0, 1)

But for more complicated distributions, we may not have an explicit formula forF−1(u). In these cases we can consider other sampling methods.

12 / 50

Page 18: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Rejection Sampling

Main idea:

• It can be difficult to sample from complicated distributions

• Instead, we will sample from a simple distribution that contains ourcomplicated distribution (an upper bound), and reject any sample that is notwithin the complicated distribution

• If we take enough samples, this will give us a good approximation under theMonte Carlo principle

13 / 50

Page 19: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Rejection Sampling: Sampling from a density

This is how we sample if we can compute the inverse CDF

14 / 50

Page 20: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Rejection Sampling: Sampling from a density

And this is a good way to approximate if you can’t

15 / 50

Page 21: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Rejection Sampling: Sampling from a density

16 / 50

Page 22: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Rejection Sampling: Drawbacks

Consider approximating the area of the unit sphere by sampling uniformly fromthe simplex and rejecting any sample not inside the sphere. In low dimensions,this is fine, but as the number of dimensions gets large, the volume of the unitsphere goes to zero whereas the volume of the simplex stays one, resulting in anintractable number of rejected samples.

Rejection sampling is impractical in high dimensions!17 / 50

Page 23: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Importance Sampling

• Importance sampling is used to estimate properties of a particular distributionof interest. Essentially we are transforming a difficult integral into anexpectation over a simpler proposal distribution.

• Relies on drawing samples from a proposal distribution in order to developthe approximation

• Convergence time depends heavily on an appropriate choice of proposaldistribution.

18 / 50

Page 24: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Importance Sampling: visual example

19 / 50

Page 25: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Importance Sampling

If we choose proposal distribution q(x) such that supp(p(x)) ⊂ supp(q(x)) thenwe can write

I (f ) =

∫f (x)w(x)dx (4)

where w(x) = p(x)q(x) is known as the importance weight

20 / 50

Page 26: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Importance Sampling

If we can simulate N i.i.d. samples x (i)Ni+1 from q(x) and evaluate w(x (i)) then

we can approximate I (f ) as

IN(f ) =N∑i=1

f (x (i))w(x (i)) (5)

which converges to I (f ) by the strong law of large numbers

21 / 50

Page 27: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Importance Sampling

• Choosing an appropriate q(x) is very important. In importance sampling, wecan choose an optimal q by finding one that will minimize the variance ofIN(f )

• By sampling from areas of high ”importance” we can achieve very highsampling efficiency

• It can be hard to find a candidate q in high dimensions. One strategy is touse adaptive importance sampling which parametrizes q.

22 / 50

Page 28: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Table of Contents

1 MCMC Motivation

2 Monte Carlo Principle and Sampling Methods

3 MCMC Algorithms

4 Applications

23 / 50

Page 29: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Markov Chain Monte Carlo

Main Idea:

• Given a target distribution p, construct a Markov chain over the samplespace χ with invariant distribution equal to p.

• Perform a random walk and a sample at each iteration.

• Use the sample histogram to approximate the target distribution p.

• This is called the Metropolis-Hastings algorithm.

Two important things to note for MCMC:

• We assume we can evaluate p(x) for any x ∈ χ.

• We only need to know p(x) up to a normalizing constant.

24 / 50

Page 30: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Markov Chain Monte Carlo

Main Idea:

• Given a target distribution p, construct a Markov chain over the samplespace χ with invariant distribution equal to p.

• Perform a random walk and a sample at each iteration.

• Use the sample histogram to approximate the target distribution p.

• This is called the Metropolis-Hastings algorithm.

Two important things to note for MCMC:

• We assume we can evaluate p(x) for any x ∈ χ.

• We only need to know p(x) up to a normalizing constant.

24 / 50

Page 31: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Review: Markov chains

Suppose a stochastic process x (i) can only take n values in χ = {x1, x2, ..., xn}.

Then x (i) is a Markov chain if Pr(x (i) | x (i−1), ..., x (1)) = T (x (i) | x (i−1)).

Transition matrix T is n × n with row-sum 1 (aka stochastic matrix).

• Tij = T (xj | xi ) “transition probability from state xi to state xj”

A row vector π is a stationary distribution if πT = π.

A row vector π is a (unique) invariant/limiting distribution if limn→∞ µTn = π

for all initial distributions µ.

An invariant distribution is also a stationary distribution.

25 / 50

Page 32: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Review: Markov chains

Theorem: A Markov chain has an invariant distribution if it is:

• Irreducible: There is a positive probability to get from any state to all otherstates.

• Aperiodic: The gcd of the lengths of all path from a state to itself is 1, forall states xi : gcd{n > 0 : Pr(x (n) = xi | x (0) = xi ) > 0}

If we construct an irreducible and aperiodic Markov chain T such that pT = p(i.e. p is a stationary distribution), then p is necessarily the invariant distribution.

This is the key behind the Metropolis-Hastings algorithm.

26 / 50

Page 33: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Review: Markov chains

Theorem: A Markov chain has an invariant distribution if it is:

• Irreducible: There is a positive probability to get from any state to all otherstates.

• Aperiodic: The gcd of the lengths of all path from a state to itself is 1, forall states xi : gcd{n > 0 : Pr(x (n) = xi | x (0) = xi ) > 0}

If we construct an irreducible and aperiodic Markov chain T such that pT = p(i.e. p is a stationary distribution), then p is necessarily the invariant distribution.

This is the key behind the Metropolis-Hastings algorithm.

26 / 50

Page 34: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Reversibility / Detailed Balance Condition

How can we ensure that p is a stationary distribution of T?

Check if T satisfies the reversibility / detailed balance condition:

p(x (i+1))T (x (i) | x (i+1)) = p(x (i))T (x (i+1) | x (i)) (6)

Summing over x (i) on both sides gives us:

p(x (i+1)) =∑x(i)

p(x (i))T (x (i+1) | x (i)) (7)

This is equivalent to pT = p.

27 / 50

Page 35: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Reversibility / Detailed Balance Condition

How can we ensure that p is a stationary distribution of T?

Check if T satisfies the reversibility / detailed balance condition:

p(x (i+1))T (x (i) | x (i+1)) = p(x (i))T (x (i+1) | x (i)) (6)

Summing over x (i) on both sides gives us:

p(x (i+1)) =∑x(i)

p(x (i))T (x (i+1) | x (i)) (7)

This is equivalent to pT = p.

27 / 50

Page 36: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Discrete vs. Continuous Markov chains

Transition matrix T becomes an integral/transition kernel K .

Detailed balance condition is the same:

p(x (i+1))K (x (i) | x (i+1)) = p(x (i))K (x (i+1) | x (i)) (8)

Integral instead of summation:

p(x (i+1)) =

∫χp(x (i))K (x (i+1) | x (i))dx (i) (9)

28 / 50

Page 37: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Metropolis-Hastings Algorithm

Initialize x (0)

For i from 0 to N − 1:

• Sample u ∼ Uniform[0, 1]

• Sample x∗ ∼ q(x∗ | x (i))

• If u < A(x (i), x∗) = min{

1, p(x∗)q(x(i)|x∗)p(x(i))q(x∗|x(i))

}:

x (i+1) = x∗

Else:x (i+1) = x (i)

q is called the proposal distribution and should be easy-to-sample.A(x (i), x∗) is called the acceptance probability of moving from x (i) to x∗.

29 / 50

Page 38: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Metropolis-Hastings Algorithm

30 / 50

Page 39: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Example of MH algorithm

p(x) ∝ 0.1 exp(−0.2x2) + 0.7 exp(−0.2(x − 10)2) bimodal distributionq(x∗ | x (i)) = N (x (i), 100)

31 / 50

Page 40: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Verify: detailed balance condition

The transition kernel for the MH algorithm is:

KMH(x (i+1) | x (i)) = q(x (i+1) | x (i)) · A(x (i), x (i+1)) + δx(i)(x(i+1)) · r(x (i)) (10)

where r(x (i)) =∫χ q(x∗ | x (i))(1− A(x (i), x∗))dx∗ is the “rejection” term

We can check that KMH satisfies the detailed balance condition:

p(x (i+1))KMH(x (i) | x (i+1)) = p(x (i))KMH(x (i+1) | x (i)) (11)

32 / 50

Page 41: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Verify: detailed balance condition

The transition kernel for the MH algorithm is:

KMH(x (i+1) | x (i)) = q(x (i+1) | x (i)) · A(x (i), x (i+1)) + δx(i)(x(i+1)) · r(x (i)) (10)

where r(x (i)) =∫χ q(x∗ | x (i))(1− A(x (i), x∗))dx∗ is the “rejection” term

We can check that KMH satisfies the detailed balance condition:

p(x (i+1))KMH(x (i) | x (i+1)) = p(x (i))KMH(x (i+1) | x (i)) (11)

32 / 50

Page 42: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Verify: irreducibility and aperiodicity

Irreducibility holds as long as the support of q includes the support of p.

Aperiodicity holds by construction. By allowing for rejection, we are creatingself-loops in our Markov chain.

33 / 50

Page 43: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Disadvantages of MH algorithm

Samples are not independent• Nearby samples are correlated.• Can use “thinning” by sampling every d steps, but this is often unnecessary.• Just take more samples.• Theoretically, the sample histogram is an unbiased estimator for p.

34 / 50

Page 44: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Disadvantages of MH algorithm

Initial samples are biased• Starting state not chosen according to p, meaning the random walk might

start in a low density region of p.• We can employ a burn-in period, where the first few samples are discarded.• How many samples to discard depends on the mixing time of the Markov

chain.

35 / 50

Page 45: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Convergence of Random Walk

How many time-steps do we need to converge to p?

• Theoretically, infinite.

• But being “close” is often a good enough approximation.

Total Variation norm:

• Finite case: δ(q, p) = 12

∑x∈χ |q(x)− p(x)|

• General case: δ(q, p) = supA⊆χ |q(A)− p(A)|

ε-mixing time: τx(ε) = min{t : δ(K(t′)x , p) ≤ ε, ∀t ′ > t}

Mixing times of a Markov chain can be bounded by its second eigenvalue and itsconductance (measure of connectivity).

CS139 Advanced Algorithms covers this in great detail.

36 / 50

Page 46: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Convergence of Random Walk

How many time-steps do we need to converge to p?

• Theoretically, infinite.

• But being “close” is often a good enough approximation.

Total Variation norm:

• Finite case: δ(q, p) = 12

∑x∈χ |q(x)− p(x)|

• General case: δ(q, p) = supA⊆χ |q(A)− p(A)|

ε-mixing time: τx(ε) = min{t : δ(K(t′)x , p) ≤ ε, ∀t ′ > t}

Mixing times of a Markov chain can be bounded by its second eigenvalue and itsconductance (measure of connectivity).

CS139 Advanced Algorithms covers this in great detail.

36 / 50

Page 47: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Convergence of Random Walk

How many time-steps do we need to converge to p?

• Theoretically, infinite.

• But being “close” is often a good enough approximation.

Total Variation norm:

• Finite case: δ(q, p) = 12

∑x∈χ |q(x)− p(x)|

• General case: δ(q, p) = supA⊆χ |q(A)− p(A)|

ε-mixing time: τx(ε) = min{t : δ(K(t′)x , p) ≤ ε, ∀t ′ > t}

Mixing times of a Markov chain can be bounded by its second eigenvalue and itsconductance (measure of connectivity).

CS139 Advanced Algorithms covers this in great detail.36 / 50

Page 48: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Other MCMC algorithms

Many MCMC algorithms are just special cases of the MH algorithm.

Independent sampler

• Assume q(x∗ | x (i)) = q(x∗), independent of current state

• A(x (i), x∗) = min{

1, p(x∗)q(x(i))

p(x(i))q(x∗)

}= min

{1, w(x∗)

w(x(i))

}Metropolis algorithm

• Assume q(x∗ | x (i)) = q(x (i) | x∗), symmetric random walk

• A(x (i), x∗) = min{

1, p(x∗)p(x(i))

}

37 / 50

Page 49: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Other MCMC algorithms

Simulated Annealing

Monte Carlo EM

Hybrid Monte Carlo

Slice Sampler

Reversible jump MCMC

Gibbs Sampler

38 / 50

Page 50: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Gibbs Sampler

Assume each x is an n-dimensional vector and we know the full conditionalprobabilities: p(xj | x1, ..., xj−1, xj+1, ..., xn) = p(xj | x−j). Furthermore, assumethe full conditional probabilities are easy-to-sample.

Then we can choose proposal distribution for each j :

q(x∗ | x (i)) =

{p(x∗j | x

(i)−j ) if x∗−j = x

(i)−j , meaning all other coordinates match

0 otherwise

With acceptance probability:

A(x (i), x∗) = min{

1,p(x∗)p(x

(i)j |x

(i)−j )

p(x(i))p(x∗j |x(i)−j )

}= min

{1,

p(x∗−j )

p(x(i)−j )

}= 1

39 / 50

Page 51: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Gibbs Sampler

At each iteration, we are changing one coordinate of our current state. We canchoose which coordinate to change randomly, or we can go in order.

For directed graphical models (aka Bayesian networks), we can write the fullconditional probability of xj in terms of its Markov blanket:

p(xj | x−j) = p(xj | xparent(j))∏

k∈children(j)

p(xk | xparent(k)) (12)

Recall: the Markov blanket MB(xj) of xj is the minimal set of nodes such that xjis independent from the rest of the graph if MB(xj) is observed. For directedgraphical models, MB(xj) = {parents(xj), children(xj), parents(children(xj))}.

40 / 50

Page 52: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Gibbs Sampler

At each iteration, we are changing one coordinate of our current state. We canchoose which coordinate to change randomly, or we can go in order.

For directed graphical models (aka Bayesian networks), we can write the fullconditional probability of xj in terms of its Markov blanket:

p(xj | x−j) = p(xj | xparent(j))∏

k∈children(j)

p(xk | xparent(k)) (12)

Recall: the Markov blanket MB(xj) of xj is the minimal set of nodes such that xjis independent from the rest of the graph if MB(xj) is observed. For directedgraphical models, MB(xj) = {parents(xj), children(xj), parents(children(xj))}.

40 / 50

Page 53: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Table of Contents

1 MCMC Motivation

2 Monte Carlo Principle and Sampling Methods

3 MCMC Algorithms

4 Applications

41 / 50

Page 54: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Sequential MC and Particle Filters

Sequential MC methods provide online approximations of probability distributionsfor inherently sequential data

• Assumptions:• Initial distribution: p(x0)• Dynamic model: p(xt |x0:t−1, y1:t−1) (for t≥1)• Measurement model: p(yt |x0:t , y1:t−1) (for t≥1)

• Aim: estimate the posterior p(x0:t |y1:t), including the filtering distribution,

p(xt |y1:t), and sample N particles {x (i)0:t}Ni=1 from that distribution

• Method: Generate a weighted set of paths {x̃ (i)0:t}Ni=1, using an importancesampling framework with appropriate proposal distribution q(x̃0:t |y1:t)

42 / 50

Page 55: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Sequential MC and Particle Filters

Sequential MC methods provide online approximations of probability distributionsfor inherently sequential data

• Assumptions:• Initial distribution: p(x0)• Dynamic model: p(xt |x0:t−1, y1:t−1) (for t≥1)• Measurement model: p(yt |x0:t , y1:t−1) (for t≥1)

• Aim: estimate the posterior p(x0:t |y1:t), including the filtering distribution,

p(xt |y1:t), and sample N particles {x (i)0:t}Ni=1 from that distribution

• Method: Generate a weighted set of paths {x̃ (i)0:t}Ni=1, using an importancesampling framework with appropriate proposal distribution q(x̃0:t |y1:t)

42 / 50

Page 56: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Sequential MC and Particle Filters

Sequential MC methods provide online approximations of probability distributionsfor inherently sequential data

• Assumptions:• Initial distribution: p(x0)• Dynamic model: p(xt |x0:t−1, y1:t−1) (for t≥1)• Measurement model: p(yt |x0:t , y1:t−1) (for t≥1)

• Aim: estimate the posterior p(x0:t |y1:t), including the filtering distribution,

p(xt |y1:t), and sample N particles {x (i)0:t}Ni=1 from that distribution

• Method: Generate a weighted set of paths {x̃ (i)0:t}Ni=1, using an importancesampling framework with appropriate proposal distribution q(x̃0:t |y1:t)

42 / 50

Page 57: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Sequential MC and Particle Filters

Sequential MC methods provide online approximations of probability distributionsfor inherently sequential data

• Assumptions:• Initial distribution: p(x0)• Dynamic model: p(xt |x0:t−1, y1:t−1) (for t≥1)• Measurement model: p(yt |x0:t , y1:t−1) (for t≥1)

• Aim: estimate the posterior p(x0:t |y1:t), including the filtering distribution,

p(xt |y1:t), and sample N particles {x (i)0:t}Ni=1 from that distribution

• Method: Generate a weighted set of paths {x̃ (i)0:t}Ni=1, using an importancesampling framework with appropriate proposal distribution q(x̃0:t |y1:t)

42 / 50

Page 58: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Sequential MC and Particle Filters

Here, q(x̃0:t |y1:t) = p(x0:t−1|y1:t−1)q(x̃ |x0:t−1, y1:t)

43 / 50

Page 59: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Vision Application

Tracking Multiple Interacting Targets with an MCMC-based Particle Filter

• Aim: Generate sequence of statesXit , for the i’th ant at frame t

• Idea: Use an independent particlefilter for each ant

44 / 50

Page 60: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Vision Application

Tracking Multiple Interacting Targets with an MCMC-based Particle Filter

• Particle Filter Assumptions:• Motion Model: Normally

distributed, centered at locationin previous frame Xt−1

• Measurement Model:Image-comparison function({how ant-like} - {howbackground-like})

45 / 50

Page 61: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Vision Application

Independent Particle Filters

• P(Xit |Z t−1) ≈ {X (r)i ,t−1, π

(r)i ,t−1}Nr=1

• Proposal distribution: q(Xit) =∑

r π(r)i ,t−1P(Xi ,t |X

(r)i ,t−1)

• where π(r)i ,t−1 = P(Zit |X

(r)it )

46 / 50

Page 62: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Vision Application

Independent Particle Filters

• P(Xit |Z t−1) ≈ {X (r)i ,t−1, π

(r)i ,t−1}Nr=1

• Proposal distribution: q(Xit) =∑

r π(r)i ,t−1P(Xi ,t |X

(r)i ,t−1)

• where π(r)i ,t−1 = P(Zit |X

(r)it )

46 / 50

Page 63: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Vision Application

Independent Particle Filters

• P(Xit |Z t−1) ≈ {X (r)i ,t−1, π

(r)i ,t−1}Nr=1

• Proposal distribution: q(Xit) =∑

r π(r)i ,t−1P(Xi ,t |X

(r)i ,t−1)

• where π(r)i ,t−1 = P(Zit |X

(r)it )

46 / 50

Page 64: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Vision Application

Markov Random Field (MRF) Model & the Joint MRF Particle Filter

• Modify motion model to include pairwise interactions:• P(Xt |Xt−1) ∼

∏i P(Xit |Xi,t−1)

∏ij∈E ψ(Xit ,Xjt)

• Proposal distribution (same as before!):

• q(Xt) =∑

r π(r)t−1

∏i P(Xi,t |X (r)

i,t−1)

• but now π(r)t−1 =

∏ni=1 P(Zit |X (r)

it )∏

ij∈E ψ(X(r)i,t ,X

rj,t)

47 / 50

Page 65: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Vision Application

MCMC-based Particle Filter

• Because of pairwise interactions, Joint MRF Particle Filter scales badly withnumber of targets (ants)

• Intractable instance of importance sampling... smells like a good time to usean MCMC algorithm!

48 / 50

Page 66: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

Vision Application

Tracking Multiple Interacting Targets with an MCMC-based Particle Filter

• Use Metropolis-Hastingsinstead of importancesampling

• Proposal Density: Simply theMotion Model

49 / 50

Page 67: Markov Chain Monte Carlo for Machine Learning · Markov Chain Monte Carlo for Machine Learning Sara Beery, Natalie Bernat, and Eric Zhan Advanced Topics in Machine Learning California

Markov ChainMonte Carlo for

MachineLearning

Sara Beery,Natalie Bernat,and Eric Zhan

MCMCMotivation

Monte CarloPrinciple andSamplingMethods

MCMCAlgorithms

Applications

References

• http://www.cs.ubc.ca/~arnaud/andrieu_defreitas_doucet_jordan_

intromontecarlomachinelearning.pdf

• https://smartech.gatech.edu/bitstream/handle/1853/3246/03-35.

pdf?sequence=1&isAllowed=y

• https://media.nips.cc/Conferences/2015/tutorialslides/

nips-2015-monte-carlo-tutorial.pdf

• https://en.wikipedia.org/wiki/Inverse_transform_sampling

• http://www.mdpi.com/1996-1073/8/6/5538/htm

50 / 50