MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century
description
Transcript of MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century
![Page 1: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/1.jpg)
Markov Chain Monte Carlo (MCMC)
Presented by:
Monzur MorshedHabibur Rahman
TigerHATSwww.tigerhats.org
![Page 2: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/2.jpg)
The International Research group dedicated to Theories, Simulation and Modeling, New Approaches, Applications, Experiences, Development, Evaluations, Education, Human, Cultural and Industrial Technology
TigerHATS - Information is power
![Page 3: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/3.jpg)
• Markov Chain Monte Carlo:Markov Chain Process + Monte Carlo Integration
• MCMC: A way for random sampling method
• Markov Chain Monte Carlo (MCMC) method isconsidered to be one of the top ten algorithms of the 20thcentury
• The goal of MCMC is to sample x with a probabilityproportional to the distribution function π(x)
Markov Chain Monte Carlo (MCMC)
![Page 4: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/4.jpg)
Markov Chain Monte Carlo
Markov Chain Monte Carlo methods generate a Markov chain of points that converges to a distribution of interest.
“Monte Carlo” : The methods employ randomness.
![Page 5: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/5.jpg)
Markov Chain Monte Carlo (MCMC)
The basic idea of MCMC is:
• To construct a Markov chain such that:• Have the parameters as the state space, and • the stationary distribution is the posterior
probability distribution of the parameters• Simulate the chain• Treat the realization as a sample from the
posterior probability distribution
MCMC = sampling + continue search
![Page 6: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/6.jpg)
Markov Chain Monte Carlo (MCMC)
• What is Markov Chain?
• A Markov chain is a mathematical model for stochastic system that generates random variable X1, X2, …, Xt, where the distribution
• The distribution of the next random variable depends only on the current random variable.
• The entire chain represents a stationary probabilitydistribution.
tx 1+tx1−tx
)|(),,,|( 1121 −− = tttt xxpxxxxp
![Page 7: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/7.jpg)
MCMCMCMC is general purpose techniquefor generating fair samples from a probability in high-dimensional space, using random numbers (dice) drawn from uniform probability in certain range.
tx 1+tx1−tx
1−tz tz 1+tz
(Hidden) Markov chain states
Independent trials of dice
( )xpxt ~
],[~ baunifzt
![Page 8: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/8.jpg)
Stochastic (non-deterministic behavior) techniques - based on the use of random numbers and probability statistics to investigate problems
Large system ->random configurations, data-> describe the whole system
"Hit and miss" integration is the simplest type
Monte Carlo Methods
![Page 9: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/9.jpg)
The Monte Carlo principle
p(x): a target density defined over a high-dimensional space (e.g. the space of all possible configurations of a system under study)
The idea of Monte Carlo techniques is to draw a set of (iid) samples {x1,…,xN} from p in order to approximate p with the empirical distribution
Using these samples we can approximate integrals I(f) (or v large sums) with tractable sums that converge (as the number of samples grows) to I(f)
∑=
=≈N
i
ixxN
xp1
)( )(1)( δ
)()(1)()()(1
)( fIxfN
dxxpxffI N
N
i
i∫ ∑ →≈= ∞→=
iid: Independent and identically distributed random variables
![Page 10: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/10.jpg)
Monte Carlo principle
• Given a very large set X and a distribution p(x) over it• We draw i.i.d. a set of N samples • We can then approximate the distribution using these
samples
∑=
==N
i
iN xx
Nx
1
)( )1(1)(p
X
p(x)
)p(xN ∞→→
iid: Independent and identically distributed random variables
![Page 11: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/11.jpg)
How to build the Markov chain
Surprisingly, there are many ways to construct a Markov chain with stationary distribution π.
Perhaps the simplest is the Metropolis-Hastings algorithm.
![Page 12: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/12.jpg)
Markov Chain Monte Carlo
Draw random numbers from the posterior distribution
Each number depends on the previous one Start from arbitrary value Simulation “finds” the posterior distribution
and provides random numbers from it Advantage: very complex models can be
analyzed Disadvantage: length of the searching phase
is difficult to identify
![Page 13: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/13.jpg)
How MCMC works
Key idea is to construct a discrete time Markov chain X1, X2, X3, … on state space S whose stationary distribution is π.
If P(dy,dx) is the transitional kernel of the chain this means that
),()()( dxyPdydxS
ππ ∫=
![Page 14: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/14.jpg)
How MCMC works (2)
Subject to some technical conditions,
Distribution of Xn → π as n →∞ Thus to obtain samples from π we simulate
the chain and sample from it after a “long time”.
![Page 15: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/15.jpg)
Suppose that an orange juice company controls 20% of
the OJ market
Suppose they hire a market research company to predict
the effect on an aggressive ad campaign
Suppose they conclude:
• Someone using Brand A will stay using Brand A 90% probability
• Someone NOT using Brand A will switch to Brand A 70%
probability
Markov Process : Simple Example
![Page 16: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/16.jpg)
Buy Orange Juice(OJ) once a week.• A = uses Brand A• A′ = uses other Brand
Transition Diagram:The transition matrix:
Markov Process
A A'
0.10.9 0.3
0.7
=
3.07.01.09.0
P
![Page 17: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/17.jpg)
Initial state distribution matrix S0
S0 = [0.20 0.80] What the probability of uses Brand A after 1 week?
• S0 * P = [0.20 0.80]
= [0.74 0.26] = S1 (First State Matrix)
This is the probability after 1 week where 74% control over OJ Market.
Markov Chain
3.07.01.09.0
![Page 18: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/18.jpg)
MCMC algorithms
Metropolis-Hastings algorithm Metropolis algorithm
• Mixtures and blocks Gibbs sampling Rejection Sampling Random Sampling Sequential Monte Carlo
![Page 19: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/19.jpg)
Metropolis-Hastings Metropolis-Hastings is an MCMC model that can sample from
any distribution P, using a proposal distribution Q(x’; x).
• Initialize with random x.• Generate new x’ =
Proposal position according to Q(x’; x)
• Compute α = min( (P(x’) / P(x) ), 1)and accept change with probability α.
![Page 20: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/20.jpg)
Gibbs Sampling
Gibbs sampling is a variety of Metropolis-Hastings sampling where the sampling step is always accepted.
For multivariate distributions, in Gibbs sampling only one parameter is changed at a time.
This makes Gibbs sampling particularly useful for multivariate distributions.
![Page 21: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/21.jpg)
The Gibbs Sampler
Geman and Geman 1984, Gelfand and Smith 1990
)X,...,X,(X X k21=
with distribution ).X(
π
Consider a random vector
Suppose that the full set of conditional distributions
)x|(x i-iπ
where ).x,...,x,x,...,(x x k1i1-i1i- +=
![Page 22: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/22.jpg)
The Gibbs Sampler
Further suppose that these distributions can be sampled from.
Start at some value ).x,...,x,(xx 0k
02
01
0 =
The algorithm:
Sample from 11x )x,...,x ,x|(x 0
k03
021π
Sample from 12x )x,...,x ,x|(x 0
k03
112π
Sample from 13x )x...,x,x ,x|(x 0
k04
02
112π
![Page 23: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/23.jpg)
The Gibbs Sampler
Cycle through the components again:
Sample from 1kx )x,...,x ,x|(x 1
1-k12
111π
nix )x...,x,,...xx ,x|(x 1-n
k1-n1i
n1-i
n2
n1i +π
At time n, update the ith component by drawing a value
from
![Page 24: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/24.jpg)
Example: Random Walker (Sample)
A drinking walker walks in discrete steps. In each step, he has ½ probability walk to the right, and ½ probability to the left. He doesn’t remember his previous steps.
![Page 25: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/25.jpg)
Rejection Sampling Method
![Page 26: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/26.jpg)
Bayes’ Theorem (Rule, Law)
Bayes’ Theorem: Let events A1,…,Ak form a partition of the space S such that Pr(Aj) > 0 for all j and let B be any event such that Pr(B) > 0. Then for i = 1,..,k:
)(B|A)(A )( B | A ) ( A | B ) ( A
k kk
iii ∑
=PrPr
PrPrPr
Proof:
∑=
∩=
k kk
iiii
ABAABA
BBABA
)|Pr()Pr()|Pr()Pr(
)Pr()Pr()|Pr(
Bayes’ Theorem is just a simple rule for computing the conditional probability of events Ai given B from the conditional probability of B given each event Ai and the unconditional probability of each Ai
![Page 27: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/27.jpg)
Interpretation of Bayes’ Theorem
)(B|A)(A )( B | A ) ( A | B ) ( A
k kk
iii ∑
=PrPr
PrPrPr
Pr(Ai) = Prior distribution for the Ai. It summarizes your beliefs about the probability of event Ai before Ai or B are observed.
Pr( B | Ai ) = The conditional probability of B given Ai. It summarizes the likelihood of event B given Ai.
∑k Pr( Ak ) Pr( B | Ak ) = The normalizing constant. This is equal to the sum of the quantities in the numerator for all events Ak. Thus, P( Ai | B ) represents the likelihood of event Ai relative to all other elements of the partition of the sample space.
Pr( Ai | B ) = The posterior distribution of Ai given B. It represents the probability of event Ai after Ai has B has been observed.
![Page 28: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/28.jpg)
Example of Bayes’ Theorem
What is the probability in a survey that someone is black given that they respond that they are black when asked?
- Suppose that 10% of the population is black, so Pr(B) = 0.10- Suppose that 95% of blacks respond Yes, when asked if they are black, so
Pr( Y1 | B ) = 0.95.- Suppose that 5% of non-blacks respond Yes, when asked if they are
black, so Pr( Y1 | BC) = 0.05
68.14.095.
)05)(.9.0()95)(.1.0()95.0)(1.0(1Pr
)|Pr()Pr()|PrPrPrPrPr
1
11
11
==+
==
+=
) ( B | Y
BYBB(Y(B) | B )( Y( B ) ) ( B | Y CC
We reach the surprising conclusion that even if 95% of black and non-black respondents correctly classify themselves according to race, the probability that someone is black given that they say they are black is less than 0.70
![Page 29: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/29.jpg)
Applications
Computer vision• Object tracking demo [Blake&Isard]
Speech & audio enhancement Web statistics estimation Regression & classification Bayesian networks Genetics & molecular biology Robotics, etc.
![Page 30: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/30.jpg)
Conclusion• MCMC
• The Markov Chain Monte Carlo methods cover a variety of different fields and applications.
• There are great opportunities for combining existing sub-optimal algorithms with MCMC in many machine learning problems.
• Some areas are already benefiting from sampling methods include:
Tracking, restoration, segmentationProbabilistic graphical modelsClassificationData association for localizationClassical mixture models.
![Page 31: MCMC - Markov Chain Monte Carlo: One of the top ten algorithms of the 20th century](https://reader034.fdocuments.in/reader034/viewer/2022051515/54e8350c4a7959a46e8b4cae/html5/thumbnails/31.jpg)
TigerHATSwww.tigerhats.org
Thank you
Bangladeshi Scientists and Researchers Network
https://www.facebook.com/groups/BDSRNet