Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic...
-
Upload
mildred-gardner -
Category
Documents
-
view
216 -
download
0
Transcript of Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic...
![Page 1: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/1.jpg)
Ch. 14: Markov Chain Monte Carlo Methods
based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.;C, Andrieu, N, de Freitas, A, Doucet, M, I. Jordan. An Introduction to MCMC for Machine Learning. Machine Learning, Vol. 50, No. 1. pp. 5-43, 2003,and Wikipedia
Longin Jan LateckiTemple University
![Page 2: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/2.jpg)
Random Numbers
• Pseudo-random numbers from uniform distribution:xn+1 = (axn + c) mod m,a, c, m are parameters,e.g., m=232, a=1,664,525, and c=1,013,904,223
![Page 3: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/3.jpg)
Monte Carlo Method
While convalescing from an illness in 1946, Stanislaw Ulam was playing solitaire. It, then, occurred to him to try to compute the chances that a particular solitaire laid out with 52 cards would come out successfully (Eckhard, 1987). After attempting exhaustive combinatorial calculations, he decided to go for the more practical approach of laying out several solitaires at random and then observing and counting the number of successful plays. This idea of selecting a statistical sample to approximate a hard combinatorial problem by a much simpler problem is at the heart of modern Monte Carlo simulation. Stanislaw Ulam soon realized that computers could be used in this fashion to answer questionsof neutron diffusion and mathematical physics.
From An Introduction to MCMC for Machine Learning by: C, Andrieu, N, de Freitas, A, Doucet, M, I. JordanMachine Learning, Vol. 50, No. 1. pp. 5-43, 2003.
![Page 4: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/4.jpg)
![Page 5: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/5.jpg)
Random Samples and Statistical Models
• Random Sample: A random sample is a collection of random variables X1, X2,…, XN, that that have the same probability distribution and are mutually independent– If F is a distribution function of each random
variable Xi in a random sample, we speak of a random sample from F. Similarly we speak of a random sample from a density f, e.g., a random sample from an N(µ, σ2) distribution, etc.
![Page 6: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/6.jpg)
• Statistical Model for repeated measurementsA dataset consisting of values x(1), x(2),…, x(N) of
repeated measurements of the same quantity is modeled as the realization of a random sample
X1, X2,…, XN.
The model may include a partial specification of the probability distribution of each Xi.
![Page 7: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/7.jpg)
Distribution features and sample statistics
• Empirical Distribution Function
• Law of Large Numbers– lim N->∞ P(|FN(a) – F(a)| > ε) = 0
• This implies that for most realizations– FN(a) ≈ F(a)
N
axaF
i
N
]),((#)(
)(
![Page 8: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/8.jpg)
• The histogram
i.e., the height of histogram on (x-h, x+h] ≈ f(x)We recall that the probability density function (pdf) f of a
random variable X is a derivative of the cumulative distribution function (cdf) F(t)=Pr(X<t):
In particular, we have
N
hxhxxxf
i ]),((#)(
)(
x
duufxF )()()()( xFdx
dxf
x
duufxd
dxF
xd
dxf )(
)()(
)()(
![Page 9: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/9.jpg)
Monte Carlo Principle 1• The idea of Monte Carlo simulation is to draw an independent and identically distributed (i.i.d.) set of samples
{x(1), x(2),…, x(N)} from a target density p(x) defined on a high-dimensional space X. These N samples can be used to approximate the target density with the following empirical point-mass function
The convergence is almost sure (a.s.) and denotes the delta-Dirac mass located at x(i).Think about pN(x) as a histogram or kernel density estimate.
)()(1
)(1
)( xpxN
xp NN
ixN i
)()( xix
![Page 10: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/10.jpg)
)()()(1
)( .).(
1
)()( xpxxw
Nxp saN
N
ix
iN i
Monte Carlo = Sample based pdf approximation
Regions of high density have many samples and high weights of samples.
Discrete approximation of a continuous pdf:
![Page 11: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/11.jpg)
Samples for a Gaussian Matlab demo
![Page 12: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/12.jpg)
Samples for exponential pdf Matlab demo
![Page 13: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/13.jpg)
Delta-Dirac mass
• As a probability measure, the delta measure is characterized by its cumulative distribution function, which is the Heaviside function (or unit step function)
Plot of H0(z) and plot of Hc(0)
)(zc
)()( zHdz
dz cc
and 1)(
dzzc
cz
czzc 0)(
cz
czzHc 0
1)(
0 0
1 1
![Page 14: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/14.jpg)
Delta-Dirac function
• The Dirac delta function as the limit (in the sense of distributions) of the sequence of Gaussians
)(lim)( 00 xx aa
![Page 15: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/15.jpg)
Kernel Density Estimate• Given is set of iid samples {x(1), x(2),…, x(N)} from a density p(x)
The convergence is almost sure.
)()(1
)(1
)( xpxN
xp NN
ixN i
)()(1
)( 0,)(
1
xpxxKN
xf NiN
iN
![Page 16: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/16.jpg)
Monte Carlo Principle 2• We can approximate the integrals, e.g., expectations (or very large sums) E( f ) with tractable sums EN( f ):
The convergence is almost sure (a.s.)
• The N samples can also be used to obtain a maximum of the objective function p(x)
x
NiN
iN
X
NiN
iN
xpxffExfN
fE
dxxpxffExfN
fE
)()()()(1
)(
)()()()(1
)(
)(
1
)(
1
![Page 17: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/17.jpg)
Theorem 1Let x(i) be iid samples from p(x), then
Proof. By the strong law of large numbers, we have
We observe that µ is the mean or expected value of function f(x) with respect to the probability density function, while µ N is the mean of f values at the N sample points.
This proves the theorem.
X
saNiN
iNN dxxpxffExf
NfE )()()()(
1)( .).()(
1
X
saNiN
iN dxxpxffExf
NfE )()()()(
1)( .).()(
1
![Page 18: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/18.jpg)
Example• It is possible to express all probabilities, integrals, and summations as expectations as we illustrate now.• Consider a problem (which is completely deterministic) of integrating a function f(x) from a to b (as in high-school calculus). This can be expressed as an expectation with
respect to a uniformly distributed, continuous random variable U(a,b) between a and b. U has density function pU(x) = 1/(b - a), so if we rewrite the integral we get
• Let x(i) be iid samples from U, then
b
a
b
a
saNiN
i
xfdxab
xfabxfN
ab)(
1)()()( .).()(
1
b
a
b
a
dxab
xfabxf1
)()()(
![Page 19: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/19.jpg)
Theorem 2• Let x(i) be iid samples from p(x), then
Proof. By Theorem 1, we have
We set X=R and f(x)=Hx(t) for some t, and obtain the cdf of p(x):
Consequently,
and
X
saNiN
iNN dxxpxffExf
NfE )()()()(
1)( .).()(
1
)()(1
)( .).(
1
)( xpxN
xp saNN
ixN i
t
R xXdxxpdxxptHdxxpxf )()()()()(
t
saNN
ix
dxxptHN
i )()(1 .).(
1
)(
)()()(1
)(1 .).(
11
)()( xpdxxpt
tN
tHNt
tsaN
N
ix
N
ix ii
![Page 20: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/20.jpg)
Importance Sampling• Sometimes it is hard or impossible to draw samples from p(x).• We have a proposal distribution q(x) such that its support includes the support of p(x), i.e., p(x) > 0 ⇒ q(x) > 0 and it is easy to
draw samples form q(x).• We draw an iid set of samples {x(1), x(2),…, x(N)} from q(x) and define importance weights as
Then samples
{(x(1), w(x(1))), {(x(2), w(x(2))),…, {(x(N), w(x(N))), }
are weighted samples from p(x).
)(
)()(
)(
)()(
i
ii
xq
xpxw
![Page 21: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/21.jpg)
Theorem 3: Importance Sampling• We have a proposal distribution q(x) such that its support includes the support of p(x), i.e., p(x) >
0 ⇒ q(x) > 0. We draw an iid set of samples {x(1), x(2),…, x(N)} from q(x) and define importance weights as
Then
Proof: By Theorem 1 applied to q(x) and f(x)=h(x)w(x):)()()(
1)( .).(
1
)()( xpxxw
Nxp saN
N
ix
iN i
)(
)()(
xq
xpxw
XX
saNiiN
i
dxxpxhdxxqxq
xpxhxwxh
N)()()(
)(
)()()()(
1 .).()()(
1
We set X=R and and h(x)=Hx(t) and obtain the cdf of p(x):
Finally, we have
t
R xXdxxpdxxptHdxxpxh )()()()()(
)()()()(1
)()(1 .).()(
1
)(
1
)()( xpdxxpt
xwtN
xwtHNt
tsaNi
N
ix
iN
ix ii
![Page 22: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/22.jpg)
Importance Sampling 2• When the normalizing constant of p(x) is unknown, it is still possible to apply the importance sampling method:• We draw an iid set of samples {x(1), x(2),…, x(N)} from q(x)
and define importance weights as
Then samples are weighted samples from p(x).
Before
now
N
i
i
ii
xw
xwxw
1
)(
)()(
)(
)()(~
Ni
ii xwx 1)()( ))}(~,{(
X
saNiiN
i
dxxqxwxhxwxhN
)()()()()(1 .).()()(
1
X
XsaN
iN
i
iiN
iN
i
ii
dxxqxw
dxxqxwxh
xwN
xwxhN
xwxhN )()(
)()()(
)(1
)()(1
)(~)(1 .).(
)(
1
)()(
1
1
)()(
![Page 23: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/23.jpg)
Sampling Importance Resampling (SIR)• We draw an iid set of samples {x(1), x(2),…, x(N)} from q(x) and compute the importance weights as
• Normalize the weights as
• Resample from the distribution with probabilities given by the weights and again normalize the weights.
We obtain weighted samples from p(x):
N
i
i
ii
xw
xwxw
1
)(
)()(
)(
)()(~
Ni
ii xwx 1)()( ))}(~,{(
)(
)()(
)(
)()(
i
ii
xq
xpxw
Ninewnew
ii
xwx 1))}(~,{()()(
![Page 24: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/24.jpg)
)()()(1
)( .).(
1
)()( xpxxw
Nxp saN
N
ix
iN i
Sample based pdf representation
Regions of high density have many samples and high weights of samples.
Discrete approximation of a continuous pdf:
![Page 25: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/25.jpg)
Rejection Sampling 1Sometimes it is hard or impossible to draw samples from p(x).We can sample from a distribution p(x), which is known up to a proportionality constant, by sampling from another easy-to-sample proposal distribution q(x) that satisfies p(x) ≤ Mq(x), M < ∞, using the accept/reject procedure.The accepted x(i ) can be shown to be sampled with probability p(x).
![Page 26: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/26.jpg)
Rejection Sampling 2
Rejection sampling suffers from severe limitations. It is not alwayspossible to bound p(x)/q(x) with a reasonable constant M over the whole space X. If M is too large, the acceptance probability will be too small. This makes the method impractical in high-dimensional scenarios.
![Page 27: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/27.jpg)
Theorem 4: Rejection Sampling (taken from pdf)
![Page 28: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/28.jpg)
Demo AssignmentSuppose that we only have uniform generators at hand and we need samples drawn from a Gaussian distribution, i.e., p(x) = G(0, 1). Use the uniform distribution q(x) = U(-5, 5), to draw samples form G(0, 1).
1.Use rejection sampling to get samples from p(x).2.Use importance sampling to get samples from p(x).
In both cases plot the histogram of samples overlaid on the plot of the original Gaussian. What is the percentage of samples you had to reject with the rejection sampler?
![Page 29: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/29.jpg)
Markov Chain Monte Carlo (MCMC)
MCMC is a strategy for generating samples x(i ) while exploring the state space X using a Markov chain mechanism. This mechanism is constructed so that the chain spends more time in the most important regions. In particular, it is constructed so that the samples x(i ) mimic samples drawn from the target distribution p(x). We recall that we use MCMC when we cannot draw samples from p(x) directly, but can evaluate p(x) up to a normalizing constant.
It is intuitive to introduce Markov chains on finite state spaces, where x(i ) can only take s discrete values x(i ) X = {x∈ 1, x2, . . . , xs}. The stochastic process x(i ) is called a Markov chain if
![Page 30: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/30.jpg)
Metropolis-Hastings algorithm
We obtain a set of samples {x(1), x(2),…, x(N)} from p(x),as we show on the next set of slides.
![Page 31: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/31.jpg)
Metropolis-Hastings Example. Bimodal target distribution p(x) 0.3 exp(−0.2x∝ 2) + 0.7 exp(−0.2(x − 10)2)Proposal q(x | x(i )) = G(x(i ), 100)
![Page 32: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/32.jpg)
Simulated annealing for global optimization
This technique involves simulating a non-homogeneous Markov chain whose invariant distribution at iteration i is no longer equal to p(x), but to
Our goal is to find a global maximum
![Page 33: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/33.jpg)
Simulated annealing Example. Bimodal target distribution p(x) 0.3 exp(−0.2x∝ 2) + 0.7 exp(−0.2(x − 10)2)Proposal q(x | x(i )) = G(x(i ), 100)
![Page 34: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/34.jpg)
Gibbs SamplerGibbs sampler can be viewed as a special case of the Metropolis-Hastings algorithm. Suppose we have an n-dimensional vector x=(x1, . . . , xj, . . . , xn). and the expressions for the full conditionalsp(x j | x−j) = p(x j | x1, . . . , xj−1, x j+1, . . . , xn).In this case, it is often advantageous to use the proposal distribution q defined for j = 1, . . . , n as
The corresponding acceptance probability is:
)(
)(
)(
),()|(
)(
)(
)(
)()()()(
ij
i
ij
ij
iji
jij xp
xp
xp
xxpxxp
)(
)()|(
*
***
jjj xp
xpxxp
Thus we always accept the new sample.
![Page 35: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/35.jpg)
Gibbs Sampler
![Page 36: Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,](https://reader030.fdocuments.in/reader030/viewer/2022012919/5697bfbe1a28abf838ca253f/html5/thumbnails/36.jpg)
Gibbs Sampling
x1
x2
),,,|(~ )()(3
)(21
)1(1
tK
ttt xxxxx
),,,|(~ )()(3
)1(12
)1(2
tK
ttt xxxxπx
),,|(~ )1(1
)1(1
)1(
tK
tK
tK xxxx
Slide by Ce Liu