Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods...

292
Markov Chain Monte Carlo Methods Markov Chain Monte Carlo Methods Olivier Capp´ e LTCI, Telecom ParisTech & CNRS http://perso.telecom-paristech.fr/ ~ cappe November 30, 2010

Transcript of Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods...

Page 1: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Markov Chain Monte Carlo Methods

Olivier Cappe

LTCI, Telecom ParisTech & CNRShttp://perso.telecom-paristech.fr/~cappe

November 30, 2010

Page 2: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Textbook: Monte Carlo Statistical Methodsby Christian. P. Robert and George Casella

Slides: Adapted from (mostly)http://www.ceremade.dauphine.fr/~xian/coursBC.pdf

available athttp://perso.telecom-paristech.fr/~cappe/cours/mido-tsi.pdf

Other suggestedreading...

Page 3: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Outline

Motivations, Random Variable Generation Chapters 1 & 2

Monte Carlo Integration Chapter 3

Notions on Markov Chains Chapter 6

The Metropolis-Hastings Algorithm Chapter 7

The Gibbs Sampler Chapters 8–10

Case Study (with a glimpse of Chapter 11∗)

Corresponding homeworks

Date: Oct. 27 Nov. 3 Nov. 10 Nov. 24 Dec. 1Assignment: I (§1 & 2) II (§3) III (§6) IV (§7) V (§8–10)

Page 4: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Motivation and leading example

Motivation and leading exampleIntroductionLikelihood methodsMissing variable modelsBayesian MethodsBayesian troubles

Random variable generation

Monte Carlo Integration

Notions on Markov Chains for MCMC

The Metropolis-Hastings Algorithm

The Gibbs Sampler

Case Study

Page 5: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Introduction

Latent structures make life harder!

Even simple models may lead to computational complications, asin latent variable models

f(x|θ) =∫f?(x, x?|θ) dx?

If (x, x?) observed, fine!If only x observed, trouble!

Page 6: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Introduction

Example (Mixture models)

Models of mixtures of distributions:

X ∼ fj with probability pj ,

for j = 1, 2, . . . , k, with overall density

X ∼ p1f1(x) + · · ·+ pkfk(x) .

For a sample of independent random variables (X1, · · · , Xn),sample density

n∏i=1

{p1f1(xi) + · · ·+ pkfk(xi)} .

Expanding this product involves kn elementary terms: prohibitiveto compute in large samples.

Page 7: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Introduction

−1 0 1 2 3

−1

01

23

µ1

µ 2

Case of the 0.3N (µ1, 1) + 0.7N (µ2, 1) likelihood

Page 8: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Likelihood methods

Maximum likelihood methods

Go Bayes!!

◦ For an iid sample X1, . . . , Xn from a population with densityf(x|θ1, . . . , θk), the likelihood function is

L(x|θ) = L(x1, . . . , xn|θ1, . . . , θk)

=∏n

i=1f(xi|θ1, . . . , θk).

θn = arg maxθ

L(x|θ)

◦ Global justifications from asymptotics

◦ Computational difficulty depends on structure, eg latentvariables

Page 9: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Likelihood methods

Example (Mixtures again)

For a mixture of two normal distributions,

pN (µ, τ2) + (1− p)N (θ, σ2) ,

likelihood proportional to

n∏i=1

[pτ−1ϕ

(xi − µτ

)+ (1− p) σ−1 ϕ

(xi − θσ

)]containing 2n terms.

Page 10: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Likelihood methods

Standard maximization techniques often fail to find the globalmaximum because of multimodality or undesirable behavior(usually at the frontier of the domain) of the likelihood function.

Example

In the special case

f(x|µ, σ) = (1− ε) exp{(−1/2)x2}+ε

σexp{(−1/2σ2)(x− µ)2}

(1)with ε > 0 known, whatever n, the likelihood is unbounded:

limσ→0

L(x1, . . . , xn|µ = x1, σ) =∞

Page 11: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Missing variable models

The special case of missing variable models

Consider again a latent variable representation

g(x|θ) =∫Zf(x, z|θ) dz

Define the completed (but unobserved) likelihood

Lc(x, z|θ) = f(x, z|θ)

Useful for optimisation algorithm

Page 12: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Missing variable models

The EM Algorithm

Gibbs connection Bayes rather than EM

Algorithm (Expectation–Maximisation)

Iterate (in m)

1. (E step) Compute

Q(θ; θ(m),x) = E[logLc(x,Z|θ)|θ(m),x] ,

2. (M step) Maximise Q(θ; θ(m),x) in θ and take

θ(m+1) = arg maxθ

Q(θ; θ(m),x).

until a fixed point [of Q] is reached

Page 13: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Missing variable models

−2 −1 0 1 2

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Sample from the mixture model

Page 14: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Missing variable models

−1 0 1 2 3

−1

01

23

µ1

µ 2

Likelihood of .7N (µ1, 1) + .3N (µ2, 1) and EM steps

Page 15: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Bayesian Methods

The Bayesian Perspective

In the Bayesian paradigm, the information brought by the data x,realization of

X ∼ f(x|θ),

is combined with prior information specified by prior distributionwith density

π(θ)

Page 16: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Bayesian Methods

Central tool

Summary in a probability distribution, π(θ|x), called the posteriordistributionDerived from the joint distribution f(x|θ)π(θ), according to

π(θ|x) =f(x|θ)π(θ)∫f(x|θ)π(θ)dθ

,

[Bayes Theorem]where

Z(x) =∫f(x|θ)π(θ)dθ

is the marginal density of X also called the (Bayesian) evidence

Page 17: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Bayesian Methods

Central tool... central to Bayesian inference

Posterior defined up to a constant as

π(θ|x) ∝ f(x|θ)π(θ)

I Operates conditional upon the observations

I Integrate simultaneously prior information and informationbrought by x

I Avoids averaging over the unobserved values of x

I Coherent updating of the information available on θ,independent of the order in which i.i.d. observations arecollected

I Provides a complete inferential scope and a unique motor ofinference

Page 18: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Bayesian troubles

Conjugate bonanza...

Example (Binomial)

For an observation X ∼ B(n, p) so-called conjugate prior is thefamily of beta Be(a, b) distributionsThe classical Bayes estimator δπ is the posterior mean

Γ(a+ b+ n)Γ(a+ x)Γ(n− x+ b)

∫ 1

0p px+a−1(1− p)n−x+b−1dp

=x+ a

a+ b+ n.

Page 19: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Bayesian troubles

Conjugate Prior

Conjugacy

Given a likelihood function L(y|θ), the family Π of priors π0 on Θis conjugate if the posterior π(θ|y) also belong to Π

In this case, posterior inference is tractable and reduces toupdating the hyperparameters∗ of the prior

∗The hyperparameters are parameters of the priors; they are most often nottreated as a random variables

Page 20: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Bayesian troubles

Discrete/Multinomial & Dirichlet

If the observations consist of positive counts Y1, . . . , Yd modelledby a Multinomial distribution

L(y|θ, n) =n!∏di=1 yi!

d∏i=1

θyii

The conjugate family is the D(α1, . . . , αd) distribution

π(θ|α) =Γ(∑d

i=1 αi)∏di=1 Γ(αi)

d∏i

θαi−1i

defined on the probability simplex (θi ≥ 0,∑d

i=1 θi = 1), where Γis the gamma function Γ(α) =

∫∞0 tα−1e−tdt (Γ(k) = (k − 1)! for

integers k)

Page 21: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Bayesian troubles

Figure: Dirichlet: 1D marginals

Page 22: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Bayesian troubles

Figure: Dirichlet: 3D examples (projected on two dimensions)

Page 23: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Bayesian troubles

Multinomial Posterior

Posteriorπ(θ|y) = D(y1 + α1, . . . , yd + αd)

Posterior Mean† (yi + αi∑dj=1 yj + αj

)1≤i≤d

MAP (yi + αi − 1∑dj=1 yj + αj − 1

)1≤i≤d

if yi + αi > 1 for i = 1, . . . , dEvidence

Z(y) =Γ(∑d

i=1 αi)∏di=1 Γ(yi + αi)∏d

i=1 Γ(αi)Γ(∑d

i=1 yi + αi)†Also known as Laplace smoothing when αi = 1

Page 24: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Bayesian troubles

Conjugate Priors for the Normal I

Conjugate Prior for the Normal Mean

For the N (y|µ,w) distribution with iid observations y1, . . . ,yn,the conjugate prior for the mean µ is Gaussian N (µ|m0, v0):

π(µ|y1:n) ∝ exp[−(µ−m0)2/2v0

] n∏k=1

exp[−(yk − µ)2/2w

]∝ exp

{−1

2

[µ2

(1v0

+n

w

)− 2µ

(m0

v0+snw

)]}= N

∣∣∣∣sn +m0w/v0

n+ w/v0,

w

n+ w/v0

)where sn =

∑nk=1 yk

a

aAnd y1:n denotes the collection y1, . . . , yn

Page 25: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Bayesian troubles

Conjugate Priors for the Normal II

Conjugate Priors for the Normal Variance

If w is to be estimated and µ is known, the conjugate prior for w isthe inverse Gamma distribution I G (w|α0, β0):

π0(w|β0, α0) =βα0

0

Γ(α0)w−α0+1e−β0/w

and

π(w|y1:n) ∝ w−(α0+1)e−β0/wn∏k=1

1√w

exp[−(yk − µ)2/2w

]= w−(n/2+α0+1) exp

[−(s(2)

n /2 + β0)/w]

where s(2)n =

∑nk=1(Yk − µ)2.

Page 26: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Bayesian troubles

The Gamma, Chi-Square and Inverses

The Gamma Distributiona

aA different convention is to use Gam*(a,b), where b = 1/β is the scaleparameter

G a(θ|α, β) =βα

Γ(α)θα−1e−βθ

where α is the shape and β the inverse scale parameter(E(θ) = α/β, Var(θ) = α/β2)

I θ ∼ I G (θ|α, β): 1/θ ∼ G a(θ|α, β)I θ ∼ χ2(θ|ν): θ ∼ G a(θ|ν/2, 1/2)

Page 27: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Bayesian troubles

Figure: Gamma pdf (k = α, θ = 1/β)

Page 28: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Bayesian troubles

Conjugate Priors for the Normal IV

Example (Normal)

In the normal N (µ,w) case, with both µ and w unknown,conjugate prior on θ = (µ,w) of the form

(w)−λw exp−{λµ(µ− ξ)2 + α

}/w

since

π((µ,w)|x1, . . . , xn) ∝ (w)−λw exp−{λµ(µ− ξ)2 + α

}/w

×(w)−n exp−{n(µ− x)2 + s2

x

}/w

∝ (w)−λw+n exp−{

(λµ + n)(µ− ξx)2

+α+ s2x +

nλµn+ λµ

}/w

Page 29: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Bayesian troubles

Conjugate Priors for the Normal III

Conjugate Priors are However Available Only in Simple Cases

In the previous example the conjugate prior when both µ and w areunknown is not particularly useful.

I Hence, it is very common to resort to independent marginallyconjugate priors: eg., in the Gaussian case, takeN (µ|m0, v0)I G (w|α0, β0) as prior, then π(µ|w, y) isGaussian, π(w|µ, y) is inverse-gamma but π(µ,w|y) does notbelong to a known family‡

I There nonetheless exists some important multivariateextensions : Bayesian normal linear model, inverse-Wishartdistribution for covariance matrices

‡Although closed-form expressions for π(µ|y)and π(w|y) are available

Page 30: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Bayesian troubles

...and conjugate curse

Conjugate priors are very limited in scope

In addition, the use ofconjugate priors only for computationalreasons

• implies a restriction on the modeling of the available priorinformation

• may be detrimental to the usefulness of the Bayesian approach

• gives an impression of subjective manipulation of the priorinformation disconnected from reality.

Page 31: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Motivation and leading example

Bayesian troubles

A typology of Bayes computational problems

(i). latent variable models in general

(ii). use of a complex parameter space, as for instance inconstrained parameter sets like those resulting from imposingstationarity constraints in dynamic models;

(iii). use of a complex sampling model with an intractablelikelihood, as for instance in some graphical models;

(iv). use of a huge dataset;

(v). use of a complex prior distribution (which may be theposterior distribution associated with an earlier sample);

Page 32: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Random variable generation

Motivation and leading example

Random variable generationBasic methodsUniform pseudo-random generatorBeyond Uniform distributionsTransformation methodsAccept-Reject MethodsFundamental theorem of simulationLog-concave densities

Monte Carlo Integration

Notions on Markov Chains for MCMC

The Metropolis-Hastings Algorithm

The Gibbs Sampler

Case Study

Page 33: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Random variable generation

• Rely on the possibility of producing (computer-wise) anendless flow of random variables (usually iid) from well-knowndistributions

• Given a uniform random number generator, illustration ofmethods that produce random variables from both standardand nonstandard distributions

Page 34: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Basic methods

The inverse transform method

For a function F on R, the generalized inverse of F , F−, is definedby

F−(u) = inf {x; F (x) ≥ u} .

Definition (Probability Integral Transform)

If U ∼ U[0,1], then the random variable F−(U) has the distributionF .

Page 35: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Basic methods

The inverse transform method (2)

To generate a random variable X ∼ F , simply generate

U ∼ U[0,1]

and then make the transform

x = F−(u)

Page 36: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Uniform pseudo-random generator

Desiderata and limitationsskip Uniform

• Production of a deterministic sequence of values in [0, 1] whichimitates a sequence of iid uniform random variables U[0,1].

• Can’t use the physical imitation of a “random draw” [noguarantee of uniformity, no reproducibility]

• Random sequence in the sense: Having generated(X1, · · · , Xn), knowledge of Xn [or of (X1, · · · , Xn)] impartsno discernible knowledge of the value of Xn+1.

• Deterministic: Given the initial value X0, sample(X1, · · · , Xn) always the same

• Validity of a random number generator based on a singlesample X1, · · · , Xn when n tends to +∞, not on replications

(X11, · · · , X1n), (X21, · · · , X2n), . . . (Xk1, · · · , Xkn)

where n fixed and k tends to infinity.

Page 37: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Uniform pseudo-random generator

Uniform pseudo-random generator

Algorithm starting from an initial value 0 ≤ u0 ≤ 1 and atransformation D, which produces a sequence

(ui) = (Di(u0))

in [0, 1].For all n,

(u1, · · · , un)

reproduces the behavior of an iid U[0,1] sample (V1, · · · , Vn) whencompared through usual tests

Page 38: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Uniform pseudo-random generator

Uniform pseudo-random generator (2)

• Validity means the sequence U1, · · · , Un leads to accept thehypothesis

H : U1, · · · , Un are iid U[0,1].

• The set of tests used is generally of some consequence

◦ Kolmogorov–Smirnov and other nonparametric tests◦ Time series methods, for correlation between Ui and

(Ui−1, · · · , Ui−k)◦ Marsaglia’s battery of tests called Die Hard (!)

Page 39: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Uniform pseudo-random generator

Usual generatorsIn R and S-plus, procedure runif()

The Uniform Distribution

Description:‘runif’ generates random deviates.

Example:u <- runif(20)

‘.Random.seed’ is an integer vector, containingthe random number generator state for randomnumber generation in R. It can be saved andrestored, but should not be altered by users.

Page 40: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Uniform pseudo-random generator

500 520 540 560 580 600

0.0

0.2

0.4

0.6

0.8

1.0

uniform sample

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

Page 41: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Uniform pseudo-random generator

Usual generators (2)In C, procedure rand() or random()

SYNOPSIS#include <stdlib.h>long int random(void);

DESCRIPTIONThe random() function uses a non-linear additivefeedback random number generator employing adefault table of size 31 long integers to returnsuccessive pseudo-random numbers in the rangefrom 0 to RAND_MAX. The period of this randomgenerator is very large, approximately16*((2**31)-1).RETURN VALUErandom() returns a value between 0 and RAND_MAX.

Page 42: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Uniform pseudo-random generator

Usual generators (3)

In Matlab and Octave, procedure rand()

RAND Uniformly distributed pseudorandom numbers.R = RAND(M,N) returns an M-by-N matrix containingpseudorandom values drawn from the standard uniformdistribution on the open interval(0,1).

The sequence of numbers produced by RAND isdetermined by the internal state of the uniformpseudorandom number generator that underlies RAND,RANDI, and RANDN.

Page 43: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Uniform pseudo-random generator

Usual generators (4)

In Scilab, procedure rand()

rand() : with no arguments gives a scalar whosevalue changes each time it is referenced. Bydefault, random numbers are uniformly distributedin the interval (0,1). rand(’normal’) switches toa normal distribution with mean 0 and variance 1.

EXAMPLEx=rand(10,10,’uniform’)

Page 44: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Beyond Uniform distributions

Beyond Uniform generators

• Generation of any sequence of random variables can beformally implemented through a uniform generator

◦ Distributions with explicit F− (for instance, exponential, andWeibull distributions), use the probability integral transform

here

◦ Case specific methods rely on properties of the distribution (forinstance, normal distribution, Poisson distribution)

◦ More generic methods (for instance, accept-reject)

• Simulation of the standard distributions is accomplished quiteefficiently by many numerical and statistical programmingpackages.

Page 45: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Transformation methods

Transformation methods

Case where a distribution F is linked in a simple way to anotherdistribution easy to simulate.

Example (Exponential variables)

If U ∼ U[0,1], the random variable

X = − logU/λ

has distribution

P(X ≤ x) = P(− logU ≤ λx)= P(U ≥ e−λx) = 1− e−λx,

the exponential distribution E xp(λ).

Page 46: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Transformation methods

Other random variables that can be generated starting from anexponential include

Y = −2ν∑j=1

log(Uj) ∼ χ22ν

Y = − 1β

a∑j=1

log(Uj) ∼ G a(a, β)

Y =

∑aj=1 log(Uj)∑a+bj=1 log(Uj)

∼ Be(a, b)

Page 47: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Transformation methods

Points to note

◦ Transformation quite simple to use

◦ There are more efficient algorithms for gamma and betarandom variables

◦ Cannot generate gamma random variables with a non-integershape parameter

◦ For instance, cannot get a χ21 variable, which would get us a

N (0, 1) variable.

Page 48: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Transformation methods

Box-Muller Algorithm

Example (Normal variables)

If r, θ polar coordinates of (X1, X2), then,

r2 = X21 +X2

2 ∼ χ22 = E (1/2) and θ ∼ U [0, 2π]

Consequence: If U1, U2 iid U[0,1],

X1 =√−2 log(U1) cos(2πU2)

X2 =√−2 log(U1) sin(2πU2)

iid N (0, 1).

Page 49: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Transformation methods

Box-Muller Algorithm (2)

1. Generate U1, U2 iid U[0,1] ;

2. Define

x1 =√−2 log(u1) cos(2πu2) ,

x2 =√−2 log(u1) sin(2πu2) ;

3. Take x1 and x2 as two independent draws from N (0, 1).

Page 50: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Transformation methods

Box-Muller Algorithm (3)

I Unlike algorithms based on the CLT,this algorithm is exact

I Get two normals for the price oftwo uniforms

I Drawback (in speed)in calculating log, cos and sin.

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

Page 51: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Transformation methods

More transforms

Reject

Example (Poisson generation)

Poisson–exponential connection:If N ∼ P(λ) and Xi ∼ E xp(λ), i ∈ N∗,

Pλ(N = k) =Pλ(X1 + · · ·+Xk ≤ 1 < X1 + · · ·+Xk+1) .

Page 52: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Transformation methods

More Poisson

Skip Poisson

• A Poisson can be simulated by generating E xp(1) till theirsum exceeds 1.

• This method is simple, but is really practical only for smallervalues of λ.

• On average, the number of exponential variables required is λ.

• Other approaches are more suitable for large λ’s.

Page 53: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Transformation methods

Negative extension

I A generator of Poisson random variables can produce negativebinomial random variables since,

Y ∼ Ga(n, (1− p)/p) X|y ∼ P(y)

impliesX ∼ N eg(n, p)

Page 54: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Transformation methods

Mixture representation

• The representation of the negative binomial is a particularcase of a mixture distribution

• The principle of a mixture representation is to represent adensity f as the marginal of another distribution, for example

f(x) =∑i∈Y

pi fi(x) ,

• If the component distributions fi(x) can be easily generated,X can be obtained by first choosing fi with probability pi andthen generating an observation from fi.

Page 55: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Transformation methods

Partitioned sampling

Special case of mixture sampling when

fi(x) = f(x) IAi(x)/∫

Ai

f(x) dx

andpi = P(X ∈ Ai)

for a partition (Ai)i

Page 56: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Accept-Reject Methods

Accept-Reject algorithm

• Many distributions from which it is difficult, or evenimpossible, to directly simulate.

• Another class of methods that only require us to know thefunctional form of the density f of interest only up to amultiplicative constant.

• The key to this method is to use a simpler (simulation-wise)density g, the instrumental density , from which the simulationfrom the target density f is actually done.

Page 57: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Fundamental theorem of simulation

Fundamental theorem of simulation

Lemma

Simulating

X ∼ f(x)

equivalent to simulating

(X,U) ∼ U{(x, u) : 0 < u < f(x)} 0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

x

f(x)

Page 58: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Fundamental theorem of simulation

The Accept-Reject algorithm

Given a density of interest f , find a density g and a constant Msuch that

f(x) ≤Mg(x)

on the support of f .

Accept-Reject Algorithm

1. Generate X ∼ g, U ∼ U[0,1] ;

2. Accept Y = X if U ≤ f(X)/Mg(X) ;

3. Return to 1. otherwise.

Page 59: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Fundamental theorem of simulation

Validation of the Accept-Reject method

Warranty:

This algorithm produces a variable Y distributed according to f

Page 60: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Fundamental theorem of simulation

Two interesting properties

◦ First, it provides a generic method to simulate from anydensity f that is known up to a multiplicative factorProperty particularly important in Bayesian calculations wherethe posterior distribution

π(θ|x) ∝ π(θ) f(x|θ) .

is specified up to a normalizing constant

◦ Second, the probability of acceptance in the algorithm is1/M , e.g., expected number of trials until a variable isaccepted is M

Page 61: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Fundamental theorem of simulation

More interesting properties

◦ In cases f and g both probability densities, the constant M isnecessarily larger that 1.

◦ The size of M , and thus the efficiency of the algorithm, arefunctions of how closely g can imitate f , especially in the tails

◦ For f/g to remain bounded, necessary for g to have tailsthicker than those of f .It is therefore impossible to use the A-R algorithm to simulatea Cauchy distribution f using a normal distribution g, howeverthe reverse works quite well.

Page 62: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Fundamental theorem of simulation

No Cauchy!

Example (Normal from a Cauchy)

Take

f(x) =1√2π

exp(−x2/2)

and

g(x) =1π

11 + x2

,

densities of the normal and Cauchy distributions.Then

f(x)g(x)

=√π

2(1 + x2) e−x

2/2 ≤√

2πe

= 1.52

attained at x = ±1.

Page 63: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Fundamental theorem of simulation

Example (Normal from a Cauchy (2))

So probability of acceptance

1/1.52 = 0.66,

and, on the average, one out of every three simulated Cauchyvariables is rejected.

Page 64: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Fundamental theorem of simulation

No Double!

Example (Normal/Double Exponential)

Generate a N (0, 1) by using a double-exponential distributionwith density

g(x|α) = (α/2) exp(−α|x|)

Thenf(x)g(x|α)

≤√

2πα−1e−α

2/2

and minimum of this bound (in α) attained for

α? = 1

Page 65: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Fundamental theorem of simulation

Example (Normal/Double Exponential (2))

Probability of acceptance √π/2e = .76

To produce one normal random variable requires on the average1/.76 ≈ 1.3 uniform variables.

Page 66: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Fundamental theorem of simulation

truncate

Example (Gamma generation)

Illustrates a real advantage of the Accept-Reject algorithmThe gamma distribution Ga(α, β) represented as the sum of αexponential random variables, only if α is an integer

Page 67: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Fundamental theorem of simulation

Example (Gamma generation (2))

Can use the Accept-Reject algorithm with instrumental distribution

Ga(a, b), with a = [α], α ≥ 0.

(Without loss of generality, β = 1.)Up to a normalizing constant,

f/gb = b−axα−a exp{−(1− b)x} ≤ b−a(

α− a(1− b)e

)α−afor b ≤ 1.The maximum is attained at b = a/α.

Page 68: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Fundamental theorem of simulation

Truncated Normal simulation

Example (Truncated Normal distributions)

Constraint x ≥ µ produces density proportional to

e−(x−µ)2/2σ2Ix≥µ

for a bound µ large compared with µThere exists alternatives far superior to the naıve method ofgenerating a N (µ, σ2) until exceeding µ, which requires an averagenumber of

1/Φ((µ− µ)/σ)

simulations from N (µ, σ2) for a single acceptance.

Page 69: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Fundamental theorem of simulation

Example (Truncated Normal distributions (2))

Instrumental distribution: translated exponential distribution,E (α, µ), with density

gα(z) = αe−α(z−µ) Iz≥µ .

The ratio f/gα is bounded by

f/gα ≤

{1/α exp(α2/2− αµ) if α > µ ,

1/α exp(−µ2/2) otherwise.

Page 70: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Log-concave densities

Log-concave densities (1)

move to next chapter Densities f whose logarithm is concave, forinstance Bayesian posterior distributions such that

log π(θ|x) = log π(θ) + log f(x|θ) + c

concave

Page 71: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Log-concave densities

Log-concave densities (2)

TakeSn = {xi, i = 0, 1, . . . , n+ 1} ⊂ supp(f)

such that h(xi) = log f(xi) known up to the same constant.

By concavity of h, line Li,i+1 through(xi, h(xi)) and (xi+1, h(xi+1))

I below h in [xi, xi+1] and

I above this graph outside this interval x 1 x 2 x 3 x 4

x

L (x)2,3

log f(x)

Page 72: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Log-concave densities

Log-concave densities (3)

For x ∈ [xi, xi+1], if

hn(x) = min{Li−1,i(x), Li+1,i+2(x)} and hn(x) = Li,i+1(x) ,

the envelopes arehn(x) ≤ h(x) ≤ hn(x)

uniformly on the support of f , with

hn(x) = −∞ and hn(x) = min(L0,1(x), Ln,n+1(x))

on [x0, xn+1]c.

Page 73: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Log-concave densities

Log-concave densities (4)

Therefore, if

fn(x) = exphn(x) and fn(x) = exphn(x)

thenfn(x) ≤ f(x) ≤ fn(x) = $n gn(x) ,

where $n normalizing constant of fn

Page 74: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Log-concave densities

ARS Algorithm

1. Initialize n and Sn.

2. Generate X ∼ gn(x), U ∼ U[0,1].

3. If U ≤ fn(X)/$n gn(X), accept X;

otherwise, if U ≤ f(X)/$n gn(X), accept X

Page 75: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Log-concave densities

kill ducks

Example (Northern Pintail ducks)

Ducks captured at time i with both probability pi andsize N of the population unknown.Dataset

(n1, . . . , n11) = (32, 20, 8, 5, 1, 2, 0, 2, 1, 1, 0)

Number of recoveries over the years 1957–1968 of 1612Northern Pintail ducks banded in 1956

Page 76: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Log-concave densities

Example (Northern Pintail ducks (2))

Corresponding conditional likelihood

L(n1, . . . , nI |N, p1, . . . , pI) ∝I∏i=1

pnii (1− pi)N−ni ,

where I number of captures, ni number of captured animalsduring the ith capture, and r is the total number of differentcaptured animals.

Page 77: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Log-concave densities

Example (Northern Pintail ducks (3))

Prior selectionIf

N ∼P(λ)

and

αi = log(

pi1− pi

)∼ N (µi, σ2),

[Normal logistic]

Page 78: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Log-concave densities

Example (Northern Pintail ducks (4))

Posterior distribution

π(α,N |, n1, . . . , nI) ∝ N !(N − r)!

λN

N !

I∏i=1

(1 + eαi)−N

I∏i=1

exp{αini −

12σ2

(αi − µi)2

}

Page 79: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Log-concave densities

Example (Northern Pintail ducks (5))

For the conditional posterior distribution

π(αi|N,n1, . . . , nI) ∝ exp{αini −

12σ2

(αi − µi)2

}/(1+eαi)N ,

the ARS algorithm can be implemented since

αini −1

2σ2(αi − µi)2 −N log(1 + eαi)

is concave in αi.

Page 80: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Log-concave densities

Posterior distributions of capture log-odds ratios for theyears 1957–1965.

−10 −9 −8 −7 −6 −5 −4 −3

0.0

0.2

0.4

0.6

0.8

1.0

1957

−10 −9 −8 −7 −6 −5 −4 −3

0.0

0.2

0.4

0.6

0.8

1.0

1958

−10 −9 −8 −7 −6 −5 −4 −3

0.0

0.2

0.4

0.6

0.8

1.0

1959

−10 −9 −8 −7 −6 −5 −4 −3

0.0

0.2

0.4

0.6

0.8

1.0

1960

−10 −9 −8 −7 −6 −5 −4 −3

0.0

0.2

0.4

0.6

0.8

1.0

1961

−10 −9 −8 −7 −6 −5 −4 −3

0.0

0.2

0.4

0.6

0.8

1.0

1962

−10 −9 −8 −7 −6 −5 −4 −3

0.0

0.2

0.4

0.6

0.8

1.0

1963

−10 −9 −8 −7 −6 −5 −4 −3

0.0

0.2

0.4

0.6

0.8

1.0

1964

−10 −9 −8 −7 −6 −5 −4 −3

0.0

0.2

0.4

0.6

0.8

1.0

1965

Page 81: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Random variable generation

Log-concave densities

1960

−8 −7 −6 −5 −4

0.0

0.2

0.4

0.6

0.8

True distribution versus histogram of simulated sample

Page 82: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Monte Carlo integration

Motivation and leading example

Random variable generation

Monte Carlo IntegrationIntroductionMonte Carlo IntegrationImportance SamplingSelf-Normalised Importance SamplingAcceleration MethodsApplication to Bayesian Model Choice

Notions on Markov Chains for MCMC

The Metropolis-Hastings Algorithm

The Gibbs Sampler

Case Study

Page 83: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Introduction

Quick reminder

Two major classes of numerical problems that arise in statisticalinference

◦ Optimization - generally associated with the likelihoodapproach

◦ Integration- generally associated with the Bayesian approach

Page 84: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Introduction

skip Example!

Example (Bayesian decision theory)

Bayes estimators are not always posterior expectations, but rathersolutions of the minimization problem

minδ

∫Θ

L(θ, δ) π(θ) f(x|θ) dθ .

Proper loss:For L(θ, δ) = (θ − δ)2, the Bayes estimator is the posterior meanAbsolute error loss:For L(θ, δ) = |θ − δ|, the Bayes estimator is the posterior medianWithout loss functionUse the maximum a posteriori (MAP) estimator

arg maxθf(x|, θ)π(θ)

Page 85: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Monte Carlo Integration

Monte Carlo integration

Theme:Generic problem of evaluating the integral

I = Ef [h(X)] =∫

Xh(x) f(x) dx

where X is uni- or multidimensional, f is a closed form, partlyclosed form, or implicit density, and h is a function

Page 86: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Monte Carlo Integration

Monte Carlo integration (2)

Monte Carlo solutionFirst use a sample (X1, . . . , Xm) from the density f toapproximate the integral I by the empirical average

hm =1m

m∑j=1

h(Xj)

which convergeshm −→ Ef [h(X)]

by the Strong Law of Large Numbers

Page 87: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Monte Carlo Integration

Monte Carlo precision

Estimate the variance with

vm =1

m− 1

m∑j=1

[h(Xj)− hm]2,

and for m large,

√mhm − Ef [h(X)]

√vm

∼ N (0, 1),

(assuming Ef [h2(X)] <∞).

Note: This can lead to the construction of a convergence test andof confidence bounds on the approximation of Ef [h(X)].

Page 88: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Monte Carlo Integration

Example (Cauchy prior/normal sample)

For estimating a normal mean, a robust prior is a Cauchy prior

X ∼ N (θ, 1), θ ∼ C(0, 1).

Under squared error loss, posterior mean

δπ(x) =

∫ ∞−∞

θ

1 + θ2e−(x−θ)2/2dθ∫ ∞

−∞

11 + θ2

e−(x−θ)2/2dθ

Page 89: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Monte Carlo Integration

Example (Cauchy prior/normal sample (2))

Form of δπ suggests simulating iid variables

θ1, · · · , θm ∼ N (x, 1)

and calculating

δπm(x) =m∑i=1

θi1 + θ2

i

/ m∑i=1

11 + θ2

i

.

The Law of Large Numbers implies

δπm(x) −→ δπ(x) as m −→∞.

Page 90: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Monte Carlo Integration

0 200 400 600 800 1000

9.6

9.8

10.0

10.2

10.4

10.6

iterations

Range of estimators δπm for 100 runs and x = 10

Page 91: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Importance Sampling

Importance sampling

Paradox

Simulation from f (the true density) is not necessarily optimal

Alternative to direct sampling from f is importance sampling,based on the alternative representation

Ef [h(X)] =∫

X

[h(x)

f(x)g(x)

]g(x) dx .

which allows us to use other distributions than f

Page 92: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Importance Sampling

Importance sampling (2)

Importance sampling algorithm

To evaluation

Ef [h(X)] =∫

Xh(x) f(x) dx ,

do

1. Generate a sample X1, . . . , Xm from a distribution g

2. Use the weighted approximation

1m

m∑j=1

f(Xj)g(Xj)

h(Xj)

Page 93: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Importance Sampling

Consitency of importance sampling

Convergence of the estimator

1m

m∑j=1

f(Xj)g(Xj)

h(Xj) −→∫

Xh(x) f(x) dx

converges almost surely for any choice of the distribution g[as long as supp(g) ⊃ supp(f)]

Page 94: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Importance Sampling

Important details

◦ Instrumental distribution g chosen from distributions easy tosimulate

◦ The same sample (generated from g) can be used repeatedly,not only for different functions h, but also for differentdensities f

◦ Even dependent proposals can be used, as seen laterPMC chapter

Page 95: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Importance Sampling

Although g can be any density, some choices are better thanothers:

◦ Finite variance only when

Ef

[h2(X)

f(X)g(X)

]=∫

Xh2(x)

f2(X)g(X)

dx <∞ .

◦ Instrumental distributions with tails lighter than those of f(that is, with sup f/g =∞) not appropriate.

◦ If sup f/g =∞, the weights f(xj)/g(xj) vary widely, givingtoo much importance to a few values xj .

◦ If sup f/g = M <∞, the accept-reject algorithm can be usedas well to simulate f directly.

Page 96: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Importance Sampling

Example (Counter-Example: Cauchy target)

Case of Cauchy distribution C(0, 1) when importance function isGaussian N (0, 1).Ratio of the densities

ω(x) =p?(x)p0(x)

=√

2πexpx2/2π (1 + x2)

very badly behaved: e.g.,∫ ∞−∞

ω(x)2p0(x)dx =∞ .

Poor performances of the associated importance samplingestimator

Page 97: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Importance Sampling

0 2000 4000 6000 8000 10000

050

100150

200

iterations

Range and average of 500 replications of IS estimate ofE[exp(−X)] over 10, 000 iterations.

Page 98: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Importance Sampling

Optimal importance function

The choice of g that minimizes§ the variance of theimportance sampling estimator is

g∗(x) =|h(x)| f(x)∫

X |h(x′)| f(x′) dx′.

Rather formal optimality result since optimal choice of g∗(x)requires the knowledge of I, the integral of interest!

§This choice ensures that, for a positive function h, hm = I, even for asingle draw!

Page 99: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Self-Normalised Importance Sampling

Self-Normalized Importance Sampling (Hammersley &Handscomb, 1964)

In self-normalized (or Bayesian) importance sampling, expectationsunder the target pdf f

Ef [h(X)]

are estimated as

h(g)m =

m∑i=1

ωi∑mj=1 ωj︸ ︷︷ ︸Wi

h(Xi) =1m

∑mi=1 ωih(Xi)

1m

∑mj=1 ωj

,

whereI Xi ∼ iid g, where g is an instrumental pdfI ωi = f

g (Xi).

This form of importance sampling does not necessitate that f beproperly normalized.

Page 100: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Self-Normalised Importance Sampling

Performance of self-normalized IS

Assuming that Ef [fg (X)(1 + h2(X))] <∞, h(g)m is consistent and

asymptotically normal, with asymptotic variance given by

υ(g) = Ef

[f

g(X) (h(X)− Ef (h))2

].

The asymptotic variance can be estimated from the IS sample by

υ(g)m = m

m∑i=1

(Wi)2{h(Xi)− h(g)

m }2 ,

where Wi = ωi/∑m

j=1 ωj are the normalized weights.

Page 101: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Self-Normalised Importance Sampling

Elements of proofConsistency If f = f/c, where c =

∫f(x)dx, is a pdf,

Eg[ωih(Xi)] = Eg

[f

g(X)h(X)

]= cEf [h(X)] .

CLT

√m(h(g)

m − Ef (h)) =1√m

∑mi=1 ωi(h(Xi)− Ef (h))

1m

∑mj=1 ωj

and varg[ωi(h(Xi)− Ef (h))] = c2 Ef [ fg (X)(h(X)− Ef (h))2].

Empirical Estimate

υmg (h) =m∑i=1

Wi

fg (Xi)

1m

∑mj=1

fg (Xj)

{h(Xi)− h(g)m }2

Page 102: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Self-Normalised Importance Sampling

Empirical Diagnostic Tool for IS

Effective Sample Size

ESS(g)m =

[m∑i=1

(Wi)2

]−1

1. 1 ≤ ESS(g)m ≤ m

2. m/ESS(g)m is an estimate of Ef [fg (X)] which is the maximal

IS asymptotic variance for functions h such that|h(x)− Ef (h)| ≤ 1 (note: these functions have maximalMonte Carlo variance of 1 under f).

3. m/ESS(g)m −1 is an estimator of the χ2 divergence∫

(f(x)− g(x))2

g(x)dx

(

√m/ESS(g)

m is the empirical coefficient of variation).

Page 103: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Self-Normalised Importance Sampling

Optimal IS Instrumental DensityWhen considering large classes of functions h, IS generally appearsto perform worse than Monte Carlo under f (e.g., for functionssuch that |h(x)− Ef (h)| ≤ 1, the maximal Monte Carloasymptotic variance is 1 whereas, the maximal IS asymptoticvariance is Ef [fg (X)] which is strictly larger than 1 by Jensen’sinequality, unless g = f).

However, for a fixed function h, the optimal IS density is not f but

|h(x)− Ef (h)|f(x)∫|h(x′)− Ef (h)|f(x′)dx′

as Cauchy-Schwarz inequality implies that

Ef

[f

g(X) (h(X)− Ef (h))2

]Ef

[g

f(X)

]︸ ︷︷ ︸

=1

≥ E2f [|h(X)− Ef (h)|] .

Page 104: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Self-Normalised Importance Sampling

Example (Bayesian Posterior)

Consider a simple model in which

θ ∼ N (1, 22) ,

X|θ ∼ N (θ2, σ2) .

And apply IS to compute expectations under the posterior forX = 2, using the prior as instrumental pdf.

Page 105: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Self-Normalised Importance Sampling

−6 −4 −2 0 2 4 6 80

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

InstrumentalTarget

σ = 2lim ESS(g)

m /m 0.68

υ(g)[h(θ) = θ] 2.0

υ(f) 1.4

−6 −4 −2 0 2 4 6 80

0.5

1

1.5

InstrumentalTarget

σ = 0.5lim ESS(g)

m /m 0.19

υ(g)[h(θ) = θ] 9.0

υ(f) 1.7

Page 106: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Self-Normalised Importance Sampling

IS suffers from curse of dimensionalityAs dimension increases, discrepancy between importance andtarget worsens

To illustrate this issue, consider the simple model where both thetarget and instrumental are d-dimensional product distributions:

fn(x1:d) =d∏i=1

f(xi)

gn(x1:d) =d∏i=1

g(xi)

Then, the unormalized importance weights

ω =d∏i=1

f

g(Xi)

are a product of iid terms of expectation 1.

Page 107: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Self-Normalised Importance Sampling

IS suffers from curse of dimensionality (Contd.)

For a function of interest h that only depends on the firstcoordinate x1, the asymptotic variance of the Self-normalized ISapproximation to Ef (h) is given by

υn(h) =∫· · ·∫ ( d∏

i=1

f

g(xi)

)2

(h(x1)− Ef (h))2d∏i=1

g(xi) dx1 . . . dxn

=(∫ f

g(x)f(x)dx︸ ︷︷ ︸>1

)d−1∫f

g(x1) (h(x1)− Ef (h))2 g(x1)dx1 .

In practise, this situation can usually be detected by monitoring theeffective sample size criterion, which become abnormally small.

Page 108: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Self-Normalised Importance Sampling

Importance Sampling Scaling Example

2 5 10 20

Figure: Importance sampling (m = 1000 samples) for a multivariateNormal distribution truncated at ±4σ with uniform proposals (2Dmarginal for increasing values of d; radius of point increases with theimportance weight; circle shows 99% region for the 2D marginal).

Here Ef[f(X)g(X)

]= 2.26

Page 109: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Self-Normalised Importance Sampling

too volatile!

Example (Stochastic volatility model)

yk = β exp (xk/2) εk , εk ∼ N (0, 1)

with AR(1) log-variance process (or volatility)

xk+1 = ϕxk + σuk , uk ∼ N (0, 1)

Page 110: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Self-Normalised Importance Sampling

Evolution of IBM stocks (corrected from trend and log-ratio-ed)

0 100 200 300 400 500

−10

−9−8

−7−6

days

Page 111: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Self-Normalised Importance Sampling

Example (Stochastic volatility model (2))

Observed likelihood unavailable in closed from.Joint posterior distribution of the hidden state sequence{Xk}1≤k≤n can be evaluated explicitly

n∏k=2

exp[−{σ−2(xk − φxk−1)2 + β−2 exp(−xk)y2

k + xk}/2],

(2)up to a normalizing constant.

Page 112: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Self-Normalised Importance Sampling

Computational problems

Example (Stochastic volatility model (3))

Direct simulation from this distribution impossible because of

(a) dependence among the Xk’s,

(b) dimension of the sequence {Xk}1≤k≤n, and

(c) exponential term exp(−xk)y2k within (2).

Page 113: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Self-Normalised Importance Sampling

Importance sampling

Example (Stochastic volatility model (4))

We illustrate below the behavior of (self-normalized) importancesampling on trajectories simulated under the model with ϕ = 0.98,σ = 0.17 and β = 0.64 when proposing whole trajectories{Xk}1≤k≤n from the AR(1) prior, using m = 1000 simulations.

The importance weight associated to x1, . . . , xn reduces to

n∏k=1

exp[−{β−2 exp(−xk)y2

k + xk}/2],

Page 114: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Self-Normalised Importance Sampling

Typical Evolution of ESS for a Single ObservationTrajectory

1 5 10 15 20 25 30 35 40 45 500

100

200

300

400

500

600

700

800

900

1000

ES

S (

500

inde

p. r

eplic

atio

ns o

f the

par

ticle

s)

time index

Page 115: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Self-Normalised Importance Sampling

Evolution of ESS When Averaging Over Trajectories

1 5 10 15 20 25 30 35 40 45 500

100

200

300

400

500

600

700

800

900

1000E

SS

(50

0 in

dep.

sim

ulat

ions

)

time index

Page 116: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Self-Normalised Importance Sampling

Typical Evolution of the Weight Distribution

−25 −20 −15 −10 −5 00

500

1000

−25 −20 −15 −10 −5 00

500

1000

−25 −20 −15 −10 −5 00

50

100

Importance Weights (base 10 logarithm)

Figure: Histograms of the base 10 logarithm of the normalizedimportance weights for subtrajectories of length (from top to bottom) 1,10 and 100 samples.

Page 117: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Acceleration Methods

Correlated simulations

Negative correlation reduces variance

Special technique — but efficient when it applies

Two samples (X1, . . . , Xm) and (Y1, . . . , Ym) from f to estimate

I =∫

Rh(x)f(x)dx

by

I1 =1m

m∑i=1

h(Xi) and I2 =1m

m∑i=1

h(Yi)

with mean I and variance σ2

Page 118: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Acceleration Methods

Variance reduction

Variance of the average

var

(I1 + I2

2

)=σ2

2+

12

cov(I1, I2).

If the two samples are negatively correlated,

cov(I1, I2) ≤ 0 ,

they improve on two independent samples of same size

Page 119: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Acceleration Methods

Antithetic variables

◦ If f symmetric about µ, take Yi = 2µ−Xi

◦ If Xi = F−1(Ui), take Yi = F−1(1− Ui)◦ If (Ai)i partition of X , partitioned sampling by samplingXj ’s in each Ai (requires to know Pr(Ai))

Page 120: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Acceleration Methods

Control variates

out of control!

For

I =∫h(x)f(x)dx

unknown and

I0 =∫h0(x)f(x)dx

known,

I0 estimated by I0 and

I estimated by I

Page 121: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Acceleration Methods

Control variates (2)

Combined estimator

I∗ = I + β(I0 − I0)

I∗ is unbiased for I and

var(I∗) = var(I) + β2var(I0) + 2β cov(I, I0)

Page 122: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Acceleration Methods

Optimal control

Optimal choice of β

β? = −cov(I, I0)

var(I0),

withvar(I?) = (1− ρ2) var(I) ,

where ρ correlation between I and I0

Usual solution: regression coefficient of h(xi) over h0(xi)

Page 123: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Acceleration Methods

Example (Quantile Approximation)

Evaluate

% = P(X > a) =∫ ∞a

f(x)dx

by

% =1n

n∑i=1

I(Xi > a),

with Xi iid f .

Assuming that the median µ (P(X > µ) = 12) is known.

Page 124: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Acceleration Methods

Example (Quantile Approximation (2))

Control variate

% =1n

n∑i=1

I(Xi > a) + β

(1n

n∑i=1

I(Xi > µ)− P(X > µ)

)

improves upon % if

β < 0 and |β| < 2cov(%, %0)var(%0)

2P(X > a)P(X > µ)

.

Page 125: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Acceleration Methods

Integration by conditioning

Use Rao-Blackwell Theorem

var(E[δ(X)|Y]) ≤ var(δ(X))

Page 126: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Acceleration Methods

Consequence

If I unbiased estimator of I = Ef [h(X)], with X simulated from ajoint density f(x, y), where∫

f(x, y)dy = f(x),

the estimatorI∗ = Ef [I|Y1, . . . , Yn]

dominate I(X1, . . . , Xn) variance-wise (and is unbiased)

Page 127: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Acceleration Methods

skip expectation

Example (Student’s t expectation)

ForE[h(X)] = E[exp(−X2)] with X ∼ T (ν, 0, σ2)

a Student’s t distribution can be simulated as

X|y ∼ N (µ, σ2y) and Y −1 ∼ χ2ν .

Page 128: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Acceleration Methods

Example (Student’s t expectation (2))

Empirical distribution

1m

m∑j=1

exp(−X2j ) ,

can be improved from the joint sample

((X1, Y1), . . . , (Xm, Ym))

since

1m

m∑j=1

E[exp(−X2)|Yj ] =1m

m∑j=1

1√2σ2Yj + 1

is the conditional expectation.In this example, precision ten times better

Page 129: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Acceleration Methods

0 2000 4000 6000 8000 10000

0.50

0.52

0.54

0.56

0.58

0.60

0 2000 4000 6000 8000 10000

0.50

0.52

0.54

0.56

0.58

0.60

Estimators of E[exp(−X2)]: empirical average (full) andconditional expectation (dotted) for (ν, µ, σ) = (4.6, 0, 1).

Page 130: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Bayesian model choice

directly Markovian

Probabilise the entire model/parameter space

I allocate probabilities pi to all models Mi

I define priors πi(θi) for each parameter space Θi

I compute

π(Mi|x) =pi

∫Θi

fi(x|θi)πi(θi)dθi∑j

pj

∫Θj

fj(x|θj)πj(θj)dθj

I take largest π(Mi|x) to determine “best” model,

Page 131: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Bayes factor

Definition (Bayes factors)

For testing hypotheses H0 : θ ∈ Θ0 vs. Ha : θ 6∈ Θ0, under prior

π(Θ0)π0(θ) + π(Θc0)π1(θ) ,

central quantity

B01 =π(Θ0|x)π(Θc

0|x)

/π(Θ0)π(Θc

0)=

∫Θ0

f(x|θ)π0(θ)dθ∫Θc0

f(x|θ)π1(θ)dθ

[Jeffreys, 1939]

Page 132: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Evidence

A similar quantity, the evidence

Ek =∫

Θk

πk(θk)Lk(θk) dθk,

a.k.a. the marginal likelihood.[Jeffreys, 1939]

Page 133: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Bayes factor approximation

When approximating the Bayes factor

B01 =

∫Θ0

f0(x|θ0)π0(θ0)dθ0∫Θ1

f1(x|θ1)π1(θ1)dθ1

use of importance functions $0 and $1 and

B01 =n−1

0

∑n0i=1 f0(x|θi0)π0(θi0)/$0(θi0)

n−11

∑n1i=1 f1(x|θi1)π1(θi1)/$1(θi1)

Page 134: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Diabetes in Pima Indian women

Example (R benchmark)

“A population of women who were at least 21 years old, of PimaIndian heritage and living near Phoenix (AZ), was tested fordiabetes according to WHO criteria. The data were collected bythe US National Institute of Diabetes and Digestive and KidneyDiseases.”200 Pima Indian women with observed variables

I plasma glucose concentration in oral glucose tolerance test

I diastolic blood pressure

I diabetes pedigree function

I presence/absence of diabetes

Page 135: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Probit modelling on Pima Indian women

Probability of diabetes function of above variables

P(y = 1|x) = Φ(x1β1 + x2β2 + x3β3) ,

Test of H0 : β3 = 0 for 200 observations of Pima.tr based on ag-prior modelling:

β ∼ N3

(0, n(XTX)−1

)

Page 136: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Importance sampling for the Pima Indian dataset

Use of the importance function inspired from the MLE estimatedistribution

β ∼ N (β, Σ)

R Importance sampling codemodel1=summary(glm(y~-1+X1,family=binomial(link="probit")))

is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled)

is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)

bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1],

sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)-

dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))

Page 137: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Diabetes in Pima Indian womenComparison of the variation of the Bayes factor approximationsbased on 100 replicas for 20, 000 simulations from the prior andthe above MLE importance sampler

Basic Monte Carlo Importance sampling

23

45

Page 138: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Bridge sampling

Special case:If

π1(θ1|x) ∝ π1(θ1|x)π2(θ2|x) ∝ π2(θ2|x)

live on the same space (Θ1 = Θ2), then

B12 =1n

n∑i=1

π1(θi|x)π2(θi|x)

with θi ∼ π2(θ|x)

[Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]

Requires simulating from π2(θ|x) which will be done(approximately) using MCMC

Page 139: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Bridge sampling variance

The bridge sampling estimator does poorly if

var(B12)B2

12

=1n

Eπ2

[(π1(θ)π2(θ)

)2

− 1

]

=1n

Eπ2

[(π1(θ)− π2(θ)

π2(θ)

)2]

︸ ︷︷ ︸R (π1(θ)−π2(θ))2

π2(θ)dθ

is large, i.e. if the posteriors π1 and π2 have little overlap...

Page 140: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

(Further) bridge sampling

General identity:

B12 =

∫π2(θ|x)α(θ)π1(θ|x)dθ∫π1(θ|x)α(θ)π2(θ|x)dθ

∀ α(·)

1n1

n1∑i=1

π2(θ1i|x)α(θ1i)

1n2

n2∑i=1

π1(θ2i|x)α(θ2i)θji ∼ πj(θ|x)

Page 141: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Optimal bridge sampling

The optimal choice of auxiliary function is

α?(θ) =n1 + n2

n1π1(θ|x) + n2π2(θ|x)

leading to

B12 ≈

1n1

n1∑i=1

π2(θ1i|x)n1π1(θ1i|x) + n2π2(θ1i|x)

1n2

n2∑i=1

π1(θ2i|x)n1π1(θ2i|x) + n2π2(θ2i|x)

Page 142: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Optimal bridge sampling (2)

Reason:

Var(B12)B2

12

≈ 1n1n2

{∫π1(θ)π2(θ)[n1π1(θ) + n2π2(θ)]α(θ)2 dθ(∫

π1(θ)π2(θ)α(θ) dθ)2 − 1

}

(by the δ method)

Drawback: Dependence on the unknown normalising constantssolved iteratively

Page 143: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Extension to varying dimensions

When dim(Θ1) 6= dim(Θ2), e.g. θ2 = (θ1, ψ), introduction of apseudo-posterior density, ω(ψ|θ1, x), augmenting π1(θ1|x) intojoint distribution

π1(θ1|x)× ω(ψ|θ1, x)

on Θ2 so that

B12 =

∫π1(θ1|x)α(θ1, ψ)π2(θ1, ψ|x)dθ1ω(ψ|θ1, x) dψ∫π2(θ1, ψ|x)α(θ1, ψ)π1(θ1|x)dθ1 ω(ψ|θ1, x) dψ

= Eπ2

[π1(θ1)ω(ψ|θ1)π2(θ1, ψ)

]=

Eϕ [π1(θ1)ω(ψ|θ1)/ϕ(θ1, ψ)]Eϕ [π2(θ1, ψ)/ϕ(θ1, ψ)]

for any conditional density ω(ψ|θ1) and any joint density ϕ.

Page 144: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Illustration for the Pima Indian dataset

Use of the MLE induced conditional of β3 given (β1, β2) as apseudo-posterior and mixture of both MLE approximations on β3

in bridge sampling estimate

R bridge sampling codecova=model2$cov.unscaled

expecta=model2$coeff[,1]

covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3]

probit1=hmprobit(Niter,y,X1)

probit2=hmprobit(Niter,y,X2)

pseudo=rnorm(Niter,meanw(probit1),sqrt(covw))

probit1p=cbind(probit1,pseudo)

bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]),

sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3],

cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+

dnorm(pseudo,expecta[3],cova[3,3])))

Page 145: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Diabetes in Pima Indian women (cont’d)Comparison of the variation of the Bayes factor approximationsbased on 100× 20, 000 simulations from the prior (MC), the abovebridge sampler and the above importance sampler

Page 146: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

The original harmonic mean estimator

When θki ∼ πk(θ|x),

1T

T∑t=1

1L(θkt)

is an unbiased estimator of 1/Zk[Newton & Raftery, 1994]

Highly dangerous: Most often leads to an infinite variance!!! (as itresults from an application of IS targeting the prior and using theposterior as instrumental density)

Page 147: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Approximating Zk from a posterior sample

Use of the [harmonic mean] identity

Eπk[

ϕ(θk)πk(θk)Lk(θk)

∣∣∣∣x] =∫

ϕ(θk)πk(θk)Lk(θk)

πk(θk)Lk(θk)Zk

dθk =1Zk

no matter what the proposal ϕ(·) is.[Gelfand & Dey, 1994; Bartolucci et al., 2006]

Page 148: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Comparison with regular importance sampling

Harmonic mean: Constraint opposed to usual importance samplingconstraints: ϕ(θ) must have lighter (rather than fatter) tails thanπk(θk)Lk(θk) for the approximation

Z1k = 1

/1T

T∑t=1

ϕ(θ(t)k )

πk(θ(t)k )Lk(θ

(t)k )

to have a finite variance.E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ

Page 149: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Comparison with regular importance sampling (cont’d)

Compare Z1k with a standard importance sampling approximation

Z2k =1T

T∑t=1

πk(θ(t)k )Lk(θ

(t)k )

ϕ(θ(t)k )

where the θ(t)k ’s are generated from the density ϕ(·) (with fatter

tails like t’s)

Page 150: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

HPD indicator as ϕUse the convex hull of MCMC simulations corresponding to the10% HPD region (easily derived!) and ϕ as indicator:

ϕ(θ) =10T

∑t∈HPD

Id(θ,θ(t))≤ε

Page 151: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Monte Carlo Integration

Application to Bayesian Model Choice

Diabetes in Pima Indian women (cont’d)Comparison of the variation of the Bayes factor approximationsbased on 100 replicas for 20, 000 simulations for a simulation fromthe above harmonic mean sampler and importance samplers

Harmonic mean Importance sampling

3.102

3.104

3.106

3.108

3.110

3.112

3.114

3.116

Page 152: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Notions on Markov Chains

Notions on Markov Chains for MCMCBasicsInvariant MeasuresErgodicity: (1) The Discrete CaseErgodicity: (2) The Continuous CaseLimit Theorems

Page 153: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Basics

Basics

A Markov chain is a (discrete-time) stochastic process which issuch that the future evolution of the process is independent of itspast given only its current value.

The distribution of the chain is defined through its transitionkernel, a function K defined on X ×B(X ) such that

I ∀x ∈X , K(x, ·) is a probability measure;

I ∀A ∈ B(X ), K(·, A) is measurable.

Page 154: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Basics

no discrete

• When X is a discrete (finite or countable) set, the transitionkernel may be represented as transition matrix K = (Kxy)whose elements satisfy

Kxy = P(Xn = y|Xn−1 = x) , x, y ∈X

Since, for all x ∈X , K(x, ·) is a probability, we must have

Kxy ≥ 0 and K(x,X ) =∑y∈X

Kxy = 1

The matrix K is said to be a stochastic matrix.

Page 155: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Basics

• In the continuous case, if all probabilities K(x, ·) areabsolutely continuous wrt. Lebesgue measure, the kernel canbe represented by a transition density function k(x, x′):

P(Xn−1 ∈ A|Xn) =∫Ak(Xn, x

′)dx′.

The transition density function satisfies k(x, x′)dx′ ≥ 0 and∫X k(x, x′) = 1.

Page 156: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Basics

Kh

For functions h such that∫X K(x, dy)h(y) <∞ for all xs, define

the function Kh by

Kh(x) = K(x, h) =∫

XK(x, dy)h(y).

Kh(Xn) = E(h(Xn+1)|Xn).

µK

For probability measures µ, define the measure µK by

µK(A) =∫

Xµ(dx)K(x,A).

µK correspond the law of Xn+1, had the chain been started attime n with distribution Xn ∼ µ.

Page 157: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Basics

Composition of Kernels

More generally if K1 and K2 are two transition kernels,

K1K2(x,A) =∫x′∈X

K1(x, dx′)K2(x′, A)

defines the law of Xn+2 when the chain is started in Xn = x antwo successive transitions according to, respectively, K1 and K2

are performed.

Page 158: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Basics

bypass remarks

On a finite state-space X = {s1, . . . , sk},I A function h is uniquely defined by the (column) vectorh = (h(s1), . . . , h(sk))T and

(Kh)s =∑s′∈X

Kss′hs′

can be interpreted as the sth component of the product of thetransition matrix K and of the vector h.

I A probability distribution on X is defined as a (row) vectorµ = (µ(s1), . . . , µ(sk)) and the probability distribution µK isdefined, for each s ∈X , as

(µK)s =∑s′∈X

µs′Ks′s ,

that is, the sth component of the product of the (row) vectorµ and of the transition matrix K.

Page 159: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Basics

Markov Chainskip definition

Definition (Markov Chain)

(Xn)n≥0 forms a Markov chain if there exist a transition kernel Kand an initial probability µ such that for any n

Pµ(X0 ∈ A0, X1 ∈ A1, . . . , Xn ∈ An) =∫x0∈A0

. . .

∫xn−1∈An−1

µ(dx0)K(x0, dx1) . . .K(xn−1, An) .

In particular,

Pµ(Xn ∈ A) =∫

X. . .

∫Xµ(dx0)K(x0, dx1) . . .K(xn−1, A) ,

which is usually denoted more concisely byPµ(Xn ∈ A) = µKn(A).

Page 160: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Basics

Example (Normal Random walk)

Xn = Xn−1 + τεn

where (εn)n≥1 is an iid. sequence of normal variables independentof X0.Then, k(x, x′) = N (x′;x, τ2I) and

Pµ(Xn ∈ A) =∫x0∈X

∫xn∈A

µ(dx0)N (xn;x0, nτ2I)︸ ︷︷ ︸

kn(x0,xn)

dxn

Page 161: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Basics

−4 −2 0 2

02

46

810

x

y

A trajectory of length 100 of the random walk in R2 withτ = 1

Page 162: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Basics

The Strong Markov Property

Let τ denote a random stopping time such that the event {τ = n}is σ(X0, . . . , Xn)-measurable for all n. Assuming that τ is Px-a.s.finite, we have

Px(Xτ+1 ∈ A1, . . . , Xτ+n ∈ An|X0, . . . , Xτ ) =∫x1∈A1

. . .

∫xn∈An

K(Xτ , dx1) . . .K(xn−1, dxn) .

Typical stopping times of interest in Markov chain include returntimes

τC = inf{n ≥ 1 : Xn ∈ C} .

Page 163: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Invariant Measures

Invariant Measures

Definition (Invariant Measure)

A measure π is invariant for the transition kernel K if πK = π,that is,

π(A) =∫

Xπ(dx) K(x,A) , ∀A ∈ B(X ) .

I If π in an invariant measure, so is cπ for any scale factor c

I There can be several invariant measures (that differs by morethan just a scale)

The latter can be avoided by assuming that the chain is irreduciblego to irreducible chains

Page 164: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Invariant Measures

Stability Properties and InvarianceAssuming that π is unique (up to the normalization issue), thereare two very different cases

Null Chains If π is not a finite measure

Positive Chains If π is a probability measure; in this case π alsocalled the stationary distribution since

X0 ∼ π implies that Xn ∼ π for every n

More generally, the process is then stationary in thesense that

Pπ(Xk ∈ A0, Xk+1 ∈ A1, . . . , Xk+n ∈ An) =∫x0∈A0

. . .

∫xn∈An

π(dx0)K(x0, dx1) . . .K(xn−1, dxn) ,

for any lag k ≥ 0¶.¶This restriction can be removed, but the corresponding construction is

omitted here.

Page 165: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Invariant Measures

Stationary Distribution in MCMC

In MCMC, we construct kernels K that admit the target πas invariant measure

Hence, if we can show that π is the unique invariant measure, wewill always deal with positive chains.

Page 166: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Invariant Measures

Reversible Chains

π-Reversibility

K is said to be π-reversible is π(dx)K(x, dx′) is a symmetricmeasure on X 2, that is∫

x∈A

∫x′∈A′

π(dx)K(x, dx′) =∫x′∈A′

∫x∈A

π(dx′)K(x′, dx)

π-reversibility implies that K is π-invariant as∫x∈X

∫x′∈A

π(dx)K(x, dx′) =∫x′∈A

∫x∈X

π(dx′)K(x′, dx)

=∫x′∈A

π(dx′)K(x′,X )︸ ︷︷ ︸=1

= π(A) .

Page 167: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Invariant Measures

π-Reversibility and Time Reversal

If K is π-reversible, the stationary chain may be run backwards intime to obtain a backward chain that has the same distribution asthe (usual) forward chain

Pπ(Xn−k ∈ Ak, . . . , Xn−1 ∈ A1, Xn ∈ A0) =∫xn∈A0

. . .

∫xn−k∈Ak

π(dxn)K(xn, dxn−1) . . .K(xn−k+1, dxn−k) .

Page 168: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Invariant Measures

π-Reversible Chains in MCMC

MCMC algorithms are all built from elementary transitionkernels that are π-reversible

It is indeed very easy to check that a kernel is π-reversible (and itdoes not require that the normalizing constant of π be known)

Page 169: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Invariant Measures

Ergodicity

For MCMC, a minimum requirement is that the chain (Xn) beergodic in the sense that

Px(Xn ∈ A)→ π(A) for all x ,

ensuring that, for large enough simulation runs, we areapproximatively simulating from π, for any choice of the startingpoint X0.

Page 170: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Invariant Measures

Total Variation Norm

Distances between probability measures will be assessed throughthe following criterion‖.

Total Variation Norm

For probability measures µ1 and µ2,

‖µ1 − µ2‖TV = supA|µ1(A)− µ2(A)|

Other equivalent definitions

(i) For a signed measure ξ on X ,‖ξ‖TV = 1/2{ξ+(X ) + ξ−(X )}

(ii) ‖ξ‖TV = 1/2 sup{|ξ(h)| : ‖h‖∞ = 1}

‖Alternative definition: ‖µ1 − µ2‖TV = 2 supA |µ1(A)− µ2(A)|.

Page 171: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Invariant Measures

A First Step Towards Ergodicity

Proposition (6.52)

If K is π-invariant (with π a probability),

‖µKn − π‖TV

is non-increasing in n, for any choice of the initial distribution µ.

A weak statement that can be made stronger under restrictiveassumptions: If there exist a probability ν such thatK(x,A) > εν(A) with ε > 0, the chain is uniformly ergodic in thesense that

‖µKn − π‖TV ≤ (1− ε)n‖µ− π‖TV .

Page 172: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Invariant Measures

Simple counter-examples show that additional conditions arerequired to ensure

Recurrence Infinitely many visits to all sets such that π(A) > 0,from any starting point.

K =

1 0 00 1− α α0 α 1− α

Aperidocity That the chain does not cycle through disconnected

regions of the state-space.

K =(

0 11 0

)

Page 173: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Ergodicity: (1) The Discrete Case

Irreducibility

Definition (Irreducibility)

In the discrete case, the chain is irreducible if all statescommunicate, in the sense that

Px(τy <∞) > 0 , ∀x, y ∈X ,

τy being the first (strictly positive) time when y is visited (alsocalled return time).

This is equivalent to requesting that for all x, y ∈X , there existsn such that Kn(x, y) > 0.

Page 174: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Ergodicity: (1) The Discrete Case

Recurrence and Transience

Define by ηy the total number of visits of the chain to state y,then Ex[ηy] =

∑n≥0K

n(x, y).

The Recurrence/Transience Dichotomy

An irreducible Markov chain is either

I transient and

Px(τy <∞) < 1 , Ex[ηy] <∞ for all x, y ∈X ,

I recurrent and

Px(τy <∞) = 1 , Ex[ηy] =∞ for all x, y ∈X .

Page 175: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Ergodicity: (1) The Discrete Case

Recurrent Chain

Kac’s Theorem

A chain on a discrete state-space X that is both π-invariant (withπ a probability, such that π(x) > 0 for all x) and irreducible

I is recurrent

I and

Ex[τx] =1

π(x).

Page 176: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Ergodicity: (1) The Discrete Case

Aperiodicity

The chain is said to be aperiodic if, for all states x, the greatestcommon divisor of the set {n ≥ 1 : Kn(x, x) > 0} is equal to 1.

Strong Aperiodicity (Sufficient condition for aperiodicity)

If the chain is irreducible, the existence of a state x such that

K(x, x) > 0

implies that the chain is strongly aperiodic.

Page 177: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Ergodicity: (1) The Discrete Case

Ergodic Theorem

An irreducible and aperiodic chain on a discrete state-space Xwith invariant probability π satisfies

‖Px(Xn ∈ ·)− π‖TV → 0 for all x ∈X

Page 178: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Ergodicity: (2) The Continuous Case

The General Case

For general chains the situation is more complex

I First, individual points of the state-space will generally not bereachable and we must adapt our definitions of irreducibility,recurrence and aperiodicity to deal with measurable setsrather points of the state-space.

I Second, we need a strengthened notion of recurrence.

Page 179: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Ergodicity: (2) The Continuous Case

Phi-Irreducibility

The chain is said to be ϕ-irreducible for some measure ϕ if

I for all x ∈X ,

I for all A ∈ B(X ) such that ϕ(A) > 0,

there exists n ≥ 1 such that

Kn(x,A) > 0 . (3)

It the chain is π-invariant, with π a probability measure, (i.e., as inMCMC) one should choose ϕ = π, as π is a maximal irreducibilitymeasure (dominates all measures ϕ such that (3) holds).

Page 180: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Ergodicity: (2) The Continuous Case

Harris Recurrence

Unfortunately (and in contrast to the discrete case),phi-irreducibility in itself is not sufficient to ensure thatPx(τA <∞) = 1, hence we will assume a stronger form ofrecurrence

Definition (Harris Recurrence)

A phi-irreducible chain is said to be Harris recurrent if for anyx ∈X and all sets A such that ϕ(A) > 0,

Px(τA <∞) = 1 .

Px(τA <∞) = 1 for all xs implies that Ex[ηA] =∞, but theconverse is not more true.

Page 181: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Ergodicity: (2) The Continuous Case

Small Sets

To ensure (strong) aperidocity a sufficient condition is theexistence of a ν-small set C such that ν(C) > 0.

Definition (Small Set)

If there exist C ∈ B(X ), with ϕ(C) > 0, a measure ν and ε > 0such that, for all x ∈ C and all A ∈ B(X ),

K(x,A) ≥ εν(A)

C is called a ν-small set.

Small sets play a role somewhat similar to points (or atoms) in thediscrete case.

Page 182: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Ergodicity: (2) The Continuous Case

General Ergodicity

Theorem (6.51)

If the chain (Xn) is Harris recurrent positive and aperiodic, then

limn→∞

‖Kn(x, ·)− π ‖TV = 0

for all x ∈X .

Convergence in total variation implies

limn→∞

Ex[h(Xn)] = Eπ[h]

for every bounded function h (and, hence, convergence indistribution).

Page 183: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Limit Theorems

Limit Theorems for MCMCI The ergodic theorem (and particularly quantitative

refinements of it not discussed here) is important to guaranteethat the algorithm is forgetting its initial (arbitrary) state.

I However, the usual practice in MCMC is not only to exploit asingle sampled point Xn for n sufficiently large but to rely onempirical trajectory-wise averages

1n

n∑k=1

h(Xk)

to estimate the expectation of functions of interest h under π.I Hence, we also need LLNs and CLTs.

Classical LLNs and CLTs not directly applicable due to:

◦ Markovian dependence structure

◦ Non-stationarity of the chain

Page 184: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Limit Theorems

LLN for Markov Chains

Theorem (6.63)

If the Markov chain (Xn) is Harris recurrent with stationarydistribution π, then for any function h such that Eπ |h| <∞,

limn→∞

1n

n∑k=1

h(Xk) = Eπ[h] .

Page 185: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Notions on Markov Chains for MCMC

Limit Theorems

CLT for Markov Chains

Theorem (6.65)

For an Harris recurrent π-reversible Markov chain (Xn), leth = h− Eπ[h] and assume that

0 < γ2h = Eπ[h2(X0)] + 2

∞∑k=1

Eπ[h(X0)h(Xk)] <∞ .

Then,

1√n

(n∑k=1

(h(Xk)− Eπ[h])

)L−→ N (0, γ2

h) .

[Kipnis & Varadhan, 1986]

Page 186: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Metropolis-Hastings Algorithm

Motivation and leading example

Random variable generation

Monte Carlo Integration

Notions on Markov Chains for MCMC

The Metropolis-Hastings AlgorithmMonte Carlo Methods based on Markov ChainsThe Metropolis-Hastings AlgorithmThe Independent Metropolis-Hastings AlgorithmThe Random Walk Metropolis-Hastings AlgorithmExtensions

The Gibbs Sampler

Case Study

Page 187: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

Monte Carlo Methods based on Markov Chains

Running Monte Carlo via Markov Chains

It is not necessary to use a iid. sample from the distribution f toapproximate the integral

I =∫

h(x)f(x)dx ,

One can approximate I from a simulation of an ergodic Markovchain with stationary distribution f .

Page 188: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

Monte Carlo Methods based on Markov Chains

Running Monte Carlo via Markov Chains (2)

Idea

For an arbitrary starting value x(0), an ergodic chain (X(t)) isgenerated using a transition kernel with stationary distribution f

I Insures the convergence in distribution of (X(t)) to a randomvariable from f .

I For a “large enough” T0, X(T0) can be considered asdistributed from f

I Produce a dependent sample X(t), X(t+1), . . ., which can beused to approximate integrals under f .

Problem: How can one build a Markov chain with a givenstationary distribution?

Page 189: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Metropolis-Hastings Algorithm

The Metropolis-Hastings Algorithm

BasicsThe algorithm requires the knowledge of the objective (target)density

f

and uses a conditional density

q(y|x)

called the instrumental (or proposal) distribution

Page 190: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Metropolis-Hastings Algorithm

The MH algorithm

Algorithm (Metropolis-Hastings)

Given X(t),

1. Generate Yt ∼ q(y|X(t)).

2. Take

X(t+1) =

{Yt with prob. ρ(X(t), Yt),X(t) with prob. 1− ρ(X(t), Yt),

where

ρ(x, y) = min{f(y)f(x)

q(x|y)q(y|x)

, 1}.

Page 191: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Metropolis-Hastings Algorithm

Features

I Does not require the knowledge of normalizing constants ofeither f or q(·|x)

I Never move to values for which f(y) = 0I The chain (X(t)) may take the same value several times in a

row, even though f is a density wrt Lebesgue measure

I The sequence (Yt) is usually not a Markov chain

Page 192: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Metropolis-Hastings Algorithm

Convergence properties

1. The MH Markov chain is reversible, with invariant/stationarydensity f since it satisfies the detailed balance condition

f(y)dyK(y, dx) = f(x)dxK(x, dy)

2. As f is a probability measure, the chain is positive recurrent

3. If

P

[f(Yt) q(X(t)|Yt)f(X(t)) q(Yt|X(t))

≥ 1

]< 1 ,

that is, the event {X(t+1) = X(t)} is possible, then the chainis aperiodic

Page 193: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Metropolis-Hastings Algorithm

Convergence properties (2)

4. Ifq(y|x) > 0 for every (x, y) ,

the chain is f-irreducible

5. For MH, f -irreducibility implies Harris recurrence

6. Thus, for MH chains satisfying 3 and 4(i) For h, with Ef |h(X)| <∞,

limT→∞

1T

T∑t=1

h(X(t)) =∫h(x)df(x) Px(0) −a.s.

(ii) and

limn→∞

∥∥∥∥∫ Kn(x(0), ·)− f∥∥∥∥TV

= 0

for every initial starting point x(0).

Page 194: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Independent Metropolis-Hastings Algorithm

The Independent Case

The instrumental distribution q does not depend on X(t), and isdenoted g by analogy with Accept-Reject.

Algorithm (Independent Metropolis-Hastings)

Given X(t),

a Generate Yt ∼ g(y)b Take

X(t+1) =

Yt with prob. min

{f(Yt) g(X(t))f(X(t)) g(Yt)

, 1

},

X(t) otherwise.

Page 195: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Independent Metropolis-Hastings Algorithm

PropertiesThe resulting sample is not iid but there exist strong convergenceproperties:

Theorem (Ergodicity)

The algorithm produces a uniformly ergodic chain if there exists aconstant M such that

f(x) ≤Mg(x) ,

In this case,

‖Kn(x, ·)− f‖TV ≤(

1− 1M

)n.

[Mengersen & Tweedie, 1996]

Page 196: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Independent Metropolis-Hastings Algorithm

Example (Noisy AR(1))

Hidden Markov chain from a regular AR(1) model,

xt+1 = ϕxt + εt+1 εt ∼ N (0, τ2)

and observablesyt|xt ∼ N (x2

t , σ2)

The distribution of xt given xt−1, xt+1 and yt is

exp−12τ2

{(xt − ϕxt−1)2 + (xt+1 − ϕxt)2 +

τ2

σ2(yt − x2

t )2

}.

Page 197: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Independent Metropolis-Hastings Algorithm

Example (Noisy AR(1) Contd.)

Use for proposal the N (µt, ω2t ) distribution, with

µt = ϕxt−1 + xt+1

1 + ϕ2and ω2

t =τ2

1 + ϕ2.

Ratioπ(x)/qind(x) = exp−(yt − x2

t )2/2σ2

is bounded

Page 198: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Independent Metropolis-Hastings Algorithm

(top) Last 500 samples of the chain (X(t)) run for 10, 000iterations; (bottom) histogram of the chain, compared withthe target distribution.

Page 199: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Independent Metropolis-Hastings Algorithm

Example (Cauchy by normal)

go random W Given a Cauchy C (0, 1) distribution, consider a normalN (0, 1) proposalThe Metropolis–Hastings acceptance ratio is

π(ξ′)/ν(ξ′)π(ξ)/ν(ξ))

= exp[{ξ2 − (ξ′)2

}/2] 1 + (ξ′)2

(1 + ξ2).

Poor perfomances: the proposal distribution has lighter tails thanthe target Cauchy and convergence to the stationary distribution isnot even geometric!

[Mengersen & Tweedie, 1996]

Page 200: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Independent Metropolis-Hastings AlgorithmDe

nsity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

Histogram of Markov chain(ξ(t))1≤t≤5000 against targetC (0, 1) distribution.

0 1000 2000 3000 4000 5000

−3−2

−10

12

3iterations

Range and average of 1000parallel runs when initializedwith a normal N (0, 1002)distribution.

Page 201: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Independent Metropolis-Hastings Algorithm

Performance of the Algorithm

Average Acceptance Rate

If f(x) ≤Mg(x), the average acceptance rate (under stationarity)satisfies

ρ = Ef⊗g

[min

{f(Y ) g(X)f(X) g(Y )

, 1}]

= 2 Pf⊗g

(f(Y )g(Y )

≥ f(X)g(X)

)≥ 2

MPf⊗f

(f(Y )g(Y )

≥ f(X)g(X)

)=

1M

.

Hence, maximizing the acceptance rate is related to theoptimization of the speed of convergence in the ergodic theorem.

When different proposals g are available, the one that leads to thehighest acceptance rate should be used.

Page 202: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Independent Metropolis-Hastings Algorithm

Practical approachChoose a parameterized instrumental distribution g(·|θ) and adjustthe corresponding parameters θ based on the empirical estimate ofthe average acceptance rate computed on pilot runs∗∗

ρ(θ) =2m

m∑t=1

I{f(Yt)g(X(t)|θ)>f(X(t))g(Yt|θ)} .

∗∗Trying to modify θ continuously as the algorithm runs would result in anadaptive MCMC approach, whose convergence is harderto establish (see latter).

Page 203: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

Random walk Metropolis–Hastings

Use of a local perturbation as proposal

Yt = X(t) + εt,

where εt ∼ g is independent of X(t).

The instrumental density is now of the form g(y − x) and theMarkov chain is a random walk if we take g to be symmetricg(x) = g(−x).

Page 204: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

Algorithm (Random walk Metropolis-Hastings)

Given X(t)

1. Generate Yt ∼ g(y −X(t))2. Take

X(t+1) =

Yt with prob. min{

1,f(Yt)f(X(t))

},

X(t) otherwise.

Page 205: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

Example (Random walk and normal target)

forget History! Generate N (0, 1) based on the uniform proposal [−δ, δ][Hastings (1970)]

The probability of acceptance is then

ρ(x(t), yt) = exp{(x(t)2 − y2t )/2} ∧ 1.

Page 206: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

-1 0 1 2

050

100

150

200

250

(a)

-1.5

-1.0

-0.5

0.0

0.5

-2 0 2

010

020

030

040

0

(b)

-1.5

-1.0

-0.5

0.0

0.5

-3 -2 -1 0 1 2 3

010

020

030

040

0

(c)

-1.5

-1.0

-0.5

0.0

0.5

Three samples based on U [−δ, δ] with (a) δ = 0.1, (b) δ = 0.5and (c) δ = 1.0, superimposed with the convergence of themeans (15, 000 simulations).

Page 207: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

Example (Mixture models (again!))

π(θ|x) ∝n∏j=1

(k∑`=1

p`f(xj |µ`, σ`)

)π(θ)

Metropolis-Hastings proposal:

θ(t+1) ={θ(t) + ωε(t) if u(t) < ρ(t)

θ(t) otherwise

where

ρ(t) =π(θ(t) + ωε(t)|x)

π(θ(t)|x)∧ 1

and ω scaled for good acceptance rate

Page 208: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

p

thet

a

0.0 0.2 0.4 0.6 0.8 1.0

-10

12

tau

thet

a

0.2 0.4 0.6 0.8 1.0 1.2

-10

12

p

tau

0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

1.2

-1 0 1 2

0.0

1.0

2.0

theta

0.2 0.4 0.6 0.8

02

4

tau

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

6

p

Random walk sampling (50000 iterations)

Gase of a 3 component normal mixture[Celeux & al., 2000]

Page 209: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

Convergence properties

Uniform ergodicity prohibited by random walk structureAt best, geometric ergodicity:

Theorem (Sufficient ergodicity)

For a symmetric density f , log-concave in the tails, and a positiveand symmetric density g, the chain (X(t)) is geometrically ergodic.

[Mengersen & Tweedie, 1996]

no tail effect

Page 210: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

Example (Cauchy by normal continued)

Again, Cauchy C (0, 1) target and Gaussian random walk proposal,ξ′ ∼ N (ξ, σ2), with acceptance probability

1 + ξ2

1 + (ξ′)2∧ 1 ,

Overall fit of the Cauchy density by the histogram satisfactory, butpoor exploration of the tails: 99% quantile of C (0, 1) equal to 3,but no simulation exceeds 14 out of 10, 000!

[Roberts & Tweedie, 2004]

Page 211: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

Again, lack of geometric ergodicity![Mengersen & Tweedie, 1996]

Slow convergence shown by the non-stable range after 10, 000iterations.

Density

−5 0 5

0.000.05

0.100.15

0.200.25

0.300.35

Histogram of the 10, 000 first steps of a random walkMetropolis–Hastings algorithm using a N (ξ, 1) proposal

Page 212: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

0 2000 4000 6000 8000 10000

−100

−500

50100

iterations

Range of 500 parallel runs for the same setup

Page 213: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

Further convergence properties

Under assumptions skip detailed convergence

I (A1) f is super-exponential, i.e. it is positive with positivecontinuous first derivative such thatlim|x|→∞ n(x)′∇ log f(x) = −∞ where n(x) := x/|x|.In words : exponential decay of f in every direction with ratetending to ∞

I (A2) lim sup|x|→∞ n(x)′m(x) < 0, wherem(x) = ∇f(x)/|∇f(x)|.In words: non degeneracy of the countour manifoldCf(y) = {y : f(y) = f(x)}

Q is geometrically ergodic, andV (x) ∝ f(x)−1/2 verifies the drift condition

[Jarner & Hansen, 2000]

Page 214: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

How to Chose the Scale of the Proposal in RW-MH?Try yourself at http://www.lbreyer.com/classic.html

From (Roberts & Rosenthal, 2001)

Page 215: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

Performance Tradeoff

A high acceptance rate does not indicate that the algorithm ismoving correctly since it indicates that the random walk is movingtoo slowly on the surface of f .

If x(t) and yt are close, i.e. f(x(t)) ' f(yt) y is accepted withprobability

min(f(yt)f(x(t))

, 1)' 1 .

For multimodal densities with well separated modes, the negativeeffect of limited moves on the surface of f clearly shows.

Page 216: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

Performance Tradeoff (2)

If the average acceptance rate is very low, the algorithm is oftenproposing values of Yt for which f(Yt) is small compared tof(X(t)): the random walk tries to move too quickly and theresulting MH sampler is often “stuck”.

Page 217: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

Example (Noisy AR(1) continued)

For a Gaussian random walk with scale ω small enough, therandom walk never jumps to the other mode. But if the scale ω issufficiently large, the Markov chain explores both modes and give asatisfactory approximation of the target distribution.

Page 218: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

Markov chain based on a random walk with scale ω = .1.

Page 219: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

Markov chain based on a random walk with scale ω = .5.

Page 220: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

How Does the Method Scale in Large Dimension?

The Scaling Construction (Roberts et al., 1997-2001)

Assume a d-dimensional target of product formf(d)(x1, . . . , xd) =

∏dk=1 f(xk) and an isotropic (d-dimensional)

proposal Gaussian distribution with variance σ2(d). Now let the

dimension d tends to infinity

I the acceptance rate does not degenerate only if σ2(d) = O(1/d)

And for a coordinate projection

I the optimal choice of σ2(d) is 2.382/d× varπ(X), which

correspond to an acceptance rate of 23%

I the asymptotic variance of the ergodic average isαd× varπ(X) (for a Gaussian target, α ≈ 3.7)

Page 221: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

RW-MH Scaling Example

2 5 10 20

First 1 000 samples of the RW-MH algorithm started from mode for a multivariate Gaussian distribution withoptimal proposal scaling (arbitrary 2D marginal for increasing values of d)

Page 222: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

The Geometry of the Target is Also Very Important

2D Gaussian RW-MH (Gaussian target) with accept. rate 50%(left: σ2

prop = 0.04 I2; right: σ2prop = 1.4 Σπ)

Number of Iterations 1, 2, 3, 4, 5, 10, 25, 50, 100, 500

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

Page 223: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

The Random Walk Metropolis-Hastings Algorithm

Rule of thumb

Aim at an average acceptance rate of 25% when using therandom-walk MH algorithm.

This rule is to be taken with a pinch of salt!

Page 224: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

Extensions

Extensions

There are many other families of MH algorithms

◦ Adaptive Rejection Metropolis Sampling

◦ Reversible Jump (later!)

◦ Langevin algorithms

to name just a few...

Page 225: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

Extensions

Langevin Algorithms

Proposal based on the Langevin diffusion Lt is defined by thestochastic differential equation

dLt = dBt +12∇ log f(Lt)dt,

where Bt is the standard Brownian motion

Theorem

The Langevin diffusion is the only non-explosive diffusion which isreversible with respect to f .

Page 226: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

Extensions

Discretization

Instead, consider the sequence

x(t+1) = x(t) +σ2

2∇ log f(x(t)) + σεt, εt ∼ Np(0, Ip)

where σ2 corresponds to the discretization step

Unfortunately, the discretized chain may be be transient, forinstance when

limx→±∞

∣∣σ2∇ log f(x)|x|−1∣∣ > 1

Page 227: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Metropolis-Hastings Algorithm

Extensions

MH correction

Accept the new value Yt with probability

f(Yt)f(X(t))

·exp

{−∥∥∥Yt − x(t) − σ2

2 ∇ log f(x(t))∥∥∥2/

2σ2

}exp

{−∥∥∥X(t) − Yt − σ2

2 ∇ log f(Yt)∥∥∥2/

2σ2

} ∧ 1 .

Choice of the scale factor σThe scaling approach suggests an acceptance rate of 58% toachieve optimal convergence rates

[Roberts & Rosenthal, 1998]

Page 228: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

The Gibbs Sampler

The Gibbs SamplerThe Hammersley-Clifford theoremGeneral PrinciplesConvergenceData AugmentationThe Slice SamplerRao-BlackwellizationMarginalized Gibbs Sampling

Page 229: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

The Hammersley-Clifford theorem

Hammersley-Clifford theorem

An illustration that conditionals determine the joint distribution

Theorem

If the joint density g(y1, y2) has conditional distributions g1(y1|y2)and g2(y2|y1), then

g(y1, y2) =g2(y2|y1)∫

g2(v|y1)/g1(y1|v) dv.

[Hammersley & Clifford, circa 1970]

Page 230: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

The Hammersley-Clifford theorem

Positivity condition

g(i)(yi) > 0 for every i = 1, · · · , p, implies that g(y1, . . . , yp) > 0,where g(i) denotes the marginal distribution of Yi.

This implies that the support of g is the Cartesian product of thesupports of the marginal and that the conditionals do not restrictthe possible range of Yi.

Page 231: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

The Hammersley-Clifford theorem

General HC decomposition

Under the positivity condition, the joint distribution g satisfies

g(y1, . . . , yp) ∝p∏j=1

g`j (y`j |y`1 , . . . , y`j−1, y′`j+1

, . . . , y′`p)

g`j (y′`j|y`1 , . . . , y`j−1

, y′`j+1, . . . , y′`p)

for every permutation ` of {1, 2, . . . , p} and every y′ ∈ Y .

Page 232: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

General Principles

General Principles

A very specific simulation algorithm based on the structure of thetarget distribution f :

1. Uses the conditional densities f1, . . . , fp from f

2. Start with the random variable X = (x1, . . . , xp)3. Simulate from the conditional densities,

Xi|x1, x2, . . . , xi−1, xi+1, . . . , xp

∼ fi(xi|x1, x2, . . . , xi−1, xi+1, . . . , xp)

for i = 1, 2, . . . , p.

Page 233: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

General Principles

Algorithm (Gibbs sampler)

Given X(t) = (x(t)1 , . . . , x

(t)p ), generate

1. X(t+1)1 ∼ f1(x1|x(t)

2 , . . . , x(t)p );

2. X(t+1)2 ∼ f2(x2|x(t+1)

1 , x(t)3 , . . . , x

(t)p ),

. . .

p. X(t+1)p ∼ fp(xp|x(t+1)

1 , . . . , x(t+1)p−1 )

Page 234: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

General Principles

Properties

The full conditionals densities f1, . . . , fp are the only densities usedfor simulation. Thus, even in a high dimensional problem, all ofthe simulations may be univariate

The Gibbs sampler is not reversible with respect to f . However,each of its p coordinate-wise updates is.

Page 235: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

General Principles

Random Scan Gibbs sampler

Modification of the above Gibbs sampler where, with probability1/p, the i-th component is drawn from fi(xi|X−i), ie when thecomponents are chosen at random

Motivation

The Random Scan Gibbs sampler is f -reversible.

Page 236: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

General Principles

Example (Bivariate Gibbs sampler)

For a target of the form

(X,Y ) ∼ f(x, y)

Set X(0) = x(0)

For t = 1, 2, . . . , generate

Y (t) ∼ fY |X(·|X(t−1))

X(t) ∼ fX|Y (·|Y (t))

where fY |X and fX|Y are the conditional distributions

Page 237: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

General Principles

A Very Simple Example: Independent N (µ, σ2)Observations

When Y1, . . . , Yniid∼ N (y|µ, σ2) with both µ and σ unknown, the

posterior in (µ, σ2) does not belong to a standard family (evenwhen choosing conjugate priors)

But...

µ|Y 0:n, σ2 ∼ N

(µ∣∣∣ 1n

∑ni=1 Yi,

σ2

n )

σ2|Y 1:n, µ ∼ IG(σ2∣∣n

2 − 1, 12

∑ni=1(Yi − µ)2

)assuming constant (improper) priors on both µ and σ2

I Hence we may use the Gibbs sampler for simulating from theposterior of (µ, σ2)

Page 238: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

General Principles

R Gibbs Sampler for Gaussian posterior

n = length(Y);S = sum(Y);mu = S/n;for (i in 1:500)

S2 = sum((Y-mu)^2);sigma2 = 1/rgamma(1,n/2-1,S2/2);mu = S/n + sqrt(sigma2/n)*rnorm(1);

Page 239: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

General Principles

Example of Results with, Left n = 10 Observations; Right,n = 100 Observations from the N (0, 1) Distribution

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

µ

σ2

Number of Iterations 1, 2, 3, 4, 5, 10, 25, 50, 100, 500

Page 240: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

General Principles

Limitations of the Gibbs sampler

The Gibbs sampler

1. limits the choice of instrumental distributions

2. requires some knowledge of f

3. does not apply to problems where the number of parametersvaries.

Page 241: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Convergence

Properties of the Gibbs sampler

Theorem (Convergence)

Under[Positivity condition]

I f (i)(yi) > 0 for every i = 1, · · · , p, implies thatf(y1, . . . , yp) > 0, where f (i) denotes the marginaldistribution of Yi under f ,

the chain is irreducible and positive Harris recurrent.

Page 242: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Convergence

Properties of the Gibbs sampler (2)

Consequences

(i) If∫h(y)f(y)dy <∞, then

limnT→∞

1T

T∑t=1

h(Y (t)) =∫h(y)f(y)dy a.s.

(ii) If, in addition, (Y (t)) is aperiodic, then

limn→∞

∥∥∥∥∫ Kn(y(0), ·)− f∥∥∥∥TV

= 0

for every initial starting point y(0).

Page 243: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Data Augmentation

Latent variables are back

The Gibbs sampler is particularly adapted for cases where thetarget probability is defined on an augmented space.

A density g is a completion of f if∫Zg(x, z) dz = f(x)

Note

The variable z may be meaningless for the problem

Page 244: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Data Augmentation

Purpose

g should have full conditionals that are easy to simulate for aGibbs sampler to be implemented with g rather than f

For p > 1, write y = (x, z) and denote the conditional densities ofg(y) = g(y1, . . . , yp) by

Y1|y2, . . . , yp ∼ g1(y1|y2, . . . , yp),Y2|y1, y3, . . . , yp ∼ g2(y2|y1, y3, . . . , yp),

. . . ,

Yp|y1, . . . , yp−1 ∼ gp(yp|y1, . . . , yp−1).

Page 245: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Data Augmentation

The move from Y (t) to Y (t+1) is defined as follows:

Algorithm (Completion Gibbs sampler)

Given Y (t) = (y(t)1 , . . . , y

(t)p ), simulate

1. Y(t+1)

1 ∼ g1(y1|y(t)2 , . . . , y

(t)p ),

2. Y(t+1)

2 ∼ g2(y2|y(t+1)1 , y

(t)3 , . . . , y

(t)p ),

. . .

p. Y(t+1)p ∼ gp(yp|y(t+1)

1 , . . . , y(t+1)p−1 ).

Page 246: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Data Augmentation

Example (Mixtures all over again)

Hierarchical missing data structure:If

X1, . . . , Xn ∼k∑i=1

pif(x|θi),

then

X|Z ∼ f(x|θZ), Z ∼ p1I(z = 1) + . . .+ pkI(z = k),

Z is the component indicator associated with observation x

Page 247: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Data Augmentation

Example (Mixtures (2))

Conditionally on (Z1, . . . , Zn) = (z1, . . . , zn) :

π(p1, . . . , pk, θ1, . . . , θk|x1, . . . , xn, z1, . . . , zn)∝ pα1+n1−1

1 . . . pαk+nk−1k

×π(θ1|y1 + n1x1, λ1 + n1) . . . π(θk|yk + nkxk, λk + nk),

withni =

∑j

I(zj = i) and xi =∑j; zj=i

xj/ni.

Page 248: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Data Augmentation

Algorithm (Mixture Gibbs sampler)

1. Simulate

θi ∼ π(θi|yi + nixi, λi + ni) (i = 1, . . . , k)(p1, . . . , pk) ∼ D(α1 + n1, . . . , αk + nk)

2. Simulate (j = 1, . . . , n)

Zj |xj , p1, . . . , pk, θ1, . . . , θk ∼k∑i=1

pijI(zj = i)

with (i = 1, . . . , k)pij ∝ pif(xj |θi)

and update ni and xi (i = 1, . . . , k).

Page 249: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

The Slice Sampler

Recall the Fundamental Observation that Underpins theAccept-Reject Method

Lemma

Simulating

X ∼ f(x)

is equivalent to simulating

(X,U) ∼ U{(x, u) : 0 < u < f(x)}0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

xf(

x)

Page 250: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

The Slice Sampler

Slice sampler as generic Gibbs

Algorithm (Slice sampler)

Simulate

1. U (t+1) ∼ U[0,f(X(t))];

2. X(t+1) ∼ UA(t+1) , with

A(t+1) = {x : f(x) ≥ U (t+1)} .

Page 251: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

The Slice Sampler

Example of results with a truncated N (−3, 1) distribution

0.0 0.2 0.4 0.6 0.8 1.0

0.00

00.

002

0.00

40.

006

0.00

80.

010

x

y

Number of Iterations 2, 3, 4, 5, 10, 50, 100

Page 252: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

The Slice Sampler

Example (Stochastic volatility conditional distribution)

In the stochastic volatility model, the single-site conditionaldistribution is of the form

π(x) ∝ exp−{σ2(x− µ)2 + β2 exp(−x)y2 + x

}/2 ,

simplified in exp−{x2 + α exp(−x)

}Slice sampling means simulation from a uniform distribution on

A ={x : exp−

{x2 + α exp(−x)

}/2 ≥ u

}=

{x : x2 + α exp(−x) ≤ ω

}if we set ω = −2 log u.

Note Inversion of x2 + α exp(−x) = ω needs to be donenumerically (eg., using Newton’s algorithm).

Page 253: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

The Slice Sampler

0 10 20 30 40 50 60 70 80 90 100−0.1

−0.05

0

0.05

0.1

Lag

Cor

rela

tion

−1 −0.5 0 0.5 1 1.5 2 2.5 3 3.50

0.2

0.4

0.6

0.8

1D

ensi

ty

Histogram of a Markov chain produced by a slice samplerand target distribution in overlay.

Page 254: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

The Slice Sampler

Convergence of the Slice Sampler

Theorem (Uniform ergodicity)

If f is bounded and suppf is bounded, the simple slice sampler isuniformly ergodic.

[Mira & Tierney, 1997]

Page 255: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

The Slice Sampler

Good slices

The slice sampler usually enjoys very good theoretical properties(geometric ergodicity and even uniform ergodicity) in onedimensional cases.

As the dimension increases, the practical determination of the setA(t+1) may get increasingly complex (and the performanceguarantees of the method are usually not as strong).

Page 256: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Rao-Blackwellization

Rao-Blackwellization

If (Y1, Y2, . . . , Yp)(t), t = 1, 2, . . . T is the output from a Gibbssampler

δ0 =1T

T∑t=1

h(Y

(t)1

)→∫h(y1)f (1)(y1)dy1 .

The Rao-Blackwellization replaces δ0 with its conditionalexpectation

δrb =1T

T∑t=1

Ef[h(Y1)

∣∣∣Y (t)2 , . . . , Y (t)

p

].

Page 257: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Rao-Blackwellization

Rao-Blackwellization (2)

Then

◦ Both estimators converge to E[h(Y1)]◦ The Rao-Blackwellized estimator is unbiased,

◦ and

var(

E[h(Y1)|Y (t)

2 , . . . , Y (t)p

])≤ var(h(Y1)) .

Page 258: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Rao-Blackwellization

Examples of Rao-Blackwellization

Example

Bivariate normal Gibbs sampler

X | y ∼ N (ρy, 1− ρ2)Y | x ∼ N (ρx, 1− ρ2).

Then

δ0 =1T

T∑i=1

X(i) and δ1 =1T

T∑i=1

E[X(i)|Y (i)] =1T

T∑i=1

ρY (i),

estimate E[X] and σ2δ0/σ2

δ1= 1

ρ2> 1.

Page 259: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Rao-Blackwellization

Examples of Rao-Blackwellization (2)

Example (Grouped counting data)

360 consecutive records of the number of passages per unit time

Number ofpassages 0 1 2 3 4 or more

Number ofobservations 139 128 55 25 13

Page 260: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Rao-Blackwellization

Example (Grouped counting data (2))

Particularity Observations with 4 passages and more are grouped

If observations are Poisson P(λ), the likelihood is

`(x1, . . . , x5|λ)

∝ e−347λλ128+55×2+25×3

(1− e−λ

3∑i=0

λi

i!

)13

,

which can be difficult to work with.

Idea With a prior π(λ) = 1/λ, complete the vector (y1, . . . , y13) ofthe 13 units larger than 4

Page 261: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Rao-Blackwellization

Algorithm (Poisson-Gamma Gibbs)

a Simulate Y(t)i ∼P(λ(t−1)) Iy≥4 i = 1, . . . , 13

b Simulate

λ(t) ∼ Ga

(313 +

13∑i=1

y(t)i , 360

).

The Bayes estimator

δπ =1

360T

T∑t=1

(313 +

13∑i=1

y(t)i

)

converges quite rapidly

0 100 200 300 400 500

1.02

11.

022

1.02

31.

024

1.02

5

0.9 1.0 1.1 1.2

010

2030

40

lambda

Page 262: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Rao-Blackwellization

Rao-Blackwellization in Poisson-Gamma Gibbs

Naıve estimate

δ0 =1T

T∑t=1

λ(t)

and Rao-Blackwellized version

δπ =1T

T∑t=1

E[λ(t)|x1, x2, . . . , x5, y(i)1 , y

(i)2 , . . . , y

(i)13 ]

=1

360T

T∑t=1

(313 +

13∑i=1

y(t)i

).

Page 263: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Rao-Blackwellization

Rao-Blackwellization for “Non-Parametric” Estimation

Another substantial benefit of Rao-Blackwellization is in theapproximation of densities of different components of y withoutnonparametric density estimation methods.

Lemma

The estimator

1T

T∑t=1

fi(yi|Y (t)j , j 6= i) −→ f (i)(yi) .

Page 264: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Marginalized Gibbs Sampling

Another option is to take profit of Rao-Blackwellizationduring the simulations

Resulting in marginalized (or collapsed) Gibbs samplers

In a Bayesian latent variable model, a typical scheme for the Gibbssampler is to alternate{

x|y, θθ|x, y

wherey observationsx latent variablesθ parameters

For an exponential family complete-data model with conjugateprior, the pdf f(θ|x, y) is available in closed-form, thus allowing forRao-Blackwellization.

Page 265: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Marginalized Gibbs Sampling

Exponential Family Distributions

`(y|θ) = h(y) exp [〈s(y), ψ(θ)〉 −B(θ)]

where s(y) are the sufficient statistics

If ψ is an invertible mapping it is possible through thereparameterization η = ψ(θ) to rewrite the above in canonical (ornatural) form

`(y|η) = h(y) exp [〈s(y), η〉 −A(η)]

and A is called the log-partition function

Page 266: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Marginalized Gibbs Sampling

Conjugacy in Exponential Families

The conjugate distribution for

`(y|θ) = h(y) exp [〈s(y), ψ(θ)〉 −B(θ)]

isπ(θ|µ0, λ0) = Z−1(µ0, λ0) exp [〈µ0, ψ(θ)〉 − λ0B(θ)]

where µ0 has the same dimension as s(y) and λ0 ∈ R+

Page 267: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Marginalized Gibbs Sampling

The Bayesian Posterior

Assuming a complete-data exponential family model for (yi, xi)and conjugate prior.

After seeing n independent observations y = (y1, . . . , yn), the jointposterior is

f(x, θ|y) ∝ h(x, y) exp [〈s(x, y), ψ(θ)〉 − nB(θ)]

Z−1(µ0, λ0) exp [〈µ0, ψ(θ)〉 − λ0B(θ)]

Page 268: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Marginalized Gibbs Sampling

Hence, the full conditional is given by

f(θ|x, y) = Z−1(s(x, y) + µ0, n+ λ0)exp [〈s(x, y) + µ0, ψ(θ)〉 − (n+ λ0)B(θ)]

Thus a Rao-Blackwellized estimator of the posterior mean of θ,can be computed as

1m

m∑j=1

E(θ|X(j), Y )

Page 269: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Marginalized Gibbs Sampling

But as the normalizing constant Z is known explicitly, it is alsopossible to integrate out θ to obtain a closed-form expression ofthe marginal posterior f(x|y):

π(x|y) ∝ h(x, y)Z(s(x, y) + µ0, n+ λ0)

Z(µ0, λ0)

The resulting marginal posterior is usually complex but amenableto single site Gibbs sampling on the components ofx = (x1, . . . , xn), especially when these are discrete variables

This is the preferred sampling method for the Latent DirichletAllocation (LDA) model and related models.

Page 270: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Marginalized Gibbs Sampling

Mixture of Gaussian ExampleAssume that we observe Y1, . . . , Yn in the Gaussian mixture model∑r

i=1 αiN (y|θi, vi). To make derivations simpler, α and v aretreated as fixed parameters and we use an improper flat prior onthe vector of means θ

π(xk, θ|yk) ∝ exp

[r∑i=1

(logαi −

(yk − θi)2

2vi

)I{xk = i}

]and hence

π(x1:n, θ|y1:n) ∝ exp

[r∑i=1

logαini −θ2i ni2vi

+θisivi

]where {

ni =∑n

k=1 I{xk = i}si =

∑nk=1 ykI{xk = i}

Page 271: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Marginalized Gibbs Sampling

Upon completing the square,

f(x1:n, θ|y1:n) ∝ exp

[r∑i=1

logαini +s2i

2nivi

]exp

[r∑i=1

− 12vi/ni

(θi −

sini

)2]

and

f(x1:n|y1:n) ∝r∏i=1

αnii

√vini

exp[

s2i

2nivi

]Finally,

Pf (Xk = i|x−k, y1:n) ∝ αni,−k+1i

√vi

ni,−k + 1exp

[(si,−k + yk)2

2(ni,−k + 1)vi

]where {

ni,−k =∑j 6=k I{xj = i} = ni − I{xk = i}

si,−k =∑j 6=k yjI{xj = i} = si − ykI{xk = i}

Page 272: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

The Gibbs Sampler

Marginalized Gibbs Sampling

To run the collapsed (or marginalized) single site Gibbssampler in this example

I Repeatedly simulate from the conditionals f(xk|x−k, y1:n)I keeping track of the accumulated component statistics

(ni, si)1≤i≤r

The idea can be extended to mixture models with an unknownnumber of components

Page 273: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

In this last course, we consider a very simple example and discussthe application of the methods discussed so far

Case StudyIndependent Sampler(Almost) Random-Walk SamplerGibbs Sampling(Baby) Reversible Jump

Page 274: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

Contaminated Counts

Assume we have observed a vector y1, . . . , yn of count data whichwe want to model as a mixture of Poisson distributions

L(yi|α, λ) = (1− α)λyi0

yi!e−λ0 + α

λyi

yi!e−λ

where λ0 is known

In the following, we assume wlog. that λ0 = 1 and the the prior on(α, λ) is chosen as

π(α, λ) = e−λ

Page 275: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

Independent Sampler

Independent Sampler

I (α∗, λ∗) ∼ U [0, 1]⊗ E xp ;

I `∗ =∑n

i=1 log{

(1− α∗) + α∗λyi∗ e−(λ∗−1)

};

I u ∼ U [0, 1] ;

I If u < exp(`∗ − `) ,

α← α∗ ;λ← λ∗ ;`← `∗ ;

Page 276: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

Independent Sampler

A Small-Scale Exampley = (1, 1, 0, 3, 0, 5)

0 1 2 3 4 50

0.2

0.4

0.6

0.8

1

0 2 4 6 8 100

500

1000

1500

2000

λ

coun

ts

mean 1.8, std. 1.1

0

0.2

0.4

0.6

0.8

1

0200400600800

α

counts

mean 0.45, std. 0.27

0 5 10 15 20−0.2

0

0.2

0.4

0.6

0.8

1

1.2

lags

auto

corr

.

accept. rate (%) 46 (α), 46 (λ)

αλ

Page 277: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

Independent Sampler

A Larger Exampley consisting of 100 values (1/6 0’s, 1/3 1’s, 1/3 2’s, 1/6 3’s)

1 1.5 2 2.50.2

0.4

0.6

0.8

1

1 1.5 2 2.50

500

1000

1500

2000

λ

coun

ts

mean 1.6, std. 0.18

0

0.2

0.4

0.6

0.8

1

05001000150020002500

α

counts

mean 0.8, std. 0.16

0 5 10 15 200

0.2

0.4

0.6

0.8

1

lags

auto

corr

.

accept. rate (%) 4.8 (α), 4.8 (λ)

αλ

Page 278: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

(Almost) Random-Walk Sampler

Random-Walk Sampler

I v ∼ N (0, I2) ;

I (α∗, λ∗)← (α, λ) + sv ;I If (α∗, λ∗) ∈ [0, 1]× R+ ,

I `∗ = −λ∗ +∑ni=1 log

{(1− α∗) + α∗λ

yi∗ e−(λ∗−1)

};

I u ∼ U [0, 1] ;I If u < exp(`∗ − `) Φ((1−α)/s)−Φ(−α/s)

Φ((1−α∗)/s)−Φ(−α∗/s)1−Φ(−λ/s)1−Φ(−λ∗/s) ,

α← α∗ ;λ← λ∗ ;`← `∗ ;

Φ denotes the Gaussian cdf.

Page 279: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

(Almost) Random-Walk Sampler

6 Observation Case

0 2 4 60

0.2

0.4

0.6

0.8

1

0 2 4 6 80

500

1000

1500

2000

λ

coun

ts

mean 1.7, std. 1.1

0

0.2

0.4

0.6

0.8

1

0200400600800

α

counts

mean 0.45, std. 0.27

0 5 10 15 200

0.2

0.4

0.6

0.8

1

lags

auto

corr

.

accept. rate (%) 22 (α), 22 (λ)

αλ

Figure: With s tuned to 1

Page 280: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

(Almost) Random-Walk Sampler

100 Observation Case

1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

1 1.5 2 2.5 30

500

1000

1500

2000

2500

λ

coun

ts

mean 1.6, std. 0.16

0

0.2

0.4

0.6

0.8

1

05001000150020002500

α

counts

mean 0.84, std. 0.14

0 5 10 15 200

0.2

0.4

0.6

0.8

1

lags

auto

corr

.

accept. rate (%) 21 (α), 21 (λ)

αλ

Figure: With s tuned to 0.3

Page 281: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

Gibbs Sampling

Gibbs Sampling With Data Augmentation

I For i = 1, . . . , n ,

ui ∼ U [0, 1] ;

xi ← I

{ui <

αλyie−(λ−1)

(1− α) + αλyie−(λ−1)

};

I c =∑n

i=1 I(xi = 1) ;

I s =∑n

i=1 yiI(xi = 1) ;

I α ∼ Beta(c+ 1, n− c+ 1) ;

I λ ∼ G amma(s+ 1, c+ 1) ;

Page 282: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

Gibbs Sampling

6 Observation Case

0 1 2 3 4

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 60

500

1000

1500

2000

λ

coun

ts

mean 1.6, std. 0.49

0

0.2

0.4

0.6

0.8

1

0100020003000

α

counts

mean 0.87, std. 0.11

0 5 10 15 20−0.2

0

0.2

0.4

0.6

0.8

1

1.2

lags

auto

corr

.

accept. rate (%) 100 (α), 100 (λ)

αλ

Rao-Blacwellization could be used for estimating the marginal distributions

Page 283: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

Gibbs Sampling

100 Observation Case

1.4 1.6 1.8 2 2.20.94

0.95

0.96

0.97

0.98

0.99

1

1 1.5 2 2.50

500

1000

1500

2000

λ

coun

ts

mean 1.5, std. 0.12

0.9

0.92

0.94

0.96

0.98

1

01000200030004000

α

counts

mean 0.99, std. 0.0097

0 5 10 15 20−0.2

0

0.2

0.4

0.6

0.8

1

1.2

lags

auto

corr

.

accept. rate (%) 100 (α), 100 (λ)

αλ

Page 284: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

Gibbs Sampling

Collapsed Gibbs Sampler

I For i = 1, . . . , n ,

c−i = c− I(xi = 1) ;s−i = s− yiI(xi = 1) ;ui ∼ U [0, 1] ;

xi ← I{ui <

A1

A1 +A0

},

where A1=(c−i+1)e(s−i+yi)!s−i!

“c−i+1

c−i+2

”(s−i+1)“1

c−i+2

”yi,

and A0=(n−c−i) ;

c← c−i + I(xi = 1) ;s← s−i + yiI(xi = 1) ;

Page 285: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

(Baby) Reversible Jump

Gibbs Output on a 100 Observation Poisson DatasetSimulated from λ0 = 1

0.8 1 1.2 1.40.95

0.96

0.97

0.98

0.99

1

0.8 1 1.2 1.40

500

1000

1500

λ

coun

ts

mean 0.97, std. 0.098

0.88

0.9

0.92

0.94

0.96

0.98

1

010002000300040005000

α

counts

mean 0.99, std. 0.0097

0 5 10 15 20−0.2

0

0.2

0.4

0.6

0.8

1

1.2

lags

auto

corr

.

accept. rate (%) 100 (α), 100 (λ)

αλ

Page 286: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

(Baby) Reversible Jump

We reformulate the model to allow for model comparison by

I Giving a prior π1 to the model M1 which postulates that

L(yi|α, λ) = (1− α)λyi0

yi!e−λ0 + α

λyi

yi!e−λ

I Giving a prior pi0 = (1− π1) to the baseline model M0

L0(yi) =λyi0

yi!e−λ0

Under M1, we keep the same prior for α and λ.

We further introduce a latent variable k ∈ {0, 1} which indicatesthe (unobservable) model in (M0,M1) that is assumed togenerate the data.

Page 287: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

(Baby) Reversible Jump

The Posterior

The posterior f is now defined on {0} ∪ {1}× [0, 1]×R+ and suchthat

f(k, α, β|y) ∝ π0δ0(k)L0(y) + π1δ1(k)L(y|α, λ)1λ0e−λ/λ0

The Bayes factor for model M1 compared to model M0 is given by∫α∈[0,1]

∫λ∈R+

L(y|α, λ) 1λ0e−λ/λ0dαdλ

L0(y)

Instead of trying to compute the Bayes factor directly theReversible Jump Metropolis-Hastings algorithm is a sampler thattargets f on {0} ∪ {1} × [0, 1]× R+

Page 288: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

(Baby) Reversible Jump

I If k = 0 ,

# Birth moveI (α∗, λ∗) ∼ U [0, 1]⊗ E xp(λ0) ;I u ∼ U [0, 1] ;I If u < π1L(y|α∗,λ∗)

π0L0(y) p ,

k ← 1 ; α← α∗ ; λ← λ∗ ;

I Else If k = 1 ,I u ∼ U [0, 1] ;I If u < p ,

# Death moveI v ∼ U [0, 1] ;I If v < π0L0(y)

π1L(y|α∗,λ∗)p,

k ← 0 ;

I Else ,

# Fixed dimension move (. . . )

Page 289: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

(Baby) Reversible Jump

In the following, we assume that λ0 = 1 and π1 = 1/2 andconsider the same datasets as before.

For the 100 Observation Poisson case

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

0

1

k

The estimated posterior probability for M0 is 0.79

Page 290: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

(Baby) Reversible Jump

100 Observation Poisson CaseResults conditional to k = 1

0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

1

0 1 2 3 4 50

200

400

600

800

λ

coun

ts

mean 1.1, std. 0.43

0

0.2

0.4

0.6

0.8

1

0200400600

α

counts

mean 0.49, std. 0.37

0 5 10 15 20−0.2

0

0.2

0.4

0.6

0.8

1

1.2

lags

auto

corr

.

αλ

Page 291: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

(Baby) Reversible Jump

For the other 100 observation dataset the estimated posteriorprobability in favor of M1 is 1

Page 292: Markov Chain Monte Carlo Methodscappe/fr/Enseignement/mido-tsi.pdfMarkov Chain Monte Carlo Methods Motivation and leading example Bayesian troubles Conjugate Prior Conjugacy Given

Markov Chain Monte Carlo Methods

Case Study

(Baby) Reversible Jump

In the case of y = (1, 1, 0, 3, 0, 5), the posterior probability in favorof M1 is estimated at 0.56

0 1 2 3 4 50

0.2

0.4

0.6

0.8

1

0 2 4 6 8 100

200

400

600

800

1000

1200

1400

λ

coun

ts

mean 1.7, std. 0.97

0

0.2

0.4

0.6

0.8

1

0200400600

α

counts

mean 0.59, std. 0.29

0 5 10 15 20−0.2

0

0.2

0.4

0.6

0.8

1

1.2

lags

auto

corr

.

αλ

Figure: Results conditional on k = 1