LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1....

58
LTCC: Advanced Computational Methods in Statistics Introduction to some aspects of Monte Carlo N. Kantas Notes at http://wwwf.imperial.ac.uk/~nkantas/notes4ltcc.pdf Slides at http://wwwf.imperial.ac.uk/~nkantas/slides1.pdf

Transcript of LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1....

Page 1: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

LTCC: Advanced Computational Methods inStatistics

Introduction to some aspects of Monte Carlo

N. Kantas

Notes at http://wwwf.imperial.ac.uk/~nkantas/notes4ltcc.pdf

Slides athttp://wwwf.imperial.ac.uk/~nkantas/slides1.pdf

Page 2: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Aims for course

I Aims:I Overview of some simulation based methodsI Understand the basics behind principles of Monte Carlo

MethodsI Motivate further study for theory & applications

I Light assessment:I via coursework

Page 3: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Outline for course

1. Introduction to simulationI basics of Monte CarloI variance reduction methodsI rejection sampling

2. Importance SamplingI some basics, asymptotic variance,I sequential importance sampling

3. Markov Chain Monte Carlo (MCMC)I Metropolis-Hastings, Gibbs samplingI some basics on theory and practice

4. Sequential Monte Carlo (SMC)I particle filtering for state space modelsI sampling for fixed dimensional state spacesI particle MCMC

Page 4: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Outline for lecture 1

I What is Monte Carlo?I using sampling from complex high dimensional distributions to

compute integralsI same example problems

I Some basic approachesI perfect (or naive) Monte CarloI Variance reduction

I control or antithetic variables, conditioning, importancesampling

I Importance samplingI examples as variance reductionI sequential Importance Sampling

Page 5: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Introduction to Monte Carlo

I Consider an arbitrary distribution on X with a density ⇡ w.r.tto dx , such that

⇡(x) =�(x)

Z

and is Z unknown

I Let ' : X �! Rn

x , with ' = sup' < +1I We want to compute

⇡ (') = E⇡['(X )] = h⇡,'i =Z

X

'(x)⇡(dx) (1)

Page 6: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Example 1: Bayesian Statistics

I Bayesian Statistics (assume densities exist)

p(x |y) / p(y |x)p(x)

Ip(x) is known proper prior

Ip(y |x) is likelihood

I ⇡ is posterior

I Here evidence

Z = p(y) =

Zp(y |x)p(x)dx

is very useful to compare models, but is unknown

I Need to approximate both ⇡ and Z

I Simple conjugate example: X ⇠ IG (a, b), Y ⇠ N (0,X ), a, bknown.

Page 7: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Example 2: rare events estimation

I For a distribution p, compute probability of a small/rare tail

p(A) =

Z

A

p(dx)

I Define⇡(dx) / 1

A

(x)p(dx)

I indicator function 1A

(x) =

(1 if x 2 A

0 if x /2 A

acts as likelihood, p

as priorI Normalising const. Z =

RX ⇡(dx) =

RA

p(dx) = p(A)

I Examples:I compute tail of a distributionI simple random walk confinement : X

n

= X

n�1 +W

n

, Wn

iidnoise, X = (X1, . . . ,Xn

), A = (�✏, ✏)n

Page 8: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Examples 3: stochastic filtering

I continuous spaces and X0 ⇠ ⌘✓ (·), Xn

⇠ f✓(·|xn�1),Yn

⇠ g✓(·|xn)I What is the hidden signal X0,X1, . . .? Can perform Bayesian inference

using ⇧n

(· ) = P [X0:n 2 ·|Y0:n] and the marginal likelihoodZ = P(Y1, . . . ,Yn

).

Page 9: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Example 4: self avoiding walk

I Have you played vintage snake game?I Given X

0

let Xn

2 Z2 and consider a standard RW

p(Xn

= x |Xn�1

= y) =

(1

8

, if x � y = 10 otherwise

I Simulate from

⇡(X1

, . . . ,Xn

) / 1x

n 6=x

n�1 6=... 6=x0(X

1

, . . . ,Xn

)p(X1

, . . . ,Xn

)

and compute Z = P(xn

6= x

n�1

6= . . . 6= x

0

)

Page 10: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Introduction to Monte Carlo

I The problem is essentially a numerical integration problemI One could use deterministic numerical integration methods

approximate ⇡ (')

I there some pros and cons (next slide)

I A different direction is to use simulationI take advantage of more computational power available.

I What is Monte Carlo?I Sampling from complex high dimensional distributions to

compute integralsI In examples 1-4 we focus on cases ⇡(x) / G (x)p(x)

Page 11: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Introduction to Monte Carlo

I The problem is essentially a numerical integration problemI One could use deterministic numerical integration methods

approximate ⇡ (')

I there some pros and cons (next slide)

I A different direction is to use simulationI take advantage of more computational power available.

I What is Monte Carlo?I Sampling from complex high dimensional distributions to

compute integralsI In examples 1-4 we focus on cases ⇡(x) / G (x)p(x)

Page 12: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

On determinstic integration methods

I Quadrature, cubature, sigma points..I Underlying principles:

I Use polynomial or other approximations that interpolatefunction to be integrated at certain points

I place points to minimise errorsI exploit symmetries to reduce number of points used

I Interesting numerical analysis for errorsI exploit smoothness or other properties of functions & densities

to be integratedI errors depend on integrand properties

BUT

I hard to be useful in higher dimensions than 2 � 3I still very useful for simple integrals

I often might need to re-apply for different '

Page 13: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Roadmap

I Perfect Monte CarloI variance reduction

I Rejection SamplingI Importance Sampling

Page 14: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Perfect Monte CarloI

IF we can obtain i.i.d. samples

X

i ⇠ ⇡

I by (strong) Law of Large Numbers (LLN) one can use sampleaverage

b⇡(') = 1N

NX

i=1

'(X i ) !N!1 ⇡ (')

I In a way one can view samples forming an approximation of ⇡

b⇡ =1N

NX

i=1

�X

i

,

b⇡(') =Z

X

'(x)b⇡(dx) =1N

NX

i=1

'(X (i)). (2)

Page 15: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Perfect Monte Carlo

I Variance (non-asymptotic) is given by

Var [b⇡(')] = 1N

Var

⇥'(X i )

⇤=

1N

0

@Z

X

'2(x)⇡(dx)� ⇡(')2

1

A

I Note rate of decrease w.r.t N is not dependent on size of XI (dimensionality still important as integrals and ⇡ can depend

implicitly on dimension)

I Problems:I often cannot sample from ⇡I even if this is possible relative variance can still be very high:

I when ' = 1A

where A is a tail with very low probability.I high dimensions

Page 16: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Rare events example using Perfect Monte Carlo

I Consider a continuous distribution P with density p(x)

I We are interested in computing p

⇤ = P(X �) ⇡ 10�9

I Naive Monte Carlo setting:I For i = 1 : N sample i.i.d. x

i ⇠ p (·), then compute

cp

⇤ =1N

NX

i=1

1

x�(xi )

I cp

⇤ consistent, CLTpN(cp⇤ � p

⇤) ! N (0,VarP

[1x�]),

Page 17: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Rare events example using Perfect Monte Carlo

I Variance of estimator �2

cp

⇤ =Var

p

[1x�]

N

= p

⇤�p

⇤2

N

,I Relative error:

RE =

vuutVar"bp

p

#⇡ 1p

p

⇤N

I So would like at least N ⇠ 1011 to get decent estimators -Prohibitively long simulation times

Page 18: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

On unbiasedness and variance reduction

I Note that when we sample i.i.d from X

i ⇠ ⇡, Monte Carloestimates are unbiased

E⇡

"X

i

��X

i

�#=

NX

i=1

E⇡⇥��X

i

�⇤= NE⇡ [� (X )]

I Example: E⇡ [P

i

1X

i<c

] =P

N

i=1 E⇡ [1X

i<c

] = Np(X i < c)

I In fact a simple sample from ⇡ is an unbiased estimate forE⇡[X ]

I for N > 1 use sample average 1N

PN

i=1 Xi to estimate E⇡[X ]

I variance of estimator decreases with rate 1/N

Page 19: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Control variates

I When estimating E⇡[� (X )] there are ways to reduce thevariance

I control variates or antithetic variablesI conditioning or Rao BlackwellisationI Importance SamplingI ....

Page 20: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Control variates

I Let X 1 be an unbiased estimate for E⇡[� (X )]. For any Z suchthat E⇡ [Z ] = 0 and a constant �, then X

1 + �Z is also anunbiased estimator

E⇡⇥X

1 + �Z⇤= E⇡

⇥X

1

⇤+ �E⇡ [Z ] = E⇡[� (X )]

and

Var⇡⇥X

1 + �Z⇤= Var⇡

⇥X

1

⇤+ �2Var⇡ [Z ] + 2�Cov⇡

⇥X

1,Z⇤

Page 21: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Control variates

I In theory one can minimise variance w.r.t to �,

� = �Cov⇡

⇥X

1,Z⇤

Var⇡ [Z ]

to actually get a zero variance estimator!I In practice it is difficult to achieve this, i.e. to find such �,Z

I but can choose Z and tune � numerically and get goodvariance reduction

I Similar ideas appear in antithetic variates or Multi-level MonteCarlo

Page 22: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Rao Blackwell conditioning

I Consider a bivariate distribution ⇡(x , y) = ⇡(x |y)p(y), i.e.Z

⇡(x , dy) = ⇡(x),

and assume one can simulate ⇡(x |y) and p(y).I Then E ['(X )|Y ] is an unbiased estimator for E⇡ ['(X )]

E⇡ ['(X )] = Ep

[E ['(X )|Y ]]

I In addition, we have the variance conditioning identity

Var⇡ ['(X )] � Varp

[E ['(X )|Y ]]

I Procedure: use perfect Monte Carlo from p(y) and then from⇡(x |y)

I conditioning can improve on the variance.

Page 23: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Discussion on Perfect Monte Carlo so far

I Very often perfect Monte Carlo is not possible except forsimple distributions

I e.g. see Examples 1-4I Note that even when it is possible to get direct samples from

⇡, strange test functions ' can result to estimates with veryhigh Monte Carlo variance

I e.g. rare event example above for ' = 1

A

I Variance of estimators are a measure of efficiencyI in some cases indirect sampling can be better

Page 24: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Discussion on Perfect Monte Carlo so far

I There are indirect ways for sampling perfectly from ⇡ usingI rejection samplingI Importance samplingI Markov ChainsI particle systems & methods

Page 25: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Rejection Sampling

I Let ⇡ ⌧ q: ⇡(x) > 0 ) q(x) > 0,I i.e. q has heavier tailsI (can phrase as absolutely continuous requirement if densities do not

exist!)

I Then assume you know M such that for all x :

w(x) =�(x)

q(x)< M

I Accept Reject Procedure:I For i = 1, . . . , L

I Sample X

i ⇠ q

I Sample U

i ⇠ U[0, 1)I

Accept sample, Y = X

i if U i < w(Xi )M

Page 26: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Rejection Sampling

I Procedure generates samples from ⇡!I Conditioning argument

P [Y 2 A] = PX

i 2 A

��U

i <w(X i )

M

=PhX

i 2 A,U i < w(Xi )M

i

PhU

i < w(Xi )M

i

=

RA

R 10 q(x)1

u<w(x)/Mdudx

RX q(x)

⇣Rw(x)/M0 du

⌘dx

=

RA

q(x)w(x)M

dx

RX q(x)w(x)

M

dx

=1M

RA

�(x)dx1M

RX �(x)dx

= ⇡(A)

Page 27: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Rejection Sampling

I Issue is that

PU

i <w(X i )

M

�= E

q

PU

i <w(X i )

M

����Xi

��= . . . =

1M

so method might not be very efficient if M high!I So in practice need M ⇡ 1 i.e. � ⇡ q which is not easyI There are also more advanced rejection methods

I envelopes, adaptive accept-reject,...

Page 28: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Popular Monte Carlo Methods

IImportance Sampling:

I Sample from a proposal q, weight samples according to d�dq

IMarkov Chain Monte Carlo (MCMC):

I Run an ergodic Markov chain with invariant distribution ⇡,I (Metropolis and Ulam 49, Hastings 71, Geman and

Geman 84, Gelfand and Smith 90)

I many approaches:I Metropolis-Hastings, Gibbs Sampling, Metropolis within

Gibbs, Hybrid (or Hamiltonian) Monte Carlo, SimulatedAnnealing, .....

I very interesting theory related to Markov Processes (by G.O.Roberts, J. Rosenthal, L. Tierney and many others)

Page 29: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Popular Monte Carlo Methods

ISequential Monte Carlo (SMC):

I Propagate a swarm of samples through{⇡

n

(x0:n) =�n

(x0:n)Z

n

}nT

, such that x =x0:T and ⇡T

= ⇡.(Hetherington 84, Gordon, Salmond & Smith 93, Liu 98,Doucet 98, Del Moral 04)

I Hybrid approaches also possible:I SMC within MCMC (Particle MCMC, Andrieu et al 10)I MCMC withn SMC (Chopin 01, Gillks and Berzouini )

I very interesting theory related to interacting particle systems,Mean field Integration (e.g Del Moral, Crisan, Douc, Moulinesand many others)

Page 30: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Importance Sampling (IS)

I Let ⇡ ⌧ q

I Then⇡(x) =

w(x)q(x)Rw(x)q(x)dx

withw(x) =

�(x)

q(x)

Iq importance distribution

Iw un-normalised importance weights

I (Recall ⇡(x) = �(x)Z

)

Page 31: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Importance Sampling: known normalising constant

I When Z known, the following Monte Carlo approximation canbe used

b⇡(dx) = 1N

NX

i=1

W

i�X

i

(dx)

where

W

i =w(X i )

Z

I Note

Eq

"NX

i=1

W

i

#= E

q

"NX

i=1

�(X i )

Zq(X i )

#=

NX

i=1

R�(x1)dx1

Z

= N

Page 32: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Importance Sampling: estimating normalising constant

I In other words

Eq

"1N

NX

i=1

w(X i )

#= Z

I So1N

NX

i=1

w(X i )

is an unbiased estimator of Z .

Page 33: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Importance Sampling: self normalising case

I When Z unknown (most interesting cases), the followingMonte Carlo approximation can be used

b⇡(dx) =NX

i=1

W

i�X

i

(dx)

where

W

i =w(X i )

PN

i

0=1

w(X i

0)

such thatP

N

i=1

W

i = 1, so for the integral:

b⇡(') =NX

i=1

W

i'(X i ) (3)

IZ can be estimated as before!

Page 34: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

IS in a very simple exampleI How important is absolute continuity? Consider p = N (0, 1) ,

q = N (0, 2)

−4 −2 0 2 4 60

0.5

1

1.5x 10

−3 p/q(x) vs x, X~q

−4 −2 0 2 4 60

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045q/p(x) vs x, X~p

Figure: Left: q acts as proposal to p, right: p acts as proposal to q

Page 35: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Rare events example: IS as variance reduction method

I We defined the rare event of an i.i.d process as the event{x 2 A}, e.g. A = {x : x �}.

I In principle if we could sample from

q(x) / p(x)1A(x),

and weight using w(x) = p(x)q(x) =

Rp(x 0)1A(x 0)dx 0

1A(x)

I Monte Carlo estimator has variance1

N

⇥Ep

[w1

A

]� Ep

[1A

]2⇤= 0.

IUnrealistic to do perfectly in most cases, but principle holds

I can do better than perfect Monte CarloI Aim is to find q with w(x) < 1 and more “mass” in rare

region A.

Page 36: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Importance Sampling: some asymptotics

I Asymptotically consistent as N ! 1. Asymptotic bias

(b⇡(')� ⇡(')) = � 1N

Z

X

⇡2(x)

q(x)('(x)� ⇡(')) dx

I Central Limit Theorem (CLT) holds:pN(b⇡(')� ⇡(')) ) N

�0,�2

IS

where

�2

IS

=

0

@Z

X

⇡2(x)

q(x)('(x)� ⇡('))2 dx

1

A

Page 37: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Importance Sampling: choosing proposals

I The asymptotic variance of the estimator ⇡(') is minimised by

q(x) =|' (x)|⇡ (x)R

X|' (x)|⇡ (x) dx

.

but this is not very easy in practice!I think of tails & Rare events example

I We are typically interested in the expectations of several testfunctions (e.g. moments or simple functions).

Page 38: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Comments

I Results on with ' useful for understanding what types offunctions will lead to good estimators

I But not useable easily in practice except when interested inspecific test functions

I e.g. rare events caseI In a Bayesian inference context as we are typically interested in

the expectations of several test functionsI e.g. different moments or simple functions for histograms.

Page 39: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Importance Sampling: normalising constant

I Estimate normalising constant Z ,

bZ =

1N

NX

i=1

�(X i )

q(X i )

=

Z�(x)

q(x)bq(dx), with bq(dx) = 1

N

NX

i=1

�X

i (dx)

!

I Variance:

VarhbZ

i=

Z

2

N

✓Z⇡2(x)

q(x)dx � 1

Page 40: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Choosing importance proposals

I We can attempt to select q which minimises eitherI the variance of the importance weights.I the relative variance of ZI in both cases q should be ⇡.

I So one could construct q similar or close to ⇡

I Can use other methods/approximations:I Laplace principle, Gaussian, Saddlepoint approximations etc.

Page 41: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

The effective sample size (ESS)

I We can rescale the variance of the importance weights to givea number in [1,N] to represent the effective number of samples

ESS =N

1 + Varq

[w(X )]

I The higher the ESS the betterI

ESS/N can be interpreted as approximate ratio of MonteCarlo variance of perfect Monte Carlo & IS

I Can be monitored using Monte Carlo approximations:

ESS

N =1

PN

i=1

W

2

i

=

⇣PN

i=1

w(X i )⌘

2

PN

i=1

w(X i )2

Page 42: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Discussion on IS

I It is crucial to find a good q

I cannot be easily automated and requires good understandingof the problem

I Approach will degenerate for high dimensional xI dissimilarity between ⇡ and q increases usually with dimension

(in extreme cases they can even become singular)I this results to very low weights and high weight variance

Page 43: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

More advanced Importance Sampling

I Adaptive ISI iteratively use already obtained samples to improve on

construction of qI e.g. after a few steps change q to minimise some distance

with a smoothed version of ⇡, (Oh & Berger 93)I interesting combinations with MCMC possible, population

Monte Carlo (Iba 01, Douc et. al. 07, Cappe et. al. 12)I Sequential IS

I for high dimensional target distribution, work dimension perdimension

I tempering also possible (Neal 01, Jarzynski 97)

Page 44: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Sequential Importance Sampling (SIS)

I Let say we are interested to do IS for

⇡(x0:T ) =

�(x0:T )

Z

,

I Can we perform IS recursivelyI aka Apply IS sequentially ! Sequential IS

I Define a sequence of distributions {⇡n

(x0:n) =

�n

(x0:n)Z

n

}nT

,with ⇡ = ⇡

T

.I Perform IS at each n using samples (aka particles) from

previous steps

Page 45: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Sequential Importance Sampling (SIS)

I For the target density assume a product factorisation holds

�(x0:n) = �(x

0:n�1

)�n

(xn

|x0:n�1

).

I We want to use previous samples/particles to generate samplesfor sequence

I Construct proposal or instrumental density as

q(x0:n) = q(x

0:n�1

)qn

(xn

|x0:n�1

).

I Then obtain a recursive expression for the IS weight

w(x0:n) = w(x

0:n�1

)�n

(xn

|x0:n�1

)

q

n

(xn

|x0:n�1

).

Page 46: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

General SIS

At each n � 0 we have available {X i

0:n�1,Win�1}N

i=1.1. Sampling

I For i = 1, . . . ,N,I sample particles as X

i

n

⇠ q

n

�·|X i

0:n�1�,

I Augment the path of the state as X

i

0:n = [X i

0:n�1,Xi

n

].

2. Compute weight:

I For i = 1, . . . ,N, Compute weight

fW

i

n

= W

i

n�1�n

(X i

0:n)

�n�1(X i

0:n�1)q�X

i

n

|X i

0:n�1� = W

i

n�1�n

(X i

n

|X i

0:n�1)

q

�X

i

n

|X i

0:n�1� ,

I Normalise weight W in

=fW

i

nPN

j=1fW

j

n

.

Page 47: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

SIS approximations

At time n, the approximations of ⇡ and Z after the sampling step are

b⇡ (dx0:n) =NX

i=1

W

i

n

�X

i

0:n(dx0:n) , (4)

cZ

n

=1N

NX

i=1

w

�X

i

0:n�. (5)

(Note change of notation for weights, subscript is time now)

Page 48: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Particle approximations with SIS

I Let alsoI ' : X n ! R be a bounded measureable test functionI the integral of interest be

I

n

=

Z'(x0:n)⇡ (x0:n) dx0:n

I and its particle approximation

I

n

=

Z'(x0:n)⇡ (dx0:n)

=NX

i=1

W

i

n

'�X

i

0:n�

Page 49: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Some Asymptotics with N

I Similar as usual importance sampling:I basic difference is we are computing the weight recursively.

I Asymptotically consistent as N ! 1. Asymptotic bias

⇣bI

n

� I

n

⌘= � 1

N

Z

X n

(⇡ (x0:n))

2

q(x0:n)

('(x0:n)� I

n

) dx0:n

I Central Limit Theorem (CLT) holds:pN

⇣bI

n

� I

n

⌘) N

�0,�2

IS

where

�2

IS

=1N

0

@Z

X n

(⇡ (x0:n))

2

q(x0:n)

('(x0:n)� I

n

)2 dx0:n

1

A

Page 50: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Normalising constant

I For when estimated as in (5) (above), relative variance is as instandard IS:

VarhcZ

n

i

Z

2

n

=1N

Z(⇡ (x

0:n))2

q(x0:n)

dx

0:n � 1

!

I So far it is not clear how we can exploit more sequentialstructure.

Page 51: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Normalising constant

I Lets write conditional distribution

⇡n

(xn

| x0:n�1

) =�n

(x0:n)Zn�1

�n�1

(x0:n�1

)Zn

/ �n

(xn

| x0:n�1

)

I To estimate of Z

n

Z

n�1, use standard IS, with proposal

q

n

(xn

|x0:n�1

), so

dZ

n

Z

n�1

=1N

NX

i=1

w

n

(X i

0:n),

where w

n

(x0:n) =

�n

(xn

|x0:n�1)q(x

n

|x0:n�1)

Page 52: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

On unbiasedness of bZ

I Not that one could either approximate Z (with standard ISand weighting final sample):

Z =1N

NX

i=1

w(X i

0:n)

(recall w(x0:n) =

nQk=0

w

k

(x0:k)) or as

bZ =

nY

k=0

dZ

n

Z

n�1

=1

N

n+1

nY

k=0

NX

i=1

w

k

�X

i

0:k

with

w

k

�x

i

0:k

�=

�k

(x ik

|x i0:k�1

)

q

�x

i

k

|x i0:k�1

Page 53: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

On unbiasedness of bZI Assuming Z

0

= 1 (for simplicity), Eq

hbZ

i=

ZnY

k=0

1N

NX

i=1

w

k

�x

i

0:k

�!"

nY

k=0

NY

i=1

q

k

(x ik

|x i0:k�1

)dx ik

#

=nY

k=0

Z 1N

NX

i=1

w

k

�x

i

0:k

�!

NY

i=1

q

k

(x ik

|x i0:k�1

)dx ik

=nY

k=0

1N

NX

i=1

Zw

k

�x

i

0:k

�q

k

(x ik

|x i0:k�1

)dx ik

·Y

j 6=i

Zq

k

(x jk

|x j0:k�1

)dx jk

=nY

k=0

1N

NX

i=1

Zw

k

�x

1

0:k

�q

k

(x1

k

|x1

0:k�1

)dx1

k

=nY

k=0

1N

N

Z�k

(xk

|x0:k�1

)dxk

=nY

k=0

Z

k

Z

k�1

= Z

Page 54: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Choosing importance proposals

I Intuition for choosing q

n

is same as beforeI can attempt minimise the rel. variance of the normalising

constant Zn

I or equivalently minimise the variance of the importanceweights.

I This means again q

n

should be very similar or close to ⇡n

I can use other approximations available, e.g. Laplace,Saddlepoint, etc.

I Can also monitor ESS as n progresses:

ESS

N =1

PN

i=1

W

i2

n

Page 55: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Discussion on SIS

I Approach can be useful for low/moderate n and lowdimensional x

n

-sI Eventually as n increases method will degenarate:

I low weights will remain low for each particleI mass concentrates to few or one particleI weight variance eventually explodes

I Particle filtering addresses this by using resampling to stabilisethe weights

Page 56: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Other problems where sequential sampling can be usefulI Optimisation :

I define a sequence of targets

⇡n

(x) / e⇡(x)�n

where �n

> �n�1 .

I As �n

! 1 then ⇡n

concentrates around the set of maximisersof e⇡

I Rare Events:I compute probability of a small/rare tail ⇡(A).I Define a sequence of targets

⇡n

(x) / 1A

n

⇡(x)

where A = A

T

⇢ A

T�1 ⇢ . . . ⇢ A0. Normalising constant thistime approximates ⇡(A)

I Note this time the sequence of densities is defined on a static(non-increasing) state space so slightly different thanpresentation so far

I SMC samplers (Del Moral, Doucet and Jasra 06).

Page 57: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Reading List

I Liu (2001) Monte Carlo strategies in scientific computing,Springer.

I Sections 1.1, 2.1-2.3, 2.5, 2.6.1, 2.6.3I Robert and Casella (1999) Monte Carlo Statistical Methods,

SpringerI Sections 1, 3.1-3.3, 3.7

Page 58: LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1. Introduction to simulation I basics of Monte Carlo I variance reduction methods I rejection

Homework 1

I For the following scalar model

X

n

= ⇢Xn�1 + �V

n

, Y

n

= � exp(X

n

2)W

n

,

where Wn

,Vn

iid⇠ N (0, 1), X0 ⇠ N⇣0, �2

1�⇢2

⌘, ⇢ = 0.8,� = 1,� = 0.1

n = 0, 1 . . . , 50. Code a SIS procedure to approximate p(x0:50|y0:50) andestimate p(y0:50). Compute estimates for the first two moments of thisposterior for the vector x0:50 = (x0, x1, . . . , x50).

I Perform multiple runs of this SIS algorithm with different random seedsand plot on the Monte Carlo variance of the above estimates as afunction of n.

I Using multivariate versions of Wn

,Vn

investigate the effect ofdimensionality to the variance of the weights and the normalisingconstant.

I Relate the output figures and results with any theoretical resultsmentioned in the slides and lectures.