LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1....
Transcript of LTCC: Advanced Computational Methods in Statisticsnkantas/slides1.pdf · Outline for course 1....
LTCC: Advanced Computational Methods inStatistics
Introduction to some aspects of Monte Carlo
N. Kantas
Notes at http://wwwf.imperial.ac.uk/~nkantas/notes4ltcc.pdf
Slides athttp://wwwf.imperial.ac.uk/~nkantas/slides1.pdf
Aims for course
I Aims:I Overview of some simulation based methodsI Understand the basics behind principles of Monte Carlo
MethodsI Motivate further study for theory & applications
I Light assessment:I via coursework
Outline for course
1. Introduction to simulationI basics of Monte CarloI variance reduction methodsI rejection sampling
2. Importance SamplingI some basics, asymptotic variance,I sequential importance sampling
3. Markov Chain Monte Carlo (MCMC)I Metropolis-Hastings, Gibbs samplingI some basics on theory and practice
4. Sequential Monte Carlo (SMC)I particle filtering for state space modelsI sampling for fixed dimensional state spacesI particle MCMC
Outline for lecture 1
I What is Monte Carlo?I using sampling from complex high dimensional distributions to
compute integralsI same example problems
I Some basic approachesI perfect (or naive) Monte CarloI Variance reduction
I control or antithetic variables, conditioning, importancesampling
I Importance samplingI examples as variance reductionI sequential Importance Sampling
Introduction to Monte Carlo
I Consider an arbitrary distribution on X with a density ⇡ w.r.tto dx , such that
⇡(x) =�(x)
Z
and is Z unknown
I Let ' : X �! Rn
x , with ' = sup' < +1I We want to compute
⇡ (') = E⇡['(X )] = h⇡,'i =Z
X
'(x)⇡(dx) (1)
Example 1: Bayesian Statistics
I Bayesian Statistics (assume densities exist)
p(x |y) / p(y |x)p(x)
Ip(x) is known proper prior
Ip(y |x) is likelihood
I ⇡ is posterior
I Here evidence
Z = p(y) =
Zp(y |x)p(x)dx
is very useful to compare models, but is unknown
I Need to approximate both ⇡ and Z
I Simple conjugate example: X ⇠ IG (a, b), Y ⇠ N (0,X ), a, bknown.
Example 2: rare events estimation
I For a distribution p, compute probability of a small/rare tail
p(A) =
Z
A
p(dx)
I Define⇡(dx) / 1
A
(x)p(dx)
I indicator function 1A
(x) =
(1 if x 2 A
0 if x /2 A
acts as likelihood, p
as priorI Normalising const. Z =
RX ⇡(dx) =
RA
p(dx) = p(A)
I Examples:I compute tail of a distributionI simple random walk confinement : X
n
= X
n�1 +W
n
, Wn
iidnoise, X = (X1, . . . ,Xn
), A = (�✏, ✏)n
Examples 3: stochastic filtering
I continuous spaces and X0 ⇠ ⌘✓ (·), Xn
⇠ f✓(·|xn�1),Yn
⇠ g✓(·|xn)I What is the hidden signal X0,X1, . . .? Can perform Bayesian inference
using ⇧n
(· ) = P [X0:n 2 ·|Y0:n] and the marginal likelihoodZ = P(Y1, . . . ,Yn
).
Example 4: self avoiding walk
I Have you played vintage snake game?I Given X
0
let Xn
2 Z2 and consider a standard RW
p(Xn
= x |Xn�1
= y) =
(1
8
, if x � y = 10 otherwise
I Simulate from
⇡(X1
, . . . ,Xn
) / 1x
n 6=x
n�1 6=... 6=x0(X
1
, . . . ,Xn
)p(X1
, . . . ,Xn
)
and compute Z = P(xn
6= x
n�1
6= . . . 6= x
0
)
Introduction to Monte Carlo
I The problem is essentially a numerical integration problemI One could use deterministic numerical integration methods
approximate ⇡ (')
I there some pros and cons (next slide)
I A different direction is to use simulationI take advantage of more computational power available.
I What is Monte Carlo?I Sampling from complex high dimensional distributions to
compute integralsI In examples 1-4 we focus on cases ⇡(x) / G (x)p(x)
Introduction to Monte Carlo
I The problem is essentially a numerical integration problemI One could use deterministic numerical integration methods
approximate ⇡ (')
I there some pros and cons (next slide)
I A different direction is to use simulationI take advantage of more computational power available.
I What is Monte Carlo?I Sampling from complex high dimensional distributions to
compute integralsI In examples 1-4 we focus on cases ⇡(x) / G (x)p(x)
On determinstic integration methods
I Quadrature, cubature, sigma points..I Underlying principles:
I Use polynomial or other approximations that interpolatefunction to be integrated at certain points
I place points to minimise errorsI exploit symmetries to reduce number of points used
I Interesting numerical analysis for errorsI exploit smoothness or other properties of functions & densities
to be integratedI errors depend on integrand properties
BUT
I hard to be useful in higher dimensions than 2 � 3I still very useful for simple integrals
I often might need to re-apply for different '
Roadmap
I Perfect Monte CarloI variance reduction
I Rejection SamplingI Importance Sampling
Perfect Monte CarloI
IF we can obtain i.i.d. samples
X
i ⇠ ⇡
I by (strong) Law of Large Numbers (LLN) one can use sampleaverage
b⇡(') = 1N
NX
i=1
'(X i ) !N!1 ⇡ (')
I In a way one can view samples forming an approximation of ⇡
b⇡ =1N
NX
i=1
�X
i
,
b⇡(') =Z
X
'(x)b⇡(dx) =1N
NX
i=1
'(X (i)). (2)
Perfect Monte Carlo
I Variance (non-asymptotic) is given by
Var [b⇡(')] = 1N
Var
⇥'(X i )
⇤=
1N
0
@Z
X
'2(x)⇡(dx)� ⇡(')2
1
A
I Note rate of decrease w.r.t N is not dependent on size of XI (dimensionality still important as integrals and ⇡ can depend
implicitly on dimension)
I Problems:I often cannot sample from ⇡I even if this is possible relative variance can still be very high:
I when ' = 1A
where A is a tail with very low probability.I high dimensions
Rare events example using Perfect Monte Carlo
I Consider a continuous distribution P with density p(x)
I We are interested in computing p
⇤ = P(X �) ⇡ 10�9
I Naive Monte Carlo setting:I For i = 1 : N sample i.i.d. x
i ⇠ p (·), then compute
cp
⇤ =1N
NX
i=1
1
x�(xi )
I cp
⇤ consistent, CLTpN(cp⇤ � p
⇤) ! N (0,VarP
[1x�]),
Rare events example using Perfect Monte Carlo
I Variance of estimator �2
cp
⇤ =Var
p
[1x�]
N
= p
⇤�p
⇤2
N
,I Relative error:
RE =
vuutVar"bp
⇤
p
⇤
#⇡ 1p
p
⇤N
I So would like at least N ⇠ 1011 to get decent estimators -Prohibitively long simulation times
On unbiasedness and variance reduction
I Note that when we sample i.i.d from X
i ⇠ ⇡, Monte Carloestimates are unbiased
E⇡
"X
i
��X
i
�#=
NX
i=1
E⇡⇥��X
i
�⇤= NE⇡ [� (X )]
I Example: E⇡ [P
i
1X
i<c
] =P
N
i=1 E⇡ [1X
i<c
] = Np(X i < c)
I In fact a simple sample from ⇡ is an unbiased estimate forE⇡[X ]
I for N > 1 use sample average 1N
PN
i=1 Xi to estimate E⇡[X ]
I variance of estimator decreases with rate 1/N
Control variates
I When estimating E⇡[� (X )] there are ways to reduce thevariance
I control variates or antithetic variablesI conditioning or Rao BlackwellisationI Importance SamplingI ....
Control variates
I Let X 1 be an unbiased estimate for E⇡[� (X )]. For any Z suchthat E⇡ [Z ] = 0 and a constant �, then X
1 + �Z is also anunbiased estimator
E⇡⇥X
1 + �Z⇤= E⇡
⇥X
1
⇤+ �E⇡ [Z ] = E⇡[� (X )]
and
Var⇡⇥X
1 + �Z⇤= Var⇡
⇥X
1
⇤+ �2Var⇡ [Z ] + 2�Cov⇡
⇥X
1,Z⇤
Control variates
I In theory one can minimise variance w.r.t to �,
� = �Cov⇡
⇥X
1,Z⇤
Var⇡ [Z ]
to actually get a zero variance estimator!I In practice it is difficult to achieve this, i.e. to find such �,Z
I but can choose Z and tune � numerically and get goodvariance reduction
I Similar ideas appear in antithetic variates or Multi-level MonteCarlo
Rao Blackwell conditioning
I Consider a bivariate distribution ⇡(x , y) = ⇡(x |y)p(y), i.e.Z
⇡(x , dy) = ⇡(x),
and assume one can simulate ⇡(x |y) and p(y).I Then E ['(X )|Y ] is an unbiased estimator for E⇡ ['(X )]
E⇡ ['(X )] = Ep
[E ['(X )|Y ]]
I In addition, we have the variance conditioning identity
Var⇡ ['(X )] � Varp
[E ['(X )|Y ]]
I Procedure: use perfect Monte Carlo from p(y) and then from⇡(x |y)
I conditioning can improve on the variance.
Discussion on Perfect Monte Carlo so far
I Very often perfect Monte Carlo is not possible except forsimple distributions
I e.g. see Examples 1-4I Note that even when it is possible to get direct samples from
⇡, strange test functions ' can result to estimates with veryhigh Monte Carlo variance
I e.g. rare event example above for ' = 1
A
I Variance of estimators are a measure of efficiencyI in some cases indirect sampling can be better
Discussion on Perfect Monte Carlo so far
I There are indirect ways for sampling perfectly from ⇡ usingI rejection samplingI Importance samplingI Markov ChainsI particle systems & methods
Rejection Sampling
I Let ⇡ ⌧ q: ⇡(x) > 0 ) q(x) > 0,I i.e. q has heavier tailsI (can phrase as absolutely continuous requirement if densities do not
exist!)
I Then assume you know M such that for all x :
w(x) =�(x)
q(x)< M
I Accept Reject Procedure:I For i = 1, . . . , L
I Sample X
i ⇠ q
I Sample U
i ⇠ U[0, 1)I
Accept sample, Y = X
i if U i < w(Xi )M
Rejection Sampling
I Procedure generates samples from ⇡!I Conditioning argument
P [Y 2 A] = PX
i 2 A
��U
i <w(X i )
M
�
=PhX
i 2 A,U i < w(Xi )M
i
PhU
i < w(Xi )M
i
=
RA
R 10 q(x)1
u<w(x)/Mdudx
RX q(x)
⇣Rw(x)/M0 du
⌘dx
=
RA
q(x)w(x)M
dx
RX q(x)w(x)
M
dx
=1M
RA
�(x)dx1M
RX �(x)dx
= ⇡(A)
Rejection Sampling
I Issue is that
PU
i <w(X i )
M
�= E
q
PU
i <w(X i )
M
����Xi
��= . . . =
1M
so method might not be very efficient if M high!I So in practice need M ⇡ 1 i.e. � ⇡ q which is not easyI There are also more advanced rejection methods
I envelopes, adaptive accept-reject,...
Popular Monte Carlo Methods
IImportance Sampling:
I Sample from a proposal q, weight samples according to d�dq
IMarkov Chain Monte Carlo (MCMC):
I Run an ergodic Markov chain with invariant distribution ⇡,I (Metropolis and Ulam 49, Hastings 71, Geman and
Geman 84, Gelfand and Smith 90)
I many approaches:I Metropolis-Hastings, Gibbs Sampling, Metropolis within
Gibbs, Hybrid (or Hamiltonian) Monte Carlo, SimulatedAnnealing, .....
I very interesting theory related to Markov Processes (by G.O.Roberts, J. Rosenthal, L. Tierney and many others)
Popular Monte Carlo Methods
ISequential Monte Carlo (SMC):
I Propagate a swarm of samples through{⇡
n
(x0:n) =�n
(x0:n)Z
n
}nT
, such that x =x0:T and ⇡T
= ⇡.(Hetherington 84, Gordon, Salmond & Smith 93, Liu 98,Doucet 98, Del Moral 04)
I Hybrid approaches also possible:I SMC within MCMC (Particle MCMC, Andrieu et al 10)I MCMC withn SMC (Chopin 01, Gillks and Berzouini )
I very interesting theory related to interacting particle systems,Mean field Integration (e.g Del Moral, Crisan, Douc, Moulinesand many others)
Importance Sampling (IS)
I Let ⇡ ⌧ q
I Then⇡(x) =
w(x)q(x)Rw(x)q(x)dx
withw(x) =
�(x)
q(x)
Iq importance distribution
Iw un-normalised importance weights
I (Recall ⇡(x) = �(x)Z
)
Importance Sampling: known normalising constant
I When Z known, the following Monte Carlo approximation canbe used
b⇡(dx) = 1N
NX
i=1
W
i�X
i
(dx)
where
W
i =w(X i )
Z
I Note
Eq
"NX
i=1
W
i
#= E
q
"NX
i=1
�(X i )
Zq(X i )
#=
NX
i=1
R�(x1)dx1
Z
= N
Importance Sampling: estimating normalising constant
I In other words
Eq
"1N
NX
i=1
w(X i )
#= Z
I So1N
NX
i=1
w(X i )
is an unbiased estimator of Z .
Importance Sampling: self normalising case
I When Z unknown (most interesting cases), the followingMonte Carlo approximation can be used
b⇡(dx) =NX
i=1
W
i�X
i
(dx)
where
W
i =w(X i )
PN
i
0=1
w(X i
0)
such thatP
N
i=1
W
i = 1, so for the integral:
b⇡(') =NX
i=1
W
i'(X i ) (3)
IZ can be estimated as before!
IS in a very simple exampleI How important is absolute continuity? Consider p = N (0, 1) ,
q = N (0, 2)
−4 −2 0 2 4 60
0.5
1
1.5x 10
−3 p/q(x) vs x, X~q
−4 −2 0 2 4 60
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045q/p(x) vs x, X~p
Figure: Left: q acts as proposal to p, right: p acts as proposal to q
Rare events example: IS as variance reduction method
I We defined the rare event of an i.i.d process as the event{x 2 A}, e.g. A = {x : x �}.
I In principle if we could sample from
q(x) / p(x)1A(x),
and weight using w(x) = p(x)q(x) =
Rp(x 0)1A(x 0)dx 0
1A(x)
I Monte Carlo estimator has variance1
N
⇥Ep
[w1
A
]� Ep
[1A
]2⇤= 0.
IUnrealistic to do perfectly in most cases, but principle holds
I can do better than perfect Monte CarloI Aim is to find q with w(x) < 1 and more “mass” in rare
region A.
Importance Sampling: some asymptotics
I Asymptotically consistent as N ! 1. Asymptotic bias
(b⇡(')� ⇡(')) = � 1N
Z
X
⇡2(x)
q(x)('(x)� ⇡(')) dx
I Central Limit Theorem (CLT) holds:pN(b⇡(')� ⇡(')) ) N
�0,�2
IS
�
where
�2
IS
=
0
@Z
X
⇡2(x)
q(x)('(x)� ⇡('))2 dx
1
A
Importance Sampling: choosing proposals
I The asymptotic variance of the estimator ⇡(') is minimised by
q(x) =|' (x)|⇡ (x)R
X|' (x)|⇡ (x) dx
.
but this is not very easy in practice!I think of tails & Rare events example
I We are typically interested in the expectations of several testfunctions (e.g. moments or simple functions).
Comments
I Results on with ' useful for understanding what types offunctions will lead to good estimators
I But not useable easily in practice except when interested inspecific test functions
I e.g. rare events caseI In a Bayesian inference context as we are typically interested in
the expectations of several test functionsI e.g. different moments or simple functions for histograms.
Importance Sampling: normalising constant
I Estimate normalising constant Z ,
bZ =
1N
NX
i=1
�(X i )
q(X i )
=
Z�(x)
q(x)bq(dx), with bq(dx) = 1
N
NX
i=1
�X
i (dx)
!
I Variance:
VarhbZ
i=
Z
2
N
✓Z⇡2(x)
q(x)dx � 1
◆
Choosing importance proposals
I We can attempt to select q which minimises eitherI the variance of the importance weights.I the relative variance of ZI in both cases q should be ⇡.
I So one could construct q similar or close to ⇡
I Can use other methods/approximations:I Laplace principle, Gaussian, Saddlepoint approximations etc.
The effective sample size (ESS)
I We can rescale the variance of the importance weights to givea number in [1,N] to represent the effective number of samples
ESS =N
1 + Varq
[w(X )]
I The higher the ESS the betterI
ESS/N can be interpreted as approximate ratio of MonteCarlo variance of perfect Monte Carlo & IS
I Can be monitored using Monte Carlo approximations:
ESS
N =1
PN
i=1
W
2
i
=
⇣PN
i=1
w(X i )⌘
2
PN
i=1
w(X i )2
Discussion on IS
I It is crucial to find a good q
I cannot be easily automated and requires good understandingof the problem
I Approach will degenerate for high dimensional xI dissimilarity between ⇡ and q increases usually with dimension
(in extreme cases they can even become singular)I this results to very low weights and high weight variance
More advanced Importance Sampling
I Adaptive ISI iteratively use already obtained samples to improve on
construction of qI e.g. after a few steps change q to minimise some distance
with a smoothed version of ⇡, (Oh & Berger 93)I interesting combinations with MCMC possible, population
Monte Carlo (Iba 01, Douc et. al. 07, Cappe et. al. 12)I Sequential IS
I for high dimensional target distribution, work dimension perdimension
I tempering also possible (Neal 01, Jarzynski 97)
Sequential Importance Sampling (SIS)
I Let say we are interested to do IS for
⇡(x0:T ) =
�(x0:T )
Z
,
I Can we perform IS recursivelyI aka Apply IS sequentially ! Sequential IS
I Define a sequence of distributions {⇡n
(x0:n) =
�n
(x0:n)Z
n
}nT
,with ⇡ = ⇡
T
.I Perform IS at each n using samples (aka particles) from
previous steps
Sequential Importance Sampling (SIS)
I For the target density assume a product factorisation holds
�(x0:n) = �(x
0:n�1
)�n
(xn
|x0:n�1
).
I We want to use previous samples/particles to generate samplesfor sequence
I Construct proposal or instrumental density as
q(x0:n) = q(x
0:n�1
)qn
(xn
|x0:n�1
).
I Then obtain a recursive expression for the IS weight
w(x0:n) = w(x
0:n�1
)�n
(xn
|x0:n�1
)
q
n
(xn
|x0:n�1
).
General SIS
At each n � 0 we have available {X i
0:n�1,Win�1}N
i=1.1. Sampling
I For i = 1, . . . ,N,I sample particles as X
i
n
⇠ q
n
�·|X i
0:n�1�,
I Augment the path of the state as X
i
0:n = [X i
0:n�1,Xi
n
].
2. Compute weight:
I For i = 1, . . . ,N, Compute weight
fW
i
n
= W
i
n�1�n
(X i
0:n)
�n�1(X i
0:n�1)q�X
i
n
|X i
0:n�1� = W
i
n�1�n
(X i
n
|X i
0:n�1)
q
�X
i
n
|X i
0:n�1� ,
I Normalise weight W in
=fW
i
nPN
j=1fW
j
n
.
SIS approximations
At time n, the approximations of ⇡ and Z after the sampling step are
b⇡ (dx0:n) =NX
i=1
W
i
n
�X
i
0:n(dx0:n) , (4)
cZ
n
=1N
NX
i=1
w
�X
i
0:n�. (5)
(Note change of notation for weights, subscript is time now)
Particle approximations with SIS
I Let alsoI ' : X n ! R be a bounded measureable test functionI the integral of interest be
I
n
=
Z'(x0:n)⇡ (x0:n) dx0:n
I and its particle approximation
I
n
=
Z'(x0:n)⇡ (dx0:n)
=NX
i=1
W
i
n
'�X
i
0:n�
Some Asymptotics with N
I Similar as usual importance sampling:I basic difference is we are computing the weight recursively.
I Asymptotically consistent as N ! 1. Asymptotic bias
⇣bI
n
� I
n
⌘= � 1
N
Z
X n
(⇡ (x0:n))
2
q(x0:n)
('(x0:n)� I
n
) dx0:n
I Central Limit Theorem (CLT) holds:pN
⇣bI
n
� I
n
⌘) N
�0,�2
IS
�
where
�2
IS
=1N
0
@Z
X n
(⇡ (x0:n))
2
q(x0:n)
('(x0:n)� I
n
)2 dx0:n
1
A
Normalising constant
I For when estimated as in (5) (above), relative variance is as instandard IS:
VarhcZ
n
i
Z
2
n
=1N
Z(⇡ (x
0:n))2
q(x0:n)
dx
0:n � 1
!
I So far it is not clear how we can exploit more sequentialstructure.
Normalising constant
I Lets write conditional distribution
⇡n
(xn
| x0:n�1
) =�n
(x0:n)Zn�1
�n�1
(x0:n�1
)Zn
/ �n
(xn
| x0:n�1
)
I To estimate of Z
n
Z
n�1, use standard IS, with proposal
q
n
(xn
|x0:n�1
), so
dZ
n
Z
n�1
=1N
NX
i=1
w
n
(X i
0:n),
where w
n
(x0:n) =
�n
(xn
|x0:n�1)q(x
n
|x0:n�1)
On unbiasedness of bZ
I Not that one could either approximate Z (with standard ISand weighting final sample):
Z =1N
NX
i=1
w(X i
0:n)
(recall w(x0:n) =
nQk=0
w
k
(x0:k)) or as
bZ =
nY
k=0
dZ
n
Z
n�1
=1
N
n+1
nY
k=0
NX
i=1
w
k
�X
i
0:k
�
with
w
k
�x
i
0:k
�=
�k
(x ik
|x i0:k�1
)
q
�x
i
k
|x i0:k�1
�
On unbiasedness of bZI Assuming Z
0
= 1 (for simplicity), Eq
hbZ
i=
ZnY
k=0
1N
NX
i=1
w
k
�x
i
0:k
�!"
nY
k=0
NY
i=1
q
k
(x ik
|x i0:k�1
)dx ik
#
=nY
k=0
Z 1N
NX
i=1
w
k
�x
i
0:k
�!
NY
i=1
q
k
(x ik
|x i0:k�1
)dx ik
=nY
k=0
1N
NX
i=1
Zw
k
�x
i
0:k
�q
k
(x ik
|x i0:k�1
)dx ik
·Y
j 6=i
Zq
k
(x jk
|x j0:k�1
)dx jk
=nY
k=0
1N
NX
i=1
Zw
k
�x
1
0:k
�q
k
(x1
k
|x1
0:k�1
)dx1
k
=nY
k=0
1N
N
Z�k
(xk
|x0:k�1
)dxk
=nY
k=0
Z
k
Z
k�1
= Z
Choosing importance proposals
I Intuition for choosing q
n
is same as beforeI can attempt minimise the rel. variance of the normalising
constant Zn
I or equivalently minimise the variance of the importanceweights.
I This means again q
n
should be very similar or close to ⇡n
I can use other approximations available, e.g. Laplace,Saddlepoint, etc.
I Can also monitor ESS as n progresses:
ESS
N =1
PN
i=1
W
i2
n
Discussion on SIS
I Approach can be useful for low/moderate n and lowdimensional x
n
-sI Eventually as n increases method will degenarate:
I low weights will remain low for each particleI mass concentrates to few or one particleI weight variance eventually explodes
I Particle filtering addresses this by using resampling to stabilisethe weights
Other problems where sequential sampling can be usefulI Optimisation :
I define a sequence of targets
⇡n
(x) / e⇡(x)�n
where �n
> �n�1 .
I As �n
! 1 then ⇡n
concentrates around the set of maximisersof e⇡
I Rare Events:I compute probability of a small/rare tail ⇡(A).I Define a sequence of targets
⇡n
(x) / 1A
n
⇡(x)
where A = A
T
⇢ A
T�1 ⇢ . . . ⇢ A0. Normalising constant thistime approximates ⇡(A)
I Note this time the sequence of densities is defined on a static(non-increasing) state space so slightly different thanpresentation so far
I SMC samplers (Del Moral, Doucet and Jasra 06).
Reading List
I Liu (2001) Monte Carlo strategies in scientific computing,Springer.
I Sections 1.1, 2.1-2.3, 2.5, 2.6.1, 2.6.3I Robert and Casella (1999) Monte Carlo Statistical Methods,
SpringerI Sections 1, 3.1-3.3, 3.7
Homework 1
I For the following scalar model
X
n
= ⇢Xn�1 + �V
n
, Y
n
= � exp(X
n
2)W
n
,
where Wn
,Vn
iid⇠ N (0, 1), X0 ⇠ N⇣0, �2
1�⇢2
⌘, ⇢ = 0.8,� = 1,� = 0.1
n = 0, 1 . . . , 50. Code a SIS procedure to approximate p(x0:50|y0:50) andestimate p(y0:50). Compute estimates for the first two moments of thisposterior for the vector x0:50 = (x0, x1, . . . , x50).
I Perform multiple runs of this SIS algorithm with different random seedsand plot on the Monte Carlo variance of the above estimates as afunction of n.
I Using multivariate versions of Wn
,Vn
investigate the effect ofdimensionality to the variance of the weights and the normalisingconstant.
I Relate the output figures and results with any theoretical resultsmentioned in the slides and lectures.