Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Introduction to Monte Carlo Methods
Prof. Nicholas Zabaras
Center for Informatics and Computational Science
https://cics.nd.edu/
University of Notre Dame
Notre Dame, Indiana, USA
Email: [email protected]
URL: https://www.zabaras.com/
October 4, 2018
1
https://cics.nd.edu/mailto:[email protected]://www.zabaras.com/
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Contents
2
Review of the Bayesian inference framework, Introducing the Monte
Carlo simulation, reviewing the central Limit Theorem and Law of Large
Numbers, indicator functions, error approximations
Examples showing convergence of the MC simulator, Generalization of
the MC Estimator in High Dimensions
Sample Representation of the Monte Carlo Estimator
Deterministic vc MC integration, Using MC for Computing integrals,
expectations and Bayes factors
Following closely:
C. Robert, G. Casella, Monte Carlo Statistical Methods (Ch.. 1, 2, 3.1, & 3.2) (google books, slides, video)
J. S. Liu, MC Strategies in Scientific Computing (Chapters 1 & 2)
J-M Marin and C. P. Robert, Bayesian Core (Chapter 2)
Statistical Computing & Monte Carlo Methods, A. Doucet (course notes, 2007)
http://books.google.com/books?id=HfhGAxn5GugC&dq=Monte+Carlo+statistical+methods,+google+books&printsec=frontcover&source=bn&hl=en&ei=JbxiTP6KDIK88gaK7bnsCQ&sa=X&oi=book_result&ct=result&resnum=4&ved=0CCQQ6AEwAwhttp://books.google.com/books?id=HfhGAxn5GugC&printsec=frontcover&source=gbs_v2_summary_r&cad=0http://www.ceremade.dauphine.fr/~xian/Madrid.pdfhttp://videolectures.net/christian_robert/http://books.google.com/books?id=R8E-yHaKCGUC&printsec=frontcover&dq=monte+carlo+strategies+in+scientific+computing&hl=en&ei=afBiTOWwNoT7lwfi1-nMCw&sa=X&oi=book_result&ct=result&resnum=1&ved=0CDUQ6AEwAAhttp://www.springerlink.com/content/x38261/http://people.cs.ubc.ca/~arnaud/stat535/slides7.pdf
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
The Bayesian Model
3
Let us revisit our Bayesian model
In most problems of interest there are no analytical closed forms for
the posterior (using conjugate priors is one exception).
Also note that the denominator in the Bayesโ equation above implies
the calculation of the following high-dimensional integral:
( ) ( | )f x d
( ) ( | )( | )
( ) ( | )
f xx
f x d
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
The Bayesian Model
4
Consider the posterior model
Typical point estimates based on the posterior include the following:
We are often interested in the mode of the posterior
If ๐ = (๐1, ๐2) and ๐2 is a nuisance parameter, marginal distributions are also often needed:
1 1 2 2( | ) ( , | )x x d
( ) ( | )( | )
( ) ( | )
f xx
f x d
22
| ( | )
| ( | ) |
x x d
Var x x d x
( | )x
argmax ( | )MAP x
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
The Bayesian Model
5
Consider the posterior model
The predictive distribution of ๐, ๐~๐(๐ฆ|๐ฅ), and its mean , are:
Similarly, for model selection, we have seen that the posterior takes the
form:
1
( ) ( ) ( | , )( , | )
( ) ( ) ( | , )
k
k k k kk
k k k k k
k
k f x kk x
k f x k d
( ) ( | )( | )
( ) ( | )
f xx
f x d
( | ) ( | ) ( | )
| ( | ) ( | )
g y x f y x d
Y x yf y x d dy
|Y x
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Monte Carlo Simulation
6
All of the calculations reviewed earlier require computing high-
dimensional integrals.
Monte Carlo simulation provides the means for effective calculation of
these integrals and for resolving many more issues.
Some historical (early) references on Monte Carlo Methods:
N. Metropolis and S. Ulam, The Monte Carlo Method, J Amer. Statist. Ass., Vol. 44, pp. 335-341
(1949)
N. Metropolis, The Beginning of the Monte Carlo Method, Los Alamos Science Special Issue
(1987)
http://www.amstat.org/misc/TheMonteCarloMethod.pdfhttps://www.dropbox.com/s/5iwwnfdj0nak2s9/TheMonteCarloMethod.pdf?dl=0http://library.lanl.gov/cgi-bin/getfile?00326866.pdfhttps://www.dropbox.com/s/qc5qv64krrlg57f/TheBeginningOfTheMCMethod.pdf?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Introducing Monte Carlo Simulation
7
Consider a 2 ร 2 square as shown in the Figure below.
Let an inscribed circle D of radius 1 in the square S.
2S
S
D
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Introducing Monte Carlo Simulation
8
Let us consider dropping darts uniformly on the square S. This means that the probability of the dart falling in a subdomain ๐ด of S is proportional to the area of ๐ด.
Let ๐ท = (๐ฅ, ๐ฆ) define a random variable on that represents the location of the drop of the dart. We have:
Assume ๐ independent drops of the dart on the square S, i.e.
( ) Adxdy
P D Adxdy
S
S
1 2, ,..., ND D D
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Introducing Monte Carlo Simulation
9
A common sense estimation of the above probability would be
where ๐ is the total number of dart falls.
Can we give a statistical justification
of this result?
( ) Adxdy
P D Adxdy
S
๐(๐ท โ ๐ด) โ๐๐๐๐๐ ๐กโ๐ ๐๐๐๐ก ๐๐๐๐ ๐๐ ๐ด
๐
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Indicator Functions
10
Let us introduce the indicator function for the event ๐ด, i.e. let
Let us compute the probability for
The above results come immediately noticing that
1 ,( , )A
if drop point of dart d = (x, y) Ax y
0, otherwise
.D A
( , ) ( , )1
( ) ( , )4 4
A A
A
x y dxdy x y dxdy
P D A x y dxdydxdy
S S
S
S
\ \
( , ) ( , ) ( , ) 1 0 1A A AA A A A A
x y dxdy x y dxdy x y dxdy dxdy dxdy dxdy S S S
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Indicator Functions
11
The probability density associated to ๐ is ยผ - this is the density of the uniform distribution on ๐ denoted as
Let us introduce the random variable where
๐, ๐ are the random variables representing the Cartesian coordinates of a uniformly distributed point ๐ = (๐ฅ, ๐ฆ) on S, where a dart falls.
With this notation
( , ) ( , )1
( ) ( , )4 4
A A
A
x y dxdy x y dxdy
P D A x y dxdydxdy
S S
S
S
1( ) ( , ) ( )
4 SAP d A x y dxdy V U
S
SU
( ) : ( ) : ( , ),A AV D D X Y
~ ,SD U
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Strong Law of Large Numbers
12
Let ๐๐ for ๐ = 1, 2, . . . , ๐ be independent and identically distritbutedrandom variables (i.i.d.) with mean ๐ผ(๐๐) = ๐ and variance ๐(๐๐) =๐2 < โ, the strong LLN states for the sample mean the following:NX
lim NX almost surelyN
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 13
Let ๐๐ for ๐ = 1, 2, . . . , ๐ be independent and identically distritbutedrandom variables (i.i.d.) with mean ๐ผ(๐๐) = ๐ and variance ๐(๐๐) =๐2 < โ, the weak LLN states that the sample mean is a random variable that converges to the true mean as , i.e.
More formally:
Note that
NXN
1
N
i
iN
X
X as NN
Pr 0 0NNlim X
2
2
1 1 1 1
2 2
[ ] var[ ]
[ ] var[ ]
N N N N
i i
i i i iN N
X X
X and Xn n N N N
Weak Law of Large Numbers
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
The Central Limit Theorem
14
Let ๐๐ for ๐ = 1, 2, . . . , ๐ be independent and identically distributed (i.i.d) random variables each with expectation ฮผ and variances Then:
Under some weak conditions (such as finite variance), the sample
mean has a limiting normal distribution:
2.
1
1 1
1 1 1[ ] ( )
N
i N Ni
N i i
i i
X
X X NN N N N
22 21
2 21 1
1 1 1[ ] ( )
N
i N Ni
N i
i i
X
Var X Var Var X NN N N N N
1
N
i
iN
X
XN
lim๐โโ
เดค๐๐ โ ๐
๐2
๐
โผ ๐ฉ(0,1)๐๐ ๐๐ ๐ โ โ เดค๐๐ โผ ๐ฉ(๐,
๐2
๐)
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Back to Indicator Functions
15
Let ๐๐ for ๐ = 1, 2, . . . , ๐ be an indicator function for an event ๐น, i.e. let
Then as a result of the law of large numbers,
Note that herein we used that
i
1 if theoutcomeof theexperiment i is F,X
0, otherwise
1
Pr( )N
iN
i
XX F as N
N
1 Pr( ) 0 Pr( ) Pr( )iX F F F
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Let us introduce the random variables
associated to the drops and consider the sum
Assuming (๐โ), then the law of large numbers (since )yields
where we already proved that
When ๐โ, this justifies our intuitive result introduced earlier.
Returning to Our Dart Drop Experiment
16
: ( ) : ( ), 1, 2., ,i i A iV V D D i N
lim ( ) ( )SNN
S V almost surely
U
1
. . . : ( ), 1,2.,,
i i
N
i
iN
Empirical averageof i i d V V D i N
Vnumber of drops that fell in A
SN N
, 1,2,...,iD i N
( )S
V U
Pr ( )S
D A V U
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Since is an unbiased estimator of
To characterize the precision of the estimator, we can use the following:
This means that
Thus the mean square error between decreases as
Mean Square Error
17
1( ) ,
4 4P d dxdy
D
D
121
1 1( ) ( ) ( ) ( ) ( ' . . .)
N
N N i i
i
Var E Var S Var V Var V recall theV s are i i dN N
NS .4
4N N N
Errorterm
S is a random variable : S E
2 2
1
1( ) ( ) Pr( ) ( )N N N NVar S S S S D Var V
N
D
Pr( )NS and DD1
N
For our problem 2
2
1( ) ( ) ( ) 0.1685.4 4
Var V P D P D
D D
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Applying the central limit theorem (since ) :
The probability of the error being larger than is given as:
where
One can similarly prove that for any integer ๐ โฅ 1 and ๐ > 0,
where and (see here).
Properties of the Estimator
18
( )Var V
( ),
4
N
d
Var VS
NN
var( )2
V
N
var( )2
var( )Pr 2 2 1 2 1 2 0.0456
4 var( )N
V
V NS
N V
N
.x the CDF of 0,1 N
2
2var( )
2
var( )Pr | | 2 1
4 var( ) var( )2
N
V
N
VS erfc Ce
NV V
N N
1
12 2
xx erf
2
:xe
for large x erfc xx
http://en.wikipedia.org/wiki/Standard_normal_tablehttp://mathworld.wolfram.com/Erf.html
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Coefficient of Variation
19
The coefficient of variation for our Monte Carlo estimator is given as:
For different COVs, the number of samples needed is given as:
With all of the above error estimates, we conclude that the
approximation error varies as .
(1 ) 4(1 )var 0.52334 4 4. .
( )4
N
N
N
SCoeff of Var COV S
S N NN
10% 28
5% 110
1% 2739
COV N
COV N
COV N
1O
N
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
An alternative non-asymptotic error estimatea using a Bernstein type
inequality is:
Using this result, we can show that:
Alternatively, for any , using the Eq. above we can write:
These expressions for the approximation error show that it is inversely
proportional to
Properties of the Estimator
20
2
log(2 / )(0,1],
4 2
NP S for N
1N
log(40)Pr 0.05
4 2NS
N
.NaFrom Statistical Computing & Monte Carlo Methods, A. Doucet.
221 0, 24
N
NN and P S e
http://en.wikipedia.org/wiki/Bernstein_inequalities_(probability_theory)#cite_ref-3http://people.cs.ubc.ca/~arnaud/stat535/slides7.pdf
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Convergence of as a function of ๐ (one realization)
Convergence of the Simulator
21
4NS
See here for a
C++
implementation
(Meanerror_plt)
https://www.dropbox.com/s/8xnqtvao8ua4a8u/Dart_experiment_simulator.rar?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Convergence of as a function of ๐ (one hundred realizations)
Convergence of the Simulator
22
4NS
See here for a
C++
implementation
(Meanerror.plt)
https://www.dropbox.com/s/8xnqtvao8ua4a8u/Dart_experiment_simulator.rar?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Square root empirical mean square error across 100
realizations as a function of ๐ and (dotted)
Convergence of the Simulator
23
4NS
( )Var V
N
See here for a
C++
implementation
(Var_error.plt)
https://www.dropbox.com/s/8xnqtvao8ua4a8u/Dart_experiment_simulator.rar?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Convergence of the Estimator
24
Monte Carlo estimator of the area of a unit circle: (a) Circles: MC mean averaged over the S=100
runs; (b) Theoretical error bars using the estimator discussed earlier; (c) Average of the MC
numerical error bars (from the S=100 MC runs for each N). MatLab implementation is given here.
1
1
1( ) ( )
1( ) ( )
S
i
S
i
Mean Area Estimator at each N circles MC Area Estimator of circle run iS
Sampled Area Error Bars colored area Numerical Area Error Bars run iS
24 4
( ) /
Nour S MC estimatorr
MC Estimator of the area A of circle run i Area of square Number of counts of samples falling in circle N
2
2
1
1 1( ) 4 ( )
1
N
A j N
Area of Square jIndicator function
Numerical Area Error Bars run i r D SN N
2
2( ) 44 4
N
Area of Square
exact std of our S MC estimator
Theoretical Area Error Bars solid lines r N
https://www.dropbox.com/s/6vl1rfq457k78xi/AreaCircle.rar?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Consider now the case where We assume a
hypercube S nฮธ and the inscribed hyperball D nฮธ in
We can build the same estimator as in 2๐ท. Our indicator function now becomes .
All of our earlier derivations are still applicable. The rate of
convergence of the estimator in the mean square sense is
independent of n and equal to
This is not the case using a deterministic method on a grid of regularly
spaced points where the convergence rate is typically of the form
1/Nr/n where ๐ is related to the smoothness of the contours of D nฮธ.
Monte Carlo methods are thus attractive when n is large.
Generalization of the Dart Drop Experiment
25
, 1.n
n .
n DD
1 .N
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Assume you are interested in computing the volume of the
hypersphere of radius R = 1 in nโdimensions:
We want to do so using samples from the hypercube of
volume
Using ๐ samples, the variance of our MC estimator is
Generalization of the Dart Drop Experiment
26
2
0
12
n
nvol as nn
S
1
.2
n n n
n
n n
p p pVar Xas n
N N N
volwith p
S
1,1n
2 .
n
๐คโ๐๐๐: ๐ โผ โฌโฏ๐๐โด๐โโ๐พ ๐๐๐
http://en.wikipedia.org/wiki/N-spherehttp://en.wikipedia.org/wiki/Bernoulli_distribution
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
The coefficient of variation is then:
To obtain a reasonable relative error, we would need
For this choice:
For
Clearly a plain Monte Carlo approach is not as helpful in 20 and higher dimensions! You are trying in a huge volume of the hypercube to hit
the infinitesimal hypersphere!
Monte Carlo and the Curse of Dimensionality
27
1
n
n n
pVar X
N NCOV as n
X p Np
1100 .nN p
0.1COV
1 20 2010 1
9
0
11 10!20, 100 100 2 100 2 4.06 10nn N p
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
To obtain a fixed precision in the variance, a number of samples that
is exponential in the dimension is needed!
So using a plain Monte Carlo method still suffers from the curse of
dimensionality
MC not appropriate for rare events modeling.
Generalization of the Dart Drop Experiment
28
1, ( ) , 1.
: .
n
n
Consider : X and Var X C with
Var X CThen For N
N
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Assume i.i.d. samples
Now consider any set and assume that we are interested in
We naturally choose the following estimator:
which by the law of large numbers is a consistent estimator of
since
Generalization
29
1N
A
( ) ~ ( 1,2,..., )i i N
( ) ( ), ~ .A P A for
( )A
( )
1
1lim ( ) ( ( )) ( )
Ni
A AN
i
AN
๐(๐ด) โ๐๐ข๐๐๐๐ ๐๐ ๐ ๐๐๐๐๐๐ ๐๐ ๐ด
๐ก๐๐ก๐๐ ๐๐ข๐๐๐๐ ๐๐ ๐ ๐๐๐๐๐๐
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Now we generalize the idea to tackle the generic problem
of estimating
where and ๐ is a probability distribution on
We assume that and that cannot be
calculated analytically.
Generalization
30
( ( )) ( ) ( )f f d
: fn
f .xn
( ( ) )f ( ( ))f
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
To evaluate we consider the unbiased estimator
From the law of large numbers will converge and
A good measure of the approximation is the variance of
From the central limit (if ), we have:
Generalization
31
( )
1
1( ) ( )
Ni
N
i
S f fN
( )
1
( )1( ) ( )
Ni
N
i
Var fVar S f Var f
N N
( )
1
1lim ( ) ( ( )) ( . .)
Ni
Ni
f f a sN
( ( )),f
( )NS f
( ),NS f
var ( )f
( ) ( ) 0, ( )
( )( ) ( ) ,
N
Nd
N
Nd
N S f f Var f or
Var fS f f
N
N
N
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Generalization
32
The rate of convergence is independent of the dimension of
Integration in complex domains is now not a problem.
The method is easy to implement and rather general. You will need
to be able to evaluate
to be able to produce samples distributed according to
( )f
Some historical references on Monte Carlo Methods:
N. Metropolis and S. Ulam, The Monte Carlo Method, J Amer. Statist. Ass., Vol. 44, pp. 335-341
(1949)
N. Metropolis, The Beginning of the Monte Carlo Method, Los Alamos Science Special Issue
(1987)
http://www.amstat.org/misc/TheMonteCarloMethod.pdfhttps://www.dropbox.com/s/5iwwnfdj0nak2s9/TheMonteCarloMethod.pdf?dl=0http://library.lanl.gov/cgi-bin/getfile?00326866.pdfhttps://www.dropbox.com/s/qc5qv64krrlg57f/TheBeginningOfTheMCMethod.pdf?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Sample Representation of the MC Estimator
33
Let us introduce the Dirac-Delta function for defined for
any as follows:
Note that this implies in particular that for
For we can introduce the following mixture of Delta-
Dirac functions:
which is the empirical measure, and consider for any
0( )
0 0( ) ( ) ( )f d f
( ) ~ , 1,..., ,i i N
0
: fn
f
A
0 0 0( ) ( ) ( ) ( )A A
A
d d
A
เท๐๐(๐):=1
๐
๐=1
๐
เตฏ๐ฟ๐ ๐ (๐
เท๐๐(๐ด) โ เถฑ
๐ด
เท๐๐ ๐ ๐๐ =เท
๐=1
๐
เถฑ
๐ด
1
๐๐ฟ๐ ๐ (๐) ๐๐ =
1
๐
๐=1
๐
เตฏ๐๐ด(๐๐ = ๐๐(๐ด) (# ๐๐ ๐ ๐๐๐๐๐๐ ๐๐๐ด)
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Sample Representation of
34
The concentration of points in a given region of the space represents
This is contrast with parametric statistics where one starts with
samples and then introduces a distribution with an algebraic
representation of the underlying population.
Note that each sample has a weight of 1/๐, but that it is also possible to consider weighted sample representations of .
( )i
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Sample Representation of
35
Sample representation of a Gaussian distribution is shown below.
See here for a MatLab Implementation
Software/GaussianSampleInterpretation.mhttps://www.dropbox.com/s/tak5ny5evo3jqeh/GaussianSampleInterpretation.m?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Deterministic Vs. Monte Carlo Integration
36
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Sample Representation of the MC Estimator
37
Now consider the problem of estimating We simply replace
with its sample representation )เท๐๐(๐ and obtain:
This is precisely , the Monte Carlo estimator suggested earlier.
Clearly based on , we can easily estimate for any ๐.
For example, the variance of ๐ is approximated as:
( ).f
( )NS f
( )f)เท๐๐(๐
๐ผ๐(๐) โ เถฑ
๐ฉ
)๐(๐ เท
๐=1
๐
เถฑ
๐ด
1
๐๐ฟ๐ ๐ (๐) ๐๐ =เท
๐=1
๐
เถฑ
๐ฉ
1
๐๐(๐)๐ฟ๐ ๐ (๐) ๐๐ =
1
๐
๐=1
๐
๐ ๐ ๐
๐๐๐๐(๐) = ๐ผ๐(๐2) โ ๐ผ๐
2(๐) โ1
๐
๐=1
๐
๐2 ๐ ๐ โ1
๐
๐=1
๐
๐ ๐ ๐
2
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
From the Algebraic to the Sample Representation
38
Similarly, if we have:
The marginal distribution is given as:
If we want to estimate and is only known up to a
normalizing constant, then a reasonable estimate is the following:
arg max ( )
( )
( )( )arg max ( )
i
i
เท๐๐(๐1, ๐2) =1
๐
๐=1
๐
๐ฟ๐1๐๐2๐ ๐1 , ๐2
เท๐๐(๐1) =1
๐
๐=1
๐
๐ฟ๐1๐ ๐1
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Sampling from an Arbitrary Distribution
39
We can now see that if we can sample easily from an arbitrary
distribution, then we could easily compute any quantities of interest.
But how do we sample from an arbitrary distribution?
We will discuss about MCMC and other methods in follow up
lectures
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Monte Carlo Integration
40
We can use the same estimation for performing integration. Consider
computing the following:
We can write this integration as where the
mean is wrt the uniform distribution in ๐ท:
To compute the integral ๐ผ, we thus need to draw samples from this distribution. Our estimator will then be as follows:
The estimator converges as .
( )D
I f x dx
1( ) ( )DD
I f x dx f x
1( )
0,
if x Dx
otherwise
1
1( )
n
i
i
I f xn
1( )O
N
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Monte Carlo Integration
41
We can extend this to high (d) dimensions:
The estimator converges as regardless of the dimensionality ๐.
For a Riemann integration of this, the rate of convergence is better
where a grid of equally spaced points is used (e.g. for ๐ท = [0,1], ฮ๐ฅ =1/๐).
However, in let us say ten dimensions, ๐ท = [0,1]10, you will need O (๐10) grid points to achieve the same rate of convergence as in 1D!
Generally, the rate of convergence of the Riemann approximation of
the above integral is where ๐ is the total number of grid points and r depends on the smoothness of the domain ๐ท.
( )D
I f d x x
1( )
N
1( )N
1( )N
J. S. Liu, MC Strategies in Scientific Computing (Chapter 2, Introduction)
/(1 )r dN
http://books.google.com/books?id=R8E-yHaKCGUC&printsec=frontcover&dq=monte+carlo+strategies+in+scientific+computing&hl=en&ei=afBiTOWwNoT7lwfi1-nMCw&sa=X&oi=book_result&ct=result&resnum=1&ved=0CDUQ6AEwAA
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
( )x( )x
Riemann integration
( )x
MC integration
Monte Carlo for Integration
42
In computing with Riemann integration, one e.g.
constructs a grid with ฮ๐ฅ = 1/๐, computes and with approximation error evaluates:
In ๐ โdimensions, we need ๐๐ evaluations to keep an error
Clearly the MC estimator is slower in convergence but it
converges independently of the dimension ๐. It requires no grid of points. But you need to be able to sample from ๐(๐ฅ).
[0,1]
( )D
I f x dx
if(x )= f(iฮx)
1 1
1( ) ( )
N N
N i i
i i
I f x x f xN
1x N
if(x )
1( )
N
1N
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Monte Carlo Integration Consider calculating for any function ๐(๐ฅ), the following
integral
We can write this integral as:
Here
The Monte Carlo estimator of the integral is now:
1
0
( ) ( )f f x dx
[0,1]~X U
43
เท๐๐(๐) =1
๐
๐=1
๐
)๐(๐๐ , ๐๐ ~๐.๐.๐
๐ 0,1
๐(๐) = เถฑ
โโ
+โ
๐(๐ฅ)๐ 0,1 (๐ฅ)๐๐ฅ = ๐ผ )๐(๐
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
0 5 10 15 20 25 30 35 40 45 500.2
0.4
0.6
0.8
f(x)=x
Empirical Average
Theoretical Mean
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4f(x)=x2
Empirical Average
Theoretical Mean
0 5 10 15 20 25 30 35 40 45 50-0.5
0
0.5
1f(x)=cos( x)
Empirical Average
Theoretical Mean
Monte Carlo Integration
MatLab Implementation
( )
0.5
f x x
2( )
0.333
f x x
( ) cos
0
f x x
44
https://www.dropbox.com/s/kp98e697plbkkia/MonteCarloEx1.m?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Monte Carlo Integration: Variance From the earlier discussed properties of the MC
estimator:
where:
When the variance is unknown, we
can use its MC estimator instead: 2 ( ) ( )f Var f X
22
[0,1]( ) ( ) ( ) ( )f Var f X f x x dx
45
เท๐๐2(๐) =
1
๐ โ 1
๐=1
๐
)๐(๐๐) โ เท๐๐(๐2, เท๐๐(๐) =
1
๐
๐=1
๐
เตฏ๐(๐๐ , ๐๐ ~๐.๐.๐
๐ฐ 0,1
๐ผ เท๐๐ = ๐
๐๐๐ เท๐๐ =1
๐๐๐๐ )๐(๐ =
1
๐๐2(๐), ๐~๐ฐ 0,1
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1f(x)=x
Empirical Variance
Theoretical Variance
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1f(x)=x2
Empirical Variance
Theoretical Variance
0 5 10 15 20 25 30 35 40 45 500.1
0.2
0.3
0.4
0.5f(x)=cos( x)
Empirical Variance
Theoretical Variance
Monte Carlo Integration: Empirical Variance
MatLab Implementation
( )
( ) 1/12
f x x
Var f X
2( )
( ) 4 / 45
f x x
Var f X
( ) cos
( ) 1/ 2
f x x
Var f X
46
เท๐๐2(๐) =
1
๐ โ 1
๐=1
๐
)๐(๐๐) โ ฦธ๐๐(๐2, ๐๐ ~
๐.๐.๐๐ 0,1
https://www.dropbox.com/s/kp98e697plbkkia/MonteCarloEx1.m?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Optimal Number of MC Samples We can compute the optimal number of MC samples for a
desired accuracy of the estimator:
From the asymptotic properties we
can estimate:
where:
Approximating the variance , the number of needed ๐should satisfy:
2
2
2( )
( )
XNX N f
f
1 11 , [0,1]2
X inverse cdf of
N
2 ( )f
47
Pr | )เท๐๐(๐) โ ๐(๐ | โค ๐ = 1 โ ๐ผ โ Pr| )เท๐๐(๐) โ ๐(๐ |
เต๐
๐
โค๐ ๐
๐= 1 โ ๐ผ
๐ เท๐๐ โ ๐ โ๐โโ
๐ท๐ฉ )0, ๐2(๐
เท๐๐2(๐) โค
๐๐2
๐๐ผ2
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Optimal Number of MC Samples The condition can be satisfied iteratively
1. Start with n1 MC samples from
2. If then stop; otherwise
3. Evaluate and generate ๐1 samples
where here indicates the integer
part of ๐ฅ.
11,..., ~ [0,1]nX X U
1 11,..., ~ [0,1]n n kX X U x
48
เท๐๐2(๐) โค
๐๐2
๐๐ผ2
เท๐๐12 (๐) โค
๐1๐2
๐๐ผ2
๐1 =เตฏ๐๐ผ
2 เท๐๐12 (๐
๐2โ ๐1
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Computing Expectations
49
Expectations with respect to any distribution also involve high-
dimensional integration:
If we can draw samples from ๐(๐), then we can use the estimator
( ) ( )D
I f d x x x
( )x( )x
Riemann integration
( )x
MC integration
แ๐ผ =1
๐
๐=1
๐
(๐(๐ฅ๐), ๐ค๐๐กโ ๐ถ๐๐๐๐. ๐๐๐๐๐๐๐๐ก. แ๐ผ) =๐๐๐ )๐(๐ฅ
๐๐ผ )๐(๐ฅ )๐(๐ฅ
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
For the CMBdata, consider 2 subsamples and
We consider that both come from distributions
and
We want to decide if both means are the same, i.e. test
We assume that the variance (error) 2 is the same for both models
and use a prior
The Bayes factor can then be computed as follows:
Note that the Bayes factor does not depend on the normalizing
constant of and thus the use of an improper prior
is fine.
50
1,..., nx x 1,..., .ny y
0 : .x yH
2 21
10B
2 2 2
10 2 2 2
, , | ,
, |
x y n x y x y
n
d d dB
d d
D
D
2 2 21
Bayes Factor Approximation
J-M Marin and C. P. Robert, Bayesian Core, Chapter 2
๐ฆ1, . . . , ๐ฆ๐ โผ ๐ฉ ๐๐ฆ , ๐2 .
๐ฅ1, . . . , ๐ฅ๐ โผ ๐ฉ ๐๐ฅ , ๐2
https://www.dropbox.com/s/7w9peyrlpwcdv4m/CMBdata?dl=0http://www.springerlink.com/content/x38261/
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
We introduce a new parametrization by writing:
with the prior
This allows the same prior to be used in both ๐ป0 and the alternative ๐ป1: ๐๐ฅ โ ๐๐ฆ.
We choose the improper prior and
The Bayesโ factor can now be re-written as:
51
,x y
,
1
2
2 2 2
10 2 2 2
2 2/2 /2 12 2 2 2 2 2 2 22
2 2/2 /2 12 2 2 2 2 2 2 2
, , | ,
, |
1exp 2 exp 2
2
exp 2 exp 2
x y n x y x y
n
n n
x y
n n
x y
d d dB
d d
n x s n y s e d d d
n x s n y s d d
D
D
Bayes Factor Approximation
๐ โผ ๐ฉ(0,1).
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Bayes Factor Approximation
52
To simplify the Bayes factor, let us define:
We can then write:
Performing the integrals in ๐2:
Lets now integrate in ๐. Start with the denominator. Note that:
22 2 2
2
2 2 2
2
12 22 2
101
2 22
1
2
nx y Sn
nx y Sn
e e d d d
B
e d d
2 2
222 1 1
n n
i iyx i i
x x y yss
Sn n n n
2
2 22 2
102 2
2
1
2
n
n
x y S e d d
B
x y S d
22
2 2
22 2
x yx yx y
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Bayes Factor Approximation
53
So the denominator takes the form:
With the definitions and we can simplify as:
2
2 22 2
102 2
2
1
2
n
n
x y S e d d
B
x y S d
22
22 2
2 22 4 2
n
nn
x yx y Sx y S d d
2
2
2 4 2
2 1
x y S
n
10
( 1)/22
2 22 2 2
2
2 1
2 22 1 2
1
2
1 1
2 22 2
2 2
nn n
n n n
Common Factorin the demomin.and numer.of
B
x y
x y S d d
1/22
2
1
2 2
n
x y S
2 1,n
Here we use the normalizing constant
of the t-distribution
http://en.wikipedia.org/wiki/Student's_t-distribution
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Thus for the normal case with
54
and 0 : 0H
under prior 2 2, 1 and
2
1/22
2
01 1/22
2 /210
21
2 2 2
n
n
x y S
BB
x y S e d
For CMBdata, we simulate and approximate
with (for one simulation that we run)01B
when . Thus ๐ป0 is much more likely with the data available.
20.0888, 0.1078, 0.00875, 100x y S n
Bayes Factor Approximation
๐ฅ1, . . . , ๐ฅ๐ โผ ๐ฉ ๐ + ๐, ๐2
๐ฆ1, . . . , ๐ฆ๐ โผ ๐ฉ ๐ โ ๐, ๐2
)๐ โผ ๐ฉ(0,1
)๐1, . . . , ๐1000 โผ ๐ฉ(0,1
๐ต01๐ =
าง๐ฅ โ เดค๐ฆ 2 + 2๐2 โ๐+ ฮค1 2
ฮค๐=1
10002๐๐ + าง๐ฅ โ เดค๐ฆ
2 + 2๐2 โ๐+ ฮค1 2 1000= 43.3309
https://www.dropbox.com/s/7w9peyrlpwcdv4m/CMBdata?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
For an estimator, with , the variance can
be computed (in terms of the empirical mean) as
and for ๐ large,
We can thus find the variability
in the Bayes factor estimation.
We construct a convergence
test and of confidence bounds on
the approximation of
20 40 60 80 100 120 1400
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
55
2
1
1 1( )
1
N
N j N
j
v h x hN N
Histogram of 1000 realizations
of the approximation of
based on 1000 simulations
each.
Precision Evaluation in Bayes Factor( ) ( )I h d x x x
MatLab Implementationเตโ๐ โ ๐ผ๐ )โ(๐ )๐ฃ๐ โผ ๐ฉ(0,1
๐ต01๐
๐ต01๐
)๐ฅ๐ โผ ๐(๐
https://www.dropbox.com/s/noh9vipbkl9pfrk/PrecisionEvaluation.m?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
0 100 200 300 400 500 600 700 800 900 1000
9.6
9.8
10
10.2
10.4
10.6
For estimating a normal mean, a robust prior is a Cauchy prior
Under squared error loss, the posterior mean is given as:
Form of suggests simulating i.i.d. variables ๐1,ยท ยท ยท , ๐๐ โผ ๐ฉ(๐ฅ, 1)and calculate
LLN implies
56
~ ( ,1), ~ (0,1)x N C
2
2
/2
2
/2
2
1( )1
1
x
x
e d
x
e d
MatLab Implementation
Example (Cauchy-Normal)
๐ฟ๐๐ (๐ฅ) =
๐=1
๐๐๐
1 + ๐๐2
๐=1
๐
11 + ๐๐
2
๐ฟ๐๐ (๐ฅ) โ ๐ฟ๐(๐ฅ) ๐๐ ๐ โ โ
https://en.wikipedia.org/wiki/Cauchy_distributionhttps://www.dropbox.com/s/xxfqlloltlqchek/CauchyPrior.m?dl=0Top Related