1
Bayesian Methods with Monte Carlo Markov
Chains III
Henry Horng-Shing LuInstitute of Statistics
National Chiao Tung [email protected]
http://tigpbp.iis.sinica.edu.tw/courses.htm
2
Part 8 More Examples of Gibbs
Sampling
3
An Example with Three Random Variables (1)
To sample (X,Y,N) as follows:
1 1
( , , ) ~ ( , , )
(1 ) ,!
where 0,1,2,..., , 0 1, 0,1,2,...,
, are known, and is a constant.
nx n x
X Y N f x y n
nc y y ex n
x n y n
c
4
An Example with Three Random Variables (2)
One can see that
( | , ) (1 ) ~ ( , ),x n xnf x y n y y Binomial n y
x
1 1( | , ) (1 ) ~ ( , ),x n xf y x n y y Beta x n x (1 ) ((1 ) )
( | , ) ~ ((1 ) ).( )!
y n xe yf n x x y Poisson y
n x
5
An Example with Three Random Variables (3)
Gibbs sampling Algorithm:1. Initial Setting: t=0,
2. Sample a value (xt+1,yt+1) from
3. t=t+1, repeat step 2 until convergence.
0
0
0 0 0
~ [0,1] or a arbitrary value [0,1]
~ [1, ) or a arbitrary integer value [1, )
~ ( , )
y Unif
n Discrete Unif
x Bin n y
1
1 1
1 1 1
~ ( , )
~ ((1 ) )
~ ( , )
t t t t
t t t
t t t
y Beta x n x
n x Possion y
x Bin n y
6
An Example with Three Random Variables by R
10000 samples with α=2, β=7 and λ=16
7
An Example with Three Random Variables by C (1)
10000 samples with α=2, β=7 and λ=16
8
An Example with Three Random Variables by C (2)
9
An Example with Three Random Variables by C (3)
1010
Example 1 in Genetics (1) Two linked loci with alleles A and a, and B and
b A, B: dominant a, b: recessive
A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB, ab
F (Female) 1- r’ r’ (female recombination fraction)
M (Male) 1-r r (male recombination fraction)
A
B b
a B
A
b
a a
B
b
A
A
B b
a
10
1111
Example 1 in Genetics (2) r and r’ are the recombination rates for male a
nd female Suppose the parental origin of these heterozyg
ote is from the mating of . The problem is to estimate r and r’ from the offspring of selfed heterozygotes.
Fisher, R. A. and Balmukand, B. (1928). The estimation of linkage from the offspring of selfed heterozygotes. Journal of Genetics, 20, 79–92.
http://en.wikipedia.org/wiki/Genetics http://www2.isye.gatech.edu/~brani/isyebayes/bank/handout12.pdf
AABB aabb
11
1212
Example 1 in Genetics (3)MALE
AB (1-r)/2
ab(1-r)/2
aBr/2
Abr/2
FEMALE
AB (1-r’)/2
AABB (1-r) (1-r’)/4
aABb(1-r) (1-r’)/4
aABBr (1-r’)/4
AABbr (1-r’)/4
ab(1-r’)/2
AaBb(1-r) (1-r’)/4
aabb(1-r) (1-r’)/4
aaBbr (1-r’)/4
Aabbr (1-r’)/4
aB r’/2
AaBB(1-r) r’/4
aabB(1-r) r’/4
aaBBr r’/4
AabBr r’/4
Ab r’/2
AABb(1-r) r’/4
aAbb(1-r) r’/4
aABbr r’/4
AAbb r r’/4
12
1313
Example 1 in Genetics (4) Four distinct phenotypes: A*B*, A*b*, a*B* and a*b*. A*: the dominant phenotype from (Aa, AA, aA). a*: the recessive phenotype from aa. B*: the dominant phenotype from (Bb, BB, bB). b* : the recessive phenotype from bb. A*B*: 9 gametic combinations. A*b*: 3 gametic combinations. a*B*: 3 gametic combinations. a*b*: 1 gametic combination. Total: 16 combinations.
13
1414
Example 1 in Genetics (5)
Let (1 )(1 '), then
2( * *) ,
41
( * *) ( * *) ,4
( * *) .4
r r
P A B
P A b P a B
P a b
14
15
Hence, the random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution:
15
Example 1 in Genetics (6)
2 1 1; , , , .
4 4 4 4Multinomial n
We know that (1 )(1 '), 0 1/ 2, and 0 ' 1/ 2.
So, 1 1/ 4.
r r r r
15
16
Example 1 in Genetics (7) Suppose that we observe the data of y = (y1, y
2, y3, y4) = (125, 18, 20, 24), which is a random sample from
Then the probability mass function is
2 1 1; , , , .
4 4 4 4Multinomial n
2 31 4
1 2 3 4
! 2 1( | ) ( ) ( ) ( ) .
! ! ! ! 4 4 4y yy yn
f yy y y y
16
17
Example 1 in Genetics (8) How to estimate
MME (shown in the last week):http://en.wikipedia.org/wiki/Method_of_moments_%28statistics%29
MLE (shown in the last week):http://en.wikipedia.org/wiki/Maximum_likelihood
Bayesian Method:http://en.wikipedia.org/wiki/Bayesian_method
?
18
Example 1 in Genetics (9) As the value of is between ¼ and 1, we
can assume that the prior distribution of is Uniform (¼,1).
The posterior distribution is
The integration in the above denominator,
does not have a close form.
( | ) ( )( | ) .
( | ) ( )
f y ff y
f y f d
( | ) ( ) ,f y f d
19
Example 1 in Genetics (10) We will consider the mean of posterior
distribution (the posterior mean),
The Monte Carlo Markov Chains method is a good method to estimate
even if and the posterior mean do not have close forms.
( | ) ( | )E y f y d
( | ) ( | ) .E y f y d
( | ) ( )f y f d
20
Example 1 by R Direct numerical integration when
We can assume other prior distributions to compare the results of posterior means: Beta(1,1), Beta(2,2), Beta(2,3), Beta(3,2), Beta(0.5,0.5), Beta(10-5,10-5)
1~ ( ,1) :
4U
21
Example 1 by C/C++
Replace other prior distribution, such as Beta(1,1),…,Beta(1e-5,1e-5)
22
Beta Prior
23
Comparison for Example 1 (1)
Estimate Method
Estimate Method
MME 0.683616 BayesianBeta(2,3)
0.564731
MLE 0.663165 BayesianBeta(3,2)
0.577575
BayesianU(¼,1)
0.573931 BayesianBeta(½,½)
0.574928
BayesianBeta(1,1)
0.573918 BayesianBeta(10-5,10-5)
0.588925
BayesianBeta(2,2)
0.572103 BayesianBeta(10-7,10-7)
show below
24
Comparison for Example 1 (2)
Estimate Method
Estimate Method
BayesianBeta(10,10)
0.559905 BayesianBeta(10-7,10-7)
0.193891
BayesianBeta(102,102)
0.520366 BayesianBeta(10-7,10-7)
0.400567
BayesianBeta(104,104)
0.500273 BayesianBeta(10-7,10-7)
0.737646
BayesianBeta(105,105)
0.500027 BayesianBeta(10-7,10-7)
0.641388
BayesianBeta(10n,10n)
BayesianBeta(10-7,10-7)
Not stationary
0.5n
25
Part 9 Gibbs Sampling
Strategy
26
Sampling Strategy (1) Strategy I:
Run one chain for a long time. After some “Burn-in” period, sample points
every some fixed number of steps.
The code example of Gibbs sampling in the previous lecture use sampling strategy I.
http://www.cs.technion.ac.il/~cs236372/tirgul09.ps
Burn-in N samples from one chain
27
Sampling Strategy (2) Strategy II:
Run the chain N times, each run for M steps. Each run starts from a different state points. Return the last state in each run.
N samples from the last sample of each chain
Burn-in
28
Sampling Strategy (3) Strategy II by R:
29
Sampling Strategy (4) Strategy II by C/C++:
30
Strategy Comparison Strategy I:
Perform “burn-in” only once and save time. Samples might be correlated (--although only
weakly). Strategy II:
Better chance of “covering” the space points especially if the chain is slow to reach stationary.
This must perform “burn-in” steps for each chain and spend more time.
31
Hybrid Strategies (1) Run several chains and sample few
samples from each. Combines benefits of both strategies.
N samples from each chain
Burn-in
32
Hybrid Strategies (2) Hybrid Strategy by R:
33
Hybrid Strategies (3) Hybrid Strategy by C/C++:
34
Part 10 Metropolis-Hastings
Algorithm
35
Metropolis-Hastings Algorithm (1) Another kind of the MCMC methods. The Metropolis-Hastings algorithm can draw sa
mples from any probability distribution π(x), requiring only that a function proportional to the density can be calculated at x.
Process in three steps: Set up a Markov chain; Run the chain until stationary; Estimate with Monte Carlo methods.
http://en.wikipedia.org/wiki/Metropolis-Hastings_algorithm
36
Metropolis-Hastings Algorithm (2) Let be a probability density (or mass)
function (pdf or pmf). f(‧) is any function and we want to estimate
Construct P={Pij} the transition matrix of an irreducible Markov chain with states 1,2,…,n, whereand π is its unique stationary distribution.
1( ,..., )n
.
1
( ) ( )n
i
i
I E f f i
1Pr{ | }, (1,2..., )ij t t tP X j X i X n
37
Metropolis-Hastings Algorithm (3) Run this Markov chain for times t=1,…,N
and calculate the Monte Carlo sum
then
Sheldon M. Ross(1997). Proposition 4.3. Introduction to Probability Model. 7th ed.
http://nlp.stanford.edu/local/talks/mcmc_2004_07_01.ppt
1
1ˆ { },N
t
t
I f XN
ˆ as .I I N
38
Metropolis-Hastings Algorithm (4) In order to perform this method for a given distri
bution π , we must construct a Markov chain transition matrix P with π as its stationary distribution, i.e. πP= π.
Consider the matrix P was made to satisfy the reversibility condition that for all i and j, πiPij= πjPij.
The property ensures that
and hence π is a stationary distribution for P. P for all ,i ij j
i
j
39
Metropolis-Hastings Algorithm (5) Let a proposal Q={Qij} be irreducible where Qij
=Pr(Xt+1=j|xt=i), and range of Q is equal to range of π.
But π is not have to a stationary distribution of Q.
Process: Tweak Qij to yield π.
States from Qij
not π Tweak States from Pij π
40
Metropolis-Hastings Algorithm (6) We assume that Pij has the form
where is called accepted probability, i.e. given Xt=i,
( , ) ( ),
1 ,
ij ij
ii ij
i j
P Q i j i j
P P
( , )i j
1
1
with probability ( , ) take
with probability 1- ( , )
t
t
X j i j
X i i j
41
Metropolis-Hastings Algorithm (7)
WLOG for some (i,j), . In order to achieve equality (*), one can introd
uce a probability on the left-hand side and set on the right-hand side.
For ,
( , ) ( , ) (*)
i ij j ji
i jij ji
i j P P
Q i j Q j i
i jij jiQ Q
( , ) 1i j ( , ) 1j i
42
Metropolis-Hastings Algorithm (8) Then
These arguments imply that the accepted probability must be
( , ) ( , )
( , ) .
i j jij ji ji
j ji
i ij
Q i j Q j i Q
Qi j
Q
( , ) min 1 , .j ji
i ij
Qi j
Q
( , )i j
43
Metropolis-Hastings Algorithm (9) M-H Algorithm:
Step 1: Choose an irreducible Markov chain transition matrix Q with transition probability Qij.
Step 2: Let t=0 and initialize X0 from states in Q.
Step 3 (Proposal Step): Given Xt=i, sample Y=j form QiY.
44
Metropolis-Hastings Algorithm (10) M-H Algorithm (cont.):
Step 4 (Acceptance Step):Generate a random number U from Uniform(0,1).
Step 5: t=t+1, repeat Step 3~5 until convergence.
1
1
If ( , ) set ,
else .
t
t t
U i j X Y j
X X i
45
Metropolis-Hastings Algorithm (11) An Example of Step
3~5:Qij
X1= Y1
X2= Y1
X3= Y3
‧ ‧
‧ ‧
‧ ‧
XN
PijTweak
Y1
Y2
Y3
‧
‧
‧
YN
X(t) Y
0
1 0
0 1
1
2 1
1 2
2
1 ,
1 ,0 1
0 ,
1 1
2 ,
2 ,1 2
1 ,
2 1
3 ,
1. from and
( ) ( , ) min 1 ,
( )
accepted
2. from Q and
( ) ( , ) min 1 ,
( )
not accepted
3. from Q
X Y
Y X
X Y
X Y
Y X
X Y
X Y
Y Q
Y QX Y
X Q
X Y
Y
Y QX Y
X Q
X Y
Y
2 1
1 2
3 ,2 3
2 ,
3 3
and
( ) ( , ) min 1 ,
( )
accepted
and so on.
Y X
X Y
Y Qa X Y
X Q
X Y
46
Metropolis-Hastings Algorithm (12) We may define a “rejection rate” as the prop
ortion of times t for which Xt+1=Xt. Clearly, in choosing Q, high rejection rates are to be avoided.
Example:
Xt
π
Y
1
will be small ( )
and it is likely that
More Step3~5 are needed .
t
t t
Y
X
X X
47
Example (1) Simulate a bivariate normal distribution:
1 1 11 122
2 2 21 22
1
1/2
~ ( , ),
1exp( ( ) ( ))
2i.e. ( ) .2 | |
T
XX N
X
X XX
48
Example (2) Metropolis-Hastings Algorithm:
1. 2.
3.
4.
5.
0 , i=0.X
1 2
11 2
2
Generate ~ ( 1,1) and ~ ( 1,1)
that and are independent, then U .i
U U U U
UU U
U
.i i iY X U
1
1
( ) w.p. ( , )=min{1, },
( )
w.p. 1- ( , ).
ii i i i
i
i i i i
YX Y X Y
X
X X X Y
1, repeat step 2~4 until convergence.i i
49
Example of M-H Algorithm by R
50
Example of M-H Algorithm by C (1)
51
Example of M-H Algorithm by C (2)
52
Example of M-H Algorithm by C (3)
53
An Figure to Check Simulation Results
0 1 2 3 4 5 6
45
67
89
10
X1
X2
Black points are simulated samples; color points are probability density.
54
Exercises Write your own programs similar to those
examples presented in this talk, including Example 1 in Genetics and other examples.
Write programs for those examples mentioned at the reference web pages.
Write programs for the other examples that you know.
54
Top Related