Download - 1 Bayesian Methods with Monte Carlo Markov Chains III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University [email protected].

1

Bayesian Methods with Monte Carlo Markov

Chains III

Henry Horng-Shing LuInstitute of Statistics

National Chiao Tung [email protected]

http://tigpbp.iis.sinica.edu.tw/courses.htm

2

Part 8 More Examples of Gibbs

Sampling

3

An Example with Three Random Variables (1)

To sample (X,Y,N) as follows:

1 1

( , , ) ~ ( , , )

(1 ) ,!

where 0,1,2,..., , 0 1, 0,1,2,...,

, are known, and is a constant.

nx n x

X Y N f x y n

nc y y ex n

x n y n

c

4


One can see that

( | , ) (1 ) ~ ( , ),x n xnf x y n y y Binomial n y

x

1 1( | , ) (1 ) ~ ( , ),x n xf y x n y y Beta x n x (1 ) ((1 ) )

( | , ) ~ ((1 ) ).( )!

y n xe yf n x x y Poisson y

n x

5


Gibbs sampling Algorithm:1. Initial Setting: t=0,

2. Sample a value (xt+1,yt+1) from

3. t=t+1, repeat step 2 until convergence.

0

0

0 0 0

~ [0,1] or a arbitrary value [0,1]

~ [1, ) or a arbitrary integer value [1, )

~ ( , )

y Unif

n Discrete Unif

x Bin n y

1

1 1

1 1 1

~ ( , )

~ ((1 ) )

~ ( , )

t t t t

t t t

t t t

y Beta x n x

n x Possion y

x Bin n y

6

An Example with Three Random Variables by R

10000 samples with α=2, β=7 and λ=16

7

An Example with Three Random Variables by C (1)

10000 samples with α=2, β=7 and λ=16

8


9


1010

Example 1 in Genetics (1) Two linked loci with alleles A and a, and B and

b A, B: dominant a, b: recessive

A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB, ab

F (Female) 1- r’ r’ (female recombination fraction)

M (Male) 1-r r (male recombination fraction)

A

B b

a B

A

b

a a

B

b

A

A

B b

a

10

1111

Example 1 in Genetics (2) r and r’ are the recombination rates for male a

nd female Suppose the parental origin of these heterozyg

ote is from the mating of . The problem is to estimate r and r’ from the offspring of selfed heterozygotes.

Fisher, R. A. and Balmukand, B. (1928). The estimation of linkage from the offspring of selfed heterozygotes. Journal of Genetics, 20, 79–92.

http://en.wikipedia.org/wiki/Genetics http://www2.isye.gatech.edu/~brani/isyebayes/bank/handout12.pdf

AABB aabb

11

1212

Example 1 in Genetics (3)MALE

AB (1-r)/2

ab(1-r)/2

aBr/2

Abr/2

FEMALE

AB (1-r’)/2

AABB (1-r) (1-r’)/4

aABb(1-r) (1-r’)/4

aABBr (1-r’)/4

AABbr (1-r’)/4

ab(1-r’)/2

AaBb(1-r) (1-r’)/4

aabb(1-r) (1-r’)/4

aaBbr (1-r’)/4

Aabbr (1-r’)/4

aB r’/2

AaBB(1-r) r’/4

aabB(1-r) r’/4

aaBBr r’/4

AabBr r’/4

Ab r’/2

AABb(1-r) r’/4

aAbb(1-r) r’/4

aABbr r’/4

AAbb r r’/4

12

1313

Example 1 in Genetics (4) Four distinct phenotypes: A*B*, A*b*, a*B* and a*b*. A*: the dominant phenotype from (Aa, AA, aA). a*: the recessive phenotype from aa. B*: the dominant phenotype from (Bb, BB, bB). b* : the recessive phenotype from bb. A*B*: 9 gametic combinations. A*b*: 3 gametic combinations. a*B*: 3 gametic combinations. a*b*: 1 gametic combination. Total: 16 combinations.

13

1414

Example 1 in Genetics (5)

Let (1 )(1 '), then

2( * *) ,

41

( * *) ( * *) ,4

( * *) .4

r r

P A B

P A b P a B

P a b

14

15

Hence, the random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution:

15

Example 1 in Genetics (6)

2 1 1; , , , .

4 4 4 4Multinomial n

We know that (1 )(1 '), 0 1/ 2, and 0 ' 1/ 2.

So, 1 1/ 4.

r r r r

15

16

Example 1 in Genetics (7) Suppose that we observe the data of y = (y1, y

2, y3, y4) = (125, 18, 20, 24), which is a random sample from

Then the probability mass function is

2 1 1; , , , .

4 4 4 4Multinomial n

2 31 4

1 2 3 4

! 2 1( | ) ( ) ( ) ( ) .

! ! ! ! 4 4 4y yy yn

f yy y y y

16

17

Example 1 in Genetics (8) How to estimate

MME (shown in the last week):http://en.wikipedia.org/wiki/Method_of_moments_%28statistics%29

MLE (shown in the last week):http://en.wikipedia.org/wiki/Maximum_likelihood

Bayesian Method:http://en.wikipedia.org/wiki/Bayesian_method

?

18

Example 1 in Genetics (9) As the value of is between ¼ and 1, we

can assume that the prior distribution of is Uniform (¼,1).

The posterior distribution is

The integration in the above denominator,

does not have a close form.

( | ) ( )( | ) .

( | ) ( )

f y ff y

f y f d

( | ) ( ) ,f y f d

19

Example 1 in Genetics (10) We will consider the mean of posterior

distribution (the posterior mean),

The Monte Carlo Markov Chains method is a good method to estimate

even if and the posterior mean do not have close forms.

( | ) ( | )E y f y d

( | ) ( | ) .E y f y d

( | ) ( )f y f d

20

Example 1 by R Direct numerical integration when

We can assume other prior distributions to compare the results of posterior means: Beta(1,1), Beta(2,2), Beta(2,3), Beta(3,2), Beta(0.5,0.5), Beta(10-5,10-5)

1~ ( ,1) :

4U

21

Example 1 by C/C++

Replace other prior distribution, such as Beta(1,1),…,Beta(1e-5,1e-5)

22

Beta Prior

23

Comparison for Example 1 (1)

Estimate Method

Estimate Method

MME 0.683616 BayesianBeta(2,3)

0.564731

MLE 0.663165 BayesianBeta(3,2)

0.577575

BayesianU(¼,1)

0.573931 BayesianBeta(½,½)

0.574928

BayesianBeta(1,1)

0.573918 BayesianBeta(10-5,10-5)

0.588925

BayesianBeta(2,2)

0.572103 BayesianBeta(10-7,10-7)

show below

24

Comparison for Example 1 (2)

Estimate Method

Estimate Method

BayesianBeta(10,10)

0.559905 BayesianBeta(10-7,10-7)

0.193891

BayesianBeta(102,102)

0.520366 BayesianBeta(10-7,10-7)

0.400567


0.500273 BayesianBeta(10-7,10-7)

0.737646


0.500027 BayesianBeta(10-7,10-7)

0.641388

BayesianBeta(10n,10n)

BayesianBeta(10-7,10-7)

Not stationary

0.5n

25

Part 9 Gibbs Sampling

Strategy

26

Sampling Strategy (1) Strategy I:

Run one chain for a long time. After some “Burn-in” period, sample points

every some fixed number of steps.

The code example of Gibbs sampling in the previous lecture use sampling strategy I.

http://www.cs.technion.ac.il/~cs236372/tirgul09.ps

Burn-in N samples from one chain

27

Sampling Strategy (2) Strategy II:

Run the chain N times, each run for M steps. Each run starts from a different state points. Return the last state in each run.

N samples from the last sample of each chain

Burn-in

28

Sampling Strategy (3) Strategy II by R:

29

Sampling Strategy (4) Strategy II by C/C++:

30

Strategy Comparison Strategy I:

Perform “burn-in” only once and save time. Samples might be correlated (--although only

weakly). Strategy II:

Better chance of “covering” the space points especially if the chain is slow to reach stationary.

This must perform “burn-in” steps for each chain and spend more time.

31

Hybrid Strategies (1) Run several chains and sample few

samples from each. Combines benefits of both strategies.

N samples from each chain

Burn-in

32

Hybrid Strategies (2) Hybrid Strategy by R:

33

Hybrid Strategies (3) Hybrid Strategy by C/C++:

34

Part 10 Metropolis-Hastings

Algorithm

35

Metropolis-Hastings Algorithm (1) Another kind of the MCMC methods. The Metropolis-Hastings algorithm can draw sa

mples from any probability distribution π(x), requiring only that a function proportional to the density can be calculated at x.

Process in three steps: Set up a Markov chain; Run the chain until stationary; Estimate with Monte Carlo methods.

http://en.wikipedia.org/wiki/Metropolis-Hastings_algorithm

36

Metropolis-Hastings Algorithm (2) Let be a probability density (or mass)

function (pdf or pmf). f(‧) is any function and we want to estimate

Construct P={Pij} the transition matrix of an irreducible Markov chain with states 1,2,…,n, whereand π is its unique stationary distribution.

1( ,..., )n

.

1

( ) ( )n

i

i

I E f f i

1Pr{ | }, (1,2..., )ij t t tP X j X i X n

37

Metropolis-Hastings Algorithm (3) Run this Markov chain for times t=1,…,N

and calculate the Monte Carlo sum

then

Sheldon M. Ross(1997). Proposition 4.3. Introduction to Probability Model. 7th ed.

http://nlp.stanford.edu/local/talks/mcmc_2004_07_01.ppt

1

1ˆ { },N

t

t

I f XN

ˆ as .I I N

38

Metropolis-Hastings Algorithm (4) In order to perform this method for a given distri

bution π , we must construct a Markov chain transition matrix P with π as its stationary distribution, i.e. πP= π.

Consider the matrix P was made to satisfy the reversibility condition that for all i and j, πiPij= πjPij.

The property ensures that

and hence π is a stationary distribution for P. P for all ,i ij j

i

j

39

Metropolis-Hastings Algorithm (5) Let a proposal Q={Qij} be irreducible where Qij

=Pr(Xt+1=j|xt=i), and range of Q is equal to range of π.

But π is not have to a stationary distribution of Q.

Process: Tweak Qij to yield π.

States from Qij

not π Tweak States from Pij π

40

Metropolis-Hastings Algorithm (6) We assume that Pij has the form

where is called accepted probability, i.e. given Xt=i,

( , ) ( ),

1 ,

ij ij

ii ij

i j

P Q i j i j

P P

( , )i j

1

1

with probability ( , ) take

with probability 1- ( , )

t

t

X j i j

X i i j

41

Metropolis-Hastings Algorithm (7)

WLOG for some (i,j), . In order to achieve equality (*), one can introd

uce a probability on the left-hand side and set on the right-hand side.

For ,

( , ) ( , ) (*)

i ij j ji

i jij ji

i j P P

Q i j Q j i

i jij jiQ Q

( , ) 1i j ( , ) 1j i

42

Metropolis-Hastings Algorithm (8) Then

These arguments imply that the accepted probability must be

( , ) ( , )

( , ) .

i j jij ji ji

j ji

i ij

Q i j Q j i Q

Qi j

Q

( , ) min 1 , .j ji

i ij

Qi j

Q

( , )i j

43

Metropolis-Hastings Algorithm (9) M-H Algorithm:

Step 1: Choose an irreducible Markov chain transition matrix Q with transition probability Qij.

Step 2: Let t=0 and initialize X0 from states in Q.

Step 3 (Proposal Step): Given Xt=i, sample Y=j form QiY.

44

Metropolis-Hastings Algorithm (10) M-H Algorithm (cont.):

Step 4 (Acceptance Step):Generate a random number U from Uniform(0,1).

Step 5: t=t+1, repeat Step 3~5 until convergence.

1

1

If ( , ) set ,

else .

t

t t

U i j X Y j

X X i

45

Metropolis-Hastings Algorithm (11) An Example of Step

3~5:Qij

X1= Y1

X2= Y1

X3= Y3

‧ ‧

‧ ‧

‧ ‧

XN

PijTweak

Y1

Y2

Y3

‧

‧

‧

YN

X(t) Y

0

1 0

0 1

1

2 1

1 2

2

1 ,

1 ,0 1

0 ,

1 1

2 ,

2 ,1 2

1 ,

2 1

3 ,

1. from and

( ) ( , ) min 1 ,

( )

accepted

2. from Q and

( ) ( , ) min 1 ,

( )

not accepted

3. from Q

X Y

Y X

X Y

X Y

Y X

X Y

X Y

Y Q

Y QX Y

X Q

X Y

Y

Y QX Y

X Q

X Y

Y

2 1

1 2

3 ,2 3

2 ,

3 3

and

( ) ( , ) min 1 ,

( )

accepted

and so on.

Y X

X Y

Y Qa X Y

X Q

X Y

46

Metropolis-Hastings Algorithm (12) We may define a “rejection rate” as the prop

ortion of times t for which Xt+1=Xt. Clearly, in choosing Q, high rejection rates are to be avoided.

Example:

Xt

π

Y

1

will be small ( )

and it is likely that

More Step3~5 are needed .

t

t t

Y

X

X X

47

Example (1) Simulate a bivariate normal distribution:

1 1 11 122

2 2 21 22

1

1/2

~ ( , ),

1exp( ( ) ( ))

2i.e. ( ) .2 | |

T

XX N

X

X XX

48

Example (2) Metropolis-Hastings Algorithm:

1. 2.

3.

4.

5.

0 , i=0.X

1 2

11 2

2

Generate ~ ( 1,1) and ~ ( 1,1)

that and are independent, then U .i

U U U U

UU U

U

.i i iY X U

1

1

( ) w.p. ( , )=min{1, },

( )

w.p. 1- ( , ).

ii i i i

i

i i i i

YX Y X Y

X

X X X Y

1, repeat step 2~4 until convergence.i i

49

Example of M-H Algorithm by R

50

Example of M-H Algorithm by C (1)

51


52


53

An Figure to Check Simulation Results

0 1 2 3 4 5 6

45

67

89

10

X1

X2

Black points are simulated samples; color points are probability density.

54

Exercises Write your own programs similar to those

examples presented in this talk, including Example 1 in Genetics and other examples.

Write programs for those examples mentioned at the reference web pages.

Write programs for the other examples that you know.

54