MCMC: Metropolis Algorithm · 2012. 8. 21. · Gibbs Sampling If we can solve a conditional...

25
MCMC: Metropolis Algorithm

Transcript of MCMC: Metropolis Algorithm · 2012. 8. 21. · Gibbs Sampling If we can solve a conditional...

  • MCMC:Metropolis Algorithm

  • Idea: Random samples from the posterior● Approximate PDF with the histogram● Performs Monte Carlo Integration● Allows all quantities of interest to be calculated

    from the sample (mean, quantiles, var, etc)

    TRUE Samplemean 5.000 5.000median 5.000 5.004var 9.000 9.006Lower CI -0.880 -0.881Upper CI 10.880 10.872

  • Outline

    ● Different numerical techniques for sampling from the posterior– Rejection Sampling– Inverse Distribution Sampling– Markov Chain-Monte Carlo (MCMC)

    ● Metropolis● Metropolis-Hastings● Gibbs sampling

    ● Sampling conditionals vs full model● Flexibility to specify complex models

  • Markov Chain Monte Carlo

    1) Start from some initial parameter value

    2) Evaluate the unnormalized posterior

    3) Propose a new parameter value

    4) Evaluate the new unnormalized posterior

    5) Decide whether or not to accept the new value

    6) Repeat 3-5

  • Metropolis Algorithm

    ● Most popular form of MCMC● Can be applied to most any problem● Implementation requires little additional thought

    beyond writing the model● Evaluation/Tuning does require the most skill &

    experience● Indirect Method

    – Requires a second distribution to propose steps

  • Metropolis Algorithm1) Start from some initial parameter value c

    2) Evaluate the unnormalized posterior p(c)

    3) Propose a new parameter value Random draw from a “jump” distribution centered on the current parameter value

    4) Evaluate the new unnormalized posterior p()

    5) Decide whether or not to accept the new valueAccept new value with probability

    a = p() / p(c)

    6) Repeat 3-5

  • Jump distribution● For Metropolis, Jump distribution J must be

    SYMMETRICJ(| c) = J(c| )

    ● Most common Jump distribution is the NormalJ(| c) = N(| c,υ)

    ● User must set the variance of the jump– Trial-and-error– Tune to get acceptance rate 30-70%– Low acceptance = decrease variance (smaller step)– Hi acceptance = increase variance (bigger step)

  • Example

    ● Normal with known variance, unknown mean– Prior: N(53,10000)– Data: y = 43– Known variance: 100– Initial conditions, 3 chains starting at -100, 0, 100– Jump distribution = Normal– Jump variance = 3,10,30

  • Acceptance = 12%

    Acceptance = 70%

    Acceptance = 90%

  • Multivariate example

    ● Bivariate Normal ● Option 1: Draw from joint distribution

    ● Option 2: Draw from each parameter iteratively

    N 2 [ 00] ,[ 1 0.50.5 1 ]

    J=N 2 [1∗2∗] ∣ [ 1c2c] ,V J 1=N 1

    ∗ ∣1c ,V 1

    J 2=N 2∗ ∣2

    c ,V 2

  • Joint

    Iterative

  • Joint

    Iterative

  • Joint

    Iterative

  • Joint

    Iterative

  • Metropolis-Hastings

    ● Generalization of Metropolis● Allows for asymmetric Jump distribution● Acceptance criteria

    ● Most commonly arise due to bounds on parameter values / non-normal Jump distributions

    a= p∗/ J ∗∣ c

    pc/ J c∣∗

  • Building Complex Models: Conditional Sampling

    ● Consider a more complex model

    ● When sampling for each parameter iteratively, only need to consider distributions that have that parameter

    pb , ,∣y ∝ p y∣b p b∣ , p p

    pb∣ , , y ∝ p y∣b p b∣ ,

    p∣b , , y ∝ p b∣ , p

    p∣b , , y ∝ p b∣ , p

  • In one MCMC step

    p bg1∣g ,g , y ∝ p y∣bg p bg∣g ,g

    p g1∣bg1 ,g , y ∝ p bg1∣g ,g p g

    p g1∣bg1 ,g1 , y ∝ pbg1∣g1 ,g p g

  • Gibbs Sampling

    ● If we can solve a conditional distribution analytically– We can sample from that conditional directly – Even if we can't solve for full joint distribution analytically

    ● Example

    p ,2∣y ∝N y∣ ,2N ∣0,V 0 IG 2∣ ,

  • =N n2g y02 n2g 12 , 1 12g 12 p 2g1∣g1 ,y ∝ N y∣g1 ,2g IG 2g∣ ,

    p g1∣2g ,y ∝N y∣g ,2gN g∣0,V 0

    = IG n2 ,12∑ y i−g12

  • Trade-offs of Gibbs Sampling

    ● 100% acceptance rate● Quicker mixing, quicker convergence● No need to specify a Jump distribution● No need to tune a Jump variance● Requires that we know the analytical solution

    for the conditional– Conjugate distributions

    ● Can mix different sampler within a MCMC

    Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25