STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs...

153
Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September 13, 2007 STA 216, Generalized Linear Models, Lecture 6

Transcript of STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs...

Page 1: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

STA 216, Generalized Linear Models, Lecture 6

September 13, 2007

STA 216, Generalized Linear Models, Lecture 6

Page 2: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Introduction to Bayes Inference for GLMsDescription of PosteriorAsymptotic Approximations

Introduction to MCMC AlgorithmsGibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

STA 216, Generalized Linear Models, Lecture 6

Page 3: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Bayesian Inference via the Posterior Distribution

I Recall that Bayesian inference is based on the posteriordistribution

π(θ |y) =π(θ)L(y | θ)∫π(θ)L(y | θ) dθ

=π(θ)L(y | θ)

L(y),

STA 216, Generalized Linear Models, Lecture 6

Page 4: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Bayesian Inference via the Posterior Distribution

I Recall that Bayesian inference is based on the posteriordistribution

π(θ |y) =π(θ)L(y | θ)∫π(θ)L(y | θ) dθ

=π(θ)L(y | θ)

L(y),

I π(θ) = prior distribution for parameter θ

STA 216, Generalized Linear Models, Lecture 6

Page 5: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Bayesian Inference via the Posterior Distribution

I Recall that Bayesian inference is based on the posteriordistribution

π(θ |y) =π(θ)L(y | θ)∫π(θ)L(y | θ) dθ

=π(θ)L(y | θ)

L(y),

I π(θ) = prior distribution for parameter θI L(y | θ) = likelihood of the data y given θ

STA 216, Generalized Linear Models, Lecture 6

Page 6: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Bayesian Inference via the Posterior Distribution

I Recall that Bayesian inference is based on the posteriordistribution

π(θ |y) =π(θ)L(y | θ)∫π(θ)L(y | θ) dθ

=π(θ)L(y | θ)

L(y),

I π(θ) = prior distribution for parameter θI L(y | θ) = likelihood of the data y given θI L(y) = marginal likelihood integrating over prior

STA 216, Generalized Linear Models, Lecture 6

Page 7: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Bayesian Inference via the Posterior Distribution

I Recall that Bayesian inference is based on the posteriordistribution

π(θ |y) =π(θ)L(y | θ)∫π(θ)L(y | θ) dθ

=π(θ)L(y | θ)

L(y),

I π(θ) = prior distribution for parameter θI L(y | θ) = likelihood of the data y given θI L(y) = marginal likelihood integrating over prior

I Good news - we have the numerator in this expression

STA 216, Generalized Linear Models, Lecture 6

Page 8: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Bayesian Inference via the Posterior Distribution

I Recall that Bayesian inference is based on the posteriordistribution

π(θ |y) =π(θ)L(y | θ)∫π(θ)L(y | θ) dθ

=π(θ)L(y | θ)

L(y),

I π(θ) = prior distribution for parameter θI L(y | θ) = likelihood of the data y given θI L(y) = marginal likelihood integrating over prior

I Good news - we have the numerator in this expression

I Bad news - the denominator is typically not available (mayinvolve high dimensional integral)

STA 216, Generalized Linear Models, Lecture 6

Page 9: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Conjugate Priors

I For conjugate priors, the posterior distribution of θ isavailable analytically

STA 216, Generalized Linear Models, Lecture 6

Page 10: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Conjugate Priors

I For conjugate priors, the posterior distribution of θ isavailable analytically

I Example: L(y | θ) =∏n

i=1N(yi;x

iβ, τ−1) (normal linearregression)

STA 216, Generalized Linear Models, Lecture 6

Page 11: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Conjugate Priors

I For conjugate priors, the posterior distribution of θ isavailable analytically

I Example: L(y | θ) =∏n

i=1N(yi;x

iβ, τ−1) (normal linearregression)

I The conjugate prior is normal-gamma:

π(β, τ) = Np(β0, τ−1Σ0)G(τ ; a, b),

where Np(·) denotes the p-variate normal &G(·) denotes the gamma

STA 216, Generalized Linear Models, Lecture 6

Page 12: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Conjugate Priors

I For conjugate priors, the posterior distribution of θ isavailable analytically

I Example: L(y | θ) =∏n

i=1N(yi;x

iβ, τ−1) (normal linearregression)

I The conjugate prior is normal-gamma:

π(β, τ) = Np(β0, τ−1Σ0)G(τ ; a, b),

where Np(·) denotes the p-variate normal &G(·) denotes the gamma

I For this prior, the posterior is also normal-gamma

STA 216, Generalized Linear Models, Lecture 6

Page 13: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Non-Conjugate Priors

I Conjugate priors are not available for generalized linearmodels (GLMs) other than the normal linear model

STA 216, Generalized Linear Models, Lecture 6

Page 14: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Non-Conjugate Priors

I Conjugate priors are not available for generalized linearmodels (GLMs) other than the normal linear model

I One can potentially rely on an asymptotic normalapproximation

STA 216, Generalized Linear Models, Lecture 6

Page 15: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Non-Conjugate Priors

I Conjugate priors are not available for generalized linearmodels (GLMs) other than the normal linear model

I One can potentially rely on an asymptotic normalapproximation

I As n → ∞, the posterior distribution is normal centered onMLE

STA 216, Generalized Linear Models, Lecture 6

Page 16: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Asymptotic Approximation with Informative Priors

I Suppose we have a N(β0,Σ0) prior for β.

STA 216, Generalized Linear Models, Lecture 6

Page 17: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Asymptotic Approximation with Informative Priors

I Suppose we have a N(β0,Σ0) prior for β.

I Asymptotic normal approximation to the posterior is

π(β |y,X) ∝ exp

{−

1

2(β − β0)Σ

−1

0(β − β0)

}

× exp

{−

1

2(β − β̂)′I(β̂)(β − β̂)′

}

∝ N(β; β̃, Σ̃β

)

STA 216, Generalized Linear Models, Lecture 6

Page 18: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Asymptotic Approximation with Informative Priors

I Suppose we have a N(β0,Σ0) prior for β.

I Asymptotic normal approximation to the posterior is

π(β |y,X) ∝ exp

{−

1

2(β − β0)Σ

−1

0(β − β0)

}

× exp

{−

1

2(β − β̂)′I(β̂)(β − β̂)′

}

∝ N(β; β̃, Σ̃β

)

I Approximate posterior mean & variance:

β̃ = Σ̃(Σ−1

0β0 + I(β̂)β̂

), Σ̃β =

(Σ−1

0+ I(β̂)

)−1

STA 216, Generalized Linear Models, Lecture 6

Page 19: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Comments on Asymptotic Approximation

I Even for moderate sample sizes, asymptotic approximationmay be inaccurate

STA 216, Generalized Linear Models, Lecture 6

Page 20: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Comments on Asymptotic Approximation

I Even for moderate sample sizes, asymptotic approximationmay be inaccurate

I In logistic regression for rare outcomes or rare binaryexposures, posterior can be highly skewed

STA 216, Generalized Linear Models, Lecture 6

Page 21: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Description of PosteriorAsymptotic Approximations

Comments on Asymptotic Approximation

I Even for moderate sample sizes, asymptotic approximationmay be inaccurate

I In logistic regression for rare outcomes or rare binaryexposures, posterior can be highly skewed

I Appealing to avoid any reliance on large sampleassumptions and base inferences on exact posterior

STA 216, Generalized Linear Models, Lecture 6

Page 22: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

MCMC - Basic Idea

I Markov chain Monte Carlo (MCMC) provides an approachfor generating samples from the posterior distribution

STA 216, Generalized Linear Models, Lecture 6

Page 23: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

MCMC - Basic Idea

I Markov chain Monte Carlo (MCMC) provides an approachfor generating samples from the posterior distribution

I Note that this does not give us an approximation toπ(θ |y) directly

STA 216, Generalized Linear Models, Lecture 6

Page 24: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

MCMC - Basic Idea

I Markov chain Monte Carlo (MCMC) provides an approachfor generating samples from the posterior distribution

I Note that this does not give us an approximation toπ(θ |y) directly

I However, from these samples we can obtain summaries ofthe posterior distribution for θ

STA 216, Generalized Linear Models, Lecture 6

Page 25: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

MCMC - Basic Idea

I Markov chain Monte Carlo (MCMC) provides an approachfor generating samples from the posterior distribution

I Note that this does not give us an approximation toπ(θ |y) directly

I However, from these samples we can obtain summaries ofthe posterior distribution for θ

I Summaries of exact posterior distributions of g(θ), for anyfunctional g(·), can also be obtained.

STA 216, Generalized Linear Models, Lecture 6

Page 26: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

How does MCMC work?

I Let θt = (θt1, . . . , θ

tp) denote the value of the p × 1 vector of

parameters at iteration t.

STA 216, Generalized Linear Models, Lecture 6

Page 27: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

How does MCMC work?

I Let θt = (θt1, . . . , θ

tp) denote the value of the p × 1 vector of

parameters at iteration t.

I θ0 = initial value used to start the chain (shouldn’t be

sensitive)

STA 216, Generalized Linear Models, Lecture 6

Page 28: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

How does MCMC work?

I Let θt = (θt1, . . . , θ

tp) denote the value of the p × 1 vector of

parameters at iteration t.

I θ0 = initial value used to start the chain (shouldn’t be

sensitive)

I MCMC generates θt from a distribution that depends onthe data & potentially on θt−1, but not on θ1, . . . , θt−2.

STA 216, Generalized Linear Models, Lecture 6

Page 29: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

How does MCMC work?

I Let θt = (θt1, . . . , θ

tp) denote the value of the p × 1 vector of

parameters at iteration t.

I θ0 = initial value used to start the chain (shouldn’t be

sensitive)

I MCMC generates θt from a distribution that depends onthe data & potentially on θt−1, but not on θ1, . . . , θt−2.

I This results in a Markov chain with stationary distributionπ(θ |y) under some conditions on the sampling distribution

STA 216, Generalized Linear Models, Lecture 6

Page 30: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Different flavors of MCMC

I The most commonly used MCMC algorithms are:

STA 216, Generalized Linear Models, Lecture 6

Page 31: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Different flavors of MCMC

I The most commonly used MCMC algorithms are:I Metropolis sampling (Metropolis et al., 1953)

STA 216, Generalized Linear Models, Lecture 6

Page 32: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Different flavors of MCMC

I The most commonly used MCMC algorithms are:I Metropolis sampling (Metropolis et al., 1953)I Metropolis-Hastings (MH) (Hastings, 1970)

STA 216, Generalized Linear Models, Lecture 6

Page 33: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Different flavors of MCMC

I The most commonly used MCMC algorithms are:I Metropolis sampling (Metropolis et al., 1953)I Metropolis-Hastings (MH) (Hastings, 1970)I Gibbs sampling (Geman & Geman, 1984; Gelfand & Smith,

1990)

STA 216, Generalized Linear Models, Lecture 6

Page 34: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Different flavors of MCMC

I The most commonly used MCMC algorithms are:I Metropolis sampling (Metropolis et al., 1953)I Metropolis-Hastings (MH) (Hastings, 1970)I Gibbs sampling (Geman & Geman, 1984; Gelfand & Smith,

1990)

I Easy overview of Gibbs - Casella & George (1992, The

American Statistician, 46, 167-174)

STA 216, Generalized Linear Models, Lecture 6

Page 35: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Different flavors of MCMC

I The most commonly used MCMC algorithms are:I Metropolis sampling (Metropolis et al., 1953)I Metropolis-Hastings (MH) (Hastings, 1970)I Gibbs sampling (Geman & Geman, 1984; Gelfand & Smith,

1990)

I Easy overview of Gibbs - Casella & George (1992, The

American Statistician, 46, 167-174)

I Easy overview of MH - Chib & Greenberg (1995, The

American Statistician)

STA 216, Generalized Linear Models, Lecture 6

Page 36: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I Start with initial value θ0 = (θ01, . . . , θ0

p)

STA 216, Generalized Linear Models, Lecture 6

Page 37: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I Start with initial value θ0 = (θ01, . . . , θ0

p)

I For iterations t = 1, . . . , T ,

STA 216, Generalized Linear Models, Lecture 6

Page 38: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I Start with initial value θ0 = (θ01, . . . , θ0

p)

I For iterations t = 1, . . . , T ,

1. Sample θt1 from the conditional posterior distribution

π(θ1 | θ2 = θt−1

2, . . . , θp = θt−1

p ,y)

STA 216, Generalized Linear Models, Lecture 6

Page 39: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I Start with initial value θ0 = (θ01, . . . , θ0

p)

I For iterations t = 1, . . . , T ,

1. Sample θt1 from the conditional posterior distribution

π(θ1 | θ2 = θt−1

2, . . . , θp = θt−1

p ,y)

2. Sample θt2 from the conditional posterior distribution

π(θ2 | θ1 = θt1, θ3 = θt−1

3, . . . , θp = θt−1

p )

STA 216, Generalized Linear Models, Lecture 6

Page 40: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I Start with initial value θ0 = (θ01, . . . , θ0

p)

I For iterations t = 1, . . . , T ,

1. Sample θt1 from the conditional posterior distribution

π(θ1 | θ2 = θt−1

2, . . . , θp = θt−1

p ,y)

2. Sample θt2 from the conditional posterior distribution

π(θ2 | θ1 = θt1, θ3 = θt−1

3, . . . , θp = θt−1

p )

3. Similarly, sample θt3, . . . , θ

tp from the conditional posterior

distributions given current values of other parameters.

STA 216, Generalized Linear Models, Lecture 6

Page 41: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling (continued)

I Under mild regularity conditions, samples converge tostationary distribution π(θ |y)

STA 216, Generalized Linear Models, Lecture 6

Page 42: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling (continued)

I Under mild regularity conditions, samples converge tostationary distribution π(θ |y)

I At the start of the sampling, the samples are not from theposterior distribution π(θ |y).

STA 216, Generalized Linear Models, Lecture 6

Page 43: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling (continued)

I Under mild regularity conditions, samples converge tostationary distribution π(θ |y)

I At the start of the sampling, the samples are not from theposterior distribution π(θ |y).

I It is necessary to discard the initial samples as a burn-in toallow convergence

STA 216, Generalized Linear Models, Lecture 6

Page 44: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling (continued)

I Under mild regularity conditions, samples converge tostationary distribution π(θ |y)

I At the start of the sampling, the samples are not from theposterior distribution π(θ |y).

I It is necessary to discard the initial samples as a burn-in toallow convergence

I In simple models such as GLMs, convergence typicallyoccurs quickly & burn-in of 100 iterations should besufficient (to be conservative SAS uses 2,000 as default)

STA 216, Generalized Linear Models, Lecture 6

Page 45: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Example - DDE & Preterm Birth

I Scientific interest: Association between DDE exposure &preterm birth adjusting for possible confounding variables

STA 216, Generalized Linear Models, Lecture 6

Page 46: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Example - DDE & Preterm Birth

I Scientific interest: Association between DDE exposure &preterm birth adjusting for possible confounding variables

I Data from US Collaborative Perinatal Project (CPP) - n =2380 children out of which 361 were born preterm

STA 216, Generalized Linear Models, Lecture 6

Page 47: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Example - DDE & Preterm Birth

I Scientific interest: Association between DDE exposure &preterm birth adjusting for possible confounding variables

I Data from US Collaborative Perinatal Project (CPP) - n =2380 children out of which 361 were born preterm

I Analysis: Bayesian analysis using a probit model

STA 216, Generalized Linear Models, Lecture 6

Page 48: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Probit Model

yi = 1 if preterm birth and yi = 0 if full-term birth

Pr(yi = 1 |xi, β) = Φ(x′

iβ),

I xi = (1, ddei, xi3, . . . , xi7)′

I xi3, . . . , xi7=possible confounders (black race, etc)

I β1 = intercept

I β2 = slope

STA 216, Generalized Linear Models, Lecture 6

Page 49: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Prior, Likelihood & Posterior

I Prior: π(β) = N(β0,Σβ)

STA 216, Generalized Linear Models, Lecture 6

Page 50: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Prior, Likelihood & Posterior

I Prior: π(β) = N(β0,Σβ)

I Likelihood:

π(y |β,X) =

n∏

i=1

Φ(x′

iβ)yi

{1 − Φ(x′

iβ)}1−yi

STA 216, Generalized Linear Models, Lecture 6

Page 51: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Prior, Likelihood & Posterior

I Prior: π(β) = N(β0,Σβ)

I Likelihood:

π(y |β,X) =

n∏

i=1

Φ(x′

iβ)yi

{1 − Φ(x′

iβ)}1−yi

I Posterior:π(β |y,X) ∝ π(β)π(y |β,X).

STA 216, Generalized Linear Models, Lecture 6

Page 52: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Prior, Likelihood & Posterior

I Prior: π(β) = N(β0,Σβ)

I Likelihood:

π(y |β,X) =

n∏

i=1

Φ(x′

iβ)yi

{1 − Φ(x′

iβ)}1−yi

I Posterior:π(β |y,X) ∝ π(β)π(y |β,X).

I No closed form available for normalizing constant

STA 216, Generalized Linear Models, Lecture 6

Page 53: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Maximum Likelihood Results

Parameter MLE SE Z stat p-value

β1 -1.08068 0.04355 -24.816 < 2e − 16β2 0.17536 0.02909 6.028 1.67e-09β3 -0.12817 0.03528 -3.633 0.000280β4 0.11097 0.03366 3.297 0.000978β5 -0.01705 0.03405 -0.501 0.616659β6 -0.08216 0.03576 -2.298 0.021571β7 0.05462 0.06473 0.844 0.398721

β2 = dde slope (highly significant increasing trend)

STA 216, Generalized Linear Models, Lecture 6

Page 54: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Bayesian Analysis - Prior Elicitation

I Ideally, read literature on preterm birth → β0 best guess ofβ

STA 216, Generalized Linear Models, Lecture 6

Page 55: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Bayesian Analysis - Prior Elicitation

I Ideally, read literature on preterm birth → β0 best guess ofβ

I Should be possible (in particular) for confoundingcoefficients

STA 216, Generalized Linear Models, Lecture 6

Page 56: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Bayesian Analysis - Prior Elicitation

I Ideally, read literature on preterm birth → β0 best guess ofβ

I Should be possible (in particular) for confoundingcoefficients

I Σ0 expresses uncertainty - place high probability in aplausible range

STA 216, Generalized Linear Models, Lecture 6

Page 57: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Bayesian Analysis - Prior Elicitation

I Ideally, read literature on preterm birth → β0 best guess ofβ

I Should be possible (in particular) for confoundingcoefficients

I Σ0 expresses uncertainty - place high probability in aplausible range

I Much better than flat priors, which can yield implausibleestimates!

STA 216, Generalized Linear Models, Lecture 6

Page 58: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Bayesian Analysis - Prior Elicitation

I Ideally, read literature on preterm birth → β0 best guess ofβ

I Should be possible (in particular) for confoundingcoefficients

I Σ0 expresses uncertainty - place high probability in aplausible range

I Much better than flat priors, which can yield implausibleestimates!

I As a default, shrinkage-type prior we use N(0, 4 × I7×7)

STA 216, Generalized Linear Models, Lecture 6

Page 59: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I We choose β0 = 0 as starting values

STA 216, Generalized Linear Models, Lecture 6

Page 60: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I We choose β0 = 0 as starting values

I MLEs or asymptotic approximation to posterior mean mayprovide better default choice

STA 216, Generalized Linear Models, Lecture 6

Page 61: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I We choose β0 = 0 as starting values

I MLEs or asymptotic approximation to posterior mean mayprovide better default choice

I Results should not depend on starting values, though forpoor starting values you may need a longer burn-in

STA 216, Generalized Linear Models, Lecture 6

Page 62: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I We choose β0 = 0 as starting values

I MLEs or asymptotic approximation to posterior mean mayprovide better default choice

I Results should not depend on starting values, though forpoor starting values you may need a longer burn-in

I For typical GLMs, such as probit models, convergence rapid

STA 216, Generalized Linear Models, Lecture 6

Page 63: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Gibbs Sampling

I We choose β0 = 0 as starting values

I MLEs or asymptotic approximation to posterior mean mayprovide better default choice

I Results should not depend on starting values, though forpoor starting values you may need a longer burn-in

I For typical GLMs, such as probit models, convergence rapid

I For illustration, we collected 1,000 iterations

STA 216, Generalized Linear Models, Lecture 6

Page 64: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Example - probit binary regression model

STA 216, Generalized Linear Models, Lecture 6

Page 65: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Posterior Summaries

Parameter Mean Median SD 95% credible interval

β1 -1.08 -1.08 0.04 (-1.16, -1.01)β2 0.17 0.17 0.03 (0.12, 0.23)β3 -0.13 -0.13 0.04 (-0.2, -0.05)β4 0.11 0.11 0.03 (0.05, 0.18)β5 -0.02 -0.02 0.03 (-0.08, 0.05)β6 -0.08 -0.08 0.04 (-0.15, -0.02)β7 0.05 0.06 0.06 (-0.07, 0.18)

STA 216, Generalized Linear Models, Lecture 6

Page 66: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Estimated Posterior Density

STA 216, Generalized Linear Models, Lecture 6

Page 67: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Inferences on Functionals

I Often, it is not the regression parameter which is ofprimary interest.

STA 216, Generalized Linear Models, Lecture 6

Page 68: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Inferences on Functionals

I Often, it is not the regression parameter which is ofprimary interest.

I One may want to estimate functionals, such as the mean atdifferent values of a predictor.

STA 216, Generalized Linear Models, Lecture 6

Page 69: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Inferences on Functionals

I Often, it is not the regression parameter which is ofprimary interest.

I One may want to estimate functionals, such as the mean atdifferent values of a predictor.

I By applying the function to every iteration of the MCMCalgorithm after burn-in, one can obtain samples from themarginal posterior density of the unknown of interest.

STA 216, Generalized Linear Models, Lecture 6

Page 70: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Estimated Dose Response Function

STA 216, Generalized Linear Models, Lecture 6

Page 71: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Metropolis-Hastings Sampling

I Gibbs sampling requires sampling from the conditionalposterior distributions

STA 216, Generalized Linear Models, Lecture 6

Page 72: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Metropolis-Hastings Sampling

I Gibbs sampling requires sampling from the conditionalposterior distributions

I Metropolis-Hastings is an alternative that avoids thisrestriction

STA 216, Generalized Linear Models, Lecture 6

Page 73: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Metropolis-Hastings Sampling

I Gibbs sampling requires sampling from the conditionalposterior distributions

I Metropolis-Hastings is an alternative that avoids thisrestriction

I Again start with an initial value θ0 and sequentially updatethe parameters θ1, . . . , θp

STA 216, Generalized Linear Models, Lecture 6

Page 74: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Metropolis-Hastings (continued)

I To draw θtj:

STA 216, Generalized Linear Models, Lecture 6

Page 75: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Metropolis-Hastings (continued)

I To draw θtj:

1. Sample a candidate θ̃tj ∼ qj(· | θ

t−1

j )

STA 216, Generalized Linear Models, Lecture 6

Page 76: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Metropolis-Hastings (continued)

I To draw θtj:

1. Sample a candidate θ̃tj ∼ qj(· | θ

t−1

j )

2. Let θtj = θ̃t

j with probability

min

{1,

π(θ̃tj) L(y | θj = θ̃t

j ,−) qj(θt−1

j | θ̃tj)

π(θt−1

j ) L(y | θj = θt−1

j ,−) qj(θ̃tj | θ

tj)

},

L(y | θj = θ̃tj ,−)=likelihood given θj = θ̃t

j and currentvalues of other parameters

STA 216, Generalized Linear Models, Lecture 6

Page 77: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Metropolis-Hastings (continued)

I To draw θtj:

1. Sample a candidate θ̃tj ∼ qj(· | θ

t−1

j )

2. Let θtj = θ̃t

j with probability

min

{1,

π(θ̃tj) L(y | θj = θ̃t

j ,−) qj(θt−1

j | θ̃tj)

π(θt−1

j ) L(y | θj = θt−1

j ,−) qj(θ̃tj | θ

tj)

},

L(y | θj = θ̃tj ,−)=likelihood given θj = θ̃t

j and currentvalues of other parameters

3. Otherwise let θtj = θt−1

j .

STA 216, Generalized Linear Models, Lecture 6

Page 78: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Comments on Metropolis-Hastings

I Performance sensitive to the proposal distributions,qj(· | θ

t−1

j )

STA 216, Generalized Linear Models, Lecture 6

Page 79: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Comments on Metropolis-Hastings

I Performance sensitive to the proposal distributions,qj(· | θ

t−1

j )

I Most common proposal is N(θt−1

j , κ), which is centered onthe previous value

STA 216, Generalized Linear Models, Lecture 6

Page 80: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Comments on Metropolis-Hastings

I Performance sensitive to the proposal distributions,qj(· | θ

t−1

j )

I Most common proposal is N(θt−1

j , κ), which is centered onthe previous value

I This results in a Metropolis random walk

STA 216, Generalized Linear Models, Lecture 6

Page 81: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Comments on Metropolis-Hastings

I Performance sensitive to the proposal distributions,qj(· | θ

t−1

j )

I Most common proposal is N(θt−1

j , κ), which is centered onthe previous value

I This results in a Metropolis random walk

I Inefficient if κ is too small or too large

STA 216, Generalized Linear Models, Lecture 6

Page 82: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Adaptive Rejection Sampling (ARS)

I ARS (Gilks & Wild, 1992) - approach to implement Gibbssampling for log-concave conditional distributions

STA 216, Generalized Linear Models, Lecture 6

Page 83: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Adaptive Rejection Sampling (ARS)

I ARS (Gilks & Wild, 1992) - approach to implement Gibbssampling for log-concave conditional distributions

I Uses sequentially defined envelopes around target density,leading to some additional computational expense

STA 216, Generalized Linear Models, Lecture 6

Page 84: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Adaptive Rejection Sampling (ARS)

I ARS (Gilks & Wild, 1992) - approach to implement Gibbssampling for log-concave conditional distributions

I Uses sequentially defined envelopes around target density,leading to some additional computational expense

I Log concavity holds for most GLMs and typical priors

STA 216, Generalized Linear Models, Lecture 6

Page 85: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Adaptive Rejection Sampling (ARS)

I ARS (Gilks & Wild, 1992) - approach to implement Gibbssampling for log-concave conditional distributions

I Uses sequentially defined envelopes around target density,leading to some additional computational expense

I Log concavity holds for most GLMs and typical priors

I When violated adaptive rejection Metropolis sampling(ARMS) (Gilks et al., 1995) used.

STA 216, Generalized Linear Models, Lecture 6

Page 86: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

SAS Implementation

I BGENMOD, BLIFEREG, BPHREG all rely on ARS(when possible) or ARMS

STA 216, Generalized Linear Models, Lecture 6

Page 87: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

SAS Implementation

I BGENMOD, BLIFEREG, BPHREG all rely on ARS(when possible) or ARMS

I Hence, SAS uses Gibbs sampling for posterior computation

STA 216, Generalized Linear Models, Lecture 6

Page 88: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

SAS Implementation

I BGENMOD, BLIFEREG, BPHREG all rely on ARS(when possible) or ARMS

I Hence, SAS uses Gibbs sampling for posterior computation

I Important to diagnose convergence & mixing wheneverusing MCMC!!

STA 216, Generalized Linear Models, Lecture 6

Page 89: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Some Terminology

I Convergence: initial drift in the samples towards astationary distribution

STA 216, Generalized Linear Models, Lecture 6

Page 90: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Some Terminology

I Convergence: initial drift in the samples towards astationary distribution

I Burn-in: samples at start of the chain that are discarded toallow convergence

STA 216, Generalized Linear Models, Lecture 6

Page 91: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Some Terminology

I Convergence: initial drift in the samples towards astationary distribution

I Burn-in: samples at start of the chain that are discarded toallow convergence

I Slow mixing: tendency for high autocorrelation in thesamples.

STA 216, Generalized Linear Models, Lecture 6

Page 92: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Some Terminology

I Convergence: initial drift in the samples towards astationary distribution

I Burn-in: samples at start of the chain that are discarded toallow convergence

I Slow mixing: tendency for high autocorrelation in thesamples.

I Thinning: practice of collecting every kth iteration toreduce autocorrelation

STA 216, Generalized Linear Models, Lecture 6

Page 93: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Some Terminology

I Convergence: initial drift in the samples towards astationary distribution

I Burn-in: samples at start of the chain that are discarded toallow convergence

I Slow mixing: tendency for high autocorrelation in thesamples.

I Thinning: practice of collecting every kth iteration toreduce autocorrelation

I Trace plot: plot of sampled values of a parameter vsiteration #

STA 216, Generalized Linear Models, Lecture 6

Page 94: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Example - trace plot with poor mixing

STA 216, Generalized Linear Models, Lecture 6

Page 95: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Poor mixing Gibbs sampler

I Exhibits “snaking” behavior in trace plot with cyclic localtrends in the mean

STA 216, Generalized Linear Models, Lecture 6

Page 96: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Poor mixing Gibbs sampler

I Exhibits “snaking” behavior in trace plot with cyclic localtrends in the mean

I Poor mixing in the Gibbs sampler caused by high posteriorcorrelation in the parameters

STA 216, Generalized Linear Models, Lecture 6

Page 97: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Poor mixing Gibbs sampler

I Exhibits “snaking” behavior in trace plot with cyclic localtrends in the mean

I Poor mixing in the Gibbs sampler caused by high posteriorcorrelation in the parameters

I Decreases efficiency & many more samples need to becollected to maintain low Monte Carlo error in posteriorsummaries

STA 216, Generalized Linear Models, Lecture 6

Page 98: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Poor mixing Gibbs sampler

I Exhibits “snaking” behavior in trace plot with cyclic localtrends in the mean

I Poor mixing in the Gibbs sampler caused by high posteriorcorrelation in the parameters

I Decreases efficiency & many more samples need to becollected to maintain low Monte Carlo error in posteriorsummaries

I For very poor mixing chain, may even need millions ofiterations.

STA 216, Generalized Linear Models, Lecture 6

Page 99: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Poor mixing Gibbs sampler

I Exhibits “snaking” behavior in trace plot with cyclic localtrends in the mean

I Poor mixing in the Gibbs sampler caused by high posteriorcorrelation in the parameters

I Decreases efficiency & many more samples need to becollected to maintain low Monte Carlo error in posteriorsummaries

I For very poor mixing chain, may even need millions ofiterations.

I Routinely examine trace plots!

STA 216, Generalized Linear Models, Lecture 6

Page 100: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Example - trace plot with good mixing

STA 216, Generalized Linear Models, Lecture 6

Page 101: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics

I Diagnostics available to help decide on number of burn-in& collected samples

STA 216, Generalized Linear Models, Lecture 6

Page 102: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics

I Diagnostics available to help decide on number of burn-in& collected samples

I Note: no definitive tests of convergence & you should checkconvergence of all parameters

STA 216, Generalized Linear Models, Lecture 6

Page 103: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics

I Diagnostics available to help decide on number of burn-in& collected samples

I Note: no definitive tests of convergence & you should checkconvergence of all parameters

I With experience visual inspection of trace plots perhapsmost useful approach

STA 216, Generalized Linear Models, Lecture 6

Page 104: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics

I Diagnostics available to help decide on number of burn-in& collected samples

I Note: no definitive tests of convergence & you should checkconvergence of all parameters

I With experience visual inspection of trace plots perhapsmost useful approach

I There are a number of useful automated tests

STA 216, Generalized Linear Models, Lecture 6

Page 105: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics in SAS

I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence

STA 216, Generalized Linear Models, Lecture 6

Page 106: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics in SAS

I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence

I Geweke: applies test of stationarity to a single chain

STA 216, Generalized Linear Models, Lecture 6

Page 107: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics in SAS

I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence

I Geweke: applies test of stationarity to a single chain

I Heidelberger-Welch (stationarity): alternative to Geweke

STA 216, Generalized Linear Models, Lecture 6

Page 108: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics in SAS

I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence

I Geweke: applies test of stationarity to a single chain

I Heidelberger-Welch (stationarity): alternative to Geweke

I Heidelberger-Welch (halfwidth): # samples adequate forestimation of posterior mean?

STA 216, Generalized Linear Models, Lecture 6

Page 109: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics in SAS

I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence

I Geweke: applies test of stationarity to a single chain

I Heidelberger-Welch (stationarity): alternative to Geweke

I Heidelberger-Welch (halfwidth): # samples adequate forestimation of posterior mean?

I Raftery-Lewis: # samples needed for desired accuracy inestimating percentiles.

STA 216, Generalized Linear Models, Lecture 6

Page 110: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics in SAS

I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence

I Geweke: applies test of stationarity to a single chain

I Heidelberger-Welch (stationarity): alternative to Geweke

I Heidelberger-Welch (halfwidth): # samples adequate forestimation of posterior mean?

I Raftery-Lewis: # samples needed for desired accuracy inestimating percentiles.

I autocorrelation: high values indicate slow mixing

STA 216, Generalized Linear Models, Lecture 6

Page 111: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence diagnostics in SAS

I Gelman-Rubin: uses parallel chains with dispersed initialvalues to test convergence

I Geweke: applies test of stationarity to a single chain

I Heidelberger-Welch (stationarity): alternative to Geweke

I Heidelberger-Welch (halfwidth): # samples adequate forestimation of posterior mean?

I Raftery-Lewis: # samples needed for desired accuracy inestimating percentiles.

I autocorrelation: high values indicate slow mixing

I effective sample size: low value relative to actual #indicates slow mixing

STA 216, Generalized Linear Models, Lecture 6

Page 112: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Practical advice on convergence diagnosis

I The Gelman-Rubin approach is quite appealing in usingmultiple chains

STA 216, Generalized Linear Models, Lecture 6

Page 113: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Practical advice on convergence diagnosis

I The Gelman-Rubin approach is quite appealing in usingmultiple chains

I Geweke & Heidelberger-Welch sometimes reject even whenthe trace plots look good

STA 216, Generalized Linear Models, Lecture 6

Page 114: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Practical advice on convergence diagnosis

I The Gelman-Rubin approach is quite appealing in usingmultiple chains

I Geweke & Heidelberger-Welch sometimes reject even whenthe trace plots look good

I Overly sensitive to minor departures from stationarity thatdo not impact inferences

STA 216, Generalized Linear Models, Lecture 6

Page 115: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Practical advice on convergence diagnosis

I The Gelman-Rubin approach is quite appealing in usingmultiple chains

I Geweke & Heidelberger-Welch sometimes reject even whenthe trace plots look good

I Overly sensitive to minor departures from stationarity thatdo not impact inferences

I Sometimes this can be solved with more iterations

STA 216, Generalized Linear Models, Lecture 6

Page 116: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Practical advice on convergence diagnosis

I The Gelman-Rubin approach is quite appealing in usingmultiple chains

I Geweke & Heidelberger-Welch sometimes reject even whenthe trace plots look good

I Overly sensitive to minor departures from stationarity thatdo not impact inferences

I Sometimes this can be solved with more iterations

I Otherwise, you may want to try multiple chains

STA 216, Generalized Linear Models, Lecture 6

Page 117: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Practical advice on convergence diagnosis

I The Gelman-Rubin approach is quite appealing in usingmultiple chains

I Geweke & Heidelberger-Welch sometimes reject even whenthe trace plots look good

I Overly sensitive to minor departures from stationarity thatdo not impact inferences

I Sometimes this can be solved with more iterations

I Otherwise, you may want to try multiple chains

I For the models considered in SAS, chains tend to be verywell behaved when the MLE exists or priors are informative

STA 216, Generalized Linear Models, Lecture 6

Page 118: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

How to summarize results from the MCMC chain?

I Posterior mean: θ̂ = 1/(T − B)∑T

t=B+1θt, with B = #

burn-in samples, T = total # samples

STA 216, Generalized Linear Models, Lecture 6

Page 119: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

How to summarize results from the MCMC chain?

I Posterior mean: θ̂ = 1/(T − B)∑T

t=B+1θt, with B = #

burn-in samples, T = total # samplesI Posterior mean most commonly used point estimate and

provides alternative to MLE (note - posterior modedifficult to estimate accurately from MCMC)

STA 216, Generalized Linear Models, Lecture 6

Page 120: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

How to summarize results from the MCMC chain?

I Posterior mean: θ̂ = 1/(T − B)∑T

t=B+1θt, with B = #

burn-in samples, T = total # samplesI Posterior mean most commonly used point estimate and

provides alternative to MLE (note - posterior modedifficult to estimate accurately from MCMC)

I Posterior median (50th percentile of {θt}Tt=B+1

) providesalternative point estimate

STA 216, Generalized Linear Models, Lecture 6

Page 121: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

How to summarize results from the MCMC chain?

I Posterior mean: θ̂ = 1/(T − B)∑T

t=B+1θt, with B = #

burn-in samples, T = total # samplesI Posterior mean most commonly used point estimate and

provides alternative to MLE (note - posterior modedifficult to estimate accurately from MCMC)

I Posterior median (50th percentile of {θt}Tt=B+1

) providesalternative point estimate

I Posterior standard deviation calculated as square root of

v̂ar(θj |y) =1

T − B − 1

T∑

t=B+1

(θtj − θ̂j)

2.

STA 216, Generalized Linear Models, Lecture 6

Page 122: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

How to summarize results from the MCMC chain?

I Posterior mean: θ̂ = 1/(T − B)∑T

t=B+1θt, with B = #

burn-in samples, T = total # samplesI Posterior mean most commonly used point estimate and

provides alternative to MLE (note - posterior modedifficult to estimate accurately from MCMC)

I Posterior median (50th percentile of {θt}Tt=B+1

) providesalternative point estimate

I Posterior standard deviation calculated as square root of

v̂ar(θj |y) =1

T − B − 1

T∑

t=B+1

(θtj − θ̂j)

2.

I As n increases, we obtain

π(θj |y) ≈ N(θj; θ̂j, v̂ar(θj |y)

).

STA 216, Generalized Linear Models, Lecture 6

Page 123: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Interval estimates

I As a Bayesian alternative to the confidence interval, onecan use a credible interval

STA 216, Generalized Linear Models, Lecture 6

Page 124: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Interval estimates

I As a Bayesian alternative to the confidence interval, onecan use a credible interval

I The 100(1 − α)% credible interval ranges from the α/2 to1 − α/2 percentiles of {θt}T

t=B+1.

STA 216, Generalized Linear Models, Lecture 6

Page 125: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Interval estimates

I As a Bayesian alternative to the confidence interval, onecan use a credible interval

I The 100(1 − α)% credible interval ranges from the α/2 to1 − α/2 percentiles of {θt}T

t=B+1.

I A highest posterior density (HPD) interval can also becalculated - smallest interval containing true parameterwith 100(1 − α) posterior probability

STA 216, Generalized Linear Models, Lecture 6

Page 126: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Posterior probabilities

I Often interest focuses on the weight of evidence ofH1 : θj > 0

STA 216, Generalized Linear Models, Lecture 6

Page 127: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Posterior probabilities

I Often interest focuses on the weight of evidence ofH1 : θj > 0

I One can use the estimated posterior probability:

P̂r(θj > 0 |data) =1

T − B

T∑

t=B+1

1(θtj > 0),

with 1(θtj > 0) = 1 if θt

j > 0 and 0 otherwise.

STA 216, Generalized Linear Models, Lecture 6

Page 128: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Posterior probabilities

I Often interest focuses on the weight of evidence ofH1 : θj > 0

I One can use the estimated posterior probability:

P̂r(θj > 0 |data) =1

T − B

T∑

t=B+1

1(θtj > 0),

with 1(θtj > 0) = 1 if θt

j > 0 and 0 otherwise.

I A high value (e.g., greater than 0.95) suggests strongevidence in favor of H1

STA 216, Generalized Linear Models, Lecture 6

Page 129: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Marginal posterior density estimation

I Summary statistics such as the mean, median, standarddeviation, etc provide an incomplete picture

STA 216, Generalized Linear Models, Lecture 6

Page 130: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Marginal posterior density estimation

I Summary statistics such as the mean, median, standarddeviation, etc provide an incomplete picture

I Since we have many samples from the posterior, we canaccurately estimate the exact posterior density

STA 216, Generalized Linear Models, Lecture 6

Page 131: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Marginal posterior density estimation

I Summary statistics such as the mean, median, standarddeviation, etc provide an incomplete picture

I Since we have many samples from the posterior, we canaccurately estimate the exact posterior density

I This can be done using a kernel-smoothed densityestimation procedure applied to the samples {θt

j}Tt=B+1

.

STA 216, Generalized Linear Models, Lecture 6

Page 132: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Illustration - linear regression

I Lewis & Taylor (1967) - study of weight (yi) in 237 students

STA 216, Generalized Linear Models, Lecture 6

Page 133: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Illustration - linear regression

I Lewis & Taylor (1967) - study of weight (yi) in 237 students

I The model is as follows:

yi = β0 + β1x1i + β2x2i + β3x3i + εi, i = 1, . . . , 237,

x1i=height in feet - 5 feetx2i=age in years - 16x3i=1 for males, 0 for females

STA 216, Generalized Linear Models, Lecture 6

Page 134: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Illustration - linear regression

I Lewis & Taylor (1967) - study of weight (yi) in 237 students

I The model is as follows:

yi = β0 + β1x1i + β2x2i + β3x3i + εi, i = 1, . . . , 237,

x1i=height in feet - 5 feetx2i=age in years - 16x3i=1 for males, 0 for females

I Implemented in SAS Proc BGENMOD - 2,000 burn-in &10,000 collected

STA 216, Generalized Linear Models, Lecture 6

Page 135: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Output and diagnostics - intercept (β0)

STA 216, Generalized Linear Models, Lecture 6

Page 136: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Output and diagnostics - height (β1)

STA 216, Generalized Linear Models, Lecture 6

Page 137: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Output and diagnostics - age (β2)

STA 216, Generalized Linear Models, Lecture 6

Page 138: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Output and diagnostics - male (β3)

STA 216, Generalized Linear Models, Lecture 6

Page 139: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Mixing - Autocorrelation in MCMC samples

Parameter Lag1 Lag5 Lag10 Lag50

Intercept 0.5489 0.0114 -0.0107 0.0009height 0.5166 -0.0124 0.0112 0.0042age 0.4634 -0.0068 -0.0038 0.0032male 0.5613 0.0294 -0.0170 0.0017

Precision -0.0039 -0.0088 -0.0042 0.0018Conclusion: Very good mixing

STA 216, Generalized Linear Models, Lecture 6

Page 140: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Tests of convergence

Gelman-Rubin GewekeParameter Estimate 97.5% z Pr > |z|

Intercept 1.0000 1.0002 0.5871 0.5572height 1.0004 1.0013 1.7153 0.0863age 1.0003 1.0012 -1.3831 0.1666male 1.0001 1.0005 -1.2658 0.2056

Precision 1.0003 1.0010 2.4947 0.0126Gelman-Rubin: values ≈ 1 suggest convergenceGeweke: convergence suggested except for precisionHeidelberger-Welsh: all parameters passed

STA 216, Generalized Linear Models, Lecture 6

Page 141: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Output and diagnostics - precision (τ)

STA 216, Generalized Linear Models, Lecture 6

Page 142: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Number of samples sufficient?

I Raftery-Lewis: 3746 samples needed for +/- 0.005 accuracyin estimating 0.025 quantile (10,000 sufficient number)

STA 216, Generalized Linear Models, Lecture 6

Page 143: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Number of samples sufficient?

I Raftery-Lewis: 3746 samples needed for +/- 0.005 accuracyin estimating 0.025 quantile (10,000 sufficient number)

I Heidelberger-Welsh: 10,000 samples sufficient for accuratemean estimation - except for the male coefficient

STA 216, Generalized Linear Models, Lecture 6

Page 144: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Number of samples sufficient?

I Raftery-Lewis: 3746 samples needed for +/- 0.005 accuracyin estimating 0.025 quantile (10,000 sufficient number)

I Heidelberger-Welsh: 10,000 samples sufficient for accuratemean estimation - except for the male coefficient

I Effective sample size: ranged between 3033.5 - 3740.2 forregression coefficients

STA 216, Generalized Linear Models, Lecture 6

Page 145: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Number of samples sufficient?

I Raftery-Lewis: 3746 samples needed for +/- 0.005 accuracyin estimating 0.025 quantile (10,000 sufficient number)

I Heidelberger-Welsh: 10,000 samples sufficient for accuratemean estimation - except for the male coefficient

I Effective sample size: ranged between 3033.5 - 3740.2 forregression coefficients

I 10,000 Gibbs samples contain as much information as3033.5-3740.2 independent draws

STA 216, Generalized Linear Models, Lecture 6

Page 146: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Posterior summaries

10,000 Samples

Parameter Mean SD 95% CI 95% HPD

Intercept 96.155 1.138 [93.906, 98.352] [93.866, 98.294]height 3.103 0.272 [2.576, 3.642] [2.550, 3.611]age 2.390 0.566 [1.272, 3.492] [1.282. 3.4980male -0.280 1.601 [-3.3601, 2.948] [-3.344, 2.961]

precision 0.0071 0.00066 [0.0058, 0.0084] [0.0058, 0.0084]50,000 Samples

Intercept 96.207 1.145 [93.968, 98.457] [93.997, 98.482]height 3.107 0.267 [2.581, 3.627] [2.574, 3.619]age 2.375 0.562 [1.265, 3.467] [1.268, 3.470]male -0.353 1.605 [-3.495, 2.825] [-3.451, 2.863]

precision 0.0071 0.00065 [0.0059, 0.0084] [0.0058, 0.0084]

STA 216, Generalized Linear Models, Lecture 6

Page 147: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence Diagnostics (50,000 samples)

I Gelman-Rubin 97.5% bound max of 1.0001

STA 216, Generalized Linear Models, Lecture 6

Page 148: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence Diagnostics (50,000 samples)

I Gelman-Rubin 97.5% bound max of 1.0001

I Geweke p-values minimum of 0.4716

STA 216, Generalized Linear Models, Lecture 6

Page 149: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence Diagnostics (50,000 samples)

I Gelman-Rubin 97.5% bound max of 1.0001

I Geweke p-values minimum of 0.4716

I Heidelberger-Welsh passed for all parameters

STA 216, Generalized Linear Models, Lecture 6

Page 150: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Convergence Diagnostics (50,000 samples)

I Gelman-Rubin 97.5% bound max of 1.0001

I Geweke p-values minimum of 0.4716

I Heidelberger-Welsh passed for all parameters

I Conclusion: for longer chain, no evidence of lack ofconvergence

STA 216, Generalized Linear Models, Lecture 6

Page 151: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Discussion

I Overall picture suggests convergence, good mixing andsufficient number of collected samples

STA 216, Generalized Linear Models, Lecture 6

Page 152: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Discussion

I Overall picture suggests convergence, good mixing andsufficient number of collected samples

I Don’t take rejection of one convergence test too seriously iftrace plot looks good

STA 216, Generalized Linear Models, Lecture 6

Page 153: STA 216, Generalized Linear Models, Lecture 6Outline Introduction to Bayes Inference for GLMs Introduction to MCMC Algorithms STA 216, Generalized Linear Models, Lecture 6 September

OutlineIntroduction to Bayes Inference for GLMs

Introduction to MCMC Algorithms

Gibbs sampling & Metropolis-HastingsConvergence & MixingInference from MCMC samplesIllustration

Discussion

I Overall picture suggests convergence, good mixing andsufficient number of collected samples

I Don’t take rejection of one convergence test too seriously iftrace plot looks good

I Rejection motivates collection of additional samples tomake sure inferences do not change

STA 216, Generalized Linear Models, Lecture 6