Bayesian Statistics: Model Uncertainty & Missing Data Introduction to Bayesian Statistics Model...

171
Outline Introduction to Bayesian Statistics Model Uncertainty Missing Data Concluding Remarks Bayesian Statistics: Model Uncertainty & Missing Data David Dunson National Institute of Environmental Health Sciences, NIH March 1, 2007 David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

Transcript of Bayesian Statistics: Model Uncertainty & Missing Data Introduction to Bayesian Statistics Model...

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Bayesian Statistics: Model Uncertainty & Missing

Data

David Dunson

National Institute of Environmental Health Sciences, NIH

March 1, 2007

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Introduction to Bayesian StatisticsBasic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Model UncertaintyFormulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Missing DataGeneral FormulationPosterior Computation

Concluding Remarks

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Illustration: Patient Diagnoses

I For the past two weeks, Sue has been feeling weak and hashad nausea.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Illustration: Patient Diagnoses

I For the past two weeks, Sue has been feeling weak and hashad nausea.

I Although she suspects a stomach virus, she visits the doctorbecause the symptoms have been persisting.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Illustration: Patient Diagnoses

I For the past two weeks, Sue has been feeling weak and hashad nausea.

I Although she suspects a stomach virus, she visits the doctorbecause the symptoms have been persisting.

I The doctor also suspects a virus, but collects blood samplesand orders several tests to verify that there aren’t moreserious problems.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Illustration: Patient Diagnoses

I For the past two weeks, Sue has been feeling weak and hashad nausea.

I Although she suspects a stomach virus, she visits the doctorbecause the symptoms have been persisting.

I The doctor also suspects a virus, but collects blood samplesand orders several tests to verify that there aren’t moreserious problems.

I The tests come back and Sue has an abnormally low whitecell count.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Updating Subjective Probabilities

I Formalizing this problem statistically, let D = (D1, . . . ,DK )′,with Dk = 1 if disease k is the cause of Sue’s symptoms.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Updating Subjective Probabilities

I Formalizing this problem statistically, let D = (D1, . . . ,DK )′,with Dk = 1 if disease k is the cause of Sue’s symptoms.

I D1 = 1 if Sue has a virus or bacterial infection, D2 = 1 ifcancer, D3 = 1 if a parasitic infection of type 1, etc

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Updating Subjective Probabilities

I Formalizing this problem statistically, let D = (D1, . . . ,DK )′,with Dk = 1 if disease k is the cause of Sue’s symptoms.

I D1 = 1 if Sue has a virus or bacterial infection, D2 = 1 ifcancer, D3 = 1 if a parasitic infection of type 1, etc

I During the first few days of her illness, Sue estimated herprobability of a virus or bacterial infection asPr(D1 = 1) = π1(0) > 0.99.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Updating Subjective Probabilities

I Formalizing this problem statistically, let D = (D1, . . . ,DK )′,with Dk = 1 if disease k is the cause of Sue’s symptoms.

I D1 = 1 if Sue has a virus or bacterial infection, D2 = 1 ifcancer, D3 = 1 if a parasitic infection of type 1, etc

I During the first few days of her illness, Sue estimated herprobability of a virus or bacterial infection asPr(D1 = 1) = π1(0) > 0.99.

I After two weeks, her estimated probability gradually decreasedto π1(2) = 0.95.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Updating Subjective Probabilities

I Formalizing this problem statistically, let D = (D1, . . . ,DK )′,with Dk = 1 if disease k is the cause of Sue’s symptoms.

I D1 = 1 if Sue has a virus or bacterial infection, D2 = 1 ifcancer, D3 = 1 if a parasitic infection of type 1, etc

I During the first few days of her illness, Sue estimated herprobability of a virus or bacterial infection asPr(D1 = 1) = π1(0) > 0.99.

I After two weeks, her estimated probability gradually decreasedto π1(2) = 0.95.

I With the abnormally low white cell count test, this probabilitydecreased further to π1(3) = 0.90.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Updating Subjective Probabilities

I We can formalize the process by which Sue’s value of π1

changes using Bayes rule:

π1(t) =π1(0) L(datat |D1 = 1)

π1(0) L(datat |D1 = 1) + {1 − π1(0)} L(datat |D1 = 0),

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Updating Subjective Probabilities

I We can formalize the process by which Sue’s value of π1

changes using Bayes rule:

π1(t) =π1(0) L(datat |D1 = 1)

π1(0) L(datat |D1 = 1) + {1 − π1(0)} L(datat |D1 = 0),

I t = 0 corresponds to the time of symptom onset

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Updating Subjective Probabilities

I We can formalize the process by which Sue’s value of π1

changes using Bayes rule:

π1(t) =π1(0) L(datat |D1 = 1)

π1(0) L(datat |D1 = 1) + {1 − π1(0)} L(datat |D1 = 0),

I t = 0 corresponds to the time of symptom onsetI π1(0) is Sue’s guess at the probability of D1 = 1 at t = 0

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Updating Subjective Probabilities

I We can formalize the process by which Sue’s value of π1

changes using Bayes rule:

π1(t) =π1(0) L(datat |D1 = 1)

π1(0) L(datat |D1 = 1) + {1 − π1(0)} L(datat |D1 = 0),

I t = 0 corresponds to the time of symptom onsetI π1(0) is Sue’s guess at the probability of D1 = 1 at t = 0I L(datat |D = d) = likelihood of data at time t given D = d

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Updating Subjective Probabilities

I We can formalize the process by which Sue’s value of π1

changes using Bayes rule:

π1(t) =π1(0) L(datat |D1 = 1)

π1(0) L(datat |D1 = 1) + {1 − π1(0)} L(datat |D1 = 0),

I t = 0 corresponds to the time of symptom onsetI π1(0) is Sue’s guess at the probability of D1 = 1 at t = 0I L(datat |D = d) = likelihood of data at time t given D = dI “data” = ongoing symptoms, test results, input by Sue’s

physician, reading on the web, etc

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Learning from Sue

I For Sue this updating process continued until an astutephysician diagnosed a parasitic infection after several months.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Learning from Sue

I For Sue this updating process continued until an astutephysician diagnosed a parasitic infection after several months.

I Most medical diagnoses proceed in a similar manner, with thepatient & physicians probabilities updated through Bayesianlearning.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Learning from Sue

I For Sue this updating process continued until an astutephysician diagnosed a parasitic infection after several months.

I Most medical diagnoses proceed in a similar manner, with thepatient & physicians probabilities updated through Bayesianlearning.

I Scientific research evolves in a similar manner, with priorinsights updated as new data become available.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Learning from Sue

I For Sue this updating process continued until an astutephysician diagnosed a parasitic infection after several months.

I Most medical diagnoses proceed in a similar manner, with thepatient & physicians probabilities updated through Bayesianlearning.

I Scientific research evolves in a similar manner, with priorinsights updated as new data become available.

I Bayesian statistics seeks to formalize the process of learningthrough the accrual of evidence from different sources.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Illustration: Normal Linear Regression

I Suppose we collect data consisting of a response, yi , andpredictors xi = (xi1, . . . , xip)

′, for subjects i = 1, . . . , n.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Illustration: Normal Linear Regression

I Suppose we collect data consisting of a response, yi , andpredictors xi = (xi1, . . . , xip)

′, for subjects i = 1, . . . , n.

I For example, yi may be birth weight, with xi factorspotentially predictive of birth weight.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Illustration: Normal Linear Regression

I Suppose we collect data consisting of a response, yi , andpredictors xi = (xi1, . . . , xip)

′, for subjects i = 1, . . . , n.

I For example, yi may be birth weight, with xi factorspotentially predictive of birth weight.

I Assuming yi is normally distributed conditionally on xi , wehave the likelihood function:

L(y; θ,X) =n∏

i=1

(2πσ2)−1/2 exp

{−

1

2σ2(yi − x′iβ)2

},

with θ = (β, σ2).

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Classical Model Fitting and Inferences

I Often, interest focuses on inferences on the regressioncoefficients β

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Classical Model Fitting and Inferences

I Often, interest focuses on inferences on the regressioncoefficients β

I In order to estimate β, the standard approach is to usemaximum likelihood estimation, which results in the leastsquares estimator:

β̂ = (X′X)−1X′y,

with X = (x1, . . . , xn)′ and y = (y1, . . . , yn)

′.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Classical Model Fitting and Inferences

I Often, interest focuses on inferences on the regressioncoefficients β

I In order to estimate β, the standard approach is to usemaximum likelihood estimation, which results in the leastsquares estimator:

β̂ = (X′X)−1X′y,

with X = (x1, . . . , xn)′ and y = (y1, . . . , yn)

′.

I One can obtain confidence intervals for βj and test whetherβj = 0 using standard approaches.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Prior Knowledge

I In most applications, prior knowledge is available about θbefore observing the data in the current study.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Prior Knowledge

I In most applications, prior knowledge is available about θbefore observing the data in the current study.

I For example, in investigating factors predictive of birthweight, one can rely on a rich literature from previous studies.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Prior Knowledge

I In most applications, prior knowledge is available about θbefore observing the data in the current study.

I For example, in investigating factors predictive of birthweight, one can rely on a rich literature from previous studies.

I The classical paradigm relies entirely upon the current data →often not enough information to obtain accurate estimates ofall parameters

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Bayesian Paradigm

I Bayesian instead choose a prior distribution to quantify thestate of knowledge about θ before observing the current data.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Bayesian Paradigm

I Bayesian instead choose a prior distribution to quantify thestate of knowledge about θ before observing the current data.

I The prior distribution, π(θ), effectively treats the parametersas random variables

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Bayesian Paradigm

I Bayesian instead choose a prior distribution to quantify thestate of knowledge about θ before observing the current data.

I The prior distribution, π(θ), effectively treats the parametersas random variables

I Inferences are then based on the posterior distribution,updating the prior with the likelihood from the current study:

π(θ | y,X) =π(θ) L(y; θ,X)∫π(θ) L(y; θ,X)dθ

,

with L(y;X) =∫

π(θ) L(y; θ,X)dθ known as the marginallikelihood.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Posterior Calculations for Linear Regression

I For normal linear regression models and conjugate priors, wecan calculate the posterior distribution analytically.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Posterior Calculations for Linear Regression

I For normal linear regression models and conjugate priors, wecan calculate the posterior distribution analytically.

I Suppose π(β) = Np(β;β0,Σβ), with β0 the prior mean andΣβ the prior covariance.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Posterior Calculations for Linear Regression

I For normal linear regression models and conjugate priors, wecan calculate the posterior distribution analytically.

I Suppose π(β) = Np(β;β0,Σβ), with β0 the prior mean andΣβ the prior covariance.

I Then, the conditional posterior of β given y,X, σ2 is:

π(β | y,X, σ2) = N(β̂, Σ̂β)

β̂ = Σ̂β

(Σ−1

β β0 + σ−2X′y)

Σ̂β =(Σ−1

β + σ−2X′X)−1

Note: β̂ → MLE as n increases and prior variance increases.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Prior Elicitation - Different Schools of Thought

I Subjective Bayes: informative priors should be used thataccurately describe your prior uncertainty in the parameters.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Prior Elicitation - Different Schools of Thought

I Subjective Bayes: informative priors should be used thataccurately describe your prior uncertainty in the parameters.

I Objective Bayes: non-informative or default priors should beused to obtain statistical procedures with good properties.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Prior Elicitation - Different Schools of Thought

I Subjective Bayes: informative priors should be used thataccurately describe your prior uncertainty in the parameters.

I Objective Bayes: non-informative or default priors should beused to obtain statistical procedures with good properties.

I Pragmatic Bayes: the Bayes machinery is very useful foraddressing complex problems & (hopefully) results are robustto priors.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Shrinkage Priors

I Although non-informative priors are widely used, shrinkagepriors often have better performance (e.g., lower mean squareerror) (MacLehose et al., 2007, Epidemiology)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Shrinkage Priors

I Although non-informative priors are widely used, shrinkagepriors often have better performance (e.g., lower mean squareerror) (MacLehose et al., 2007, Epidemiology)

I By choosing a prior centered at zero for the coefficients, onetends to obtain more stable estimates, limiting over-fittingand multicolinearity problems.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Shrinkage Priors

I Although non-informative priors are widely used, shrinkagepriors often have better performance (e.g., lower mean squareerror) (MacLehose et al., 2007, Epidemiology)

I By choosing a prior centered at zero for the coefficients, onetends to obtain more stable estimates, limiting over-fittingand multicolinearity problems.

I There is a rich theoretical literature providing motivation forshrinkage.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Shrinkage Priors

I Although non-informative priors are widely used, shrinkagepriors often have better performance (e.g., lower mean squareerror) (MacLehose et al., 2007, Epidemiology)

I By choosing a prior centered at zero for the coefficients, onetends to obtain more stable estimates, limiting over-fittingand multicolinearity problems.

I There is a rich theoretical literature providing motivation forshrinkage.

I From an applied perspective, good idea to choose priors thatassign low probability outside of a plausible range for theparameters.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Shrinkage Priors

I Although non-informative priors are widely used, shrinkagepriors often have better performance (e.g., lower mean squareerror) (MacLehose et al., 2007, Epidemiology)

I By choosing a prior centered at zero for the coefficients, onetends to obtain more stable estimates, limiting over-fittingand multicolinearity problems.

I There is a rich theoretical literature providing motivation forshrinkage.

I From an applied perspective, good idea to choose priors thatassign low probability outside of a plausible range for theparameters.

I MLEs have problems when there is limited information aboutcertain parameters.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

What about more Complex Settings?

I Bayes & frequentist inferences under a single model tend tobe similar in simple settings (e.g., linear regression withmodest numbers of predictors & ample sample size)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

What about more Complex Settings?

I Bayes & frequentist inferences under a single model tend tobe similar in simple settings (e.g., linear regression withmodest numbers of predictors & ample sample size)

I Advantages of Bayes more apparent in complex settings -model uncertainty, missing data, large number of parameters,etc

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

What about more Complex Settings?

I Bayes & frequentist inferences under a single model tend tobe similar in simple settings (e.g., linear regression withmodest numbers of predictors & ample sample size)

I Advantages of Bayes more apparent in complex settings -model uncertainty, missing data, large number of parameters,etc

I Outside of simple models, posterior computation typicallyrelies on Markov chain Monte Carlo (MCMC) algorithms.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

MCMC - The Basic Idea

I MCMC algorithms rely on randomly sampling the modelparameters in a special way so that the samples converge indistribution to a target distribution, which is the true jointposterior distribution

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

MCMC - The Basic Idea

I MCMC algorithms rely on randomly sampling the modelparameters in a special way so that the samples converge indistribution to a target distribution, which is the true jointposterior distribution

I Flavors of MCMC:

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

MCMC - The Basic Idea

I MCMC algorithms rely on randomly sampling the modelparameters in a special way so that the samples converge indistribution to a target distribution, which is the true jointposterior distribution

I Flavors of MCMC:I Gibbs Sampling (Gelfand and Smith, 1990): sequentially

samples from the full conditional posterior distributions of eachof the parameters

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

MCMC - The Basic Idea

I MCMC algorithms rely on randomly sampling the modelparameters in a special way so that the samples converge indistribution to a target distribution, which is the true jointposterior distribution

I Flavors of MCMC:I Gibbs Sampling (Gelfand and Smith, 1990): sequentially

samples from the full conditional posterior distributions of eachof the parameters

I Metropolis-Hastings (Hastings, 1970): samples a candidate fora parameter from a proposal density and accepts thiscandidate with a specific probability.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Application to Study of DDE & Preterm Birth

I Scientific Interest: Association between DDE exposure &preterm birth adjusting for possible confounding variables.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Application to Study of DDE & Preterm Birth

I Scientific Interest: Association between DDE exposure &preterm birth adjusting for possible confounding variables.

I Data from US Collaborative Perinatal Project (CPP) -n = 2380 children out of whom 361 were born preterm.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Application to Study of DDE & Preterm Birth

I Scientific Interest: Association between DDE exposure &preterm birth adjusting for possible confounding variables.

I Data from US Collaborative Perinatal Project (CPP) -n = 2380 children out of whom 361 were born preterm.

I Analysis: Bayesian analysis using a probit model

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Probit Model for Risk of Preterm Birth

I Let yi = 1 if preterm birth and yi = 0 if full-term birth

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Probit Model for Risk of Preterm Birth

I Let yi = 1 if preterm birth and yi = 0 if full-term birth

I Probit Model:

Pr(yi = 1 | xi ,β) = Φ(x′iβ),

where Φ(·) is the standard normal distribution function

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Probit Model for Risk of Preterm Birth

I Let yi = 1 if preterm birth and yi = 0 if full-term birth

I Probit Model:

Pr(yi = 1 | xi ,β) = Φ(x′iβ),

where Φ(·) is the standard normal distribution functionI xi = (1, ddei , xi3, . . . , xi7)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Probit Model for Risk of Preterm Birth

I Let yi = 1 if preterm birth and yi = 0 if full-term birth

I Probit Model:

Pr(yi = 1 | xi ,β) = Φ(x′iβ),

where Φ(·) is the standard normal distribution functionI xi = (1, ddei , xi3, . . . , xi7)

I xi3, . . . , xi7 = represent possible confounders

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Probit Model for Risk of Preterm Birth

I Let yi = 1 if preterm birth and yi = 0 if full-term birth

I Probit Model:

Pr(yi = 1 | xi ,β) = Φ(x′iβ),

where Φ(·) is the standard normal distribution functionI xi = (1, ddei , xi3, . . . , xi7)

I xi3, . . . , xi7 = represent possible confoundersI β1 = intercept

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Probit Model for Risk of Preterm Birth

I Let yi = 1 if preterm birth and yi = 0 if full-term birth

I Probit Model:

Pr(yi = 1 | xi ,β) = Φ(x′iβ),

where Φ(·) is the standard normal distribution functionI xi = (1, ddei , xi3, . . . , xi7)

I xi3, . . . , xi7 = represent possible confoundersI β1 = interceptI β2 = dde slope

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Bayesian Analysis: Prior, Likelihood & Posterior

I Prior: π(β) = N(β0,Σβ)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Bayesian Analysis: Prior, Likelihood & Posterior

I Prior: π(β) = N(β0,Σβ)

I Likelihood:

L(y;β,X) =

n∏

i=1

Φ(x′iβ)yi

{1 − Φ(x′iβ)

}1−yi

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Bayesian Analysis: Prior, Likelihood & Posterior

I Prior: π(β) = N(β0,Σβ)

I Likelihood:

L(y;β,X) =

n∏

i=1

Φ(x′iβ)yi

{1 − Φ(x′iβ)

}1−yi

I Posterior:π(β | y,X) ∝ π(β)L(y;β,X).

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Bayesian Analysis: Prior, Likelihood & Posterior

I Prior: π(β) = N(β0,Σβ)

I Likelihood:

L(y;β,X) =

n∏

i=1

Φ(x′iβ)yi

{1 − Φ(x′iβ)

}1−yi

I Posterior:π(β | y,X) ∝ π(β)L(y;β,X).

I No closed form available for normalizing constant

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Posterior Computation using Data Augmentation

I Full conditional posterior distributions needed for Gibbssampling are not automatically available

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Posterior Computation using Data Augmentation

I Full conditional posterior distributions needed for Gibbssampling are not automatically available

I However, we can rely on a very useful data augmentation trickproposed by Albert and Chib (1993):

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Posterior Computation using Data Augmentation

I Full conditional posterior distributions needed for Gibbssampling are not automatically available

I However, we can rely on a very useful data augmentation trickproposed by Albert and Chib (1993):

I Augment observed data {yi , xi} with latent zi .

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Posterior Computation using Data Augmentation

I Full conditional posterior distributions needed for Gibbssampling are not automatically available

I However, we can rely on a very useful data augmentation trickproposed by Albert and Chib (1993):

I Augment observed data {yi , xi} with latent zi .I Probit model can be expressed in hierarchical form as follows:

yi = 1(zi > 0)

zi ∼ N(x′iβ, 1)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Posterior Computation using Data Augmentation

I Full conditional posterior distributions needed for Gibbssampling are not automatically available

I However, we can rely on a very useful data augmentation trickproposed by Albert and Chib (1993):

I Augment observed data {yi , xi} with latent zi .I Probit model can be expressed in hierarchical form as follows:

yi = 1(zi > 0)

zi ∼ N(x′iβ, 1)

I Marginalizing out zi we obtain Pr(yi = 1 | xi , β) = Φ(x′β).

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Gibbs Sampling Steps

I Gibbs sampling relies on alternately sampling from fullconditional posterior distributions of unknown parameters

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Gibbs Sampling Steps

I Gibbs sampling relies on alternately sampling from fullconditional posterior distributions of unknown parameters

I After data augmentation, unknowns include latent data {zi}and regression parameters β

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Gibbs Sampling Steps

I Gibbs sampling relies on alternately sampling from fullconditional posterior distributions of unknown parameters

I After data augmentation, unknowns include latent data {zi}and regression parameters β

I Full conditional posterior distributions:

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Gibbs Sampling Steps

I Gibbs sampling relies on alternately sampling from fullconditional posterior distributions of unknown parameters

I After data augmentation, unknowns include latent data {zi}and regression parameters β

I Full conditional posterior distributions:

1. π(zi | y,X, β) = N(x′iβ) truncated below by zero if yi = 1 andabove by zero if yi = 0.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Gibbs Sampling Steps

I Gibbs sampling relies on alternately sampling from fullconditional posterior distributions of unknown parameters

I After data augmentation, unknowns include latent data {zi}and regression parameters β

I Full conditional posterior distributions:

1. π(zi | y,X, β) = N(x′iβ) truncated below by zero if yi = 1 andabove by zero if yi = 0.

2. π(β | z, y,X) = Np(β̂, Σ̂β), Σ̂β = (Σ−1β + X′X)−1,

β̂ = Σ̂β

(Σ−1

β β0 + X′z).

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Gibbs Sampling Implementation

I To implement Gibbs sampling, we simply iteratively samplefrom these full conditional posterior distributions a largenumber of times

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Gibbs Sampling Implementation

I To implement Gibbs sampling, we simply iteratively samplefrom these full conditional posterior distributions a largenumber of times

I An initial burn-in is discarded to allow convergence to astationary distribution

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Gibbs Sampling Implementation

I To implement Gibbs sampling, we simply iteratively samplefrom these full conditional posterior distributions a largenumber of times

I An initial burn-in is discarded to allow convergence to astationary distribution

I Inferences can be based on posterior summaries calculatedusing the draws from the joint posterior distribution.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Gibbs Sampling Implementation

I To implement Gibbs sampling, we simply iteratively samplefrom these full conditional posterior distributions a largenumber of times

I An initial burn-in is discarded to allow convergence to astationary distribution

I Inferences can be based on posterior summaries calculatedusing the draws from the joint posterior distribution.

I WinBUGS provides an easy to use & free software package forimplementing Gibbs sampling for complex models.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Returning to the DDE and Premature Birth Application

I We chose a normal prior, π(β) = N7(β; 0, 4 × I7×7)(motivated by shrinkage considerations)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Returning to the DDE and Premature Birth Application

I We chose a normal prior, π(β) = N7(β; 0, 4 × I7×7)(motivated by shrinkage considerations)

I Choosing β = 0 as the starting value, we ran the Gibbssampler 1,000 iterations, discarding the first 100 as a burn-in

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Returning to the DDE and Premature Birth Application

I We chose a normal prior, π(β) = N7(β; 0, 4 × I7×7)(motivated by shrinkage considerations)

I Choosing β = 0 as the starting value, we ran the Gibbssampler 1,000 iterations, discarding the first 100 as a burn-in

I In general, more samples should be taken for complex models

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Gibbs Sampling Trace Plots

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Convergence and Mixing Issues

I Whenever MCMC algorithms are used, trace plots of differentparameters should be carefully examined.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Convergence and Mixing Issues

I Whenever MCMC algorithms are used, trace plots of differentparameters should be carefully examined.

I In complex models, convergence and mixing are often ofconcern.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Convergence and Mixing Issues

I Whenever MCMC algorithms are used, trace plots of differentparameters should be carefully examined.

I In complex models, convergence and mixing are often ofconcern.

I Slow convergence - the chain takes a long time to convergeto a stationary distribution, so that a long burn-in is needed.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Convergence and Mixing Issues

I Whenever MCMC algorithms are used, trace plots of differentparameters should be carefully examined.

I In complex models, convergence and mixing are often ofconcern.

I Slow convergence - the chain takes a long time to convergeto a stationary distribution, so that a long burn-in is needed.

I Slow mixing - high autocorrelation in the samples even afterconvergence, so that a very large number of samples is neededto reduce Monte Carlo error.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Estimated Posterior Densities

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Posterior Summaries of Regression Parameters

Parameter Mean Median SD 95% credible interval

β1 -1.08 -1.08 0.04 (-1.16, -1.01)β2 0.17 0.17 0.03 (0.12, 0.23)β3 -0.13 -0.13 0.04 (-0.2, -0.05)β4 0.11 0.11 0.03 (0.05, 0.18)β5 -0.02 -0.02 0.03 (-0.08, 0.05)β6 -0.08 -0.08 0.04 (-0.15, -0.02)β7 0.05 0.06 0.06 (-0.07, 0.18)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Maximum Likelihood Results

Parameter MLE SE Z stat p-value

β1 -1.08 0.04 -24.8 < 2e − 16β2 0.18 0.03 6.03 1.67e-09β3 -0.13 0.04 -3.63 0.0003β4 0.11 0.03 3.30 0.001β5 -0.02 0.03 -0.501 0.617β6 -0.08 0.04 -2.30 0.022β7 0.05 0.06 0.844 0.399

β2 = dde slope (highly significant increasing trend)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Fitting Bayesian GLMs in SAS

I We repeated our Bayesian analysis using BGENMOD (a newSAS proc for Bayes analysis of GLMs).

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Fitting Bayesian GLMs in SAS

I We repeated our Bayesian analysis using BGENMOD (a newSAS proc for Bayes analysis of GLMs).

I Very simple to implement in few lines of code and gaveidentical results to our R implementation (different MCMCimplementation)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Fitting Bayesian GLMs in SAS

I We repeated our Bayesian analysis using BGENMOD (a newSAS proc for Bayes analysis of GLMs).

I Very simple to implement in few lines of code and gaveidentical results to our R implementation (different MCMCimplementation)

I Automatically outputs posterior summaries, trace plots,convergence diagnostics, etc

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application

Fitting Bayesian GLMs in SAS

I We repeated our Bayesian analysis using BGENMOD (a newSAS proc for Bayes analysis of GLMs).

I Very simple to implement in few lines of code and gaveidentical results to our R implementation (different MCMCimplementation)

I Automatically outputs posterior summaries, trace plots,convergence diagnostics, etc

I UNC Bayes in SAS Conference, May 17-18(www.sph.unc.edu/bios for details)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Suppose the true model is unknown

I In the DDE application, we assumed that we knew in advancethat the probit model with pre-specified predictors wasappropriate.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Suppose the true model is unknown

I In the DDE application, we assumed that we knew in advancethat the probit model with pre-specified predictors wasappropriate.

I There is typically substantial uncertainty in the model & it ismore realistic to suppose that there is a list of a prioriplausible models.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Suppose the true model is unknown

I In the DDE application, we assumed that we knew in advancethat the probit model with pre-specified predictors wasappropriate.

I There is typically substantial uncertainty in the model & it ismore realistic to suppose that there is a list of a prioriplausible models.

I Typical Strategy: sequentially change model until a good fit isproduced, and then base inferences/predictions on the finalselected model.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Suppose the true model is unknown

I In the DDE application, we assumed that we knew in advancethat the probit model with pre-specified predictors wasappropriate.

I There is typically substantial uncertainty in the model & it ismore realistic to suppose that there is a list of a prioriplausible models.

I Typical Strategy: sequentially change model until a good fit isproduced, and then base inferences/predictions on the finalselected model.

I Strategy is flawed in ignoring uncertainty in the modelselection process – leads to major bias in many cases.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayes Model Uncertainty

I Let M ∈ M denote a model index, with M a list of possiblemodels.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayes Model Uncertainty

I Let M ∈ M denote a model index, with M a list of possiblemodels.

I To allow for model uncertainty, Bayesian’s first choose:

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayes Model Uncertainty

I Let M ∈ M denote a model index, with M a list of possiblemodels.

I To allow for model uncertainty, Bayesian’s first choose:

1. A prior probability for each model: Pr(M = m) = πm, m ∈ M.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayes Model Uncertainty

I Let M ∈ M denote a model index, with M a list of possiblemodels.

I To allow for model uncertainty, Bayesian’s first choose:

1. A prior probability for each model: Pr(M = m) = πm, m ∈ M.2. Priors for the coefficients within each model, π(θm), m ∈ M.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayes Model Uncertainty

I Let M ∈ M denote a model index, with M a list of possiblemodels.

I To allow for model uncertainty, Bayesian’s first choose:

1. A prior probability for each model: Pr(M = m) = πm, m ∈ M.2. Priors for the coefficients within each model, π(θm), m ∈ M.

I Given data, y, the posterior probability of model M = m is

π̂m = Pr(M = m | y) =πm Lm(y)∑l∈M πlLl (y)

,

where Lm(y) =∫

L(y |M = m, θm)π(θm)dθm is the marginallikelihood for model M = m

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Some Comments

I In the absence of prior knowledge about which models in thelist are more plausible, one often lets πm = 1/#M, with #Mthe number of models.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Some Comments

I In the absence of prior knowledge about which models in thelist are more plausible, one often lets πm = 1/#M, with #Mthe number of models.

I The highest posterior probability model is then the model withthe highest marginal likelihood.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Some Comments

I In the absence of prior knowledge about which models in thelist are more plausible, one often lets πm = 1/#M, with #Mthe number of models.

I The highest posterior probability model is then the model withthe highest marginal likelihood.

I Unlike the maximized likelihood, the marginal likelihood hasan implicit penalty for model complexity.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Some Comments

I In the absence of prior knowledge about which models in thelist are more plausible, one often lets πm = 1/#M, with #Mthe number of models.

I The highest posterior probability model is then the model withthe highest marginal likelihood.

I Unlike the maximized likelihood, the marginal likelihood hasan implicit penalty for model complexity.

I This penalty is due to the integration across the prior, whichis higher dimensional in larger models.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Impact of Prior on Coefficients

I The prior, π(θm), on the coefficients within each model playsan important role.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Impact of Prior on Coefficients

I The prior, π(θm), on the coefficients within each model playsan important role.

I As the variance of the priors on the coefficients within eachmodel increase, the penalty for model complexity alsoincreases.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Impact of Prior on Coefficients

I The prior, π(θm), on the coefficients within each model playsan important role.

I As the variance of the priors on the coefficients within eachmodel increase, the penalty for model complexity alsoincreases.

I Hence, for higher variance priors, one tends to favor smallermodels.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Impact of Prior on Coefficients

I The prior, π(θm), on the coefficients within each model playsan important role.

I As the variance of the priors on the coefficients within eachmodel increase, the penalty for model complexity alsoincreases.

I Hence, for higher variance priors, one tends to favor smallermodels.

I Using the BIC criteria is approximately equivalent to assuminga unit information prior, which is quite vague, so that BICfavors small models.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Impact of Prior on Coefficients

I The prior, π(θm), on the coefficients within each model playsan important role.

I As the variance of the priors on the coefficients within eachmodel increase, the penalty for model complexity alsoincreases.

I Hence, for higher variance priors, one tends to favor smallermodels.

I Using the BIC criteria is approximately equivalent to assuminga unit information prior, which is quite vague, so that BICfavors small models.

I By estimating the variance, one can obtain a data adaptivepenalty (George & Foster, 2000)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayes Factors

I The Bayes factor (BF) can be used as a summary of theweight of evidence in the data in favor of model m1 overmodel m2.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayes Factors

I The Bayes factor (BF) can be used as a summary of theweight of evidence in the data in favor of model m1 overmodel m2.

I The BF for model m1 over m2 is defined as the ratio ofposterior to prior odds, which is simply:

BF12 =L1(y)

L2(y),

a ratio of marginal likelihoods.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayes Factors

I The Bayes factor (BF) can be used as a summary of theweight of evidence in the data in favor of model m1 overmodel m2.

I The BF for model m1 over m2 is defined as the ratio ofposterior to prior odds, which is simply:

BF12 =L1(y)

L2(y),

a ratio of marginal likelihoods.

I Values of BF12 > 1 suggest that model m1 is preferred, withthe weight of evidence in favor of m1 increasing as BF12

increases.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayesian Model Averaging (BMA)

I Posterior model probabilities can be used for model selectionand inferences.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayesian Model Averaging (BMA)

I Posterior model probabilities can be used for model selectionand inferences.

I When focus is on prediction, BMA preferred to modelselection (Madigan and Raftery, 1994)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayesian Model Averaging (BMA)

I Posterior model probabilities can be used for model selectionand inferences.

I When focus is on prediction, BMA preferred to modelselection (Madigan and Raftery, 1994)

I To predict yn+1 given xn+1, BMA relies on:

f (yn+1 | xn+1) =∑

m∈M

π̂m

∫L(yn+1 |M = m, θm)

×π(θm |M = m, y,X) dθm.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayes Model Uncertainty - Practical Issues

I Computation of the posterior model probabilities requirescalculation of the marginal likelihoods, Lm(y)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayes Model Uncertainty - Practical Issues

I Computation of the posterior model probabilities requirescalculation of the marginal likelihoods, Lm(y)

I These marginal likelihoods are not automatically produced bytypical MCMC algorithms

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayes Model Uncertainty - Practical Issues

I Computation of the posterior model probabilities requirescalculation of the marginal likelihoods, Lm(y)

I These marginal likelihoods are not automatically produced bytypical MCMC algorithms

I Routine implementations rely on the Laplace approximation(Tierney and Kadane, 1986; Raftery, 1996)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayes Model Uncertainty - Practical Issues

I Computation of the posterior model probabilities requirescalculation of the marginal likelihoods, Lm(y)

I These marginal likelihoods are not automatically produced bytypical MCMC algorithms

I Routine implementations rely on the Laplace approximation(Tierney and Kadane, 1986; Raftery, 1996)

I In large model spaces, it is not feasible to do calculations forall the models, so search algorithms are used.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayes Model Uncertainty - Practical Issues

I Computation of the posterior model probabilities requirescalculation of the marginal likelihoods, Lm(y)

I These marginal likelihoods are not automatically produced bytypical MCMC algorithms

I Routine implementations rely on the Laplace approximation(Tierney and Kadane, 1986; Raftery, 1996)

I In large model spaces, it is not feasible to do calculations forall the models, so search algorithms are used.

I Refer to Hoeting et al. (1999) for a tutorial on BMA

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayesian Variable Selection

I Suppose we start with a vector of p candidate predictors,xi = (xi1, . . . , xip)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayesian Variable Selection

I Suppose we start with a vector of p candidate predictors,xi = (xi1, . . . , xip)

I A very common type of model uncertainty corresponds touncertainty in which predictors to include in the model.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayesian Variable Selection

I Suppose we start with a vector of p candidate predictors,xi = (xi1, . . . , xip)

I A very common type of model uncertainty corresponds touncertainty in which predictors to include in the model.

I In this case, we end up with a list of 2p different models,corresponding to each of the p candidate predictors beingexcluded or not.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Stochastic Search Variable Selection (SSVS)

I George and McCulloch (1993, 1997) proposed a Gibbssampling approach for the variable selection problem.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Stochastic Search Variable Selection (SSVS)

I George and McCulloch (1993, 1997) proposed a Gibbssampling approach for the variable selection problem.

I Similar approaches have been very widely used in applications.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Stochastic Search Variable Selection (SSVS)

I George and McCulloch (1993, 1997) proposed a Gibbssampling approach for the variable selection problem.

I Similar approaches have been very widely used in applications.

I The SSVS idea will be illustration through a return to theDDE and preterm birth application

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayes Variable Selection in Probit Regression

I Earlier we focused on the model, Pr(yi = 1 | xi ,β) = Φ(x′iβ),with yi an indicator of premature delivery.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayes Variable Selection in Probit Regression

I Earlier we focused on the model, Pr(yi = 1 | xi ,β) = Φ(x′iβ),with yi an indicator of premature delivery.

I Previously, we chose a N7(0, 4I) prior for β, assuming all 7predictors were included.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Bayes Variable Selection in Probit Regression

I Earlier we focused on the model, Pr(yi = 1 | xi ,β) = Φ(x′iβ),with yi an indicator of premature delivery.

I Previously, we chose a N7(0, 4I) prior for β, assuming all 7predictors were included.

I To account for uncertainty in subset selection, choose amixture prior:

π(β) =

p∏

j=1

{δ0(βj )p0j + (1 − p0j)N(βj ; 0, c

2j )

},

where p0j is the prior probability of excluding the jth predictorby setting its coefficient to 0

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

SSVS in Probit Regression

I The data augmentation Gibbs sampler described earlier can beeasily adapted.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

SSVS in Probit Regression

I The data augmentation Gibbs sampler described earlier can beeasily adapted.

I Sample from the conditional posterior distributions of βj , forj = 1, . . . , p:

π(βj |β(−j), z, y,X) = p̂jδ0(βj ) + (1 − p̂0j)N(βj ;Ej ,Vj),

where Vj = (c−2j + X′X)−1, Ej = VjX

′z, and

p̂j =p0j

p0j + (1 − p0j )N(0;0,c2

j)

N(0;Ej ,Vj)

is the conditional probability of βj = 0 (i.e., we exclude thejth predictor)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

SSVS - Comments

I After convergence, generates samples of models,corresponding to subsets of the set of p candidate predictors,from the posterior distribution.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

SSVS - Comments

I After convergence, generates samples of models,corresponding to subsets of the set of p candidate predictors,from the posterior distribution.

I Based on a large number of SSVS iterations, we can estimateposterior probabilities for each of the models.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

SSVS - Comments

I After convergence, generates samples of models,corresponding to subsets of the set of p candidate predictors,from the posterior distribution.

I Based on a large number of SSVS iterations, we can estimateposterior probabilities for each of the models.

I For example, the full model may appear in 10% of thesamples collected after convergence, so that model would beassigned posterior probability of 0.10.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

SSVS - Comments

I After convergence, generates samples of models,corresponding to subsets of the set of p candidate predictors,from the posterior distribution.

I Based on a large number of SSVS iterations, we can estimateposterior probabilities for each of the models.

I For example, the full model may appear in 10% of thesamples collected after convergence, so that model would beassigned posterior probability of 0.10.

I To summarize, one can present a table of the top 10 or 100models

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

SSVS - Comments

I After convergence, generates samples of models,corresponding to subsets of the set of p candidate predictors,from the posterior distribution.

I Based on a large number of SSVS iterations, we can estimateposterior probabilities for each of the models.

I For example, the full model may appear in 10% of thesamples collected after convergence, so that model would beassigned posterior probability of 0.10.

I To summarize, one can present a table of the top 10 or 100models

I Potentially more useful to calculate marginal inclusionprobabilities.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Samples from Posterior - DDE application (normal prior)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Samples from Posterior - DDE application (mixture prior)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

SSVS - Comments

I Samples congregate on 0 for the regression coefficient forpredictors that are not as important.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

SSVS - Comments

I Samples congregate on 0 for the regression coefficient forpredictors that are not as important.

I Such samples correspond to models with that predictorexcluded.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

SSVS - Comments

I Samples congregate on 0 for the regression coefficient forpredictors that are not as important.

I Such samples correspond to models with that predictorexcluded.

I Even though the prior probabilities of exclusion are the same,posterior probabilities vary greatly for the different predictors.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Posterior Summaries - Normal prior analysis

Parameter Mean Median SD 95% credible interval

β1 -1.08 -1.08 0.04 (-1.16, -1.01)β2 0.17 0.17 0.03 (0.12, 0.23)β3 -0.13 -0.13 0.04 (-0.2, -0.05)β4 0.11 0.11 0.03 (0.05, 0.18)β5 -0.02 -0.02 0.03 (-0.08, 0.05)β6 -0.08 -0.08 0.04 (-0.15, -0.02)β7 0.05 0.06 0.06 (-0.07, 0.18)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Posterior Summaries - Mixture prior analysis

Parameter Mean Median SD 95% CI Pr(βj = 0 | data)

β1 -1.05 -1.05 0.03 (-1.12, -0.99) 0.00β2 0.18 0.18 0.03 (0.12, 0.23) 0.00β3 -0.08 -0.09 0.06 (-0.19, 0.00) 0.36β4 0.05 0.00 0.06 (0.00, 0.16) 0.50β5 0.00 0.00 0.01 (0.00, 0.00) 0.98β6 -0.02 0.00 0.04 (-0.13, 0.00) 0.72β7 0.01 0.00 0.02 (0.00, 0.1) 0.93

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Posterior Probabilities of Visited Models

π̂m Model Indicator

1 0.24981301421092 1 1 0 0 0 0 02 0.225878833208676 1 1 1 1 0 0 03 0.196958364497632 1 1 1 1 0 1 04 0.139865370231862 1 1 1 0 0 0 05 0.0363999002742458 1 1 0 0 0 1 06 0.0304163550236849 1 1 0 1 0 0 07 0.0274245823984044 1 1 1 0 0 1 08 0.0206930939915233 1 1 0 0 0 0 19 0.0177013213662428 1 1 1 1 0 0 110 0.012216404886562 1 1 1 0 0 0 1

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Some Comments on DDE Application Results

I In 4,000 Gibbs iterations only 26/128 = 20.3% of the modelswere visited

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Some Comments on DDE Application Results

I In 4,000 Gibbs iterations only 26/128 = 20.3% of the modelswere visited

I There wasn’t a single dominant model, but none of themodels excluded the intercept or DDE slope.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

Some Comments on DDE Application Results

I In 4,000 Gibbs iterations only 26/128 = 20.3% of the modelswere visited

I There wasn’t a single dominant model, but none of themodels excluded the intercept or DDE slope.

I All of the better models included the 3rd & 5th of the 5possible confounders

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

General Comments on Bayes Model Uncertainty

I SSVS provides very useful approach.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

General Comments on Bayes Model Uncertainty

I SSVS provides very useful approach.

I For large numbers of candidate predictors, shotgun stochasticsearch provides an alternative (Hans et al., 2007).

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

General Comments on Bayes Model Uncertainty

I SSVS provides very useful approach.

I For large numbers of candidate predictors, shotgun stochasticsearch provides an alternative (Hans et al., 2007).

I SSVS has also been adapted to select predictors with randomeffects (Cai and Dunson, 2006)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application

General Comments on Bayes Model Uncertainty

I SSVS provides very useful approach.

I For large numbers of candidate predictors, shotgun stochasticsearch provides an alternative (Hans et al., 2007).

I SSVS has also been adapted to select predictors with randomeffects (Cai and Dunson, 2006)

I For routine implementation, one can rely on Laplaceapproximations to marginal likelihoods.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

General FormulationPosterior Computation

Missing Data Introduction

I Many (if not most) studies are faced with problems withmissing data

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

General FormulationPosterior Computation

Missing Data Introduction

I Many (if not most) studies are faced with problems withmissing data

I Bayesian methods provide a natural framework for accountingfor missing data without need to rely on ad hoc imputation

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

General FormulationPosterior Computation

Missing Data Introduction

I Many (if not most) studies are faced with problems withmissing data

I Bayesian methods provide a natural framework for accountingfor missing data without need to rely on ad hoc imputation

I Focus: missing predictors in regression models

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

General FormulationPosterior Computation

Missing Predictors in Regression

I Suppose we are interested in the general linear model:

yi = x′iβ + εi , εi ∼ N(0, σ2),

with xi = (xi1, . . . , xip)′ a vector of predictors that may have

missing values

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

General FormulationPosterior Computation

Missing Predictors in Regression

I Suppose we are interested in the general linear model:

yi = x′iβ + εi , εi ∼ N(0, σ2),

with xi = (xi1, . . . , xip)′ a vector of predictors that may have

missing valuesI Assume that the missing predictors are missing at random

(MAR), so that missingness is conditionally independent ofthe unmeasured value given the observed data.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

General FormulationPosterior Computation

Missing Predictors in Regression

I Suppose we are interested in the general linear model:

yi = x′iβ + εi , εi ∼ N(0, σ2),

with xi = (xi1, . . . , xip)′ a vector of predictors that may have

missing valuesI Assume that the missing predictors are missing at random

(MAR), so that missingness is conditionally independent ofthe unmeasured value given the observed data.

I To accommodate missing predictors, we need to specify ajoint distribution for xi (typically chosen as normal or asequence of conditional GLMs).

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

General FormulationPosterior Computation

Missing Predictors in Regression

I Suppose we are interested in the general linear model:

yi = x′iβ + εi , εi ∼ N(0, σ2),

with xi = (xi1, . . . , xip)′ a vector of predictors that may have

missing valuesI Assume that the missing predictors are missing at random

(MAR), so that missingness is conditionally independent ofthe unmeasured value given the observed data.

I To accommodate missing predictors, we need to specify ajoint distribution for xi (typically chosen as normal or asequence of conditional GLMs).

I Then, the missing values are simply additional unknowns tobe updated in the MCMC algorithm.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

General FormulationPosterior Computation

Gibbs Sampler

I When the predictors have a normal likelihood and we have alinear regression model, missing predictors can beaccommodated in a simple Gibbs sampler:

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

General FormulationPosterior Computation

Gibbs Sampler

I When the predictors have a normal likelihood and we have alinear regression model, missing predictors can beaccommodated in a simple Gibbs sampler:

1. Starting with an initial value for β, σ2, sample the missingpredictor values from their normal full conditional.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

General FormulationPosterior Computation

Gibbs Sampler

I When the predictors have a normal likelihood and we have alinear regression model, missing predictors can beaccommodated in a simple Gibbs sampler:

1. Starting with an initial value for β, σ2, sample the missingpredictor values from their normal full conditional.

2. Given the imputed data, sample β and σ2 from their fullconditional posterior distributions.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

General FormulationPosterior Computation

Gibbs Sampler

I When the predictors have a normal likelihood and we have alinear regression model, missing predictors can beaccommodated in a simple Gibbs sampler:

1. Starting with an initial value for β, σ2, sample the missingpredictor values from their normal full conditional.

2. Given the imputed data, sample β and σ2 from their fullconditional posterior distributions.

I This algorithm is easily adapted for non-normal likelihoods foryi and xi , and the resulting Gibbs sampler can beimplemented in WinBUGS.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

General FormulationPosterior Computation

Some Comments

I The Bayesian approach is often used to obtain multipleimputed data sets, which are then combined using frequentistmethods.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

General FormulationPosterior Computation

Some Comments

I The Bayesian approach is often used to obtain multipleimputed data sets, which are then combined using frequentistmethods.

I This approach avoids ad hoc imputation methods, such asbootstrapping, which often have implicit missing completelyat random assumptions.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

General FormulationPosterior Computation

Some Comments

I The Bayesian approach is often used to obtain multipleimputed data sets, which are then combined using frequentistmethods.

I This approach avoids ad hoc imputation methods, such asbootstrapping, which often have implicit missing completelyat random assumptions.

I We have assumed that missingness is non-informative.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

General FormulationPosterior Computation

Some Comments

I The Bayesian approach is often used to obtain multipleimputed data sets, which are then combined using frequentistmethods.

I This approach avoids ad hoc imputation methods, such asbootstrapping, which often have implicit missing completelyat random assumptions.

I We have assumed that missingness is non-informative.

I Shared random effects models can be used to account forinformative missingness and censoring.

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Summary

I Very brief introduction to Bayesian statistics

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Summary

I Very brief introduction to Bayesian statistics

I Emphasis on regression models, variable selection & missingpredictors

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Summary

I Very brief introduction to Bayesian statistics

I Emphasis on regression models, variable selection & missingpredictors

I Ideas related to model uncertainty and missing data can begeneralized to much broader settings

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data

OutlineIntroduction to Bayesian Statistics

Model UncertaintyMissing Data

Concluding Remarks

Summary

I Very brief introduction to Bayesian statistics

I Emphasis on regression models, variable selection & missingpredictors

I Ideas related to model uncertainty and missing data can begeneralized to much broader settings

I Many of the MCMC algorithms are easy to program, but thereare also a number of packages available (WinBUGS, Rfunctions for Bayes model averaging in GLMs, etc)

David Dunson Bayesian Statistics: Model Uncertainty & Missing Data