Bayesian Statistics: Model Uncertainty & Missing Data Introduction to Bayesian Statistics Model...
Transcript of Bayesian Statistics: Model Uncertainty & Missing Data Introduction to Bayesian Statistics Model...
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Bayesian Statistics: Model Uncertainty & Missing
Data
David Dunson
National Institute of Environmental Health Sciences, NIH
March 1, 2007
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Introduction to Bayesian StatisticsBasic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Model UncertaintyFormulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Missing DataGeneral FormulationPosterior Computation
Concluding Remarks
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Illustration: Patient Diagnoses
I For the past two weeks, Sue has been feeling weak and hashad nausea.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Illustration: Patient Diagnoses
I For the past two weeks, Sue has been feeling weak and hashad nausea.
I Although she suspects a stomach virus, she visits the doctorbecause the symptoms have been persisting.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Illustration: Patient Diagnoses
I For the past two weeks, Sue has been feeling weak and hashad nausea.
I Although she suspects a stomach virus, she visits the doctorbecause the symptoms have been persisting.
I The doctor also suspects a virus, but collects blood samplesand orders several tests to verify that there aren’t moreserious problems.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Illustration: Patient Diagnoses
I For the past two weeks, Sue has been feeling weak and hashad nausea.
I Although she suspects a stomach virus, she visits the doctorbecause the symptoms have been persisting.
I The doctor also suspects a virus, but collects blood samplesand orders several tests to verify that there aren’t moreserious problems.
I The tests come back and Sue has an abnormally low whitecell count.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Updating Subjective Probabilities
I Formalizing this problem statistically, let D = (D1, . . . ,DK )′,with Dk = 1 if disease k is the cause of Sue’s symptoms.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Updating Subjective Probabilities
I Formalizing this problem statistically, let D = (D1, . . . ,DK )′,with Dk = 1 if disease k is the cause of Sue’s symptoms.
I D1 = 1 if Sue has a virus or bacterial infection, D2 = 1 ifcancer, D3 = 1 if a parasitic infection of type 1, etc
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Updating Subjective Probabilities
I Formalizing this problem statistically, let D = (D1, . . . ,DK )′,with Dk = 1 if disease k is the cause of Sue’s symptoms.
I D1 = 1 if Sue has a virus or bacterial infection, D2 = 1 ifcancer, D3 = 1 if a parasitic infection of type 1, etc
I During the first few days of her illness, Sue estimated herprobability of a virus or bacterial infection asPr(D1 = 1) = π1(0) > 0.99.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Updating Subjective Probabilities
I Formalizing this problem statistically, let D = (D1, . . . ,DK )′,with Dk = 1 if disease k is the cause of Sue’s symptoms.
I D1 = 1 if Sue has a virus or bacterial infection, D2 = 1 ifcancer, D3 = 1 if a parasitic infection of type 1, etc
I During the first few days of her illness, Sue estimated herprobability of a virus or bacterial infection asPr(D1 = 1) = π1(0) > 0.99.
I After two weeks, her estimated probability gradually decreasedto π1(2) = 0.95.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Updating Subjective Probabilities
I Formalizing this problem statistically, let D = (D1, . . . ,DK )′,with Dk = 1 if disease k is the cause of Sue’s symptoms.
I D1 = 1 if Sue has a virus or bacterial infection, D2 = 1 ifcancer, D3 = 1 if a parasitic infection of type 1, etc
I During the first few days of her illness, Sue estimated herprobability of a virus or bacterial infection asPr(D1 = 1) = π1(0) > 0.99.
I After two weeks, her estimated probability gradually decreasedto π1(2) = 0.95.
I With the abnormally low white cell count test, this probabilitydecreased further to π1(3) = 0.90.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Updating Subjective Probabilities
I We can formalize the process by which Sue’s value of π1
changes using Bayes rule:
π1(t) =π1(0) L(datat |D1 = 1)
π1(0) L(datat |D1 = 1) + {1 − π1(0)} L(datat |D1 = 0),
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Updating Subjective Probabilities
I We can formalize the process by which Sue’s value of π1
changes using Bayes rule:
π1(t) =π1(0) L(datat |D1 = 1)
π1(0) L(datat |D1 = 1) + {1 − π1(0)} L(datat |D1 = 0),
I t = 0 corresponds to the time of symptom onset
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Updating Subjective Probabilities
I We can formalize the process by which Sue’s value of π1
changes using Bayes rule:
π1(t) =π1(0) L(datat |D1 = 1)
π1(0) L(datat |D1 = 1) + {1 − π1(0)} L(datat |D1 = 0),
I t = 0 corresponds to the time of symptom onsetI π1(0) is Sue’s guess at the probability of D1 = 1 at t = 0
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Updating Subjective Probabilities
I We can formalize the process by which Sue’s value of π1
changes using Bayes rule:
π1(t) =π1(0) L(datat |D1 = 1)
π1(0) L(datat |D1 = 1) + {1 − π1(0)} L(datat |D1 = 0),
I t = 0 corresponds to the time of symptom onsetI π1(0) is Sue’s guess at the probability of D1 = 1 at t = 0I L(datat |D = d) = likelihood of data at time t given D = d
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Updating Subjective Probabilities
I We can formalize the process by which Sue’s value of π1
changes using Bayes rule:
π1(t) =π1(0) L(datat |D1 = 1)
π1(0) L(datat |D1 = 1) + {1 − π1(0)} L(datat |D1 = 0),
I t = 0 corresponds to the time of symptom onsetI π1(0) is Sue’s guess at the probability of D1 = 1 at t = 0I L(datat |D = d) = likelihood of data at time t given D = dI “data” = ongoing symptoms, test results, input by Sue’s
physician, reading on the web, etc
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Learning from Sue
I For Sue this updating process continued until an astutephysician diagnosed a parasitic infection after several months.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Learning from Sue
I For Sue this updating process continued until an astutephysician diagnosed a parasitic infection after several months.
I Most medical diagnoses proceed in a similar manner, with thepatient & physicians probabilities updated through Bayesianlearning.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Learning from Sue
I For Sue this updating process continued until an astutephysician diagnosed a parasitic infection after several months.
I Most medical diagnoses proceed in a similar manner, with thepatient & physicians probabilities updated through Bayesianlearning.
I Scientific research evolves in a similar manner, with priorinsights updated as new data become available.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Learning from Sue
I For Sue this updating process continued until an astutephysician diagnosed a parasitic infection after several months.
I Most medical diagnoses proceed in a similar manner, with thepatient & physicians probabilities updated through Bayesianlearning.
I Scientific research evolves in a similar manner, with priorinsights updated as new data become available.
I Bayesian statistics seeks to formalize the process of learningthrough the accrual of evidence from different sources.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Illustration: Normal Linear Regression
I Suppose we collect data consisting of a response, yi , andpredictors xi = (xi1, . . . , xip)
′, for subjects i = 1, . . . , n.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Illustration: Normal Linear Regression
I Suppose we collect data consisting of a response, yi , andpredictors xi = (xi1, . . . , xip)
′, for subjects i = 1, . . . , n.
I For example, yi may be birth weight, with xi factorspotentially predictive of birth weight.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Illustration: Normal Linear Regression
I Suppose we collect data consisting of a response, yi , andpredictors xi = (xi1, . . . , xip)
′, for subjects i = 1, . . . , n.
I For example, yi may be birth weight, with xi factorspotentially predictive of birth weight.
I Assuming yi is normally distributed conditionally on xi , wehave the likelihood function:
L(y; θ,X) =n∏
i=1
(2πσ2)−1/2 exp
{−
1
2σ2(yi − x′iβ)2
},
with θ = (β, σ2).
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Classical Model Fitting and Inferences
I Often, interest focuses on inferences on the regressioncoefficients β
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Classical Model Fitting and Inferences
I Often, interest focuses on inferences on the regressioncoefficients β
I In order to estimate β, the standard approach is to usemaximum likelihood estimation, which results in the leastsquares estimator:
β̂ = (X′X)−1X′y,
with X = (x1, . . . , xn)′ and y = (y1, . . . , yn)
′.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Classical Model Fitting and Inferences
I Often, interest focuses on inferences on the regressioncoefficients β
I In order to estimate β, the standard approach is to usemaximum likelihood estimation, which results in the leastsquares estimator:
β̂ = (X′X)−1X′y,
with X = (x1, . . . , xn)′ and y = (y1, . . . , yn)
′.
I One can obtain confidence intervals for βj and test whetherβj = 0 using standard approaches.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Prior Knowledge
I In most applications, prior knowledge is available about θbefore observing the data in the current study.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Prior Knowledge
I In most applications, prior knowledge is available about θbefore observing the data in the current study.
I For example, in investigating factors predictive of birthweight, one can rely on a rich literature from previous studies.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Prior Knowledge
I In most applications, prior knowledge is available about θbefore observing the data in the current study.
I For example, in investigating factors predictive of birthweight, one can rely on a rich literature from previous studies.
I The classical paradigm relies entirely upon the current data →often not enough information to obtain accurate estimates ofall parameters
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Bayesian Paradigm
I Bayesian instead choose a prior distribution to quantify thestate of knowledge about θ before observing the current data.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Bayesian Paradigm
I Bayesian instead choose a prior distribution to quantify thestate of knowledge about θ before observing the current data.
I The prior distribution, π(θ), effectively treats the parametersas random variables
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Bayesian Paradigm
I Bayesian instead choose a prior distribution to quantify thestate of knowledge about θ before observing the current data.
I The prior distribution, π(θ), effectively treats the parametersas random variables
I Inferences are then based on the posterior distribution,updating the prior with the likelihood from the current study:
π(θ | y,X) =π(θ) L(y; θ,X)∫π(θ) L(y; θ,X)dθ
,
with L(y;X) =∫
π(θ) L(y; θ,X)dθ known as the marginallikelihood.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Posterior Calculations for Linear Regression
I For normal linear regression models and conjugate priors, wecan calculate the posterior distribution analytically.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Posterior Calculations for Linear Regression
I For normal linear regression models and conjugate priors, wecan calculate the posterior distribution analytically.
I Suppose π(β) = Np(β;β0,Σβ), with β0 the prior mean andΣβ the prior covariance.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Posterior Calculations for Linear Regression
I For normal linear regression models and conjugate priors, wecan calculate the posterior distribution analytically.
I Suppose π(β) = Np(β;β0,Σβ), with β0 the prior mean andΣβ the prior covariance.
I Then, the conditional posterior of β given y,X, σ2 is:
π(β | y,X, σ2) = N(β̂, Σ̂β)
β̂ = Σ̂β
(Σ−1
β β0 + σ−2X′y)
Σ̂β =(Σ−1
β + σ−2X′X)−1
Note: β̂ → MLE as n increases and prior variance increases.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Prior Elicitation - Different Schools of Thought
I Subjective Bayes: informative priors should be used thataccurately describe your prior uncertainty in the parameters.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Prior Elicitation - Different Schools of Thought
I Subjective Bayes: informative priors should be used thataccurately describe your prior uncertainty in the parameters.
I Objective Bayes: non-informative or default priors should beused to obtain statistical procedures with good properties.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Prior Elicitation - Different Schools of Thought
I Subjective Bayes: informative priors should be used thataccurately describe your prior uncertainty in the parameters.
I Objective Bayes: non-informative or default priors should beused to obtain statistical procedures with good properties.
I Pragmatic Bayes: the Bayes machinery is very useful foraddressing complex problems & (hopefully) results are robustto priors.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Shrinkage Priors
I Although non-informative priors are widely used, shrinkagepriors often have better performance (e.g., lower mean squareerror) (MacLehose et al., 2007, Epidemiology)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Shrinkage Priors
I Although non-informative priors are widely used, shrinkagepriors often have better performance (e.g., lower mean squareerror) (MacLehose et al., 2007, Epidemiology)
I By choosing a prior centered at zero for the coefficients, onetends to obtain more stable estimates, limiting over-fittingand multicolinearity problems.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Shrinkage Priors
I Although non-informative priors are widely used, shrinkagepriors often have better performance (e.g., lower mean squareerror) (MacLehose et al., 2007, Epidemiology)
I By choosing a prior centered at zero for the coefficients, onetends to obtain more stable estimates, limiting over-fittingand multicolinearity problems.
I There is a rich theoretical literature providing motivation forshrinkage.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Shrinkage Priors
I Although non-informative priors are widely used, shrinkagepriors often have better performance (e.g., lower mean squareerror) (MacLehose et al., 2007, Epidemiology)
I By choosing a prior centered at zero for the coefficients, onetends to obtain more stable estimates, limiting over-fittingand multicolinearity problems.
I There is a rich theoretical literature providing motivation forshrinkage.
I From an applied perspective, good idea to choose priors thatassign low probability outside of a plausible range for theparameters.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Shrinkage Priors
I Although non-informative priors are widely used, shrinkagepriors often have better performance (e.g., lower mean squareerror) (MacLehose et al., 2007, Epidemiology)
I By choosing a prior centered at zero for the coefficients, onetends to obtain more stable estimates, limiting over-fittingand multicolinearity problems.
I There is a rich theoretical literature providing motivation forshrinkage.
I From an applied perspective, good idea to choose priors thatassign low probability outside of a plausible range for theparameters.
I MLEs have problems when there is limited information aboutcertain parameters.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
What about more Complex Settings?
I Bayes & frequentist inferences under a single model tend tobe similar in simple settings (e.g., linear regression withmodest numbers of predictors & ample sample size)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
What about more Complex Settings?
I Bayes & frequentist inferences under a single model tend tobe similar in simple settings (e.g., linear regression withmodest numbers of predictors & ample sample size)
I Advantages of Bayes more apparent in complex settings -model uncertainty, missing data, large number of parameters,etc
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
What about more Complex Settings?
I Bayes & frequentist inferences under a single model tend tobe similar in simple settings (e.g., linear regression withmodest numbers of predictors & ample sample size)
I Advantages of Bayes more apparent in complex settings -model uncertainty, missing data, large number of parameters,etc
I Outside of simple models, posterior computation typicallyrelies on Markov chain Monte Carlo (MCMC) algorithms.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
MCMC - The Basic Idea
I MCMC algorithms rely on randomly sampling the modelparameters in a special way so that the samples converge indistribution to a target distribution, which is the true jointposterior distribution
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
MCMC - The Basic Idea
I MCMC algorithms rely on randomly sampling the modelparameters in a special way so that the samples converge indistribution to a target distribution, which is the true jointposterior distribution
I Flavors of MCMC:
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
MCMC - The Basic Idea
I MCMC algorithms rely on randomly sampling the modelparameters in a special way so that the samples converge indistribution to a target distribution, which is the true jointposterior distribution
I Flavors of MCMC:I Gibbs Sampling (Gelfand and Smith, 1990): sequentially
samples from the full conditional posterior distributions of eachof the parameters
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
MCMC - The Basic Idea
I MCMC algorithms rely on randomly sampling the modelparameters in a special way so that the samples converge indistribution to a target distribution, which is the true jointposterior distribution
I Flavors of MCMC:I Gibbs Sampling (Gelfand and Smith, 1990): sequentially
samples from the full conditional posterior distributions of eachof the parameters
I Metropolis-Hastings (Hastings, 1970): samples a candidate fora parameter from a proposal density and accepts thiscandidate with a specific probability.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Application to Study of DDE & Preterm Birth
I Scientific Interest: Association between DDE exposure &preterm birth adjusting for possible confounding variables.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Application to Study of DDE & Preterm Birth
I Scientific Interest: Association between DDE exposure &preterm birth adjusting for possible confounding variables.
I Data from US Collaborative Perinatal Project (CPP) -n = 2380 children out of whom 361 were born preterm.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Application to Study of DDE & Preterm Birth
I Scientific Interest: Association between DDE exposure &preterm birth adjusting for possible confounding variables.
I Data from US Collaborative Perinatal Project (CPP) -n = 2380 children out of whom 361 were born preterm.
I Analysis: Bayesian analysis using a probit model
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Probit Model for Risk of Preterm Birth
I Let yi = 1 if preterm birth and yi = 0 if full-term birth
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Probit Model for Risk of Preterm Birth
I Let yi = 1 if preterm birth and yi = 0 if full-term birth
I Probit Model:
Pr(yi = 1 | xi ,β) = Φ(x′iβ),
where Φ(·) is the standard normal distribution function
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Probit Model for Risk of Preterm Birth
I Let yi = 1 if preterm birth and yi = 0 if full-term birth
I Probit Model:
Pr(yi = 1 | xi ,β) = Φ(x′iβ),
where Φ(·) is the standard normal distribution functionI xi = (1, ddei , xi3, . . . , xi7)
′
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Probit Model for Risk of Preterm Birth
I Let yi = 1 if preterm birth and yi = 0 if full-term birth
I Probit Model:
Pr(yi = 1 | xi ,β) = Φ(x′iβ),
where Φ(·) is the standard normal distribution functionI xi = (1, ddei , xi3, . . . , xi7)
′
I xi3, . . . , xi7 = represent possible confounders
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Probit Model for Risk of Preterm Birth
I Let yi = 1 if preterm birth and yi = 0 if full-term birth
I Probit Model:
Pr(yi = 1 | xi ,β) = Φ(x′iβ),
where Φ(·) is the standard normal distribution functionI xi = (1, ddei , xi3, . . . , xi7)
′
I xi3, . . . , xi7 = represent possible confoundersI β1 = intercept
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Probit Model for Risk of Preterm Birth
I Let yi = 1 if preterm birth and yi = 0 if full-term birth
I Probit Model:
Pr(yi = 1 | xi ,β) = Φ(x′iβ),
where Φ(·) is the standard normal distribution functionI xi = (1, ddei , xi3, . . . , xi7)
′
I xi3, . . . , xi7 = represent possible confoundersI β1 = interceptI β2 = dde slope
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Bayesian Analysis: Prior, Likelihood & Posterior
I Prior: π(β) = N(β0,Σβ)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Bayesian Analysis: Prior, Likelihood & Posterior
I Prior: π(β) = N(β0,Σβ)
I Likelihood:
L(y;β,X) =
n∏
i=1
Φ(x′iβ)yi
{1 − Φ(x′iβ)
}1−yi
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Bayesian Analysis: Prior, Likelihood & Posterior
I Prior: π(β) = N(β0,Σβ)
I Likelihood:
L(y;β,X) =
n∏
i=1
Φ(x′iβ)yi
{1 − Φ(x′iβ)
}1−yi
I Posterior:π(β | y,X) ∝ π(β)L(y;β,X).
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Bayesian Analysis: Prior, Likelihood & Posterior
I Prior: π(β) = N(β0,Σβ)
I Likelihood:
L(y;β,X) =
n∏
i=1
Φ(x′iβ)yi
{1 − Φ(x′iβ)
}1−yi
I Posterior:π(β | y,X) ∝ π(β)L(y;β,X).
I No closed form available for normalizing constant
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Posterior Computation using Data Augmentation
I Full conditional posterior distributions needed for Gibbssampling are not automatically available
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Posterior Computation using Data Augmentation
I Full conditional posterior distributions needed for Gibbssampling are not automatically available
I However, we can rely on a very useful data augmentation trickproposed by Albert and Chib (1993):
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Posterior Computation using Data Augmentation
I Full conditional posterior distributions needed for Gibbssampling are not automatically available
I However, we can rely on a very useful data augmentation trickproposed by Albert and Chib (1993):
I Augment observed data {yi , xi} with latent zi .
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Posterior Computation using Data Augmentation
I Full conditional posterior distributions needed for Gibbssampling are not automatically available
I However, we can rely on a very useful data augmentation trickproposed by Albert and Chib (1993):
I Augment observed data {yi , xi} with latent zi .I Probit model can be expressed in hierarchical form as follows:
yi = 1(zi > 0)
zi ∼ N(x′iβ, 1)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Posterior Computation using Data Augmentation
I Full conditional posterior distributions needed for Gibbssampling are not automatically available
I However, we can rely on a very useful data augmentation trickproposed by Albert and Chib (1993):
I Augment observed data {yi , xi} with latent zi .I Probit model can be expressed in hierarchical form as follows:
yi = 1(zi > 0)
zi ∼ N(x′iβ, 1)
I Marginalizing out zi we obtain Pr(yi = 1 | xi , β) = Φ(x′β).
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Gibbs Sampling Steps
I Gibbs sampling relies on alternately sampling from fullconditional posterior distributions of unknown parameters
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Gibbs Sampling Steps
I Gibbs sampling relies on alternately sampling from fullconditional posterior distributions of unknown parameters
I After data augmentation, unknowns include latent data {zi}and regression parameters β
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Gibbs Sampling Steps
I Gibbs sampling relies on alternately sampling from fullconditional posterior distributions of unknown parameters
I After data augmentation, unknowns include latent data {zi}and regression parameters β
I Full conditional posterior distributions:
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Gibbs Sampling Steps
I Gibbs sampling relies on alternately sampling from fullconditional posterior distributions of unknown parameters
I After data augmentation, unknowns include latent data {zi}and regression parameters β
I Full conditional posterior distributions:
1. π(zi | y,X, β) = N(x′iβ) truncated below by zero if yi = 1 andabove by zero if yi = 0.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Gibbs Sampling Steps
I Gibbs sampling relies on alternately sampling from fullconditional posterior distributions of unknown parameters
I After data augmentation, unknowns include latent data {zi}and regression parameters β
I Full conditional posterior distributions:
1. π(zi | y,X, β) = N(x′iβ) truncated below by zero if yi = 1 andabove by zero if yi = 0.
2. π(β | z, y,X) = Np(β̂, Σ̂β), Σ̂β = (Σ−1β + X′X)−1,
β̂ = Σ̂β
(Σ−1
β β0 + X′z).
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Gibbs Sampling Implementation
I To implement Gibbs sampling, we simply iteratively samplefrom these full conditional posterior distributions a largenumber of times
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Gibbs Sampling Implementation
I To implement Gibbs sampling, we simply iteratively samplefrom these full conditional posterior distributions a largenumber of times
I An initial burn-in is discarded to allow convergence to astationary distribution
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Gibbs Sampling Implementation
I To implement Gibbs sampling, we simply iteratively samplefrom these full conditional posterior distributions a largenumber of times
I An initial burn-in is discarded to allow convergence to astationary distribution
I Inferences can be based on posterior summaries calculatedusing the draws from the joint posterior distribution.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Gibbs Sampling Implementation
I To implement Gibbs sampling, we simply iteratively samplefrom these full conditional posterior distributions a largenumber of times
I An initial burn-in is discarded to allow convergence to astationary distribution
I Inferences can be based on posterior summaries calculatedusing the draws from the joint posterior distribution.
I WinBUGS provides an easy to use & free software package forimplementing Gibbs sampling for complex models.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Returning to the DDE and Premature Birth Application
I We chose a normal prior, π(β) = N7(β; 0, 4 × I7×7)(motivated by shrinkage considerations)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Returning to the DDE and Premature Birth Application
I We chose a normal prior, π(β) = N7(β; 0, 4 × I7×7)(motivated by shrinkage considerations)
I Choosing β = 0 as the starting value, we ran the Gibbssampler 1,000 iterations, discarding the first 100 as a burn-in
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Returning to the DDE and Premature Birth Application
I We chose a normal prior, π(β) = N7(β; 0, 4 × I7×7)(motivated by shrinkage considerations)
I Choosing β = 0 as the starting value, we ran the Gibbssampler 1,000 iterations, discarding the first 100 as a burn-in
I In general, more samples should be taken for complex models
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Gibbs Sampling Trace Plots
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Convergence and Mixing Issues
I Whenever MCMC algorithms are used, trace plots of differentparameters should be carefully examined.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Convergence and Mixing Issues
I Whenever MCMC algorithms are used, trace plots of differentparameters should be carefully examined.
I In complex models, convergence and mixing are often ofconcern.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Convergence and Mixing Issues
I Whenever MCMC algorithms are used, trace plots of differentparameters should be carefully examined.
I In complex models, convergence and mixing are often ofconcern.
I Slow convergence - the chain takes a long time to convergeto a stationary distribution, so that a long burn-in is needed.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Convergence and Mixing Issues
I Whenever MCMC algorithms are used, trace plots of differentparameters should be carefully examined.
I In complex models, convergence and mixing are often ofconcern.
I Slow convergence - the chain takes a long time to convergeto a stationary distribution, so that a long burn-in is needed.
I Slow mixing - high autocorrelation in the samples even afterconvergence, so that a very large number of samples is neededto reduce Monte Carlo error.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Estimated Posterior Densities
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Posterior Summaries of Regression Parameters
Parameter Mean Median SD 95% credible interval
β1 -1.08 -1.08 0.04 (-1.16, -1.01)β2 0.17 0.17 0.03 (0.12, 0.23)β3 -0.13 -0.13 0.04 (-0.2, -0.05)β4 0.11 0.11 0.03 (0.05, 0.18)β5 -0.02 -0.02 0.03 (-0.08, 0.05)β6 -0.08 -0.08 0.04 (-0.15, -0.02)β7 0.05 0.06 0.06 (-0.07, 0.18)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Maximum Likelihood Results
Parameter MLE SE Z stat p-value
β1 -1.08 0.04 -24.8 < 2e − 16β2 0.18 0.03 6.03 1.67e-09β3 -0.13 0.04 -3.63 0.0003β4 0.11 0.03 3.30 0.001β5 -0.02 0.03 -0.501 0.617β6 -0.08 0.04 -2.30 0.022β7 0.05 0.06 0.844 0.399
β2 = dde slope (highly significant increasing trend)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Fitting Bayesian GLMs in SAS
I We repeated our Bayesian analysis using BGENMOD (a newSAS proc for Bayes analysis of GLMs).
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Fitting Bayesian GLMs in SAS
I We repeated our Bayesian analysis using BGENMOD (a newSAS proc for Bayes analysis of GLMs).
I Very simple to implement in few lines of code and gaveidentical results to our R implementation (different MCMCimplementation)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Fitting Bayesian GLMs in SAS
I We repeated our Bayesian analysis using BGENMOD (a newSAS proc for Bayes analysis of GLMs).
I Very simple to implement in few lines of code and gaveidentical results to our R implementation (different MCMCimplementation)
I Automatically outputs posterior summaries, trace plots,convergence diagnostics, etc
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Basic DefinitionsPosterior Computation via MCMCEpidemiologic Application
Fitting Bayesian GLMs in SAS
I We repeated our Bayesian analysis using BGENMOD (a newSAS proc for Bayes analysis of GLMs).
I Very simple to implement in few lines of code and gaveidentical results to our R implementation (different MCMCimplementation)
I Automatically outputs posterior summaries, trace plots,convergence diagnostics, etc
I UNC Bayes in SAS Conference, May 17-18(www.sph.unc.edu/bios for details)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Suppose the true model is unknown
I In the DDE application, we assumed that we knew in advancethat the probit model with pre-specified predictors wasappropriate.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Suppose the true model is unknown
I In the DDE application, we assumed that we knew in advancethat the probit model with pre-specified predictors wasappropriate.
I There is typically substantial uncertainty in the model & it ismore realistic to suppose that there is a list of a prioriplausible models.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Suppose the true model is unknown
I In the DDE application, we assumed that we knew in advancethat the probit model with pre-specified predictors wasappropriate.
I There is typically substantial uncertainty in the model & it ismore realistic to suppose that there is a list of a prioriplausible models.
I Typical Strategy: sequentially change model until a good fit isproduced, and then base inferences/predictions on the finalselected model.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Suppose the true model is unknown
I In the DDE application, we assumed that we knew in advancethat the probit model with pre-specified predictors wasappropriate.
I There is typically substantial uncertainty in the model & it ismore realistic to suppose that there is a list of a prioriplausible models.
I Typical Strategy: sequentially change model until a good fit isproduced, and then base inferences/predictions on the finalselected model.
I Strategy is flawed in ignoring uncertainty in the modelselection process – leads to major bias in many cases.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayes Model Uncertainty
I Let M ∈ M denote a model index, with M a list of possiblemodels.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayes Model Uncertainty
I Let M ∈ M denote a model index, with M a list of possiblemodels.
I To allow for model uncertainty, Bayesian’s first choose:
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayes Model Uncertainty
I Let M ∈ M denote a model index, with M a list of possiblemodels.
I To allow for model uncertainty, Bayesian’s first choose:
1. A prior probability for each model: Pr(M = m) = πm, m ∈ M.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayes Model Uncertainty
I Let M ∈ M denote a model index, with M a list of possiblemodels.
I To allow for model uncertainty, Bayesian’s first choose:
1. A prior probability for each model: Pr(M = m) = πm, m ∈ M.2. Priors for the coefficients within each model, π(θm), m ∈ M.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayes Model Uncertainty
I Let M ∈ M denote a model index, with M a list of possiblemodels.
I To allow for model uncertainty, Bayesian’s first choose:
1. A prior probability for each model: Pr(M = m) = πm, m ∈ M.2. Priors for the coefficients within each model, π(θm), m ∈ M.
I Given data, y, the posterior probability of model M = m is
π̂m = Pr(M = m | y) =πm Lm(y)∑l∈M πlLl (y)
,
where Lm(y) =∫
L(y |M = m, θm)π(θm)dθm is the marginallikelihood for model M = m
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Some Comments
I In the absence of prior knowledge about which models in thelist are more plausible, one often lets πm = 1/#M, with #Mthe number of models.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Some Comments
I In the absence of prior knowledge about which models in thelist are more plausible, one often lets πm = 1/#M, with #Mthe number of models.
I The highest posterior probability model is then the model withthe highest marginal likelihood.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Some Comments
I In the absence of prior knowledge about which models in thelist are more plausible, one often lets πm = 1/#M, with #Mthe number of models.
I The highest posterior probability model is then the model withthe highest marginal likelihood.
I Unlike the maximized likelihood, the marginal likelihood hasan implicit penalty for model complexity.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Some Comments
I In the absence of prior knowledge about which models in thelist are more plausible, one often lets πm = 1/#M, with #Mthe number of models.
I The highest posterior probability model is then the model withthe highest marginal likelihood.
I Unlike the maximized likelihood, the marginal likelihood hasan implicit penalty for model complexity.
I This penalty is due to the integration across the prior, whichis higher dimensional in larger models.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Impact of Prior on Coefficients
I The prior, π(θm), on the coefficients within each model playsan important role.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Impact of Prior on Coefficients
I The prior, π(θm), on the coefficients within each model playsan important role.
I As the variance of the priors on the coefficients within eachmodel increase, the penalty for model complexity alsoincreases.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Impact of Prior on Coefficients
I The prior, π(θm), on the coefficients within each model playsan important role.
I As the variance of the priors on the coefficients within eachmodel increase, the penalty for model complexity alsoincreases.
I Hence, for higher variance priors, one tends to favor smallermodels.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Impact of Prior on Coefficients
I The prior, π(θm), on the coefficients within each model playsan important role.
I As the variance of the priors on the coefficients within eachmodel increase, the penalty for model complexity alsoincreases.
I Hence, for higher variance priors, one tends to favor smallermodels.
I Using the BIC criteria is approximately equivalent to assuminga unit information prior, which is quite vague, so that BICfavors small models.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Impact of Prior on Coefficients
I The prior, π(θm), on the coefficients within each model playsan important role.
I As the variance of the priors on the coefficients within eachmodel increase, the penalty for model complexity alsoincreases.
I Hence, for higher variance priors, one tends to favor smallermodels.
I Using the BIC criteria is approximately equivalent to assuminga unit information prior, which is quite vague, so that BICfavors small models.
I By estimating the variance, one can obtain a data adaptivepenalty (George & Foster, 2000)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayes Factors
I The Bayes factor (BF) can be used as a summary of theweight of evidence in the data in favor of model m1 overmodel m2.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayes Factors
I The Bayes factor (BF) can be used as a summary of theweight of evidence in the data in favor of model m1 overmodel m2.
I The BF for model m1 over m2 is defined as the ratio ofposterior to prior odds, which is simply:
BF12 =L1(y)
L2(y),
a ratio of marginal likelihoods.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayes Factors
I The Bayes factor (BF) can be used as a summary of theweight of evidence in the data in favor of model m1 overmodel m2.
I The BF for model m1 over m2 is defined as the ratio ofposterior to prior odds, which is simply:
BF12 =L1(y)
L2(y),
a ratio of marginal likelihoods.
I Values of BF12 > 1 suggest that model m1 is preferred, withthe weight of evidence in favor of m1 increasing as BF12
increases.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayesian Model Averaging (BMA)
I Posterior model probabilities can be used for model selectionand inferences.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayesian Model Averaging (BMA)
I Posterior model probabilities can be used for model selectionand inferences.
I When focus is on prediction, BMA preferred to modelselection (Madigan and Raftery, 1994)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayesian Model Averaging (BMA)
I Posterior model probabilities can be used for model selectionand inferences.
I When focus is on prediction, BMA preferred to modelselection (Madigan and Raftery, 1994)
I To predict yn+1 given xn+1, BMA relies on:
f (yn+1 | xn+1) =∑
m∈M
π̂m
∫L(yn+1 |M = m, θm)
×π(θm |M = m, y,X) dθm.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayes Model Uncertainty - Practical Issues
I Computation of the posterior model probabilities requirescalculation of the marginal likelihoods, Lm(y)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayes Model Uncertainty - Practical Issues
I Computation of the posterior model probabilities requirescalculation of the marginal likelihoods, Lm(y)
I These marginal likelihoods are not automatically produced bytypical MCMC algorithms
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayes Model Uncertainty - Practical Issues
I Computation of the posterior model probabilities requirescalculation of the marginal likelihoods, Lm(y)
I These marginal likelihoods are not automatically produced bytypical MCMC algorithms
I Routine implementations rely on the Laplace approximation(Tierney and Kadane, 1986; Raftery, 1996)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayes Model Uncertainty - Practical Issues
I Computation of the posterior model probabilities requirescalculation of the marginal likelihoods, Lm(y)
I These marginal likelihoods are not automatically produced bytypical MCMC algorithms
I Routine implementations rely on the Laplace approximation(Tierney and Kadane, 1986; Raftery, 1996)
I In large model spaces, it is not feasible to do calculations forall the models, so search algorithms are used.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayes Model Uncertainty - Practical Issues
I Computation of the posterior model probabilities requirescalculation of the marginal likelihoods, Lm(y)
I These marginal likelihoods are not automatically produced bytypical MCMC algorithms
I Routine implementations rely on the Laplace approximation(Tierney and Kadane, 1986; Raftery, 1996)
I In large model spaces, it is not feasible to do calculations forall the models, so search algorithms are used.
I Refer to Hoeting et al. (1999) for a tutorial on BMA
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayesian Variable Selection
I Suppose we start with a vector of p candidate predictors,xi = (xi1, . . . , xip)
′
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayesian Variable Selection
I Suppose we start with a vector of p candidate predictors,xi = (xi1, . . . , xip)
′
I A very common type of model uncertainty corresponds touncertainty in which predictors to include in the model.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayesian Variable Selection
I Suppose we start with a vector of p candidate predictors,xi = (xi1, . . . , xip)
′
I A very common type of model uncertainty corresponds touncertainty in which predictors to include in the model.
I In this case, we end up with a list of 2p different models,corresponding to each of the p candidate predictors beingexcluded or not.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Stochastic Search Variable Selection (SSVS)
I George and McCulloch (1993, 1997) proposed a Gibbssampling approach for the variable selection problem.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Stochastic Search Variable Selection (SSVS)
I George and McCulloch (1993, 1997) proposed a Gibbssampling approach for the variable selection problem.
I Similar approaches have been very widely used in applications.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Stochastic Search Variable Selection (SSVS)
I George and McCulloch (1993, 1997) proposed a Gibbssampling approach for the variable selection problem.
I Similar approaches have been very widely used in applications.
I The SSVS idea will be illustration through a return to theDDE and preterm birth application
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayes Variable Selection in Probit Regression
I Earlier we focused on the model, Pr(yi = 1 | xi ,β) = Φ(x′iβ),with yi an indicator of premature delivery.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayes Variable Selection in Probit Regression
I Earlier we focused on the model, Pr(yi = 1 | xi ,β) = Φ(x′iβ),with yi an indicator of premature delivery.
I Previously, we chose a N7(0, 4I) prior for β, assuming all 7predictors were included.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Bayes Variable Selection in Probit Regression
I Earlier we focused on the model, Pr(yi = 1 | xi ,β) = Φ(x′iβ),with yi an indicator of premature delivery.
I Previously, we chose a N7(0, 4I) prior for β, assuming all 7predictors were included.
I To account for uncertainty in subset selection, choose amixture prior:
π(β) =
p∏
j=1
{δ0(βj )p0j + (1 − p0j)N(βj ; 0, c
2j )
},
where p0j is the prior probability of excluding the jth predictorby setting its coefficient to 0
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
SSVS in Probit Regression
I The data augmentation Gibbs sampler described earlier can beeasily adapted.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
SSVS in Probit Regression
I The data augmentation Gibbs sampler described earlier can beeasily adapted.
I Sample from the conditional posterior distributions of βj , forj = 1, . . . , p:
π(βj |β(−j), z, y,X) = p̂jδ0(βj ) + (1 − p̂0j)N(βj ;Ej ,Vj),
where Vj = (c−2j + X′X)−1, Ej = VjX
′z, and
p̂j =p0j
p0j + (1 − p0j )N(0;0,c2
j)
N(0;Ej ,Vj)
is the conditional probability of βj = 0 (i.e., we exclude thejth predictor)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
SSVS - Comments
I After convergence, generates samples of models,corresponding to subsets of the set of p candidate predictors,from the posterior distribution.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
SSVS - Comments
I After convergence, generates samples of models,corresponding to subsets of the set of p candidate predictors,from the posterior distribution.
I Based on a large number of SSVS iterations, we can estimateposterior probabilities for each of the models.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
SSVS - Comments
I After convergence, generates samples of models,corresponding to subsets of the set of p candidate predictors,from the posterior distribution.
I Based on a large number of SSVS iterations, we can estimateposterior probabilities for each of the models.
I For example, the full model may appear in 10% of thesamples collected after convergence, so that model would beassigned posterior probability of 0.10.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
SSVS - Comments
I After convergence, generates samples of models,corresponding to subsets of the set of p candidate predictors,from the posterior distribution.
I Based on a large number of SSVS iterations, we can estimateposterior probabilities for each of the models.
I For example, the full model may appear in 10% of thesamples collected after convergence, so that model would beassigned posterior probability of 0.10.
I To summarize, one can present a table of the top 10 or 100models
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
SSVS - Comments
I After convergence, generates samples of models,corresponding to subsets of the set of p candidate predictors,from the posterior distribution.
I Based on a large number of SSVS iterations, we can estimateposterior probabilities for each of the models.
I For example, the full model may appear in 10% of thesamples collected after convergence, so that model would beassigned posterior probability of 0.10.
I To summarize, one can present a table of the top 10 or 100models
I Potentially more useful to calculate marginal inclusionprobabilities.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Samples from Posterior - DDE application (normal prior)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Samples from Posterior - DDE application (mixture prior)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
SSVS - Comments
I Samples congregate on 0 for the regression coefficient forpredictors that are not as important.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
SSVS - Comments
I Samples congregate on 0 for the regression coefficient forpredictors that are not as important.
I Such samples correspond to models with that predictorexcluded.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
SSVS - Comments
I Samples congregate on 0 for the regression coefficient forpredictors that are not as important.
I Such samples correspond to models with that predictorexcluded.
I Even though the prior probabilities of exclusion are the same,posterior probabilities vary greatly for the different predictors.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Posterior Summaries - Normal prior analysis
Parameter Mean Median SD 95% credible interval
β1 -1.08 -1.08 0.04 (-1.16, -1.01)β2 0.17 0.17 0.03 (0.12, 0.23)β3 -0.13 -0.13 0.04 (-0.2, -0.05)β4 0.11 0.11 0.03 (0.05, 0.18)β5 -0.02 -0.02 0.03 (-0.08, 0.05)β6 -0.08 -0.08 0.04 (-0.15, -0.02)β7 0.05 0.06 0.06 (-0.07, 0.18)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Posterior Summaries - Mixture prior analysis
Parameter Mean Median SD 95% CI Pr(βj = 0 | data)
β1 -1.05 -1.05 0.03 (-1.12, -0.99) 0.00β2 0.18 0.18 0.03 (0.12, 0.23) 0.00β3 -0.08 -0.09 0.06 (-0.19, 0.00) 0.36β4 0.05 0.00 0.06 (0.00, 0.16) 0.50β5 0.00 0.00 0.01 (0.00, 0.00) 0.98β6 -0.02 0.00 0.04 (-0.13, 0.00) 0.72β7 0.01 0.00 0.02 (0.00, 0.1) 0.93
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Posterior Probabilities of Visited Models
π̂m Model Indicator
1 0.24981301421092 1 1 0 0 0 0 02 0.225878833208676 1 1 1 1 0 0 03 0.196958364497632 1 1 1 1 0 1 04 0.139865370231862 1 1 1 0 0 0 05 0.0363999002742458 1 1 0 0 0 1 06 0.0304163550236849 1 1 0 1 0 0 07 0.0274245823984044 1 1 1 0 0 1 08 0.0206930939915233 1 1 0 0 0 0 19 0.0177013213662428 1 1 1 1 0 0 110 0.012216404886562 1 1 1 0 0 0 1
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Some Comments on DDE Application Results
I In 4,000 Gibbs iterations only 26/128 = 20.3% of the modelswere visited
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Some Comments on DDE Application Results
I In 4,000 Gibbs iterations only 26/128 = 20.3% of the modelswere visited
I There wasn’t a single dominant model, but none of themodels excluded the intercept or DDE slope.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
Some Comments on DDE Application Results
I In 4,000 Gibbs iterations only 26/128 = 20.3% of the modelswere visited
I There wasn’t a single dominant model, but none of themodels excluded the intercept or DDE slope.
I All of the better models included the 3rd & 5th of the 5possible confounders
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
General Comments on Bayes Model Uncertainty
I SSVS provides very useful approach.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
General Comments on Bayes Model Uncertainty
I SSVS provides very useful approach.
I For large numbers of candidate predictors, shotgun stochasticsearch provides an alternative (Hans et al., 2007).
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
General Comments on Bayes Model Uncertainty
I SSVS provides very useful approach.
I For large numbers of candidate predictors, shotgun stochasticsearch provides an alternative (Hans et al., 2007).
I SSVS has also been adapted to select predictors with randomeffects (Cai and Dunson, 2006)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Formulation of ProblemVariable Selection & Stochastic SearchEpidemiologic Application
General Comments on Bayes Model Uncertainty
I SSVS provides very useful approach.
I For large numbers of candidate predictors, shotgun stochasticsearch provides an alternative (Hans et al., 2007).
I SSVS has also been adapted to select predictors with randomeffects (Cai and Dunson, 2006)
I For routine implementation, one can rely on Laplaceapproximations to marginal likelihoods.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
General FormulationPosterior Computation
Missing Data Introduction
I Many (if not most) studies are faced with problems withmissing data
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
General FormulationPosterior Computation
Missing Data Introduction
I Many (if not most) studies are faced with problems withmissing data
I Bayesian methods provide a natural framework for accountingfor missing data without need to rely on ad hoc imputation
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
General FormulationPosterior Computation
Missing Data Introduction
I Many (if not most) studies are faced with problems withmissing data
I Bayesian methods provide a natural framework for accountingfor missing data without need to rely on ad hoc imputation
I Focus: missing predictors in regression models
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
General FormulationPosterior Computation
Missing Predictors in Regression
I Suppose we are interested in the general linear model:
yi = x′iβ + εi , εi ∼ N(0, σ2),
with xi = (xi1, . . . , xip)′ a vector of predictors that may have
missing values
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
General FormulationPosterior Computation
Missing Predictors in Regression
I Suppose we are interested in the general linear model:
yi = x′iβ + εi , εi ∼ N(0, σ2),
with xi = (xi1, . . . , xip)′ a vector of predictors that may have
missing valuesI Assume that the missing predictors are missing at random
(MAR), so that missingness is conditionally independent ofthe unmeasured value given the observed data.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
General FormulationPosterior Computation
Missing Predictors in Regression
I Suppose we are interested in the general linear model:
yi = x′iβ + εi , εi ∼ N(0, σ2),
with xi = (xi1, . . . , xip)′ a vector of predictors that may have
missing valuesI Assume that the missing predictors are missing at random
(MAR), so that missingness is conditionally independent ofthe unmeasured value given the observed data.
I To accommodate missing predictors, we need to specify ajoint distribution for xi (typically chosen as normal or asequence of conditional GLMs).
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
General FormulationPosterior Computation
Missing Predictors in Regression
I Suppose we are interested in the general linear model:
yi = x′iβ + εi , εi ∼ N(0, σ2),
with xi = (xi1, . . . , xip)′ a vector of predictors that may have
missing valuesI Assume that the missing predictors are missing at random
(MAR), so that missingness is conditionally independent ofthe unmeasured value given the observed data.
I To accommodate missing predictors, we need to specify ajoint distribution for xi (typically chosen as normal or asequence of conditional GLMs).
I Then, the missing values are simply additional unknowns tobe updated in the MCMC algorithm.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
General FormulationPosterior Computation
Gibbs Sampler
I When the predictors have a normal likelihood and we have alinear regression model, missing predictors can beaccommodated in a simple Gibbs sampler:
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
General FormulationPosterior Computation
Gibbs Sampler
I When the predictors have a normal likelihood and we have alinear regression model, missing predictors can beaccommodated in a simple Gibbs sampler:
1. Starting with an initial value for β, σ2, sample the missingpredictor values from their normal full conditional.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
General FormulationPosterior Computation
Gibbs Sampler
I When the predictors have a normal likelihood and we have alinear regression model, missing predictors can beaccommodated in a simple Gibbs sampler:
1. Starting with an initial value for β, σ2, sample the missingpredictor values from their normal full conditional.
2. Given the imputed data, sample β and σ2 from their fullconditional posterior distributions.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
General FormulationPosterior Computation
Gibbs Sampler
I When the predictors have a normal likelihood and we have alinear regression model, missing predictors can beaccommodated in a simple Gibbs sampler:
1. Starting with an initial value for β, σ2, sample the missingpredictor values from their normal full conditional.
2. Given the imputed data, sample β and σ2 from their fullconditional posterior distributions.
I This algorithm is easily adapted for non-normal likelihoods foryi and xi , and the resulting Gibbs sampler can beimplemented in WinBUGS.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
General FormulationPosterior Computation
Some Comments
I The Bayesian approach is often used to obtain multipleimputed data sets, which are then combined using frequentistmethods.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
General FormulationPosterior Computation
Some Comments
I The Bayesian approach is often used to obtain multipleimputed data sets, which are then combined using frequentistmethods.
I This approach avoids ad hoc imputation methods, such asbootstrapping, which often have implicit missing completelyat random assumptions.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
General FormulationPosterior Computation
Some Comments
I The Bayesian approach is often used to obtain multipleimputed data sets, which are then combined using frequentistmethods.
I This approach avoids ad hoc imputation methods, such asbootstrapping, which often have implicit missing completelyat random assumptions.
I We have assumed that missingness is non-informative.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
General FormulationPosterior Computation
Some Comments
I The Bayesian approach is often used to obtain multipleimputed data sets, which are then combined using frequentistmethods.
I This approach avoids ad hoc imputation methods, such asbootstrapping, which often have implicit missing completelyat random assumptions.
I We have assumed that missingness is non-informative.
I Shared random effects models can be used to account forinformative missingness and censoring.
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Summary
I Very brief introduction to Bayesian statistics
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Summary
I Very brief introduction to Bayesian statistics
I Emphasis on regression models, variable selection & missingpredictors
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Summary
I Very brief introduction to Bayesian statistics
I Emphasis on regression models, variable selection & missingpredictors
I Ideas related to model uncertainty and missing data can begeneralized to much broader settings
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data
OutlineIntroduction to Bayesian Statistics
Model UncertaintyMissing Data
Concluding Remarks
Summary
I Very brief introduction to Bayesian statistics
I Emphasis on regression models, variable selection & missingpredictors
I Ideas related to model uncertainty and missing data can begeneralized to much broader settings
I Many of the MCMC algorithms are easy to program, but thereare also a number of packages available (WinBUGS, Rfunctions for Bayes model averaging in GLMs, etc)
David Dunson Bayesian Statistics: Model Uncertainty & Missing Data