Bayesian Model Diagnostics and...
Transcript of Bayesian Model Diagnostics and...
Bayesian Model Diagnostics and Checking
Earvin Balderama
Quantitative Ecology LabDepartment of Forestry and Environmental Resources
North Carolina State University
April 12, 2013
1 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Introduction
MCMCMC
2 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Introduction
MCMCMC
Steps in (Bayesian) Modeling1 Model Creation2 Model Checking (Criticism)3 Model Comparison
3 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Introduction
Outline
1 Popular Bayesian Diagnostics2 Predictive Distributions3 Posterior Predictive Checks4 Model Comparison
Precision
Accuracy
Extreme Values
5 Conclusion
4 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Introduction
Modeling
Classical methods1 Standardized Pearson residuals2 p-values3 Likelihood ratio4 MLE
also apply in Bayesian Analysis;1 Posterior mean of the standardized residuals.2 Posterior probabilities3 Bayes factor4 Posterior mean
5 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Popular Model Fit Statistics
Bayes Factor
For determining which model fits the data better, the Bayes factor iscommonly used in a hypothesis test.
Bayes factorThe odds ratio of the data favoring Model 1 over Model 2 is given by
BF =f1(y)f2(y)
=
∫f (y|θ1)f (θ1)dθ1∫f (y|θ2)f (θ2)dθ2
(1)
More robust than frequentist hypothesis testing.
Difficult to compute, although easy to approximate with software.
Only defined for proper marginal density functions.
6 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Popular Model Fit Statistics
Bayes Factor
Computation is conditional that one of the models is true.Because of this, Gelman thinks Bayes factors are irrelevant.
Prefers looking at distance measures between data and model.
Many alternatives to choose from! One of which is...
7 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Popular Model Fit Statistics
DIC
Like many good measures of model fit and comparison, the DevianceInformation Criterion (DIC) includes
1 how well the model fits the data (goodness of fit) and2 the complexity of the model (effective number of parameters).
Deviance Information Criterion
DIC = D + pD
8 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Popular Model Fit Statistics
Deviance
Deviance
D = −2 log f (y|θ) (2)
Example: Deviance with Poisson likelihood
D = −2 log∏
i
[µyi
i e−µi
yi!
]= −2
∑i
[− µi + yi logµi − log(yi!)
]
9 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Popular Model Fit Statistics
DIC
Like many good measures of model fit and comparison, the DevianceInformation Criterion (DIC) includes
1 how well the model fits the data (goodness of fit) and2 the complexity of the model (effective number of parameters).
Deviance Information Criterion
DIC = D + pD (3)
10 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Popular Model Fit Statistics
DIC
Like many good measures of model fit and comparison, the DevianceInformation Criterion (DIC) includes
1 how well the model fits the data (goodness of fit) and2 the complexity of the model (effective number of parameters).
Deviance Information Criterion
DIC = D + pD (4)
where1 D = E(D) is the posterior mean of deviance.2 pD = D− D(θ) is “mean deviance - deviance at means.”
11 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Popular Model Fit Statistics
DIC
DIC can then be rewritten as
DIC = 2D− D(θ)
= D(θ) + 2pD
= −2 log f (y|θ) + 2pD
which is a generalization of
AIC = −2 log f (y|θMLE) + 2k (5)
DIC can be used to compare different models as well as differentmethods. Preferred models have low DIC values.
12 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Popular Model Fit Statistics
DIC
Requires joint posterior distribution to be approximatelymultivariate normal.Doesn’t work well with
highly non-linear models
mixture models with discrete parameters
models with missing data
If pD is negativelog-likelihood may be non-concave
prior may be misspecified
posterior mean may not be a good estimator
13 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Posterior Predictive Checks
Prior Predictive Distribution
One way to decide between competing models is to rank them basedon how “well” each model does in predicting future observations. InBayesian analyses, predictive distributions are used for this kind ofdecision.Before data is observed, what could we use for predictions?
Marginal likelihood
f (y) =∫
f (y|θ)f (θ)dθ (6)
The marginal likelihood is what one would expect data to look likeafter averaging over the prior distribution of θ , so it is called the priorpredictive distribution.
14 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Posterior Predictive Checks
Posterior Predictive Distribution
More interestingly, if a set of data y have already been observed, onecan predict future (or new or unobserved) y′ from the marginalposterior likelihood of y′, called the posterior predictive distribution.
Marginal posterior likelihood
f (y′|y) =∫
f (y′|θ)f (θ|y)dθ (7)
This distribution is what one would expect y′ to look like afterobserving y and averaging over the posterior distribution of θ given y .
15 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Posterior Predictive Checks
Posterior Predictive Distribution
Equivalently, y′ can be the missing values and treated as additionalparameters to be estimated in a Bayesian framework. Specifically, foreach month, we randomly give NA values to 10 observed sites tocreate a test set of n = 220 observations, {yi; i = 1, 2, . . . , n}. Then then posterior predictive distributions, {Pi; i = 1, 2, . . . , n}, can then beused to determine measures of overall model goodness-of-fit, as wellas predictive performance measures of each individual yi in the test set.
16 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Posterior Predictive Checks
Posterior Predictive Ordinate
The posterior predictive ordinate (PPO) is the density of the posteriorpredictive distribution evaluated at an observation yi. PPO can be usedto estimate the probability of observing yi in the future if after havingalready observed y.
PPOi = f (yi|y) =∫
f (yi|θ)f (θ|y)dθ (8)
We can estimate the ith posterior predictive ordinate by
PPOi =1S
S∑s=1
f (yi|θ(s)) (9)
17 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Posterior Predictive Checks
BUGS code
Example: Poisson count model
model{
#likelihoodfor(i in 1:N){
y[i] ~ dpois(mu[i])mu[i] <- ...
ppo.term[i] <- exp(-mu[i] + y[i]*log(mu[i]) - logfact(y[i]))
}
#priors...
}
18 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Posterior Predictive Checks
Conditional Predictive Ordinate
PPO is good for prediction, but violates the likelihood principle.
The conditional predictive ordinate (CPO) is based onleave-one-out-cross-validation.
CPO estimates the probability of observing yi in the future if afterhaving already observed y−i.
Low CPO values suggest possible outliers, high-leverage andinfluential observations.
19 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Posterior Predictive Checks
Conditional Predictive Ordinate
CPOi = f (yi|y−i) = · · · =[∫
1f (yi|θ)
f (θ|y)dθ]−1
(10)
Therefore, CPO can be estimated by taking the inverse of the posteriormean of the inverse density function value of yi (harmonic mean of thelikelihood of yi). Thus,
CPOi =
1S
S∑s=1
1
f(
yi|θ(s))−1
(11)
20 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Posterior Predictive Checks
Conditional Predictive Ordinate
ProofCPOi = f (yi|y−i) (12)
=
[f (y−i)
f (y)
]−1
(13)
=
[∫f (y−i|θ)f (θ)
f (y)dθ]−1
(14)
=
[∫1
f (yi|θ)f (y|θ)f (θ)
f (y)dθ]−1
(15)
=
[∫1
f (yi|θ)f (θ|y)dθ
]−1
(16)
=
[Eθ|y
(1
f (yi|θ)
)]−1
(17)
21 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Posterior Predictive Checks
Predictive Ordinates
Estimate of PPO
PPOi =1S
S∑s=1
f (yi|θ(s)) (18)
Estimate of CPO
CPOi =
1S
S∑s=1
1
f(
yi|θ(s))−1
(19)
22 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Posterior Predictive Checks
BUGS code
Example: Poisson count model
model{
#likelihoodfor(i in 1:N){
y[i] ~ dpois(mu[i])mu[i] <- ...
ppo.term[i] <- exp(-mu[i] + y[i]*log(mu[i]) - logfact(y[i]))icpo.term[i] <- 1/ppo.term[i]
}
#priors...
}
23 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Model Comparison
LPML
The sum of the log CPO’s and is an estimator for the log marginallikelihood.
The “best” model amongst competing models have the largestLPML.
Log-pseudo marginal likelihood
LPML =1n
n∑i=1
log(CPOi) (20)
24 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Model Comparison
Hypothesis Testing
A ratio of LPML’s is a surrogate for the Bayes factor.
Another overall measure for model comparison is the posterior Bayesfactor, which is simply the Bayes factor but using the posteriorpredictive distributions.
25 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Model Comparison
Measures of Predictive Precision
The mean absolute deviation (MAD) is the mean of the absolutevalues of the deviations between the actual observed value and themedian of its respective P .
Mean Absolute Deviation
MAD =1n
n∑i=1
∣∣∣yi − Pi
∣∣∣ (21)
Note: You can also use the median absolute deviation (also MAD!) fora more robust statistic.
26 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Model Comparison
Measures of Predictive Precision
MSE is the mean of the squared deviations (errors) between theactual observed value and the mean of its respective P .
The average standard deviation of the P’s can also be helpful.
Mean Squared Error
MSE =1n
n∑i=1
(yi − P i
)2(22)
Mean Standard Deviation
SD =1n
n∑i=1
σPi (23)
27 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Model Comparison
Measures of Predictive Accuracy
Coverage is the proportion of test set observations yi falling insidesome interval of their respective posterior predictive distributions Pi.
90% Coverage
C(90%) =1n
n∑i=1
[I(P(.05)
i < yi < P(.95)i
)](24)
where I is the indicator function and P(q) is the estimated qth quantile.
This shows how well the model does in creating posterior predictivedistributions that actually capture the true value.
28 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Model Comparison
Predictive Performance
Combining information from measures of precision (i.e Coverage) andmeasures of accuracy (i.e. SD) is important for model comparison.
A high coverage probability can simply be a result of high-varianceposterior predictive distributions.
29 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Model Comparison
Prediction of Extreme Values
The Brier score is the squared difference between the posteriorpredictive probability of exceeding a certain value and whether or notthe actual observation exceeds that value. This score can be used as ameasure of predictive accuracy of extreme values.
The Brier score for a test set observation, given a certain value c, canbe computed as
Brier Score
Brieri =[I(yi > c)− Pi(yi > c)
]2(25)
where I is the indicator function and P(y > c) is the posteriorpredictive probability that yi > c.
30 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Model Comparison
Prediction of Extreme Values
For each observation yi in the test set, the quantile score can becomputed as
Quantile Score
QSi = 2×[I(
yi < P(q)i
)− q]×[P(q)
i − yi
](26)
where I is the indicator function and P(q) is the estimated qth quantile.
31 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Model Comparison
Prediction of Extreme Values
The average of the quantile scores and the average of the Brier Scorescan be used for model comparison. They evaluate how well a modelcaptures extreme values.
Smaller scores are better.
Larger Brier scores suggest lack of predictive accuracy.
Larger quantile scores suggest that the observed value is very farfrom its estimated quantile value from P .
32 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Conclusion
Other Posterior Predictive Checks
1 Other Bayesian p-value tests.2 Gelman Chi-Square tests, and other Chi-Square tests.3 Quantile Ratio4 Predictive Concordance5 Bayesian Predictive Information Criterion (BPIC)6 L-criterion7 ...and many more...
33 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama
Conclusion
Summary
1 Separate your research into the three MC’s.2 List of model checks is not exhaustive!3 Choose some based on the focus of your research.4 Statistics only a guide.
34 / 34Bayesian Model Diagnostics and Checking
c© 2013 by E. Balderama