Bayesian Model Diagnostics and...

34
Bayesian Model Diagnostics and Checking Earvin Balderama Quantitative Ecology Lab Department of Forestry and Environmental Resources North Carolina State University April 12, 2013 1 / 34 Bayesian Model Diagnostics and Checking c 2013 by E. Balderama

Transcript of Bayesian Model Diagnostics and...

Page 1: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Bayesian Model Diagnostics and Checking

Earvin Balderama

Quantitative Ecology LabDepartment of Forestry and Environmental Resources

North Carolina State University

April 12, 2013

1 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 2: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Introduction

MCMCMC

2 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 3: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Introduction

MCMCMC

Steps in (Bayesian) Modeling1 Model Creation2 Model Checking (Criticism)3 Model Comparison

3 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 4: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Introduction

Outline

1 Popular Bayesian Diagnostics2 Predictive Distributions3 Posterior Predictive Checks4 Model Comparison

Precision

Accuracy

Extreme Values

5 Conclusion

4 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 5: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Introduction

Modeling

Classical methods1 Standardized Pearson residuals2 p-values3 Likelihood ratio4 MLE

also apply in Bayesian Analysis;1 Posterior mean of the standardized residuals.2 Posterior probabilities3 Bayes factor4 Posterior mean

5 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 6: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Popular Model Fit Statistics

Bayes Factor

For determining which model fits the data better, the Bayes factor iscommonly used in a hypothesis test.

Bayes factorThe odds ratio of the data favoring Model 1 over Model 2 is given by

BF =f1(y)f2(y)

=

∫f (y|θ1)f (θ1)dθ1∫f (y|θ2)f (θ2)dθ2

(1)

More robust than frequentist hypothesis testing.

Difficult to compute, although easy to approximate with software.

Only defined for proper marginal density functions.

6 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 7: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Popular Model Fit Statistics

Bayes Factor

Computation is conditional that one of the models is true.Because of this, Gelman thinks Bayes factors are irrelevant.

Prefers looking at distance measures between data and model.

Many alternatives to choose from! One of which is...

7 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 8: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Popular Model Fit Statistics

DIC

Like many good measures of model fit and comparison, the DevianceInformation Criterion (DIC) includes

1 how well the model fits the data (goodness of fit) and2 the complexity of the model (effective number of parameters).

Deviance Information Criterion

DIC = D + pD

8 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 9: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Popular Model Fit Statistics

Deviance

Deviance

D = −2 log f (y|θ) (2)

Example: Deviance with Poisson likelihood

D = −2 log∏

i

[µyi

i e−µi

yi!

]= −2

∑i

[− µi + yi logµi − log(yi!)

]

9 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 10: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Popular Model Fit Statistics

DIC

Like many good measures of model fit and comparison, the DevianceInformation Criterion (DIC) includes

1 how well the model fits the data (goodness of fit) and2 the complexity of the model (effective number of parameters).

Deviance Information Criterion

DIC = D + pD (3)

10 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 11: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Popular Model Fit Statistics

DIC

Like many good measures of model fit and comparison, the DevianceInformation Criterion (DIC) includes

1 how well the model fits the data (goodness of fit) and2 the complexity of the model (effective number of parameters).

Deviance Information Criterion

DIC = D + pD (4)

where1 D = E(D) is the posterior mean of deviance.2 pD = D− D(θ) is “mean deviance - deviance at means.”

11 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 12: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Popular Model Fit Statistics

DIC

DIC can then be rewritten as

DIC = 2D− D(θ)

= D(θ) + 2pD

= −2 log f (y|θ) + 2pD

which is a generalization of

AIC = −2 log f (y|θMLE) + 2k (5)

DIC can be used to compare different models as well as differentmethods. Preferred models have low DIC values.

12 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 13: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Popular Model Fit Statistics

DIC

Requires joint posterior distribution to be approximatelymultivariate normal.Doesn’t work well with

highly non-linear models

mixture models with discrete parameters

models with missing data

If pD is negativelog-likelihood may be non-concave

prior may be misspecified

posterior mean may not be a good estimator

13 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 14: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Posterior Predictive Checks

Prior Predictive Distribution

One way to decide between competing models is to rank them basedon how “well” each model does in predicting future observations. InBayesian analyses, predictive distributions are used for this kind ofdecision.Before data is observed, what could we use for predictions?

Marginal likelihood

f (y) =∫

f (y|θ)f (θ)dθ (6)

The marginal likelihood is what one would expect data to look likeafter averaging over the prior distribution of θ , so it is called the priorpredictive distribution.

14 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 15: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Posterior Predictive Checks

Posterior Predictive Distribution

More interestingly, if a set of data y have already been observed, onecan predict future (or new or unobserved) y′ from the marginalposterior likelihood of y′, called the posterior predictive distribution.

Marginal posterior likelihood

f (y′|y) =∫

f (y′|θ)f (θ|y)dθ (7)

This distribution is what one would expect y′ to look like afterobserving y and averaging over the posterior distribution of θ given y .

15 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 16: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Posterior Predictive Checks

Posterior Predictive Distribution

Equivalently, y′ can be the missing values and treated as additionalparameters to be estimated in a Bayesian framework. Specifically, foreach month, we randomly give NA values to 10 observed sites tocreate a test set of n = 220 observations, {yi; i = 1, 2, . . . , n}. Then then posterior predictive distributions, {Pi; i = 1, 2, . . . , n}, can then beused to determine measures of overall model goodness-of-fit, as wellas predictive performance measures of each individual yi in the test set.

16 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 17: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Posterior Predictive Checks

Posterior Predictive Ordinate

The posterior predictive ordinate (PPO) is the density of the posteriorpredictive distribution evaluated at an observation yi. PPO can be usedto estimate the probability of observing yi in the future if after havingalready observed y.

PPOi = f (yi|y) =∫

f (yi|θ)f (θ|y)dθ (8)

We can estimate the ith posterior predictive ordinate by

PPOi =1S

S∑s=1

f (yi|θ(s)) (9)

17 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 18: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Posterior Predictive Checks

BUGS code

Example: Poisson count model

model{

#likelihoodfor(i in 1:N){

y[i] ~ dpois(mu[i])mu[i] <- ...

ppo.term[i] <- exp(-mu[i] + y[i]*log(mu[i]) - logfact(y[i]))

}

#priors...

}

18 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 19: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Posterior Predictive Checks

Conditional Predictive Ordinate

PPO is good for prediction, but violates the likelihood principle.

The conditional predictive ordinate (CPO) is based onleave-one-out-cross-validation.

CPO estimates the probability of observing yi in the future if afterhaving already observed y−i.

Low CPO values suggest possible outliers, high-leverage andinfluential observations.

19 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 20: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Posterior Predictive Checks

Conditional Predictive Ordinate

CPOi = f (yi|y−i) = · · · =[∫

1f (yi|θ)

f (θ|y)dθ]−1

(10)

Therefore, CPO can be estimated by taking the inverse of the posteriormean of the inverse density function value of yi (harmonic mean of thelikelihood of yi). Thus,

CPOi =

1S

S∑s=1

1

f(

yi|θ(s))−1

(11)

20 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 21: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Posterior Predictive Checks

Conditional Predictive Ordinate

ProofCPOi = f (yi|y−i) (12)

=

[f (y−i)

f (y)

]−1

(13)

=

[∫f (y−i|θ)f (θ)

f (y)dθ]−1

(14)

=

[∫1

f (yi|θ)f (y|θ)f (θ)

f (y)dθ]−1

(15)

=

[∫1

f (yi|θ)f (θ|y)dθ

]−1

(16)

=

[Eθ|y

(1

f (yi|θ)

)]−1

(17)

21 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 22: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Posterior Predictive Checks

Predictive Ordinates

Estimate of PPO

PPOi =1S

S∑s=1

f (yi|θ(s)) (18)

Estimate of CPO

CPOi =

1S

S∑s=1

1

f(

yi|θ(s))−1

(19)

22 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 23: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Posterior Predictive Checks

BUGS code

Example: Poisson count model

model{

#likelihoodfor(i in 1:N){

y[i] ~ dpois(mu[i])mu[i] <- ...

ppo.term[i] <- exp(-mu[i] + y[i]*log(mu[i]) - logfact(y[i]))icpo.term[i] <- 1/ppo.term[i]

}

#priors...

}

23 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 24: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Model Comparison

LPML

The sum of the log CPO’s and is an estimator for the log marginallikelihood.

The “best” model amongst competing models have the largestLPML.

Log-pseudo marginal likelihood

LPML =1n

n∑i=1

log(CPOi) (20)

24 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 25: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Model Comparison

Hypothesis Testing

A ratio of LPML’s is a surrogate for the Bayes factor.

Another overall measure for model comparison is the posterior Bayesfactor, which is simply the Bayes factor but using the posteriorpredictive distributions.

25 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 26: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Model Comparison

Measures of Predictive Precision

The mean absolute deviation (MAD) is the mean of the absolutevalues of the deviations between the actual observed value and themedian of its respective P .

Mean Absolute Deviation

MAD =1n

n∑i=1

∣∣∣yi − Pi

∣∣∣ (21)

Note: You can also use the median absolute deviation (also MAD!) fora more robust statistic.

26 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 27: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Model Comparison

Measures of Predictive Precision

MSE is the mean of the squared deviations (errors) between theactual observed value and the mean of its respective P .

The average standard deviation of the P’s can also be helpful.

Mean Squared Error

MSE =1n

n∑i=1

(yi − P i

)2(22)

Mean Standard Deviation

SD =1n

n∑i=1

σPi (23)

27 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 28: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Model Comparison

Measures of Predictive Accuracy

Coverage is the proportion of test set observations yi falling insidesome interval of their respective posterior predictive distributions Pi.

90% Coverage

C(90%) =1n

n∑i=1

[I(P(.05)

i < yi < P(.95)i

)](24)

where I is the indicator function and P(q) is the estimated qth quantile.

This shows how well the model does in creating posterior predictivedistributions that actually capture the true value.

28 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 29: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Model Comparison

Predictive Performance

Combining information from measures of precision (i.e Coverage) andmeasures of accuracy (i.e. SD) is important for model comparison.

A high coverage probability can simply be a result of high-varianceposterior predictive distributions.

29 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 30: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Model Comparison

Prediction of Extreme Values

The Brier score is the squared difference between the posteriorpredictive probability of exceeding a certain value and whether or notthe actual observation exceeds that value. This score can be used as ameasure of predictive accuracy of extreme values.

The Brier score for a test set observation, given a certain value c, canbe computed as

Brier Score

Brieri =[I(yi > c)− Pi(yi > c)

]2(25)

where I is the indicator function and P(y > c) is the posteriorpredictive probability that yi > c.

30 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 31: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Model Comparison

Prediction of Extreme Values

For each observation yi in the test set, the quantile score can becomputed as

Quantile Score

QSi = 2×[I(

yi < P(q)i

)− q]×[P(q)

i − yi

](26)

where I is the indicator function and P(q) is the estimated qth quantile.

31 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 32: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Model Comparison

Prediction of Extreme Values

The average of the quantile scores and the average of the Brier Scorescan be used for model comparison. They evaluate how well a modelcaptures extreme values.

Smaller scores are better.

Larger Brier scores suggest lack of predictive accuracy.

Larger quantile scores suggest that the observed value is very farfrom its estimated quantile value from P .

32 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 33: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Conclusion

Other Posterior Predictive Checks

1 Other Bayesian p-value tests.2 Gelman Chi-Square tests, and other Chi-Square tests.3 Quantile Ratio4 Predictive Concordance5 Bayesian Predictive Information Criterion (BPIC)6 L-criterion7 ...and many more...

33 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama

Page 34: Bayesian Model Diagnostics and Checkingwebpages.math.luc.edu/~ebalderama/myfiles/modelchecking101_pres.pdf · 3 Bayes factor 4 Posterior mean 5 / 34 Bayesian Model Diagnostics and

Conclusion

Summary

1 Separate your research into the three MC’s.2 List of model checks is not exhaustive!3 Choose some based on the focus of your research.4 Statistics only a guide.

34 / 34Bayesian Model Diagnostics and Checking

c© 2013 by E. Balderama