1 Bayesian data analysis 1 using Bugs 2 and R 3 1 Fit complicated models 2 clearly 3 display and...

1

Bayesian data analysis1

using Bugs2

and R3

1Fit complicated models

2clearly

3display and understand results

2

Bayesian data analysisusing BUGS and R

Prof. Andrew GelmanDept. of Statistics and Political

ScienceColumbia University

Joint Program in Survey Methodology, University of Maryland, College ParkJanuary 17-18, 2008

3

Introduction to the course

4

Goals Should be able to…

Fit multilevel models and display and understand the results

Use Bugs from R Theoretical

Connection between classical linear models and GLM, multilevel models, and Bayesian inference

Practical Dealing with data and graphing in R Reformatting models in Bugs to get them to

work

5

Realistic goals

You won’t be able to go out and fit Bayesian models

But . . . you’ll have a sense of what Bayesian data analysis “feels like”: Model building Model fitting Model checking

6

Overview What is Bayesian data analysis? Why Bayes? Why R and Bugs [schools example] Hierarchical linear regression [radon example] Bayesian data analysis: what it is and what it is not [several

examples] Model checking [several examples] Hierarchical Poisson regression [police stops example] Understanding the Gibbs sampler and Metropolis algorithm Troubleshooting: simulations don’t work, run too slowly, make no

sense Advanced computation [social networks example] Accounting for design of surveys and experiments [polling example] Prior distributions [logistic-regression separation example] If there is time: hierarchical logistic regression with varying

intercepts and slopes [red/blue states example] Loose ends

7

Structure of this course Main narrative on PowerPoint Formulas on blackboard On my computer and yours:

Bugs, R, and a text editor Data, Bugs models, and R scripts for the

examples We do computer demos together Pause to work on your own to play with

the models—we then discuss together Lecturing for motivating examples Interrupt to ask questions

8

Structure of this course Several examples. For each:

Set up the Bugs model Use R to set up the data and initial values, and to run Bugs Use R to understand the inferences Sometimes fit model directly in R Play with variations on your own computer

Understanding how Bugs works and how to work with Bugs

More examples Working around problems in Bugs Using the inferences to answer real questions

Not much theory. If you want more theory, let me know!

Emphasis on general methods for model construction, expansion, patching, and understanding

9

Introduction to Bayes

10

What is Bayesian data analysis? Why Bayes? Effective and flexible

Modular (“Tinkertoy”) approach Combines information from different

sources Estimate inferences without overfitting for

models with large number of parameters Examples from sample surveys and public

health include: Small-area estimation, longitudinal data

analysis, and multilevel regression

11

What are BUGS and R? Bugs

Fit Bayesian statistical models No programming required

R Open source language for statistical computing and

graphics Bugs from R

Offers flexibility in data manipulation before the analysis and display of inferences after

Avoids tedious issues of working with Bugs directly

[Open R: schools example]

12

Begin 8-schools example

13

Open R: schools example Open R:

setwd (“c:/bugs.course/schools”) Open RWinEdt

Open file “schools/schools.R” Open file “schools.bug” Click on Window, Tile Horizontally

Copy the commands from schools.R and paste them into the R window

14

8 schools example Motivates hierarchical (multilevel) modeling Easy to do using Bugs Background: educational testing

experiments Classical analyses (load data into R)

No pooling Complete pooling

Hierarchical analysis using Bugs Talk through the 8-schools model Fit the model from R Look at the inferences, compare to classical

15

Playing with the 8 schools

Try it with different n.chains, n.iter Different starting values Changing the data values y Changing the data variances

sigma.y Changing J=#schools Changing the model

16

End 8-schools example

17

A brief prehistory of Bayesian data analysis Bayes (1763)

Links statistics to probability Laplace (1800)

Normal distribution Many applications, including census [sampling models]

Gauss (1800) Least squares Applications to astronomy [measurement error models]

Keynes, von Neumann, Savage (1920’s-1950’s) Link Bayesian statistics to decision theory

Applied statisticians (1950’s-1970’s) Hierarchical linear models Applications to animal breeding, education [data in groups]

18

A brief history of Bayesian data analysis, Bugs, and R “Empirical Bayes” (1950’s-1970’s)

Estimate prior distributions from data Hierarchical Bayes (from 1970)

Include hyperparameters as part of the full model Markov chain simulation (from 1940’s [physics] and 1980’s

[statistics]) Computation with general probability models Iterative algorithms that give posterior simulations (not point

estimates) Bugs (from 1994)

Bayesian inference Using Gibbs Sampling (David Spiegelhalter et al.) Can fit general models with modular structure

S (from 1980’s) Statistical language with modeling, graphics, and programming S-Plus (commercial version) R (open-source) lme() and lmer() functions by Doug Bates for fitting hierarchical linear

and generalized linear models

19

My own experiences with Bayes, Bugs, and R Bayesian methods really work,

especially for models with lots of parameters

Use R for data manipulations and various preprogrammed models

Use Bugs (from R) as first try in fitting complex models

Use R to make graphs to check that fitted model makes sense

When Bugs doesn’t work, program in R directly

20

Bayesian regression using Bugs

21

R and Bugs for classical inference

Radon example Fitting a linear regression in Bugs Displaying the results in R Complete-pooling and no-pooling

models Model extensions[open R: radon example]

22

Begin radon example

23

Open R: radon example In RWinEdt

Open file “radon/radon_setup.R” We’ll talk through the code

In R: setwd (“c:/bugs.course/radon”) source (“radon_setup.R”) Regression output will appear on the

R console and graphics windows

24

Fitting a linear regression in R and Bugs

The Bugs model Setting up data and initial values in

R Running Bugs and checking

convergence Displaying the fitted line and

residuals in R

25

Radon example in R(complete pooling) In RWinEdt

Open file “radon/radon.classical1.R” In other window, open file

“radon.classical.bug” Copy the commands (one paragraph at

a time) from radon.classical1.R and paste them into the R window Fit the model Plot the data and estimates Plot the residuals (two versions) Label the plot

26

Simple extensions of the radon model

Separate variances for houses with and without basements

t instead of normal errors Fitting each model, then

understanding it

27

Radon regression with 2 variance parameters

Separate variances for houses with and without basements Bugs model Setting it up in R Using the posterior inferences to

compare the two variance parameters

28

Radon example in R (regression with 2 variance parameters)

In RWinEdt Open file “radon/radon.extend.1.R” Other window, open file

“radon.classical.2vars.bug” Copy from radon.extend.1.R and paste

into the R window Fit the model Display a posterior inference Compute a posterior probability STOP

29

Robust regression for radon data

t instead of normal errors Issues with the degrees-of-freedom

parameter Looking at the posterior simulations

and comparing to the prior distribution

Interpreting the scale parameter

30

Radon example in R (robust regression using the t model) In RWinEdt

Stay with “radon/radon.extend.1.R” Other window:

“radon.classical.t.bug” Copy rest of radon.extend.1.R and

paste into the R window Fit the t model Run again with n.iter=1000 iterations Make some plots

31

Fit your own model to the radon data

Alter the Bugs model Change the setup in R Run the model and check

convergence Display the posterior inferences in

R[Suggestions of alternative models?]

32

Fitting several regressions in R and Bugs Back to the radon example

Complete pooling [we just did] No pooling [do now: see next slide]

Bugs model is unchanged Fit separate model in each county and

save them as a list Displaying data and fitted lines from 11

counties

Next step: hierarchical regression

33

Radon example in R (no pooling) In RWinEdt

Open file “radon/radon.classical2.R” In other window, open file

“radon.classical.bug” Copy from radon.classical2.R and paste

into the R window Fit the model (looping through n.county=11

counties) Plot the data and estimates Plot the residuals (two versions) Label the plot

34

Hierarchical linear regression: models

[Show the models on blackboard] Generalizing the Bugs model

Varying intercepts Varying intercepts, varying slopes Adding predictors at the group level

Also write each model using algebra More than one way to write (or

program) each model

35

Hierarchical linear regression: fitting and understanding

[working on blackboard] Displaying results in R Comparing models

Interpreting coefficients and variance parameters

Going beyond R-squared

36

Hierarchical linear regression: varying intercepts Example: county-level variation for radon Recall classical models

Complete pooling No pooling (separate regressions)

Including county effects hierarchically Bugs model Written version More than one way to write the model

Fitting and understanding Interpreting the parameters Displaying results in R

37

Radon example in Bugs and R (varying intercepts) In RWinEdt

“radon/radon.multilevel.1.R” Other window: “radon.multilevel.1.bug”

Copy from radon.multilevel.1.R and paste into the R window Fit the model (debug=TRUE) Close the Bugs window Run for 100, then 1000 iterations Plot the data and estimates for 11 counties Plot all 11 regression lines at once

38

Fitting the radon model using lmer()

lmer(): an R function for multilevel (hierarchical) regression and glm

Not as flexible as Bugs Has problems when group-level

variance parameters are near zero (as in 8-schools example)

But . . . is easy to use and pretty fast (Similar routines available in Stata

and SAS)

39

Radon example using lmer() (varying intercepts) In RWinEdt

“radon/radon.multilevel.lmer.1.R” Copy from radon.multilevel.lmer.1.R and

paste into the R window Fit the model using lmer() Display the results Pull out the intercepts and slopes Plot the data and estimates for 11 counties Get posterior simulations and compare to Bugs

40

Hierarchical linear regression: varying intercepts, varying slopes

The model Bugs model Written version

Setup in R Running Bugs, checking

convergence Displaying the fitted model in R Understanding the new parameters

41

Radon example in R (varying intercepts and slopes)

In RWinEdt “radon/radon.multilevel.2.R” Other window: “radon.multilevel.2.bug”

Copy from radon.multilevel.2.R and paste into the R window Fit the model (debug=TRUE) Close the Bugs window Run for 500 iterations Plot the data and estimates for 11 counties Plot all 11 regression lines at once Plot slopes vs. intercepts

42

Radon example using lmer() (varying intercepts and slopes) In RWinEdt

“radon/radon.multilevel.lmer.2.R” Copy from radon.multilevel.lmer.2.R and

paste into the R window Fit the model using lmer() Display the results Play around with centering and rescaling of x Pull out the intercepts and slopes Plot the data and estimates for 11 counties Get posterior simulations and compare to

Bugs

43

Structure of lmer() Data are grouped, can have nonnested

groups (see book for examples) Coefficients can be “fixed” (not varying

by group) or “random” (varying by group) lmer (y ~ fixed + (random | group)) Also glm version: lmer (y ~ fixed + (random | group), family=binomial (link=“logistic”))

44

Simple lmer() examples From R window

?mcsamp Scroll to the bottom of the help window

and paste the example into the R window Simulate some fake multilevel data Fit and display a varying-intercept model Varying-intercept, varying-slope model Logistic model: varying intercept Logistic: varying-intercept, varying-slope Non-nested varying-intercept, varying-slope

model

45

Playing with hierarchical modeling for the radon example

Uranium level as a county-level predictor

Changing the sample sizes Perturbing the data Fitting nonlinear models

46

Radon example in R (adding a county-level predictor)

In RWinEdt radon/radon.multilevel.3.R Other window: radon.multilevel.3.bug

Copy from radon.multilevel.3.R and paste into the R window Fit the model Plot the data and estimates for 11 counties Plot all 11 regression lines at once Plot county parameters vs. county uranium

level

47

Varying intercepts and slopes with correlated group-level errors

The model Statistical notation Bugs model

Show on blackboard Correlation and group-level

predictors More in chapters 13 and 17 of

Gelman and Hill (2006)

48

End radon example

49

Bayesian data analysis: what it is and what it is not

50

Bayesian data analysis: what it is and what it is not Popular view of Bayesian statistics

Subjective probability Elicited prior distributions

Bayesian data analysis as we do it Hierarchical modeling Many applications

Conceptual framework Fit a probability model to data Check fit, ride the model as far as it will

take you Improve/expand/extend model

51

Overview of Bayesian data analysis Decision analysis for home radon Where did the “prior distribution” come

from? Quotes illustrating misconceptions of

Bayesian inference State-level opinions from national polls

(hierarchical modeling and validation) Serial dilution assay

(handling uncertainty in a nonlinear model) Simulation-based model checking Some open problems in BDA

52

Begin radon decision example

53

Decision analysis for home radon Radon gas

Causes 15,000 lung cancers per year in U.S. Comes from underground; home exposure

Radon webpage http://www.stat.columbia.edu/~radon/ Click for recommended decision

Bayesian inference Prior + data = posterior Where did the prior distribution come from?

54

Prior distribution for your home’s radon level Example of Bayesian data analysis Radon model

theta_i = log of radon level in house i in county j(i) linear regression model:

theta_i = a_j(i) + b_1*base_i + b_2*air_i + … + e_i linear regression model for the county levels a_j,

given geology and uranium level in the county, with county-level errors

Data model y_i = log of radon measurement in house i y_i = theta_i + Bias + error_i Bias depends on the measurement protocol error_i is not the same as e_i in radon model above

55

Radon data sources National radon survey

Accurate unbiased data—but sparse 5000 homes in 125 counties

State radon surveys Noisy biased data, but dense 80,000 homes in 3000 counties

Other info House level (basement status, etc.) County level (geologic type, uranium level)

56

Bayesian inference for home radon Set up and compute model

3000 + 19 + 50 parameters Inference using iterative simulation (Gibbs sampler)

Inference for quantities of interest Uncertainty dist for any particular house

(use as prior dist in the webpage) County-level estimates and national averages Potential $7 billion savings

Model checking Do inferences make sense? Compare replicated to actual data, cross-validation Dispersed model validation (“beta-testing”)

57

Bayesian inference for home radon

Allows estimation of over 3000 parameters

Summarizes uncertainties using probability

Combines data sources Model is testable (falsifiable)

58

End radon decision example

59

Pro-Bayesian quotes Hox (2002):

“In classical statistics, the population parameter has only one specific value, only we happen not to know it. In Bayesian statistics, we consider a probability distribution of possible values for the unknown population distribution.”

Somebody’s webpage:“To a true subjective Bayesian statistician, the prior distribution represents the degree of belief that the statistician or client has in the value of the unknown parameter . . . it is the responsibility of the statistician to elicit the true beliefs of the client.”

60

Why these views of Bayesian statistics are wrong!

Hox quote (distribution of parameter values) Our response: parameter values are “fixed

but unknown” in Bayesian inference also! Confusion between quantities of interest

and inferential summaries Anonymous quote

Our response: the statistical model is an assumption to be held if useful

Confusion between statistical analysis and decision making

61

Anti-Bayesian quotes Efron (1986):

“Bayesian theory requires a great deal of thought about the given situation to apply sensibly.”

Ehrenberg (1986):“Bayesianism assumes: (a) Either a weak or uniform prior, in which case why bother?, (b) Or a strong prior, in which case why collect new data?, (c) Or more realistically, something in between, in which case Bayesianism always seems to duck the issue.”

62

Why these views of Bayesian statistics are wrong!

Efron quote (difficulty of Bayes) Our response: demonstration that Bayes solves

many problems more easily that other methods Mistaken focus on the simplest problems

Ehrenberg quote (arbitrariness of prior dist) Our response: the “prior dist” represents the

information provided by a group-level analysis One “prior dist” serves many analyses

63

Begin pre-election polling example

64

State-level opinions from national polls Goal: state-level opinions

State polls: infrequent and low quality National polls: N is too small for individual states Also must adjust for survey nonresponse

Try solving a harder problem Estimate opinion in each of 3264 categories:

51 states 64 demographic categories (sex, ethnicity, age,

education) Logistic regression model Sum over 64 categories within each state Validate using pre-election polls

65

State-level opinions from national polls

Logistic regression model y_i = 1 if person i supports Bush, 0 otherwise logit (Pr(y_i=1)) = linear predictor given demographics

and state of person i Hierarchical model for 64 demographic predictors and

51 state predictors (including region and previous Republican vote share in the state)

State polls could be included also if we want Sum over 64 categories within each state

“Post-stratification” In state s, the estimated proportion of Bush supporters

is sum(j=1 to 64) N_j Pr(y=1 in category j) / sum(j=1 to 64) N_j

Also simple to adjust for turnout of likely voters

66

Compare to “no pooling” and “complete pooling” “No pooling”

Separate estimate within each state Treat the survey as 49 state polls Expect “overfitting”: too many parameters

“Complete pooling” Include demographics only Give up on estimating 51 state parameters

Competition Use pre-election polls and compare to election

outcome Estimated Bush support in U.S. Estimates in individual states

67

Election poll analysis and Bayesian inference

Where was the “prior distribution”? In logistic regression model, 51 state

effects, a_k a_k = b*presvote_k + c_region(k) + e_k Errors e_k have Normal (0, sigma^2)

distribution; sigma estimated from data Where was the “subjectivity”?

68

Election poll analysis: validation

No Complete pooling pooling

BayesAvg std errors ofstate estimates 5.1% 0.9% 3.1%

Avg of actual absolutestate errors 5.1% 5.9% 3.2%

70

End pre-election polling example

71

Begin assays example

72

Serial dilution assays

73

Dots are data, curves are fitted

74

Serial dilution assays: motivation for Bayesian inference Classical approach: read the estimate off the

calibration curve Difficulties

“above detection limit”: curve is too flat “below detection limit”: signal/noise ratio is too low For some samples, all the data are above or below

detection limits! Goal: downweight—but don’t discard—weak

data Maximum likelihood (weighted least squares) Bayes handles uncertainty in the parameters

of the calibration curve

75

Serial dilution: validation

76

End assays example

77

Summary Bayesian data analysis is about modeling Not an optimization problem; no “loss

function” Make a (necessarily) false set of assumptions,

then work them hard Complex models --> complex inferences -->

lots of places to check model fit Prior distributions are usually not “subjective”

and do not represent “belief” Models are applied repeatedly (“beta-testing”)

78

Model checking

79

Model checking Basic idea:

Display observed data (always a good idea anyway)

Simulate several replicated datasets from the estimated model

Display the replicated datasets and compare to the observed data

Comparison can be graphical or numerical Generalization of classical methods:

Hypothesis testing Exploratory data analysis

Crucial “safety valve” in Bayesian data analysis

80

Model checking: simple example

A normal distribution is fit to the following data:

81

20 replicated datasets under the model

82

Comparison using a numerical test statistic

83

Model checking: another example

Logistic regression model fit to data from psychology: a 15 x 23 array of responses for each of 6 persons

Next slide shows observed data (at left) and 7 replicated datasets (at right)

84

Observed and replicated datasets

85

Another example: checking the fit of prior distributions

Curves show priors for 2 sets of parameters; histograms show estimates for each set

86

Improve the model, try again! Curves show new priors; histograms

show new parameter estimates

87

Model checking and model comparison Generalizing classical methods

t tests chi-squared tests F-tests R2, deviance, AIC

Use estimation rather than testing where possible

Posterior predictive checks of model fit DIC for predictive model comparison

88

Model checking: posterior predictive tests Test statistic T(y) Replicated datasets y.repk, k=1,

…,n.sim Compare T(y) to the posterior

predictive distribution of T(y.repk) Discrepancy measure T(y,thetak)

Look at n.sim values of the difference, T(y,thetak) - T(y.repk,thetak)

Compare this distribution to 0

89

Model comparison: DIC (deviance information criterion) Generalization of “deviance” in classical

GLM p_D is effective number of parameters

8-schools example Radon example

DIC is estimated error of out-of-sample predictions

DIC = posterior mean of deviance + p_D Radon example

90

More on multilevel modeling

91

Including linear predictors in a hierarchical model Radon example

Airflow as a data-level predictor Avg airflow as a county-level predictor Uranium as a county-level predictor

How can we include group indicators and a group-level predictor? Example from Advanced Placement

study[blackboard]

92

Hierarchical generalized linear models Poisson

Example: Police stops Overdispersed

Example: Police stops Logistic

Example: pre-election polls Other details in this example

Other issues Multinomial logit, robust model

93

Begin stop-and-frisk example

94

Police stop/frisks: example of Poisson regression Background on NYPD stops and frisks Data on stops for violent crimes

#stops, population, #previous arrests Blacks (African-Americans), Hispanics

(Latinos), Whites (European-Americans) Confidentiality: you’re getting fake data

In R, compute averages for each group Variation among precincts? Poisson regression (count data)[open R: frisk example]

95

Open R: frisk example In R:

setwd (“c:/bugs.course/frisk”) In RWinEdt

Open file “frisk/frisk.0.R” Other window: “frisk.0.bug”

Copy from frisk.0.R and paste into the R window Read in and look at the data Work with a subset of the data Pre-process Fit the classical Poisson regression in Bugs Fit using “glm” in R, check that results are approx the same

96

Hierarchical Poisson regression for police stops Poisson regression

Previous arrests as offset Logarithmic link Include precinct predictors

Set up model in Bugs Shift batches of parameters to sum to zero

Compute the model Slow convergence Compare to aggregate results

97

Frisk example in R (multilevel Poisson regression) In RWinEdt

Open file “frisk/frisk.1.R” Other window: “frisk.1.bug” Discuss the parts of the model

Copy from frisk.1.R and paste into the R window Fit the model Compare to classical Poisson regression

98

Shifting of multilevel regression coefficients

Frisk example: 3 ethnicity coefficients 75 precinct coefficients

Subtract mean of each batch See Bugs model “frisk1.dat” Define new variables g.eth, g.precinct Add the means back to g.0

99

Overdispersed hierarchical Poisson regression Use simulations from previous model to test

for overdispersion Chi-squared discrepancy measure Posterior predictive check in R compare observed discrepancy to simulations from

replicated data Poisson regression with offset and log link Set up model in Bugs Faster convergence! Display the results in R Compare to non-overdispersed model

100

Frisk example in R (overdispersed model) In RWinEdt

Open file “frisk/frisk.2.R” Other window: “frisk.2.bug” New variance component for overdispersion

Copy from frisk.2.R and paste into the R window Fit the model Compare to non-overdispersed model:

Coefficient estimates Overdispersion parameter sigma.y

101

Playing with the frisk example

Using %black as a precinct-level predictor

Looking at other crime types Including previous arrests as a

predictor rather than as an offset

102

2-stage regression for the frisk example #previous arrests is a noisy measure of

crime Model 1 of stops given “crime rate” Model 2 of arrests given “crime rate” Each of the 2 models is multilevel,

overdisp. “Crime rate” is implicitly estimated by

the 2 models simultaneously Set up Bugs model, fit in R

103

Frisk example in R (2-stage model) In RWinEdt

Open file “frisk/frisk.3.R” Other window: “frisk.3.bug” Entire model is duplicated: once for stops,

once for DCJS arrests Copy from frisk.3.R and paste into the R

window Fit the model Compare to previous models

104

End stop-and-frisk example

105

Computation

106

Understanding the computational method used by Bugs Key advantages

Gives posterior simulations, not a point estimate

Works with (almost) any model Connections to earlier methods

Maximum likelihood Normal approximation for std errors Multiple imputation

Gibbs sampler Metropolis algorithm Difficulties

107

Understanding computation: relation to earlier methods

Maximum likelihood Hill-climbing algorithm

+/- std errors Normal approximation

Multiple imputation Consider parameters as “missing data”

[More on blackboard]

108

Understanding computation: relation to earlier methods

Bayes’ theorem

)|()()|(

)(

)|()(

)(

),()|(

)|()(),(

yppyp

yp

ypp

yp

ypyp

yppyp

}

Prior (population) distribution

Sampling (data) distribution}

109

Understanding the Gibbs sampler and Metropolis algorithm

Examples of Bugs chains: Run mcmc.display.R

Interpretation as random walks (generalizing maximum likelihood)

Interpretation as iterative imputations (generalizing multiple imputation)

[open R: mcmc displays]

110

Open R: mcmc displays Open R:

setwd (“c:/bugs.course/schools”) Open RWinEdt

Open file “schools/mcmc.display.R” Copy from mcmc.display.R and paste

into the R window Read in the function sim.display() Apply it to the schools example In a few minutes, apply it to a previous run

of the radon model

111


Gibbs sampler as iterative imputations Demonstration on the blackboard for

the 8-schools example Can program directly in R [Appendix C

of Bayesian Data Analysis, 2nd edition] Radon example (graph in R) Conditional updating and the modular

structure of Bugs models

112


Monitoring convergence Pictures of good and bad convergence

[blackboard] n.chains must be at least 2 Role of starting points R-hat

Less than 1.1 is good Effective sample size

At least 100 is good

113


More efficient Gibbs samplers More efficient Metropolis jumping

114

Special lecture on computation [social networks example]

115

Troubleshooting

116

Troubleshooting

Problems in Bugs Simple Bugs and R solutions Reparameterizing Changing the model slightly Stripping the model down and

rebuilding it Expanding the model

117

Troubleshooting: problems and quick solutions Problem: model does not compile Solutions:

bugs (…, debug=TRUE) Look at where cursor stops in Bugs model Typical problems:

Typo, syntax error, illegal use of a distribution or function Go into Bugs user manual

In Bugs, click on Help, then User manual Check Model Specification and Distributions

Go into Bugs examples In Bugs, click on Help, then Examples Vol I and II Find an example similar to your problem

118

Troubleshooting: problems and quick solutions Problem: model does not run Solutions:

bugs (…, debug=TRUE) Look at where cursor stops in Bugs Typical problems:

Variable in data/inits/params but not in model Variable in model but not in data or params Error in subscripting

May need to fix in Bugs model or R script

119

Troubleshooting: problems and quick solutions

Problem: Trapping in Bugs Solutions:

Give reasonable starting values for all parameters in the model

Make prior distributions more restrictive e.g., dnorm(0, .01) instead of

dnorm(0, .0001) Change the model slightly

We will discuss some examples

120

Troubleshooting: slow convergence Problem: R-hat is still more than 1.1 for

some parameters Solutions:

Look at the separate sequences to see what is moving slowly

Run Bugs a little longer, possibly using last.value as initial values

Reparameterize Change the model slightly

Run model on a subset of the data (it’ll go faster)

121

Troubleshooting: reparameterizing Nesting predictors Adding redundant additive parameters Adding redundant multiplicative

parameters Rescaling

Example: item response models (hierarchical logistic regressions) for test scores or court votes

There is no unique way to write a model

122

Troubleshooting: changing the model slightly

Normal/t, logit/probit, etc. Bounds on predictions

Logistic regression example Use of robust models Approximations to functions

Golf putting

[open R: logistic regression example]

123

Begin logistic regression example

124

Open R: logistic regression example Open R

setwd (“c:/bugs.course/logit”) Open RWinEdt

Open file “logit/logit.R” Other window: “logit.bug”

Copy from logit.R and paste into the R window Simulate fake dataset and display it Fit the logistic regression model in R, then

Bugs Compare the classical and Bayes estimates Superimpose fitted models on data

125

Logistic regression in R and Bugs:latent-variable parameterization

In RWinEdt Continue looking at “logit.R” Other window: “logit.latent.bug”

Copy from logit.R and paste into R Fit logistic regression in Bugs using the

latent-variable parameterization Compare the Bugs models, “logit.bug” and

“logit.latent.bug” Create an “outlier” in the data and fit

logistic regression again

126

Robust logistic regression in Bugs using the t distribution In RWinEdt

Continue looking at “logit.R” Other window: “logit.t.bug”, then

“logit.t.df.bug”, then “logit.t.df.adj.bug” Copy from logit.R and paste into R

Fit latent-variable model with t4 distribution instead of logistic

Fit latent-variable model with tdf distribution (df estimated from data)

Fix model to adjust for variance of t distribution

127

End logistic regression example

128

Troubleshooting: stripping the model down and rebuilding it

Remove interactions, predictors, hierarchical structures

Set hyperparameters to fixed values

Assign strong prior distributions to hyperparameters

129

Troubleshooting: expanding the model Computational problems often indicate

statistical problems Adding a variance component

Recall Poisson model for frisk data before and after we put in overdispersion

Adding a hierarchical structure Example from analysis of serial dilution data

Adding a predictor Example: including previous election

results in an analysis of state-level public opinion

130

Accounting for design of surveys, experiments, etc.

131

Relevance of data collection to statistical analysis General principle

Include all design info in the analysis Poststratification

State-level opinions from national polls

Other topics Experiments Cluster sampling Censored data

132

Begin pre-election polling example

133

State-level opinions from national polls Background on pre-election polls

Political science Survey methods Survey weighting

Classical analysis of a pre-election poll from 1988

Hierarchical models[Open R: election88 example]

134

Open R: election88 example Open RWinEdt

Open file “election88/election88_setup.R” In R:

setwd (“c:/bugs.course/election88”) source (“election88_setup.R”) Data are being loaded in and cleaned

135

Pre-election polls: demographic models Demographic models

Politically interesting Adjust for nonresponse bias

Bugs model Run from R, check convergence Get predictions and sum over states Compare to classical estimates Play with the model yourself

136

Election88 example in R(demographics only) Open RWinEdt

Open file “election88/demographics.R” Other window: “demographics.1.bug”, then

“demographics.2.bug” Copy from demographics.R and paste into R

window Set up and run Bugs model with debug=TRUE Close Bugs window Run Bugs for 100 iterations Run Bugs using model 2 Compute predicted values for each survey respondent Use survey weights to get a weighted avg for each state Compare to raw data and election outcomes

137

Pre-election polls: state models Hierarchical regressions for states

Like the 8-schools model but with binomial data model

Bugs model Run from R, check convergence Get predictions and sum over states

Model with states within regions Fit nested model, summarize results

Compare all the models fit so far

138

Election88 example in R(states only) Open RWinEdt

Open file “election88/states.R” Other window: “states.bug”

Copy from states.R and paste into R window Set up and run Bugs model Compute estimate for each state Compare to raw data, demographic

estimates, and election outcomes

139

Pre-election polls: models with demographics and states Crossed multilevel model with

demographic and state coefficients Set up model in Bugs Alternative parameterizations Compare all the models fit so far

We’ve used the computational strategy of “subsetting” or “data sampling”

140

Election88 example in R(demographics and states) Open RWinEdt

Open file “election88/all.1.R” Other window: “all.1.bug”

Copy from all.1.R and paste into the R window Set up and run Bugs model Read in Census data (demographics x state) Compute estimate for each state Compare to raw data, demographic

estimates, state estimates, and election outcomes

141

Playing with the pre-election polls

Including presvote as a state-level predictor

Putting Maryland in the northeast Other model alterations?

142

End pre-election polling example

143

Other design and data collection topics Cluster sampling

Similar to poststratification, but some clusters are empty (as with Alaska and Hawaii in the pre-election polls)

Experiments and observational studies Model should be conditional on all design

factors Blocking can be treated like clustering

Censored data Include in the likelihood (see chapter 18 of

Gelman and Hill, 2006)

144

Prior distributions

145

Prior distributions

Two traditional extremes: Noninformative priors Subjective priors

Problems with each approach New idea: weakly informative priors Illustration with a logistic regression

example

146

Logistic regression for voting Using National Election Study: For each year, predict vote choice

from sex, race, income Estimates can be unstable Bayes to the rescue Where does the prior distribution

come from? It really works!

147

Begin logistic-regression separation example

148

Logistic regression for voting

149

Separation in logit regression

In 1964, no blacks in the sample were voting for Goldwater

Maximum likelihood estimate is –infinity

It’s a sparse-data problem

150

Weakly informative priors for logistic regression coefficients Some prior info: logistic regression

coefs are almost always between -5 and 5: 5 on the logit scale takes you from .01

to .50 or from .50 to .99 Smoking and lung cancer

Independent Cauchy prior dists with center 0 and scale 2.5

Rescale each predictor so that a change of 1 unit is meaningful

bayesglm() function in R

151

Regularization in action!

152

End logistic-regression separation example

153

Weakly informative priors Scale predictors so that coefficients are

almost always less than 5 (and typically less than 1)

Now we know why dnorm (0, .0001) prior distributions in Bugs (that is, normal(0,1002)) were OK

Actually, dnorm (0, .01) (that is, normal(0,102)) is probably better

How much prior information to include???

154

Varying-intercept, varying-slope logistic regression

Red/blue states example [if there is time]

155

Commencement

156

Loose ends Slow convergence More complicated models Multivariate models, including imputation Huge datasets Suggested improvements in Bugs Indirect nature of Bugs modeling Programming in R [see Appendix C from

“Bayesian Data Analysis,” 2nd edition]

157

Some open problems in Bayesian data analysis

Complex, highly structured models (polling example) interactions between states and

demographic predictors, modeling changes over time Need reasonable classes of hierarchical models (with

“just enough” structure) Computation

Algorithms walk through high-dimensional parameter space

Key idea: link to computations of simpler models (“multiscale computation”, “simulated tempering”)

Model checking Automatic graphical displays Estimation of out-of-sample predictive error

158

Concluding discussion

What should you be able to do? Set up hierarchical models in Bugs Fit them and display/understand the

results using R Compare to estimates from simpler

models Use Bugs flexibly to explore models

What questions do you have?

159

Software resources Bugs

User manual (in Help menu) Examples volume 1 and 2 (in Help menu) Webpage (http://www.mrc-bsu.cam.ac.uk/bugs) has

pointers to many more examples and applications R

?command for quick help from the console Html help (in Help menu) has search function Complete manuals (in Help menu) Webpage (http://www.r-project.org) has pointers to more

Appendix C from “Bayesian Data Analysis,” 2nd edition, has more examples of Bugs and R programming for the 8-schools example

“Data Analysis Using Regression and Multilevel/Hierarchical Models” has lots of examples of Bugs and R.

160

References

General books on Bayesian data analysisBayesian Data Analysis, 2nd ed., Gelman, Carlin, Stern, Rubin

(2004)Bayes and Empirical Bayes Methods for Data Analysis, Carlin

and Louis (2000)

General books on multilevel modelingData Analysis Using Multilevel/Hierarchical Models, Gelman

and Hill (2007)Hierarchical Linear Models, Bryk and Raudenbush (2001)Multilevel Analysis, Snijders and Bosker (1999)

Books on RAn R and S Plus Companion to Applied Regression, Fox (2002)An Introduction to R, Venables and Smith (2002)

161

Collaborators

The examples were done in collaboration with Donald Rubin, Phillip Price, Jeffrey Fagan, Alex Kiss, Thomas Little, David Park, Joseph Bafumi, Boris Shor, Howard Wainer, Zaiying Huang, Ginger Chew, Aleks Jakulin, Masanao Yajima, Yu-Sung Su, Matt Salganik, and Tian Zheng

1 Bayesian data analysis 1 using Bugs 2 and R 3 1 Fit complicated models 2 clearly 3 display and...

Documents

Transcript of 1 Bayesian data analysis 1 using Bugs 2 and R 3 1 Fit complicated models 2 clearly 3 display and...