Mixed Models - WordPress.com · Mixed Models See the book Mixed Effects Models and Extensions in...

Mixed Models

See the book Mixed Effects Models and Extensions in Ecology by A.F.

Zuur et al.

Steps in Analyzing Data

• Data Exploration – Examine your data for outliers

• Extreme outliers can sometimes be removed from the analysis

• A transformation may reduce the impact of outliers

– Test for collinearity • Plot explanatory variables against each other • Calculate correlation coefficients • If two explanatory variables are extremely highly correlated

(r > 0.8 or so), then you may want to remove one • In general, don’t use an explanatory variable that is a

combination of other explanatory variables (e.g., tail length, snout-vent length and total length)

Develop a Modeling Philosophy

• Seven possibilities according to Zuur et al.: – Start with a model with no interactions – if there are patterns in the

residuals, investigate why and add interaction terms to improve the model fit

– Use biological knowledge to choose interaction terms to include – Apply data exploration to see which interactions might be important – Identify the explanatory variables of most interest and include the

interaction terms for these variables – Only include the main terms and two-way interaction terms – Only include higher order interactions (three-way and higher) if you

have a good reason – Include all interactions by default *if you include interaction terms, you must include the main terms

Model Selection

• Not all explanatory variables and interactions will be significant. What do I do with the non-significant ones? – Keep them all (Whitlock and Schluter; good for simple

models) – Drop them one by one based on hypothesis testing

procedures (drop the least significant term, use anova to compare models)

– Drop them one by one and use a model selection criterion like AIC or BIC to choose the best model

– Specify a priori chosen models and compare these models with each other

AIC and BIC

• Aikake Information Criterion (AIC)

• Bayesian Information Criterion (BIC)

• These techniques are aimed at choosing the best model, even when models vary with respect to the number of parameters

• The AIC is more widely used, and we will use it exclusively in this course

• There are alternatives to the AIC, one of which is BIC but there are many others

Aikake Information Criterion (AIC)

• Derived from information theory

• AIC = -2*log(L) + 2K

• L is the likelihood, which equals the probability of the data given the model – this term will be related to how well the model fits

• K is the number of parameters

• AIC is a log likelihood penalized for the number of parameters (because adding parameters allows a better fit, all else being equal)

• AIC is useful for comparing two different models, and the model with the smallest AIC is preferred

Model Validation

• After you choose and fit your model, check that it fits correctly. For a linear regression: – Plot residuals against fitted values to assess homogeneity

– Examine the histogram of residuals to check for normality

– Plot residuals against each explanatory variable – there should be no obvious patterns

– Plot residuals against explanatory variables you did not include in the model – if you see a pattern, then you may want to consider including this variable

– Look for unduly influential data points (outliers) and see how much they are influencing the results (by running the model with and without them)

Some Rules of Thumb Moving into Generalized Models

• Generalized models are designed to overcome some of the shortcomings of linear models – if your data do not have these shortcomings, you can stick with linear models

• Always try to use the simplest model that adequately fits the data in light of the biological question – more complex models will become more and more difficult to interpret

• A problem may have more than one “correct” solution

• We will be introduced to these models, but we will not be able cover every possible approach in great detail

Problem – My Data Are Not Linear

1000 2000 3000 4000 5000

010

20

30

40

50

Depth

Bio

lum

inescence

Generalized Additive Models

• Linear Model:

• Generalized Additive Model:

iii XY

iii XfY )(

f(Xi) is a smoothing curve estimated by a LOESS (local regression) smoother or splines (piecewise polynomial functions), depending on the package

Fitting a GAM

1000 2000 3000 4000 5000

010

20

30

40

50

Depth

Bio

lum

inescence

1000 2000 3000 4000 5000

-10

010

20

30

40

Depth

Bio

lum

inescence

1000 2000 3000 4000 5000

010

20

30

40

50

Depth

Bio

lum

inescence

Code to Fit GAM setwd("~/Rexamples/Week11") ISIT <- read.csv("ISIT.csv") op <- par(mfrow = c(2,2), mar=c(5,4,1,2)) Sources16 <- ISIT$Sources[ISIT$Station == 16] Depth16 <- ISIT$SampleDepth[ISIT$Station == 16] plot(Depth16, Sources16, type="p", xlab="Depth", ylab="Bioluminescence") library(mgcv) M3 <- gam(Sources16 ~ s(Depth16, fx=FALSE, k=-1, bs="cr")) plot(M3, se=TRUE, xlab="Depth", ylab="Bioluminescence")

#s() means to use a smoother #fx=FALSE, k=-1 tells it to use cross-validation to determine the amount of smoothing #bs="cr" tells it to use a cubic regression spline

GAM graphs

1000 2000 3000 4000 5000

010

20

30

40

50

Depth

Bio

lum

inescence

1000 2000 3000 4000 5000

-10

010

20

30

40

Depth

Bio

lum

inescence

1000 2000 3000 4000 5000

010

20

30

40

50

Depth

Bio

lum

inescence

Graph with Error Bars 1000 2000 3000 4000 5000

010

20

30

40

50

Depth

Bio

lum

inescence

1000 2000 3000 4000 5000

-10

010

20

30

40

Depth

Bio

lum

inescence

1000 2000 3000 4000 5000

010

20

30

40

50

Depth

Bio

lum

inescence

M3pred <- predict(M3, se=TRUE, type="response") plot(Depth16,Sources16,type="p", xlab="Depth", ylab="Bioluminescence") I1 <- order(Depth16) lines(Depth16[I1], M3pred$fit[I1], lty=1) lines(Depth16[I1], M3pred$fit[I1]+2*M3pred$se[I1],lty=2) lines(Depth16[I1], M3pred$fit[I1]-2*M3pred$se[I1],lty=2)

predict() produces predicted values from the model

I1 <- order() is used just to order the observations from smallest to largest so the line doesn’t zig-zag all over

From the predictions: fit produces the expected y for a given x, se produces the standard error for a given x. Here 2*se is used as an approximate 95% CI

Using GAM for Hypothesis Testing

• Multiple smoothers can be included in the same model

• Hybrid models, with smoothers and linear or categorical explanatory variables can also be included

Hybrid GAM Example

• Example: bioluminescence data from two different locations

• Data: A measure of bioluminescence as a function of depth from each of two places

• The null hypothesis is that the relationship between depth and bioluminescence is the same in both places

Bioluminescence Example

500 1000 1500 2000 2500 3000

01

02

03

04

0

Station 8

Depth

So

urc

es

500 1000 1500 2000 2500 3000

01

02

03

04

0

Station 13

Depth

So

urc

es

GAM Model

• The model:

– Sourcesi = α + f(Depthi) + factor(Stationi) + εi

• Depth is fit as a smoothed function

• Station is fit as a factor

• The error is normally distributed, N(0,σ2)

GAM Code

library(mgcv) M4 <- gam(So ~ s(De) + factor(ID), subset=I1) summary(M4) anova(M4)

Using a smoother for Depth (De)

ID (which is the name of the station/location) is a factor

summary() output Family: gaussian Link function: identity Formula: So ~ s(De) + factor(ID) Parametric coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 19.198 1.054 18.207 < 2e-16 *** factor(ID)13 -12.296 1.397 -8.801 7.59e-13 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Approximate significance of smooth terms: edf Ref.df F p-value s(De) 4.849 5.904 14.77 7.08e-12 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 R-sq.(adj) = 0.695 Deviance explained = 71.9% GCV = 38.802 Scale est. = 35.259 n = 75

Factor is significant, but better to test this with anova()

Smoother is significant, and the number of df provides an indication of how much smoothing was imposed

About 72% of the variation is explained by the model

anova() output

Family: gaussian Link function: identity Formula: So ~ s(De) + factor(ID) Parametric Terms: df F p-value factor(ID) 1 77.46 7.59e-13 Approximate significance of smooth terms: edf Ref.df F p-value s(De) 4.849 5.904 14.77 7.08e-12

Here the output is the same as summary because the factor only has two levels. If the factor had more than two levels, anova() would test all levels simultaneously and give an overall p-value.

Visualize the Results

ID

De

line

ar p

redic

tor

> par(mar=c(2,2,2,2,)) > vis.gam(M4, theta=120, color=“heat”)

Note that the lines are parallel

Other Validation Steps

-15 -10 -5 0 5 10 15

-10

-50

510

theoretical quantiles

devia

nce r

esid

uals

0 5 10 15 20 25

-10

-50

510

Resids vs. linear pred.

linear predictor

resid

uals

Histogram of residuals

Residuals

Fre

quency

-15 -10 -5 0 5 10 15

05

10

15

20

25

0 5 10 15 20 25

010

20

30

40

Response vs. Fitted Values

Fitted Values

Response

> gam.check(M4)

Left column: Checks for normality of residuals Upper Right: Test for heterogeneity (heteroscedasticity) Lower Right: Ideally should be a straight line

ID

De

line

ar p

redic

tor

500 1000 1500 2000 2500 3000

01

02

03

04

0

Station 8

Depth

So

urc

es

500 1000 1500 2000 2500 3000

01

02

03

04

0

Station 13

Depth

So

urc

es

Interaction

• Add an interaction term by adding a second smoother for a subset of the data (say, only station 13)

• This second smoother will be compared with the smoother from the overall dataset

> M5 <- gam(So ~ s(De) + s(De, by = as.numeric(ID==13)) + factor(ID), subset = I1)

> anova(M5)

Interaction Family: gaussian Link function: identity Formula: So ~ s(De) + s(De, by = as.numeric(ID == 13)) + factor(ID) Parametric Terms: df F p-value factor(ID) 1 2.374 0.129 Approximate significance of smooth terms: edf Ref.df F p-value s(De) 8.073 8.608 101.88 <2e-16 s(De):as.numeric(ID == 13) 7.196 8.163 52.93 <2e-16

The interaction is significant, so the relationship between depth and bioluminescence is different between these two stations.

gam.check(M5)

-4 -2 0 2 4

-6-4

-20

24

6

theoretical quantiles

0 5 10 15 20 25 30 35

-6-4

-20

24

6

Resids vs. linear pred.

linear predictor

resid

uals

Histogram of residuals

-6 -4 -2 0 2 4 6

05

10

15

20

25

30

0 5 10 15 20 25 30 35

010

20

30

40

Response vs. Fitted Values

Response

Which Model is Better?

• Examine model validation plots from gam.check(): M5 better

• summary() shows that the second model explains 96.8% of the deviance: M5 better

• Is the interaction significant? Yes: M5 better

• Formally compare the models using AIC

• Use an F-test to compare the models (if they are simple nested models)

Obtaining and Comparing AIC

• AIC(M4)

488.56

• AIC(M5)

345.26

• The model with the interaction term has a much smaller AIC, so it’s the preferred model

Using an F-test

• > anova(M4, M5, test=“F”)

Analysis of Deviance Table Model 1: So ~ s(De) + factor(ID) Model 2: So ~ s(De) + s(De, by = as.numeric(ID == 13)) + factor(ID) Resid. Df Resid. Dev Df Deviance F Pr(>F) 1 68.151 2402.90 2 58.231 272.94 9.9198 2130 45.809 < 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Summary • GAMs can overcome a major limitation of linear models by fitting a non-

linear function to the relationship between an explanatory and response variable

• The function is fit by using a smoothing algorithm. Many such algorithms exist, and we have chosen splines as the best approach. These fit local polynomial functions and then hook them together using fancy math

• The p-value is approximate for smoothing splines, so if the p-value is in the range of 0.01-0.10, it should be interpreted with caution.

• Some of the problems for linear regression are also problems for GAMs. The most important problems are non-independence of observations, heterogeneity, and nested data (so technically, the example is not entirely appropriately analyzed here).

Mixed Models - WordPress.com · Mixed Models See the book Mixed Effects Models and Extensions in...

Documents

Transcript of Mixed Models - WordPress.com · Mixed Models See the book Mixed Effects Models and Extensions in...