An R and S-PLUS Companion to Applied...

11
STAT:5201 Applied Statistic II Random Coefficient Model (a.k.a. growth curve) Adapted from John Fox example in Appendix: Linear Mixed Models found in An R and S-PLUS Companion to Applied Regression. Exercise trend over time The data consists of the exercise history of 138 teenaged girls hospitalized for eating disor- ders, and on a group of 93 control subjects. The variables are: subject: a factor with subject id codes. age: age in years. exercise: hours per week of exercise. group: a factor indicating whether the subject is a ‘patient’ or ‘control’.. > library(car) > data(Blackmore) > head(Blackmore) subject age exercise group 1 100 8.00 2.71 patient 2 100 10.00 1.94 patient 3 100 12.00 2.36 patient 4 100 14.00 1.54 patient 5 100 15.92 8.63 patient 6 101 8.00 0.14 patient > dim(Blackmore) [1] 945 4 > length(unique(Blackmore$subject[Blackmore$group=="patient"])) [1] 138 > length(unique(Blackmore$subject[Blackmore$group=="control"])) [1] 93 Fox transformed the response variable for numerous reasons (described in text) as log 2 (y +5/60). > Blackmore$log.exercise <- log(Blackmore$exercise + 5/60, 2) > attach(Blackmore) 1

Transcript of An R and S-PLUS Companion to Applied...

Page 1: An R and S-PLUS Companion to Applied Regression.homepage.stat.uiowa.edu/~rdecook/stat5201/handouts/... · An R and S-PLUS Companion to Applied Regression. Exercise trend over time

STAT:5201 Applied Statistic II

Random Coefficient Model (a.k.a. growth curve)

Adapted from John Fox example in Appendix: Linear Mixed Models found inAn R and S-PLUS Companion to Applied Regression.

Exercise trend over time

The data consists of the exercise history of 138 teenaged girls hospitalized for eating disor-ders, and on a group of 93 control subjects. The variables are:

subject: a factor with subject id codes.

age: age in years.

exercise: hours per week of exercise.

group: a factor indicating whether the subject is a ‘patient’ or ‘control’..

> library(car)

> data(Blackmore)

> head(Blackmore)

subject age exercise group

1 100 8.00 2.71 patient

2 100 10.00 1.94 patient

3 100 12.00 2.36 patient

4 100 14.00 1.54 patient

5 100 15.92 8.63 patient

6 101 8.00 0.14 patient

> dim(Blackmore)

[1] 945 4

> length(unique(Blackmore$subject[Blackmore$group=="patient"]))

[1] 138

> length(unique(Blackmore$subject[Blackmore$group=="control"]))

[1] 93

Fox transformed the response variable for numerous reasons (described in text)as log2(y + 5/60).

> Blackmore$log.exercise <- log(Blackmore$exercise + 5/60, 2)

> attach(Blackmore)

1

Page 2: An R and S-PLUS Companion to Applied Regression.homepage.stat.uiowa.edu/~rdecook/stat5201/handouts/... · An R and S-PLUS Companion to Applied Regression. Exercise trend over time

Investigating the data with plots (in R).

Use a random sample of 20 girls from each group for trend plotting. The ‘groupedData’object from the nlme package is used to form the trellis plots.

> library(nlme)

> chosen.pat.IDs=sample(unique(subject[group=="patient"]), 20)

> chosen.pat.20=groupedData(log.exercise ~ age | subject,

data=Blackmore[is.element(subject,chosen.pat.IDs),])

> chosen.con.IDs=sample(unique(subject[group=="control"]), 20)

> chosen.con.20=groupedData(log.exercise ~ age | subject,

data=Blackmore[is.element(subject,chosen.con.IDs),])

> print(plot(chosen.con.20, main="Control Subjects",xlab="Age",ylab="log2 Exercise",

ylim=1.2*range(chosen.con.20$log.exercise, chosen.pat.20$log.exercise),

layout=c(5,4),aspect=1),position=c(0, 0, 0.5, 1), more=T)

> print(plot(chosen.pat.20, main="Patients",xlab="Age",ylab="log2 Exercise",

ylim=1.2*range(chosen.con.20$log.exercise, chosen.pat.20$log.exercise),

layout=c(5,4),aspect=1),position=c(0.5, 0, 1, 1))

Control Subjects

Age

log2

Exe

rcis

e

-4-2024

8 10 14

262 277

8 10 14

202 275

8 10 14

251

240 245 222 209

-4-2024

239-4-2024

235 228 272 210 231

242

8 10 14

257 281

8 10 14

223

-4-2024

204

Patients

Age

log2

Exe

rcis

e

-4-2024

810 14 18

151 317

810 14 18

161 338

810 14 18

331

168 171 118 189

-4-2024

306-4-2024

125 340 318 119 166

130

810 14 18

116 304

810 14 18

109

-4-2024

333

The groupedData object is automatically plotted in order by average exercise. The subjectswith the highest exercise values are in the top row, the subjects with the lowest exercisevalues are in the bottom row.

2

Page 3: An R and S-PLUS Companion to Applied Regression.homepage.stat.uiowa.edu/~rdecook/stat5201/handouts/... · An R and S-PLUS Companion to Applied Regression. Exercise trend over time

Investigating the data with plots (in SAS).

After I created subsetted data sets of patients from each group called ‘control1’ and ‘pa-tient1’ (8 subjects per group), I used the PROC SGPANEL procedure to plot the individualtrajectories. Here I’ve asked for a linear regression line for each subject, but you can simplyconnect the observed points using the vline option instead of the reg option.

proc sgpanel data=control1;

title ‘Control Subjects’;

panelby subject/columns=4 rows=2;

reg x=age y=log_exercise;

rowaxis min=-4 max=4;

colaxis values=(8, 10, 12, 14, 16);

run;

proc sgpanel data=patient1;

<similar coding>

3

Page 4: An R and S-PLUS Companion to Applied Regression.homepage.stat.uiowa.edu/~rdecook/stat5201/handouts/... · An R and S-PLUS Companion to Applied Regression. Exercise trend over time

You can also plot the overlay of these individual lines using PROC SGPLOT...

proc sgplot data=control1;

title ‘Subset of Control Subjects’;

reg x=age y=log_exercise/group=subject;

run;

4

Page 5: An R and S-PLUS Companion to Applied Regression.homepage.stat.uiowa.edu/~rdecook/stat5201/handouts/... · An R and S-PLUS Companion to Applied Regression. Exercise trend over time

Investigating the subject-specific parameter estimates (in R).

Fox formally fits a linear regression to each subject (231 separately fit models) in orderto investigate the variability and correlation in the slopes and intercept estimates from agraphical perspective. The predictor age is transformed to represent age after the start ofthe study or age-8. He points out that the random coefficients model (fitted to all the data)fits a ‘unified’ model that considers slopes and intercepts as random effects, and in that case,the estimated random effects or u are estimated using BLUPs (best linear unbiased predic-tors). For a model with independent random subject effects (i.e. just a random intercept),the BLUPs are actually shrinkage estimators and fall between the individual observed valuesand the overall mean values. Formally, BLUPs are estimated as u = GZ ′Σ−1(y−Xβ) whereΣ = var(y) = ZGZ ′ +R.

Before moving to a mixed model, we consider truly fitting a separate line to each subject(so not a random coefficients model). Again, the nlme package is utilized by employing thelmList function:

> pat.list=lmList(log.exercise ~ I(age - 8) | subject, subset = group=="patient",

data=Blackmore)

> con.list=lmList(log.exercise ~ I(age - 8) | subject, subset = group=="control",

data=Blackmore)

> pat.coef=coef(pat.list)

> con.coef=coef(con.list)

> par(mfrow=c(1,2))

> boxplot(pat.coef[,1], con.coef[,1], main="Intercepts", names=c("Patients","Controls"))

> boxplot(pat.coef[,2], con.coef[,2], main="Slope", names=c("Patients","Controls"))

Patients Controls

-4-2

02

Intercepts

Patients Controls

-1.0

-0.5

0.0

0.5

1.0

Slope

The intercept represents the level of exercise at the start of the study. As expected,there isa great deal of variation in both the intercepts and the slopes. The median intercepts aresimilar for patients and controls, but there is somewhat more variation among patients. Theslopes are higher on average for patients than for controls and the slopes tend to be positive(suggesting their exercise increases over time).

5

Page 6: An R and S-PLUS Companion to Applied Regression.homepage.stat.uiowa.edu/~rdecook/stat5201/handouts/... · An R and S-PLUS Companion to Applied Regression. Exercise trend over time

It makes sense to also plot the relationship between the estimated intercept and slope pa-rameters. The dataEllipse function is in the car library.

> plot(c(-5,4),c(-1.2,1.2),xlab="intercept",ylab="slope",type="n",

main="(Individual) Estimates of slope and intercept")

> points(con.coef[,1],con.coef[,2],col=1)

> points(pat.coef[,1],pat.coef[,2],col=2)

> abline(v=0)

> abline(h=0)

> legend(-4.5,-.7,c("Controls","Patients"),col=c(1,2),pch=c(1,1))

> dataEllipse(con.coef[,1],con.coef[,2],levels=c(.5,.95),add=TRUE,

plot.points=FALSE,col=1)

> dataEllipse(pat.coef[,1],pat.coef[,2],levels=c(.5,.95),add=TRUE,

plot.points=FALSE,col=2)

-4 -2 0 2 4

-1.0

-0.5

0.0

0.5

1.0

(Individual) Estimates of slope and intercept

intercept

slope

ControlsPatients

Recall that we are on the log-scale base 2 for our response, so y = 0 coincides with 1 hour ofexercise a week (this is the scale for the intercept, but not the slope). It looks like the twogroups have a reasonably similar correlation structure for the slope and intercept. It alsolooks like the patients have a shifted distribution such that they tend to have higher slopes.

6

Page 7: An R and S-PLUS Companion to Applied Regression.homepage.stat.uiowa.edu/~rdecook/stat5201/handouts/... · An R and S-PLUS Companion to Applied Regression. Exercise trend over time

Fitting the random coefficients model (in SAS)

This model allows for a random slope and intercept for each subject (which are allowed to becorrelated). The population-level mean structure allows for separate lines for each treatmentgroup (control and patient). The predictor age is transformed to represent age after the startof the study or age-8.

data Blackmore; set Blackmore;

age_trans = age-8;

run;

proc mixed data=Blackmore;

class subject group;

model log_exercise = group age_trans group*age_trans/solution ddfm=satterth;

random intercept age_trans/subject=subject type=un gcorr;

run;

The Mixed Procedure

Dimensions

Covariance Parameters 4

Columns in X 6

Columns in Z Per Subject 2

Subjects 231

Max Obs Per Subject 5

Estimated G Correlation Matrix

Row Effect subject Col1 Col2

1 Intercept 100 1.0000 -0.2808

2 age_trans 100 -0.2808 1.0000

Covariance Parameter Estimates

Standard Z

Cov Parm Subject Estimate Error Value Pr Z

UN(1,1) subject 2.0839 0.2901 7.18 <.0001

UN(2,1) subject -0.06681 0.03698 -1.81 0.0708

UN(2,2) subject 0.02716 0.007975 3.41 0.0003

Residual 1.5478 0.09743 15.89 <.0001

We see that the correlation between b0i and b1i is estimated to be negative (ρ = −0.2808)and marginally significant with a p=0.0708.

7

Page 8: An R and S-PLUS Companion to Applied Regression.homepage.stat.uiowa.edu/~rdecook/stat5201/handouts/... · An R and S-PLUS Companion to Applied Regression. Exercise trend over time

Solution for Fixed Effects

Standard

Effect group Estimate Error DF t Value Pr > |t|

Intercept -0.6300 0.1487 230 -4.24 <.0001

group control 0.3540 0.2353 234 1.50 0.1338

group patient 0 . . . .

age_trans 0.3039 0.02386 196 12.73 <.0001

age_trans*group control -0.2399 0.03941 221 -6.09 <.0001

age_trans*group patient 0 . . . .

Type 3 Tests of Fixed Effects

Num Den

Effect DF DF F Value Pr > F

group 1 234 2.26 0.1338

age_trans 1 221 87.16 <.0001

age_trans*group 1 221 37.05 <.0001

The groups do not have significantly different intercepts (average exercise values at start ofstudy, at age 8), but they do have significantly different slopes with the patient group havinga higher slope than the control group.

I can capture the estimated BLUP s or u = GZ ′Σ−1(y−Xβ) where Σ = var(y) = ZGZ ′+Rusing the ODS output and the solution option in the random statement:

ods output SolutionR=blups;

proc mixed data=Blackmore covtest;

class subject group;

model log_exercise = group age_trans group*age_trans/ddfm=satterth;

random intercept age_trans/subject=subject type=un gcorr solution;

run; /* Solution for the random effects are BLUPs*/

ods output close;

proc print data=blups (obs=10);

run;

StdErr

Obs Effect subject Estimate Pred DF tValue Probt

1 Intercept 100 1.0095 0.7092 235 1.42 0.1560

2 age_trans 100 -0.05272 0.1261 69.8 -0.42 0.6771

8

Page 9: An R and S-PLUS Companion to Applied Regression.homepage.stat.uiowa.edu/~rdecook/stat5201/handouts/... · An R and S-PLUS Companion to Applied Regression. Exercise trend over time

3 Intercept 101 -2.1614 0.7094 256 -3.05 0.0026

4 age_trans 101 0.01287 0.1221 79.5 0.11 0.9163

5 Intercept 102 0.9339 0.7161 266 1.30 0.1933

6 age_trans 102 0.1258 0.1353 53.1 0.93 0.3567

7 Intercept 103 0.9283 0.7101 250 1.31 0.1923

8 age_trans 103 0.02691 0.1413 44.5 0.19 0.8498

9 Intercept 104 1.1407 0.7177 273 1.59 0.1131

10 age_trans 104 -0.03742 0.1332 56.7 -0.28 0.7798

proc export data=blups

outfile="blups.csv"

dbms=csv replace;

run;

Below I’ve plotted the estimated BLUPs for the random intercepts against the estimatedslopes from the separately fit regression lines (in absolute values).

0.0 0.2 0.4 0.6 0.8 1.0 1.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Absolute values of slopes

|separately fit slope (individual regression)|

|BLU

P o

f slo

pe|

9

Page 10: An R and S-PLUS Companion to Applied Regression.homepage.stat.uiowa.edu/~rdecook/stat5201/handouts/... · An R and S-PLUS Companion to Applied Regression. Exercise trend over time

Fitting the random coefficients model (in R)Using the lme function in the nlme package, we see the same estimates for the covarianceparameters:

> lme.1=lme(log.exercise~I(age-8)*group, random=~I(age-8)|subject, data=Blackmore)

> summary(lme.1)

Linear mixed-effects model fit by REML

Data: Blackmore

Random effects:

Formula: ~I(age - 8) | subject

Structure: General positive-definite, Log-Cholesky parametrization

StdDev Corr

(Intercept) 1.4435580 (Intr)

I(age - 8) 0.1647954 -0.281

Residual 1.2440951

Var(Intercept)=1.44355802 = 2.083Var(slope)=0.16479542 = 0.027

Corr(Intercept, slope)= −.066820.1647954∗1.4435580 = −0.281

Var(Residual)=1.24409512 = 1.548

Reference:Pinheiro, J.C.&D.M Bates.2000. Mixed-Effects Models in S and S-PLUS. NewYork:Springer.

10

Page 11: An R and S-PLUS Companion to Applied Regression.homepage.stat.uiowa.edu/~rdecook/stat5201/handouts/... · An R and S-PLUS Companion to Applied Regression. Exercise trend over time

SAS Hint when using PROC IMPORT (Error in trying to import data)

proc import out=Blackmore

datafile="Blackmore.csv"

dbms=csv

replace;

run;

LOG BOX SHOWS ERROR READING IN DATA...

NOTE: Invalid data for subject in line 442 1-4.

RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+

442 207a,8,0.17,control,-1.980891177 32

subject=. age=8 exercise=0.17 group=control log_exercise=-1.980891177

_ERROR_=1_N_=441

What’s happening in the data at on line 442?

> Blackmore[438:443,]

subject age exercise group log.exercise

206 8.00 0.89 control -0.03899413

206 10.00 0.89 control -0.03899413

206 12.08 1.03 control 0.15488560

207a 8.00 0.17 control -1.98089118 <--- The subject ID is suddenly

207a 10.00 0.00 control -3.58496250 no longer numeric

207a 14.00 1.15 control 0.30256277

By default, PROC IMPORT uses a certain number of observations in each column to decidewhether the variable is numeric or categorical. Here, by the time subject 207a is read-in,SAS had decided the column is numeric, and then decides a value of ‘207a’ (character) isnot an appropriate entry. Thus, it considered it incorrect and called it missing. The solutionis to force it to look farther down the column to make this decision using ‘guessingrows=X’for some large X.

proc import out=Blackmore

datafile="Blackmore.csv"

dbms=csv

replace;

guessingrows=1000;

run;

NOTE: WORK.Blackmore data set was successfully created.

NOTE: The data set WORK.Blackmore has 945 observations and 5 variables

11