GLMM (Ulrich Halekoh)

Generalized Linear Mixed Models (GLMM)

Ulrich Halekoh

Unit of Statistics and Decision Analysis

Faculty of Agricultural Sciences, University of Aarhus

April 3, 2007

Printed: April 3, 2007 File: lmer.tex

2

Contents1 Examples for correlated data 4

1.1 Bivariate measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Repeated measurements - sub-sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.1 Compound symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3 Random coefficient models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Repeated measurements and random coefficient models 21

3 Two stage model formulation 25

4 Model fitting and estimation 25

5 Model comparison 35

6 Prediction and residuals 41

7 Fixed effect vs. random effect 44

8 Generalized linear mixed models 458.1 Working example – respiratory illness . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

9 Correlated Pearson residuals 519.1 Generalized linear model formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 539.2 Model fit and estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

10 Model comparison 58

11 Prediction and residuals 60

12 Covariances and correlation* 6112.1 Rules for computing covariances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6112.2 Covariances of random vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

12.2.1 The covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6312.2.2 From the covariance to the correlation matrix . . . . . . . . . . . . . . . . . . . 64

Regression analysis with R April 3, 2007

3

13 Random coefficients and the positioning of the random intercept* 65

14 Additional R code 6814.1 Plot of the Fig. 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6814.2 Plot of Fig. 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6914.3 Plot of Fig. 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7014.4 Correlation matrix from Pearson residuals for table 4 . . . . . . . . . . . . . . . . . . . 71


4

1 Examples for correlated data

In this section we look at some reasons why data are correlated and the

consequences for the variance of estimates based on correlated data.

1.1 Bivariate measurements

Example 1.1 The Fig. 1 shows the measurements of the height of fathers

against the height of their sons.

data(fatherson, package = "dataRep")

plot(father ~ son, data = fatherson, xlab = "height son [inch]",

ylab = "height father [inch]")


5

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●●

●●●

●● ●

●●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●●●

●●

●●

● ●

●

●●

●●

●●

●

●●

●

● ●

●

●●

●●●●●

● ●●●●●

●●

●

●

●

●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●●●

●●●

●

●

●●

●

●●●●●

●● ●●

●●

●

●

●

●●●●

●● ●

●●●

●●

●●●

●●

●●●

●●

●

●

●

●

● ●

●

●●

●●

●

●

●●●

●●

●

●

●

●●●

●

●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●●●●

●●●

●●●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●● ●

●●

●●●●

●

●●

●●

●

●

●●●

●●

●●

●

●

●●●●

●●

●

●

●

●●

●●

●●●●

●●

●●

●●

●

●

●●

●●

●●

●●●

●●●

●

● ●●

●

●●●

●

●●

●

●●

●

●●

●●●

● ●

●●●

●●●●

●● ●

●

●

●●

●●

●

●

●●

●

●

●●●●

●●

●●

●●

●●

●

●

●

●

●●

●●

●●●

●

●●

●

●

●●●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

● ●●

●

●

●

●

●●●

●●●

●

●

●

●

●

●

●●

●●●

● ●●

●

●●

●

●

●

●

●●●

●●

●

●

●

●

●

●●

●●●●●

●●●●●

●●

●

●

●●

●●●

●

●

●●

●●

●●

●●

●

●●●

●

●●

●●●●

● ●●●●

●

●●●

●

●●

●

●

●

●●

●●●

●

●

●●

●●●●

●●●

●●●

●●●●

●●

●

●

●

●●

●

●●●

● ●●

●●

●●

●

●●●

●●●

●●

●●

●

●●●●

●

●

●● ●

●●●●

●

●● ●

●

●

●●

●●

●●

●●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●●●

●

●●

●

●●

●

●

●●

●●

●●●

●●●●

●

●

●●

●

●

●●

●●

●●

●

●

●

●

● ●

●●●●

●

●

●

●● ●

●●●●

●

●

●●

●

●

●

●●

●●

●●●●

●

●●

●●

●●

●

●

●●

●

●●●●

●

●●●

●

●

●

●

●

●

● ●

●●●

●

●●●

●●

●● ●●

●●

●●

●●

●●

●●

●

●

●

●●

●●

●●

●

●●●

●●

●●

●●

●

●

●●

●

●

●●

●

●●

●● ●●●●●

●●

● ●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●●

●●

●●

●

●●

●

●●

●●

●

●

●●●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●●●

●

●

●●●●●●●●

●●●

●●●

●

●●

●●● ●

●●●●

● ●●●●●

●●● ●●●●

●● ●●

●

●

●

●

●●

●

●

●

●●●●

●

●●●

●

●●

●●

●●●●●●

●●

●●

●●

●

●

●●●

●

● ●

●● ●●

●

●

●●

●●●●

●●

●●

●

●

●●

●●

●●●

●●●

●●●

●●●

●

●

●

●

●

●

●●

● ●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●●

60 65 70 75

6065

7075

height son [inch]

heig

ht fa

ther

[inc

h]

Figure 1: Height of father and sons.

The height of the son provides information about the height of the father.

Both measurements are not independent and they are said to be correlated. �

The example is a case where an observation Y is a bivariate vector of two

measurement variables Y = (Y1, Y2). In the example Y1 is the height of the

father and Y2 the height of the son. Often one can assume that these two


6

variables are normally distributed, with means µk and variances σ2k

Yk ∼ N(µk, σ2k) k = 1, 2

The dependence between the two variables is expressed by their covariance

Cov(Y1, Y2) = σ12. The common distribution for the two variables is a

bivariate normal distribution with the mean vector µ = (µ1, µ2) and the

covariance matrix Σ

Y ∼ N(µ,Σ)

where

Σ =

σ21 σ12

σ21 σ22

.

This matrix is symmetric because σ12 = σ21. Dividing the covariance by the

square root of the variances one obtains the correlation between the two

variables

Corr(Y1, Y2) = σ12 =Cov(Y1, Y2)√

Var(Y1) Var(Y2).

The correlation lies always in the interval [−1, 1].


7

The correlation can be estimated by Pearson’s correlation coefficient

r =

∑i(yi1 − y.1)(yi2 − y.2)√∑

i(yi1 − y.1)2 ·∑

i(yi2 − y.2)2

Example 1.2 The correlation between the height measurements is

with(fatherson, cor(father, son))

[1] 0.501338 �

1.2 Repeated measurements - sub-sampling

Example 1.3 In an experiment the mean calcium content of the leafs on a

plant should be determined. A random sample of four leaves was obtained and

for each leaf 4 independent chemical analyses were made. The measurements

are given in Fig. 2 (Rcode 14.1).


8

calcium

leaf

1

2

3

4

2.8 3.0 3.2 3.4

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

Figure 2: Calcium content of leafs.

Interest lies in the following questions

1. The typical calcium content of a leaf.

2. The variation of the calcium content between leafs (due for example to

different leaf sizes) (the leaf-to-leaf or between-leaf variability)

3. The measurement uncertainty of the chemical analysis (the within-leaf)

variability

There is an obvious leaf-to-leaf variability (the measurements of leaf 3 are

lower than all the others). The within-leaf variability seems to be be equal

from leaf to leaf. �


9

The observations on the same leaf are correlated. An indication of this is given

by the empirical correlation between the observations.

The data are given in ’long’ format, where each observations is given in one

row.

data(leafcalcium)

head(leafcalcium)

leaf nr ca

1 1 1 3.28

2 1 2 3.09

3 1 3 3.03

4 1 4 3.03

5 2 1 3.52

6 2 2 3.48

We collect the measurements for one leaf in one row by reshaping the data:

w <- reshape(leafcalcium, direction = "wide", v.name = "ca",

timevar = "nr", idvar = "leaf")

leaf ca.1 ca.2 ca.3 ca.4

1 1 3.28 3.09 3.03 3.03


10

5 2 3.52 3.48 3.38 3.38

9 3 2.88 2.80 2.81 2.76

13 4 3.34 3.38 3.23 3.26

The following table shows the correlation between the four measurements. It is

calculated as the Person correlation between the first to fourth measurement.

Note that any other order of the measurements per leaf is equally possible and

therefore there is some arbitrariness in this calculation of the correlation. The

correlation between the measurements on a leaf is a result of the common

random individual level of the leaf for the four measurements per leaf. We will

show that by analyzing a model for these data.

cor(w[, -1])

ca.1 ca.2 ca.3 ca.4

ca.1 1.000000 0.952861 0.959217 0.964239

ca.2 0.952861 1.000000 0.993087 0.998671

ca.3 0.959217 0.993087 1.000000 0.996860

ca.4 0.964239 0.998671 0.996860 1.000000

We formulate a model for the data that will reflect the sources of variation.

We make the following definitions


11

• yij denote the measurement j = 1, . . . , 4 for leaf i = 1, . . . , 4.

• β represents the typical value of the calcium content.

• Li ∼ N(0, σ2L) is the random deviation of a specific leaf from that typical

value. The variance σ2L describes the leaf-to-leaf variation.

• εij ∼ N(0, σ2) denotes the chemical measurement error and the variance

σ2 the within-leaf variation (also called measurement variation or residual

error variation)

The observations yij are assumed to follow the model

yij = β + Li + εij . (1)

This model contains one fixed effect, the mean β and the random effect Li

and the residual measurement error εij .

We make the assumption that Li are independent, εij are independent and

that Li are independent from εij .

The common random effect Li is responsible for the observation on the same

leaf to be correlated.Regression analysis with R April 3, 2007

12

1.2.1 Compound symmetry

For the model the correlation between the measurements on the same leaf are

given as

Cov(yij , yik) = Cov(β + Li + εij , β + Li + εik) (2)

= Cov(Li, Li) + Cov(Li, εik) + Cov(εij , Li) + Cov(εijεik)(3)

= Cov(Li, Li) = σ2L (4)

and

Var(yij , yij) = Cov(β + Li + εij , β + Li + εij) (5)

= Cov(Li, Li) + Cov(εij , εij) (6)

= σ2L + σ2 (7)

If we collect the four observations yi = (yi1, . . . , yi4) in the vector yi the

observations are from the multivariate normal distribution with mean vector

µ = (µ1, µ2, µ3, µ4)


13

and covariance matrix

Cov(yi) = Σ =

σ2

L + σ2 σ2L σ2

L σ2L

σ2L σ2

L + σ2 σ2L σ2

L

σ2L σ2

L σ2L + σ2 σ2

L

σ2L σ2

L σ2L σ2

L + σ2

(8)

This is a matrix where all the off-diagonal elements are equal. Such a

covariance structure is called compound or exchangeable symmetry. If we

collect all the observations into one large vector y = (y1, y2, y3, y4) the

covariance matrix becomes block diagonal matrix, where the off-block elements

are equal to 0 matrices of zeros indicating that the observations from different

leaves are independent:

Cov(y) =

Σ 0 0 0

0 Σ 0 0

0 0 Σ 0

0 0 0 Σ

. (9)


14

1.3 Random coefficient models

Example 1.4 In Fig. 3 (Rcode 14.2) the development of pigs in 9 successive

weeks are given, measurements of the same pig are connected by a line.

week

wei

ght

20

40

60

80

2 4 6 8

Figure 3: Growth of 48 pigs.

The interest of the study was the estimation of typical the growth rate of the

population of pigs. Three features are apparent:

• The growth curves appear to be linear in the observed period.


15

• Pigs which are largest at the beginning of the study are largest throughout.

This effect is known as ’tracking’.

• The variation between pigs in the first week is smaller than in week 9. This

is sometimes called the ’fan’ effect. It may be explained by individually

different growth rates. Even though pigs have similar weights at the start

of the study, pigs with a larger growth rate will end up with higher weights

than those with a low growth rate.

�

In the example there are repeated measurements for each pig over time. That

the observations form the same pig are correlated can be seen by calculating

the empirical correlation between the observations for all the 48 pigs (Table 1).

data(pigweight, package = "dataRep")

cor(pigweight)


16

Table 1: Empirical correlation between the measurements from a pig.

week1 week2 week3 week4 week5 week6 week7 week8 week9

week1 1.00 0.92 0.80 0.80 0.75 0.71 0.66 0.63 0.56

week2 0.92 1.00 0.91 0.91 0.88 0.84 0.78 0.71 0.66

week3 0.80 0.91 1.00 0.96 0.93 0.91 0.84 0.82 0.77

week4 0.80 0.91 0.96 1.00 0.96 0.93 0.87 0.83 0.79

week5 0.75 0.88 0.93 0.96 1.00 0.92 0.85 0.81 0.79

week6 0.71 0.84 0.91 0.93 0.92 1.00 0.96 0.93 0.89

week7 0.66 0.78 0.84 0.87 0.85 0.96 1.00 0.96 0.92

week8 0.63 0.71 0.82 0.83 0.81 0.93 0.96 1.00 0.97

week9 0.56 0.66 0.77 0.79 0.79 0.89 0.92 0.97 1.00

A random effect for pig would account for individual weight levels of the pigs

and describe to a certain extend the ’tracking’ effect.

Additional to this effect one should account for individual growth rates of the


17

pigs, which can be mathematically expressed by individual slopes.

We formulate the following model for the weight yitj for pig i and week tj :

yij = µ + ui + (β + bi)tj + εij , i = 1, . . . , 48, j = 1, . . . , 9 (10)

The parameters and their meaning

• µ The average weight of all pigs at time t = 0.

• ui ∼ N(0, σ2u) describe the pig-individual random deviation from the

mean weight at time t = 0,

• β is the mean growth rate across all pigs,

• bi ∼ N(0, σ2b ) the pig-individual random deviations from the mean slope

β,

• εit ∼ N(0, σ2) the residual error.

• Cov(ui, bi) = σub.

We make as before the assumption that both ui and bi are independent from

the residual error. But we cannot assume beforehand that also ui and bi are


18

independent. This is because a simple shift in the time variable (e.g. adding

some constant value to the week) will change the correlation of these random

effects. (For an explanation see Section 13). An assumption of independence

of these two random effects is approximately granted if the x-value (here the

week variable) where there is least variation between the pigs is close to 0.

The correlation of the observations per pig can now be calculated by noting

that

Var(yi,t, yi,t) = σ2u + t2σ2

b + 2t Cov(ui, bi) + σ2 (11)

Cov(yi,1, yi,t) = σ2u + tσ2

b + (t + 1) Cov(ui, bi) t = 2, . . . , 9 (12)

In Fig. 4 the empirical correlation already given in the first line of Tab. 1 are

compared to the model based correlations (for the R-code see section 14.3.) It

is apparent that in contrast to the compound symmetry, the correlation is

decreasing over time.


19

●

●

● ●

●

●

●

●

●

2 4 6 8

0.5

0.6

0.7

0.8

0.9

1.0

week

corr

elat

ion

●

●

●

●

●

●

●

●

●

●

●

empirical correlationmodel based correlation

Figure 4: Empirical and model based correlations Corr(yi1, yi,week) for mea-

surements for the same pig i

Example 1.5 Another example of a random coefficient model is given by an

experiment about the influence of two substances thiouracil and thyroxinil on

the growth of rats in comparison to a control group.

Data preparation


20

data(ratsbodyweight, package = "dataRep")

rat <- reshape(ratsbodyweight, direction = "long", varying = list(paste("week",

c(0:4), sep = "")), v.name = "weight", idvar = "ratid",

timevar = "week", time = c(0:4))

rat <- transform(rat, treat = factor(treat))

Plot of the data

library(lattice)

print(xyplot(weight ~ week | treat, groups = ratid,

data = rat, type = "l"))

week

wei

ght

50

100

150

0 1 2 3 4

control

0 1 2 3 4

thiouracil

0 1 2 3 4

thyroxin

Figure 5: Growth of rats.

The main question in this experiment would be whether the growth rate


21

(=slope of growth) is related to the treatment.

A model for the data assuming linear growth

yij = µ + ui + αtreat(i) (13)

+βweekj + βtreat(i)weekj + biweekj + εij

where i indexes the rat, j the week and treat=control, thiouracil, thyroxinil.

The random effects are the ui which adds to the observations of each animal a

random intercept and bi which adds a random slope for each rat.

�

2 Repeated measurements and random

coefficient models

A linear model where some of the parameters are normally distributed random

variables is called a random coefficients model. It can be generally formulated


22

as

yij =

p∑k=1

βkxijk +

q∑l=1

bilzijl + εij (14)

The yij is the normally distributed response for subject (or individual) i, the

time tj (of the repeated measurement on the individual). The covariates split

up into those for fixed and random effects:

• xijk is the value of the k-th covariate with the fixed parameter βk

• zijl is the covariate for the random variable bil. The bi = (bi1, . . . , biq)

are assumed to be normally distributed with mean 0 and a covariance Σb:

bi ∼ N(0,Σb)

The errors εij are independently normally distributed

εij ∼ N(0, σ2)

In matrix notation the model can be written more concisely

yi ∼ Xiβ + Zibi + εi (15)


23

The ni observations on the i = 1, . . . , m subjects are correlated because of

the common random variables bi

Cov(yi) = ZiΣbZ>i + I

where I is the m×m identify matrix.

A fundamental assumption in this model is that the observations between

different subjects i and i are independent i.e.

Cov(yij , yi,j) = 0 for i 6= i.

Example 2.1 The leaf example is an instance of a random-component model,

with the leaf as the random component. For these models the Zijk simplify to

Zijl = Zil = 1

In the leaf example i = 1, . . . , 4(= number of leaves), l = 1, . . . , 4(= number

of chemical determinations) the Zi is a vector with as many 1’s as the number

of chemical determinations:

Zi = (1, 1, 1, 1)> (16)


24

�

Example 2.2 In the growth of rats example we have a random coefficient

model where the matrix Zi has two column, the first represents the random

intercept and the second the week:

Zi =

1 1

1 2

1 3

1 4

(17)

�


25

3 Two stage model formulation

A two stage formulation, where one first describes the distribution of yi given

the random parameters and then the distribution of the random parameters bi

is convenient for generalizations to models with other distributions for yi than

the normal distribution.

• 1. stage:

yi|bi ∼ N(µi, Ini×ni), (18)

µi = Xiβ + Zibi. (19)

• 2. stage:

bi ∼ N(0,Σ). (20)

Additionally, one assumes that the vectors bi are independent. The residuals εi

have not disappeared. They are now included in the description of the

distribution in Eq. (18) of yi|bi. That the εij are independent is expressed by

using the identity matrix as the covariance matrix in yi|bi.


26

4 Model fitting and estimation

The fitting of a model and the estimation of the model parameters is based on

the log-likelihood.

One drawback of the maximum-likelihood estimate for variance components is

that they are biased. A common example is the maximum likelihood estimate

of the variance for n independent observations yi, say

σ2ML =

1

n

n∑i=1

(yi − yi)2 (21)

which has the divisor n. The commonly used estimator

σ2REML =

1

n− 1

n∑i=1

(yi − yi)2 (22)

is unbiased, i.e. E(σ2REML) = σ2.

A general procedure to obtain unbiased estimates of the variance-components

is to use the restricted maximum likelihood (REML) method. This method

proceeds in three steps


27

1. Transform the original observations, such that the new data are

independent on the fixed effects.

2. Maximize the likelihood in the variance components for these new data.

3. Get the fixed effects parameters from the original likelihood plugging in the

variance-component estimates.

The main argument in favor of REML-estimation is that for balanced designs

its results are the same as those obtained from an alternative classical

estimation method, ANOVA, for variance components. The unbiasedness

aspect is not so important, because even though an estimator is unbiased for

σ2 it will not be so for the standard deviation σ.

Example 4.1 One function to fit a mixed linear model is the function lmer of

the package lme4 on CRAN.

The model 1 for the leaf data is fitted by

library(lme4)

m.leaf <- lmer(ca ~ 1 + (1 | leaf), data = leafcalcium)

Linear mixed-effects model fit by REML


28

Formula: ca ~ 1 + (1 | leaf)

Data: leafcalcium

AIC BIC logLik MLdeviance REMLdeviance

-14.6 -13 9.28 -20.7 -18.6

Random effects:

Groups Name Variance Std.Dev.

leaf (Intercept) 0.072379 0.26903

Residual 0.006602 0.08125

number of obs: 16, groups: leaf, 4

Fixed effects:

Estimate Std. Error t value

(Intercept) 3.166 0.136 23.3

In the table below Fixed effects: the estimates for the fixed effects

parameters are given, similar to the table from the glm-function. You obtain

the coefficients with the extractor function fixef

fixef(m.leaf)

(Intercept)

3.16563

In the table below Random effects: the estimates for the variance


29

parameters are given. Here leaf refers to σ2L, the between leaf variance, and

Residual to σ2. Note, the values in column Std.Dev. (e.g. 0.269) are just

the square root of the estimates for the variance parameter (e.g. 0.072) and

not their standard errors.

The estimate of the variance parameters can be obtained by

VarCorr(m.leaf)

$leaf

1 x 1 Matrix of class "dpoMatrix"

(Intercept)

(Intercept) 0.0723791

attr(,"sc")

scale

0.0812534

where the sc attribute is the square root of the estimate for the residual

variance σ2 �

None of the above tables provide a confidence interval for the parameters or a

test of the hypothesis that the parameter is equal to zero.


30

This was a deliberate decision of the package maintainer D. Bates, because in

more complicated models random effects models the distribution of the

corresponding test-statistics is not clarified. One strategy is to assume that the

parameter estimates are approximately t-distributed wit the number of freedom

equal to the residual degrees of freedom. For some balanced designs these

number of degrees of freedom is too large, and one obtains a too liberal test

(e.g. too small p-values). A different strategy is based on using the Bayesian

approach in combination with simulation. For the fixed effects parameters and

the variance parameters one assumes ’uninformative’-priors. Based on the data

one can then simulate via a general method known as Markov Chain Monte

Carlo (MCMC). parameters values from the posterior distribution of the

parameters. This posterior distribution can be used to quantities, that are

similar to confidence intervals and hypothesis tests. Conceptually, the intervals

and test are different from the intervals based on the likelihood, but in many

cases they agree closely and in the present context they may give a better

picture of the uncertainty.

Confidence intervals and simple hypothesis test are available in our function

coeftable.lmer available in the package glmfun.


31

coeftable.lmer(m.leaf)

Estimate StdErr Wald95lower Wald95upper Pr(>|t|)

(Intercept) 3.1656 0.136 3.05305 3.2782 0

Example 4.2 For the rat-growth data the model 13 is fitted by:

M.rat <- lmer(weight ~ treat + week + treat:week + (1 +

week | ratid), data = rat)

library(glmfun)

coeftable.lmer(M.rat, nsim = 10000)

$fixed

Estimate MCMCmean Wald95lower Wald95upper

(Intercept) 52.8800 52.8909 51.136010 54.62399

treatthiouracil 4.8200 4.7989 2.353625 7.28637

treatthyroxin -0.7943 -0.7835 -3.512098 1.92353

week 26.4800 26.4834 25.429785 27.53022

treatthiouracil:week -9.4300 -9.4244 -10.915228 -7.94477

treatthyroxin:week 0.6629 0.6362 -0.973785 2.29950

HPD95lower HPD95upper Pr(>|t|) pMCMC

(Intercept) 48.4996 57.4297 0.0000 0.0001


32

treatthiouracil -1.4708 10.9219 0.1054 0.1218

treatthyroxin -7.8713 5.9849 0.8078 0.8236

week 23.8513 29.2751 0.0000 0.0001

treatthiouracil:week -13.2650 -5.5798 0.0000 0.0002

treatthyroxin:week -3.3931 4.9217 0.7360 0.7646

$random

MCMCmean HPD95lower HPD95upper

sigma 4.3452879 3.689706 5.075945

ratd.(In) 5.9515461 3.885008 9.244061

ratd.week 3.9753688 2.849381 5.613429

rtd.(I).wek -0.0988601 -0.562515 0.430946

In the $random component the row rtd.(I).wek displays the estimate of the

correlation between the random intercept ui and the random slopes bi (see Eq.

13 The confidence intervals for the MCMC sample are a bit larger than those

based on the t-distribution. One reason is, that the MCMC-based intervals

reflect better the uncertainty induced by estimating the variance parameters.

The default estimation procedure is REML. Estimates using the maximum

likelihood method are obtained by setting the method='ML' argument:

M.rat.ML <- lmer(weight ~ treat + week + treat:week +


33

(1 + week | ratid), data = rat, method = "ML")

reml <- coeftable.lmer(M.rat)[, c(1, 2)]

ml <- coeftable.lmer(M.rat.ML)[, c(1, 2)]

tab <- cbind(reml, ml)

colnames(tab) <- paste(colnames(tab), rep(c("REML",

"ML"), each = 2), sep = "")

EstimateREML StdErrREML EstimateML StdErrML

(Intercept) 52.8800 2.0903 52.8800 1.9710

treatthiouracil 4.8200 2.9562 4.8200 2.7874

treatthyroxin -0.7943 3.2576 -0.7943 3.0715

week 26.4800 1.2588 26.4800 1.1867

treatthiouracil:week -9.4300 1.7802 -9.4300 1.6783

treatthyroxin:week 0.6629 1.9617 0.6629 1.8494

The parameter estimates are the same, but the ML based standard errors are a

bit smaller than the REML based ones (similar to 21 and 22). �

Example 4.3 The fit of model for the pig weights is obtained by (the data

have first to be reshape into ’long’ format such that the weight measurements

form one column):


34

library(lme4)

pigweight <- get(load("data/pigweight.Rdata"))

pigweightL <- reshape(pigweight, direction = "long",

varying = list(colnames(pigweight)), v.name = "weight",

timevar = "week", idvar = "pigid")

pigweightL <- with(pigweightL, pigweightL[order(pigid,

week), ])

M.pig <- lmer(weight ~ 1 + week + (1 + week | pigid),

data = pigweightL)

and the table for the fixed effects is given by

coeftable.lmer(M.pig)[, 1:2]

Estimate StdErr

(Intercept) 19.3556 0.4039

week 6.2099 0.0920

The estimates for the variances are given by


35

VarCorr(M.pig)["pigid"]

$pigid


(Intercept) week

(Intercept) 6.989555 -0.103670

week -0.103670 0.379924

where the estimate for σ2u = 6.99, σ2

b = 0.38 and their covariance

σub = −0.104. The residual variance σ2 is obtained as

attr(VarCorr(M.pig), "sc")^2

scale

1.59679

�

5 Model comparison

The comparison of models may be performed by likelihood-ratio tests.


36

The distribution of the difference of the likelihoods between two nested model

M0 ⊂ M1 is asymptotically χ2 distributed with p1 − p0 degrees of freedom,

where p1 and p0 are number of model parameters. Here one has to count also

the variance-covariance parameters of the model.

The quality of the χ2 approximation of the distribution of the log likelihood

ratio statistic is different for tests of the fixed and the random components.

Example 5.1 Fixed effect In the rat growth example a reduced model of Eq.

(13) would be the model of no treatment effect. i.e.

E(yij) = µ + ui + βweekj + biweekj (23)

The likelihood-ratio test between model 13 and 23 is performed with the

anova function Comparing two models with different fixed effects one must fit

the models with the ’ML’, not the ’REML’ method.


week | ratid), data = rat, method = "ML")

M.rat.reduc <- lmer(weight ~ week + (1 + week | ratid),


37

data = rat, method = "ML")

anova(M.rat, M.rat.reduc)

Data: rat

Models:

M.rat.reduc: weight ~ week + (1 + week | ratid)

M.rat: weight ~ treat + week + treat:week + (1 + week | ratid)

Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)

M.rat.reduc 5 930.5 945.0 -460.2

M.rat 9 912.9 939.1 -447.5 25.54 4 3.92e-05 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

�

Example 5.2 Random effect-removing a covariance In the rat-growth

example the model for the random effects per rat i was

(ui, bi) ∼ N(0,Σ2,2)


38

with

Σ2,2 =

σ2u σu,b

σ2u,b σ2

b

In the analysis (see example 4.2) we saw that the confidence interval for the

covariance σub contained 0, and one may consider the simpler model of no

correlation

H0 : σbu = 0.

We fit both models and compare them with a likelihood-ratio test:


week | ratid), data = rat)

M.rat.simple <- lmer(weight ~ treat + week + treat:week +

(1 | ratid) + (0 + week | ratid), data = rat)

(Note, by writing separately 1|ratid and 0+week|ratid we assume

independence between these two random effects. Forgetting the ’0’ R would

automatically add a ’1’ and one would have the larger model. (Here one need

not to use a ’ML’ fit).


39

anova(M.rat, M.rat.simple)

Data: rat

Models:

M.rat.simple: weight ~ treat + week + treat:week + (1 | ratid) + (0 + week |

M.rat: ratid)

M.rat.simple: weight ~ treat + week + treat:week + (1 + week | ratid)


M.rat.simple 8 911.4 934.6 -447.7

M.rat 9 913.2 939.3 -447.6 0.187 1 0.665

The large p-value corroborates our observation, that the two random effect

may be treated as independent. �

Example 5.3 Random effect-removing a variance Assuming the simple

model for the covariance σbu = 0 we now consider the hypothesis whether

H0 : σ2b = 0.

In contrast to the previous test on a covariance parameter, which may take

negative and positive values, the current test is on a parameter that can take

only positive values. Its value under the hypothesis lies on the boundary of the


40

parameter space. This causes problems for the χ2 approximation of the

distribution of the likelihood-ratio statistic. A recommendation is to adjust the

p-value by dividing it by 2.

M.rat.simple <- lmer(weight ~ treat + week + treat:week +

(1 | ratid) + (0 + week | ratid), data = rat)

M.rat.simple.RED <- lmer(weight ~ treat + week + treat:week +

(1 | ratid), data = rat)

anova(M.rat.simple, M.rat.simple.RED)

Data: rat

Models:

M.rat.simple.RED: weight ~ treat + week + treat:week + (1 | ratid)

M.rat.simple: weight ~ treat + week + treat:week + (1 | ratid) + (0 + week |

M.rat.simple.RED: ratid)


M.rat.simple.RED 7 977.2 997.5 -481.6

M.rat.simple 8 911.4 934.6 -447.7 67.85 1 <2e-16

M.rat.simple.RED

M.rat.simple ***

---


41

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value is so small, that and adjustment has no implication on the

conclusion. �

The following general rules hold for testing in mixed linear models:

1. Testing for reduction in fixed effects:

• The χ2 approximation is liberal, i.e. p-values tend to be too small.

• The model must not be fit with the REML-method, because the

REML-likelihood is dependent on the parameterization.

2. Random effects:

• The χ2 approximation is conservative, i.e. p-values tend to be too

large.

• In testing that a variance parameter is equal to zero one should divide

the p-value by 2.


42

6 Prediction and residuals

The random parameters in a linear mixed model are realizations of random

variables and not parameters in the usual meaning, as the β or the elements of

the covariance matrix. Nevertheless it is possible to obtain estimates for them.

One prefers to speak of prediction. The most common method to calculating

these are known as best unbiased linear predictions (BLUP).

Example 6.1 In the rat example there are two sets of random parameters

• the random intercept ui,

• the random slopes βi.

Both are available in the list returned by ranef:

pred <- ranef(M.rat)

pred <- pred[[1]]

Both can be plotted (Fig. 6) for checking whether the assumption of their

normal assumption is tenable. pred is a list with one element which is selected

by pred[[1]]Regression analysis with R April 3, 2007

43

par(mfrow = c(1, 3))

qqnorm(pred[, "(Intercept)"], main = "Random intercept")

qqline(pred[, "(Intercept)"])

qqnorm(pred$week, main = "Random slope")

qqline(pred$week)

qqnorm(residuals(M.rat), main = "Residuals")

qqline(residuals(M.rat))

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

−2 −1 0 1 2

−5

05

Random intercept

Theoretical Quantiles

Sam

ple

Qua

ntile

s ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

−2 −1 0 1 2

−6

−4

−2

02

46

Random slope


Sam

ple

Qua

ntile

s

●

●

●●

●

●

●●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

−2 −1 0 1 2

−5

05

Residuals


Sam

ple

Qua

ntile

s

Figure 6: Q-Q plots for the random intercepts and slopes.

The residuals play a similar role as the random parameters. They can be

considered to be random parameters on the lowest level. They should also be


44

checked for normality. �

Based on the predictions for the random parameters one can then obtain

predictions for the expected values of the observations themselves.

Example 6.2 In the rat growth example the predictions are obtained by

evaluating Eq. (13) at the values of the covariates and the random coefficients

using the fitted function.

fitted(M.rat)

�

7 Fixed effect vs. random effect

One alternative to fitting a model where the subject effect is random would be

to use it as a fixed factor. This approach has its pros and cons.

Disadvantage of fixed effects models:


45

• It is not possible to include covariates that that are subject specific

because their variation is contained in the subject to subject effect.

• The model will not allow to model dependencies between measurements

within a subject and therefore the variance for the means comparison

between subjects is not properly inflated.

• Many clusters require the fit of a large number of parameters (one for each

cluster). The parameter estimated may become instable. Often one is not

interested in the specific value for a cluster.

Arguments for a fixed effects approach

1. With very few clusters, the assumption if the normality of the cluster

means if often not feasible.

2. The interest lies more on the specific subjects rather than on the

population of the subjects. One should add, that if the experiment only

contains a few clusters, than by the very design of the experiment, the

experimenter were obviously not interested to represent the population or

subject to subject variation


46

8 Generalized linear mixed models

In the preceding section the observations were assumed to be normally

distributed. We now generalize to observations that come from distributions of

generalized linear models.

8.1 Working example – respiratory illness

Example 8.1 The data are from a clinical trial of patients with respiratory

illness, where 111 patients from two different clinics were randomized to

receive either placebo or an active treatment. Patients were examined at

baseline and at four visits during treatment. At each examination, respiratory

status (categorized as 1 = good, 0 = poor) was determined.

• The recorded variables are:

Center (1,2), ID, Treatment (A=Active, P=Placebo), Gender

(M=Male,F=Female), Age (in years at baseline), Baseline Response.

• The response variables are:


47

Visit 1 Response, Visit 2 Response, Visit 3 Response, Visit 4 Response.

Data for 8 patients are shown in Table 2.

Table 2: Respiratory data for eight individuals. Measurements on the same

individual tend to be alike.

center id treat sex age baseline visit1 visit2 visit3 visit4

1 1 1 P M 46 0 0 0 0 0

2 1 2 P M 28 0 0 0 0 0

3 1 3 A M 23 1 1 1 1 1

4 1 4 P M 44 1 1 1 1 0

5 1 5 P F 13 1 1 1 1 1

6 1 6 A M 34 0 0 0 0 0

7 1 7 P M 43 0 1 0 1 1

8 1 8 A M 28 0 0 0 0 0


48

Plotting the mean outcome across the four visits for each patient against age

shows a parabolic trend. (The mean outcome across visits is the proportion of

the positive responses of a patient).

mean outcome vs. age

age

mea

n.ou

tcom

e

0.0

0.2

0.4

0.6

0.8

1.0

10 20 30 40 50 60 70

●

● ●

●

● ●

●

● ●

● ● ●

●

●

● ●

●

●

●● ●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●

● ●

●

●

● ● ●

treatment 1 treatment 2●

Figure 7: Mean outcome across visits against age for each patient.


49

Interest is in comparing the treatments, but also to include center, age, gender

and baseline response in the model.

From Table 2 it is clear, that there is a dependency among the response

measurements on the same person – measurements on the same person tend to

be alike.

This dependency must be accounted for in the modeling. �

Example 8.2 A first approach is to ignore the dependency. This approach is

not appropriate but illustrative.

Let yiv denote the response measured on the ith person at visit v, where

v = 1, . . . , 4 Since the response outcomes are binary, yiv ∈ {0, 1}, it is

tempting to consider the binomial distribution as basis for the modeling. That

is, to assume that yiv ∼ bin(1, µiv) and that all yiv are independent.

As specification of µiv we consider in the following the linear predictora

a We used (age/10)2 = age2/100 to get a parameter estimate about the range of the

others for better reporting in a table. If β7 is the parameter for (age/10)2 then the parameter

β7 for age2 would be β7 = β7/100.


50

ηiv = logit(µiv)

M1 : ηiv = β1 + β2baselinei + β3;center(i) + β4;sex(i) (24)

+β5;treat(i) + β6agei + β7(age/10)2i

Note that the expression for logit(µiv) does not include the visit v. We will

write this briefly as

logit(µ) = baseline + center + sex + treat + age + (age/10)2

data(respiratory, package = "dataRep")

respiratory <- transform(respiratory, age2 = (age/10)^2,

center = factor(center), visit = factor(visit))

M.resp.1 <- glm(outcome ~ baseline + center + sex +

treat + age + age2, data = respiratory, family = binomial)

Table 3 contains the parameter estimates under the model.


51

Table 3: Parameter estimates when assuming independence

Estimate Std. Error z value Pr(>|z|)

(Intercept) 3.87 0.96 4.0 0.000

baseline 1.89 0.25 7.6 0.000

center2 0.51 0.25 2.1 0.038

sexM −0.45 0.32 −1.4 0.154

treatP −1.32 0.24 −5.4 0.000

age −0.21 0.05 −4.4 0.000

age2/100 0.26 0.06 4.1 0.000

�


52

9 Correlated Pearson residuals

Based on the fitted (wrong) independence model we can calculate the Pearson

residuals

rP ;iv =yiv − µiv√µiv(1− µiv)

, i = 1, . . . , N, v = 1, . . . , 4

which under the model approximately have mean 0 and variance 1.

From these we can estimate the correlation matrix (Table 4) showing

correlations between measurements at the different visits on the same

individual (Rcode 14.4).

Table 4: Correlation matrix based on Pearson residuals.

visit.1 visit.2 visit.3 visit.4

visit.1 1.000 0.351 0.240 0.297

visit.2 0.351 1.000 0.343 0.277

visit.3 0.240 0.343 1.000 0.362

visit.4 0.297 0.277 0.362 1.000

If the observations were independent then the true (i.e. theoretical) correlations


53

should be zero.

The correlations in Table 4 are estimated so even if observations were

independent, the estimated correlations would not necessarily be zero – but

they should be close to.

There is a clear indication in Table 4 that the correlations tend to be positive.

The task in the following is to account for dependency between measurements

on the same individual.

9.1 Generalized linear model formulation

Example 9.1 The patients may have individually different response

probabilities. To account for these we will include a patient effect in the linear

predictor. Because the patient were randomly selected we include the patient

effect as a normally distributed random component.

The classical random part of a generalized linear model is now formulated

conditionally on the random effects.

yi,v|si ∼ bin(1, µiv) (25)Regression analysis with R April 3, 2007

54

and the linear predictor in the systematic part is written

G1 : ηiv = β1 + β2baselinei + β3,center(i) + β4;sex(i) (26)

+β5;treat(i) + β6agei + β7(agei/10)2 + si;patient (27)

One has to add the assumptions on the distribution of the random effects

si ∼ N(0, σ2s), (28)

si are independently distributed. (29)

�

The general formulation of a generalized linear mixed model (GLMM) following

the Section 3

yij |bij ∼ F (µij), independent for different j, (30)

ηi = Xiβ + Zibi, (31)

bi ∼ N(0,Σ), independent for different i (32)

where F is a distribution from the exponential family, and the expectation µi is

linked to the predictor via the link-function h(µij) = ηij .


55

One has therefore in addition to the random part in Eq. (30) a second random

specification in Eq. (32).


56

9.2 Model fit and estimation

The model is fit by maximizing the likelihood. For generalized linear models

there is no longer a closed representation of the likelihood as for the linear

model with normally distributed observations. The calculation involves

numerical integration which may be difficult to solve for large data with many

random effects. The REML approach is not available.

Example 9.2 The lmer function is used for the model fitting specifying the

distribution of the response via the family argument. I should be noted that

the id variable does not uniquely identify a patient as it starts from 1 within

each center. Therefore, we define a variable patid which uniquely identifies a

patient.

respiratory <- transform(respiratory, age2 = (age/10)^2,

patid = interaction(center, id))

G.resp.1 <- lmer(outcome ~ baseline + center + sex +

treat + age + age2 + (1 | patid), data = respiratory,

family = binomial)


57

The lmer will fit the model by using the numerical approximation called

’Laplace’. A less accurate but sometimes numerically more stable alternative is

the ’PQL’ method.

The table of coefficients with (optional) MCMC-based confidence intervals is

obtained by

tab <- coeftable.lmer(G.resp.1, nsim = 0)

Estimate StdErr Wald95lower Wald95upper Pr(>|z|)

(Intercept) 5.5358 1.8689 1.872746 9.198915 0.0031

baseline 2.8603 0.4961 1.888026 3.832553 0.0000

center2 0.7615 0.4997 -0.217882 1.740785 0.1275

sexM -0.6081 0.6466 -1.875347 0.659209 0.3470

treatP -2.0117 0.4874 -2.966924 -1.056457 0.0000

age -0.3073 0.0946 -0.492802 -0.121789 0.0012

age2 0.3892 0.1301 0.134164 0.644330 0.0028

The estimates of the variance component σ2s describing the patients variability

is obtained as

vv <- VarCorr(G.resp.1)

$patid


58


(Intercept)

(Intercept) 3.18672

attr(,"sc")

scale

1

The list element vv$id is the estimate of σ2s and the attribute attr(vv,"sc")

is the scale or dispersion parameter.

�

10 Model comparison

The model comparison is via the likelihood-ratio test, assuming that the

difference in log-likelihoods are asymptotically χ2 distributed.


59

Example 10.1 We consider the model where the age effect has been removed

ηiv = G0 : β1 + β2baselinei + β3,center(i) + β4;sex(i) (33)

+β5;treat(i)si;patient (34)

This model is compared to model G1 from Eq. (26) via a LR-test:


treat + age + age2 + (1 | patid), data = respiratory,

family = binomial)


treat + (1 | patid), data = respiratory, family = binomial)

anova(G.resp.1, G.resp.0)

Data: respiratory

Models:

G.resp.0: outcome ~ baseline + center + sex + treat + (1 | patid)

G.resp.1: outcome ~ baseline + center + sex + treat + age + age2 + (1 |

G.resp.0: patid)


G.resp.0 6 449.0 473.6 -218.5

G.resp.1 8 440.9 473.7 -212.4 12.09 2 0.00237 **

---


60

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

�

11 Prediction and residuals

Example 11.1 The fitted values on the link-scale are available with the

fitted function.

fit <- fitted(G.resp.1)

Residuals for GLMM-models are not jet implemented in the lmer function.

Pearson residuals can be obtained via our residu.lmer function of our

package glmfu.

library(glmfun)

res.p <- residu.lmer(G.resp.1)

Finally, the BLUP’s of the random effects are available via


61

raf <- ranef(G.resp.1)

and should be checked for normality (see Fig. 8)

rand.patient <- raf[[1]][, "(Intercept)"]

qqnorm(rand.patient)

qqline(rand.patient)

●

●●

●●●

●

●

●

●●●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●●

●

●

● ●

●

●

●

●

●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

−2 −1 0 1 2

−2

−1

01

23

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

Figure 8: Q-Q plots for the random intercept.

�


62

12 Covariances and correlation*

12.1 Rules for computing covariances

Let X, Y, V and W be some random variables and a and b some constants.

The the following rules hold:

• The covariance between a constant and a random variable is zero

Cov(a, X) = 0.

• The covariance between variables multiplied by a constant

Cov(aX, bY ) = a · b Cov(X, Y ).

• The covariance between sums of random variables is the sum of the

pairwise covariances

Cov(X+Y, V +W ) = Cov(X, V )+Cov(X, W )+Cov(Y, V )+Cov(Y, W ).

From these rules one may derive the specific rules for the variance.


63

• The covariance between a variable with itself is the variance

Cov(X, X) = Var(X).

• The variance of a linearly transformed variable

Var(a + bX) = b2 Var(X)

.

The correlation between two random variables is defined as

Corr(X, Y ) =Cov(X, Y )√

Cov(X)√

Cov(Y )(35)

One has the special result that a linear transformation of a variable from X to

a + bX does not change the correlation:

Corr(X, Y ) = Corr(a + bX, Y ). (36)

Especially, the multiplication of a variable does not change the correlation.

Therefore, measuring a variable in ’meter’ or ’centimeter’ yields the same

correlation.


64

12.2 Covariances of random vectors

12.2.1 The covariance matrix

We consider the covariance between the three random variables X, Y, V which

we collect in the random vector m = (X, Y, V ) then one defines the

correlation of the vector m as the matrix Σ of all the pairwise correlations:

Cov(m) = Σ =

Var(X) Cov(X, Y ) Cov(X, V )

Cov(Y, X) Var(Y ) Cov(Y, V )

Cov(V, X) Cov(V, Y ) Var(V )

(37)

Because covariances are symmetric Cov(X, V ) = Cov(V, X) the covariance

matrix is a symmetric matrix.

If you transform a vector m by some matrix B, then the corresponding

covariance matrix is given as

Cov(Bm) = BΣB> (38)

where B> denotes the transposed matrix.Regression analysis with R April 3, 2007

65

12.2.2 From the covariance to the correlation matrix

You compute the correlation matrix from the covariance matrix by first

collection the square root of the variances (or diagonal) elements of Σ in a

diagonal matrix

D =√

diag(Σ) =

√

Var(X) 0 0

0√

Var(Y ) 0

0 0√

Var(V )

(39)

and then calculating

Corr(m) = D−1ΣD−1 (40)


66

13 Random coefficients and the positioning of

the random intercept*

T his section explains, why in a random coefficient model with random

intercept and random slopes a simple shift of the independent variable can

essentially change the correlation between the intercept and the slope. and it is

therefore recommended to to model a correlation between both.

We assume the following simple random intercept model

yit = µ + ai + bixt + εit (41)

where µ is the mean value, ai ∼ N(0, σ2a is the random intercept,

bi ∼ N(0, σ2b the random slope and εit the residuals errors with variance σ2.

We allow for the time being that ai and bi are correlated, with covariance

Cov(ai, bi) 6= 0.

We make now a simple shift of the x-values by an amount of ∆ and define the

new regressor variable zt as

zt = xt + ∆


67

The model equation 42 changes to

yit = µ + (ai − bi∆) + bi(xt + ∆) + εit = µ + ai + bizt + εit (42)

where the new ai is

ai = ai − bi∆

We can now compute the variance of ai and its covariance with bi:

Var(ai) = Var(ai) + ∆2 Var(bi)− 2b∆ Cov(ai, bi) (43)

Cov(ai, bi) = Cov(ai, bi)−∆ Var(bi) (44)

If we choose

∆0 =Cov(ai, bi)

Var(bi)

then it is seen from equation 44) that the covariance between the ai and bi is

zero. (This same ∆0 will also minimize the variance of ai, as seen from

equation 43 by taking the derivative of the right hand side wrt. ∆ and

equating it to zero.)

We can therefore conclude, that by a simple shift of the original xi by ∆0 we

can obtain uncorrelated random coefficients.


68

It is difficult to know this ∆0 beforehand. Therefore, it is safest to fit a

parameter describing the correlation between the random intercept and the

random slope if one fits a random coefficient model.

14 Additional R code

14.1 Plot of the Fig. 2

data(leafcalcium, package = "dataRep")

library(lattice)

dotplot(leaf ~ jitter(ca), groups = nr, data = leafcalcium,

xlab = "calcium")

trellis.device(file = paste("fig/leafcalcium-1.pdf",

sep = ""), device = pdf, width = 8, height = 8/2)

data(leafcalcium, package = "dataRep")

library(lattice)

dotplot(leaf ~ jitter(ca), groups = nr, data = leafcalcium,


69

xlab = "calcium")

dev.off()

14.2 Plot of Fig. 3

In the original data set the 9 observation per pig constitute one row. The data

are reshaped so that the weight measurements are contained in one column

called ’weight’ and for the identification of the pig and the week new variables

were created.

pigweight <- get(load("data/pigweight.Rdata"))

pigweightL <- reshape(pigweight, direction = "long",

varying = list(colnames(pigweight)), v.name = "weight",

timevar = "week", idvar = "pigid")

pigweightL <- with(pigweightL, pigweightL[order(pigid,

week), ])


70

print(xyplot(weight ~ week, groups = pigid, data = pigweightL,

type = "l", col = 1))

14.3 Plot of Fig. 4

W <- as.matrix(VarCorr(M.pig)["pigid"][[1]])

residual.sigma2 <- attr(VarCorr(M.pig), "sc")^2

Zt <- as.matrix(M.pig@Zt[1:2, 1:9])

V <- t(Zt) %*% W %*% Zt + residual.sigma2 * diag(9)

d <- diag(1/sqrt(diag(V)))

Cmodel <- d %*% V %*% d

Cempirical <- cor(pigweight)

pdf(file = paste("fig/pigweightCorrelation.pdf", sep = ""),

width = 7, height = 7/1.2, paper = "special")

plot(Cempirical[1, ], type = "b", pch = 1, ylab = "correlation",

xlab = "week", ylim = c(0.5, 1))

lines(Cmodel[1, ], type = "b", lty = 2, pch = 16)

legend(5, 0.9, legend = c("empirical correlation", "model based correlation"),


71

lty = c(1, 2), pch = c(1, 16))

dev.off()

14.4 Correlation matrix from Pearson residuals for table 4

To calculate the correlations from the Pearson residuals of model 24 we build

first a data frame containing the residuals and the variables center, id and visit.

The combination of center and id variable uniquely identify a patient. The data

are reshaped so that the residuals for one patient from the successive visits

form one row.

res <- residuals(M.resp.1, type = "pearson")

dummy <- data.frame(res = res, respiratory[, c("center",

"id", "visit")])

dummyL <- reshape(dummy, direction = "wide", v.names = "res",

idvar = c("center", "id"), timevar = "visit")

COR <- cor(dummyL[, c(paste("res.", 1:4, sep = ""))])

colnames(COR) <- paste("visit.", 1:4, sep = "")


72

rownames(COR) <- paste("visit.", 1:4, sep = "")


GLMM (Ulrich Halekoh)

Documents

Transcript of GLMM (Ulrich Halekoh)