GLMM (Ulrich Halekoh)
-
Upload
gina-torres -
Category
Documents
-
view
175 -
download
1
Transcript of GLMM (Ulrich Halekoh)
Generalized Linear Mixed Models (GLMM)
Ulrich Halekoh
Unit of Statistics and Decision Analysis
Faculty of Agricultural Sciences, University of Aarhus
April 3, 2007
Printed: April 3, 2007 File: lmer.tex
2
Contents1 Examples for correlated data 4
1.1 Bivariate measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Repeated measurements - sub-sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Compound symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3 Random coefficient models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Repeated measurements and random coefficient models 21
3 Two stage model formulation 25
4 Model fitting and estimation 25
5 Model comparison 35
6 Prediction and residuals 41
7 Fixed effect vs. random effect 44
8 Generalized linear mixed models 458.1 Working example – respiratory illness . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
9 Correlated Pearson residuals 519.1 Generalized linear model formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 539.2 Model fit and estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
10 Model comparison 58
11 Prediction and residuals 60
12 Covariances and correlation* 6112.1 Rules for computing covariances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6112.2 Covariances of random vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
12.2.1 The covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6312.2.2 From the covariance to the correlation matrix . . . . . . . . . . . . . . . . . . . 64
Regression analysis with R April 3, 2007
3
13 Random coefficients and the positioning of the random intercept* 65
14 Additional R code 6814.1 Plot of the Fig. 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6814.2 Plot of Fig. 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6914.3 Plot of Fig. 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7014.4 Correlation matrix from Pearson residuals for table 4 . . . . . . . . . . . . . . . . . . . 71
Regression analysis with R April 3, 2007
4
1 Examples for correlated data
In this section we look at some reasons why data are correlated and the
consequences for the variance of estimates based on correlated data.
1.1 Bivariate measurements
Example 1.1 The Fig. 1 shows the measurements of the height of fathers
against the height of their sons.
data(fatherson, package = "dataRep")
plot(father ~ son, data = fatherson, xlab = "height son [inch]",
ylab = "height father [inch]")
Regression analysis with R April 3, 2007
5
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●●
●●●
●● ●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●●●
●●
●●
● ●
●
●●
●●
●●
●
●●
●
● ●
●
●●
●●●●●
● ●●●●●
●●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●●●
●
●
●●
●
●●●●●
●● ●●
●●
●
●
●
●●●●
●● ●
●●●
●●
●●●
●●
●●●
●●
●
●
●
●
● ●
●
●●
●●
●
●
●●●
●●
●
●
●
●●●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●●●
●●●
●●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●● ●
●●
●●●●
●
●●
●●
●
●
●●●
●●
●●
●
●
●●●●
●●
●
●
●
●●
●●
●●●●
●●
●●
●●
●
●
●●
●●
●●
●●●
●●●
●
● ●●
●
●●●
●
●●
●
●●
●
●●
●●●
● ●
●●●
●●●●
●● ●
●
●
●●
●●
●
●
●●
●
●
●●●●
●●
●●
●●
●●
●
●
●
●
●●
●●
●●●
●
●●
●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●●
●
●
●
●
●●●
●●●
●
●
●
●
●
●
●●
●●●
● ●●
●
●●
●
●
●
●
●●●
●●
●
●
●
●
●
●●
●●●●●
●●●●●
●●
●
●
●●
●●●
●
●
●●
●●
●●
●●
●
●●●
●
●●
●●●●
● ●●●●
●
●●●
●
●●
●
●
●
●●
●●●
●
●
●●
●●●●
●●●
●●●
●●●●
●●
●
●
●
●●
●
●●●
● ●●
●●
●●
●
●●●
●●●
●●
●●
●
●●●●
●
●
●● ●
●●●●
●
●● ●
●
●
●●
●●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●●
●
●●
●
●●
●
●
●●
●●
●●●
●●●●
●
●
●●
●
●
●●
●●
●●
●
●
●
●
● ●
●●●●
●
●
●
●● ●
●●●●
●
●
●●
●
●
●
●●
●●
●●●●
●
●●
●●
●●
●
●
●●
●
●●●●
●
●●●
●
●
●
●
●
●
● ●
●●●
●
●●●
●●
●● ●●
●●
●●
●●
●●
●●
●
●
●
●●
●●
●●
●
●●●
●●
●●
●●
●
●
●●
●
●
●●
●
●●
●● ●●●●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●
●●
●
●●
●
●●
●●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●●
●
●
●●●●●●●●
●●●
●●●
●
●●
●●● ●
●●●●
● ●●●●●
●●● ●●●●
●● ●●
●
●
●
●
●●
●
●
●
●●●●
●
●●●
●
●●
●●
●●●●●●
●●
●●
●●
●
●
●●●
●
● ●
●● ●●
●
●
●●
●●●●
●●
●●
●
●
●●
●●
●●●
●●●
●●●
●●●
●
●
●
●
●
●
●●
● ●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
60 65 70 75
6065
7075
height son [inch]
heig
ht fa
ther
[inc
h]
Figure 1: Height of father and sons.
The height of the son provides information about the height of the father.
Both measurements are not independent and they are said to be correlated. �
The example is a case where an observation Y is a bivariate vector of two
measurement variables Y = (Y1, Y2). In the example Y1 is the height of the
father and Y2 the height of the son. Often one can assume that these two
Regression analysis with R April 3, 2007
6
variables are normally distributed, with means µk and variances σ2k
Yk ∼ N(µk, σ2k) k = 1, 2
The dependence between the two variables is expressed by their covariance
Cov(Y1, Y2) = σ12. The common distribution for the two variables is a
bivariate normal distribution with the mean vector µ = (µ1, µ2) and the
covariance matrix Σ
Y ∼ N(µ,Σ)
where
Σ =
σ21 σ12
σ21 σ22
.
This matrix is symmetric because σ12 = σ21. Dividing the covariance by the
square root of the variances one obtains the correlation between the two
variables
Corr(Y1, Y2) = σ12 =Cov(Y1, Y2)√
Var(Y1) Var(Y2).
The correlation lies always in the interval [−1, 1].
Regression analysis with R April 3, 2007
7
The correlation can be estimated by Pearson’s correlation coefficient
r =
∑i(yi1 − y.1)(yi2 − y.2)√∑
i(yi1 − y.1)2 ·∑
i(yi2 − y.2)2
Example 1.2 The correlation between the height measurements is
with(fatherson, cor(father, son))
[1] 0.501338 �
1.2 Repeated measurements - sub-sampling
Example 1.3 In an experiment the mean calcium content of the leafs on a
plant should be determined. A random sample of four leaves was obtained and
for each leaf 4 independent chemical analyses were made. The measurements
are given in Fig. 2 (Rcode 14.1).
Regression analysis with R April 3, 2007
8
calcium
leaf
1
2
3
4
2.8 3.0 3.2 3.4
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Figure 2: Calcium content of leafs.
Interest lies in the following questions
1. The typical calcium content of a leaf.
2. The variation of the calcium content between leafs (due for example to
different leaf sizes) (the leaf-to-leaf or between-leaf variability)
3. The measurement uncertainty of the chemical analysis (the within-leaf)
variability
There is an obvious leaf-to-leaf variability (the measurements of leaf 3 are
lower than all the others). The within-leaf variability seems to be be equal
from leaf to leaf. �
Regression analysis with R April 3, 2007
9
The observations on the same leaf are correlated. An indication of this is given
by the empirical correlation between the observations.
The data are given in ’long’ format, where each observations is given in one
row.
data(leafcalcium)
head(leafcalcium)
leaf nr ca
1 1 1 3.28
2 1 2 3.09
3 1 3 3.03
4 1 4 3.03
5 2 1 3.52
6 2 2 3.48
We collect the measurements for one leaf in one row by reshaping the data:
w <- reshape(leafcalcium, direction = "wide", v.name = "ca",
timevar = "nr", idvar = "leaf")
leaf ca.1 ca.2 ca.3 ca.4
1 1 3.28 3.09 3.03 3.03
Regression analysis with R April 3, 2007
10
5 2 3.52 3.48 3.38 3.38
9 3 2.88 2.80 2.81 2.76
13 4 3.34 3.38 3.23 3.26
The following table shows the correlation between the four measurements. It is
calculated as the Person correlation between the first to fourth measurement.
Note that any other order of the measurements per leaf is equally possible and
therefore there is some arbitrariness in this calculation of the correlation. The
correlation between the measurements on a leaf is a result of the common
random individual level of the leaf for the four measurements per leaf. We will
show that by analyzing a model for these data.
cor(w[, -1])
ca.1 ca.2 ca.3 ca.4
ca.1 1.000000 0.952861 0.959217 0.964239
ca.2 0.952861 1.000000 0.993087 0.998671
ca.3 0.959217 0.993087 1.000000 0.996860
ca.4 0.964239 0.998671 0.996860 1.000000
We formulate a model for the data that will reflect the sources of variation.
We make the following definitions
Regression analysis with R April 3, 2007
11
• yij denote the measurement j = 1, . . . , 4 for leaf i = 1, . . . , 4.
• β represents the typical value of the calcium content.
• Li ∼ N(0, σ2L) is the random deviation of a specific leaf from that typical
value. The variance σ2L describes the leaf-to-leaf variation.
• εij ∼ N(0, σ2) denotes the chemical measurement error and the variance
σ2 the within-leaf variation (also called measurement variation or residual
error variation)
The observations yij are assumed to follow the model
yij = β + Li + εij . (1)
This model contains one fixed effect, the mean β and the random effect Li
and the residual measurement error εij .
We make the assumption that Li are independent, εij are independent and
that Li are independent from εij .
The common random effect Li is responsible for the observation on the same
leaf to be correlated.Regression analysis with R April 3, 2007
12
1.2.1 Compound symmetry
For the model the correlation between the measurements on the same leaf are
given as
Cov(yij , yik) = Cov(β + Li + εij , β + Li + εik) (2)
= Cov(Li, Li) + Cov(Li, εik) + Cov(εij , Li) + Cov(εijεik)(3)
= Cov(Li, Li) = σ2L (4)
and
Var(yij , yij) = Cov(β + Li + εij , β + Li + εij) (5)
= Cov(Li, Li) + Cov(εij , εij) (6)
= σ2L + σ2 (7)
If we collect the four observations yi = (yi1, . . . , yi4) in the vector yi the
observations are from the multivariate normal distribution with mean vector
µ = (µ1, µ2, µ3, µ4)
Regression analysis with R April 3, 2007
13
and covariance matrix
Cov(yi) = Σ =
σ2
L + σ2 σ2L σ2
L σ2L
σ2L σ2
L + σ2 σ2L σ2
L
σ2L σ2
L σ2L + σ2 σ2
L
σ2L σ2
L σ2L σ2
L + σ2
(8)
This is a matrix where all the off-diagonal elements are equal. Such a
covariance structure is called compound or exchangeable symmetry. If we
collect all the observations into one large vector y = (y1, y2, y3, y4) the
covariance matrix becomes block diagonal matrix, where the off-block elements
are equal to 0 matrices of zeros indicating that the observations from different
leaves are independent:
Cov(y) =
Σ 0 0 0
0 Σ 0 0
0 0 Σ 0
0 0 0 Σ
. (9)
Regression analysis with R April 3, 2007
14
1.3 Random coefficient models
Example 1.4 In Fig. 3 (Rcode 14.2) the development of pigs in 9 successive
weeks are given, measurements of the same pig are connected by a line.
week
wei
ght
20
40
60
80
2 4 6 8
Figure 3: Growth of 48 pigs.
The interest of the study was the estimation of typical the growth rate of the
population of pigs. Three features are apparent:
• The growth curves appear to be linear in the observed period.
Regression analysis with R April 3, 2007
15
• Pigs which are largest at the beginning of the study are largest throughout.
This effect is known as ’tracking’.
• The variation between pigs in the first week is smaller than in week 9. This
is sometimes called the ’fan’ effect. It may be explained by individually
different growth rates. Even though pigs have similar weights at the start
of the study, pigs with a larger growth rate will end up with higher weights
than those with a low growth rate.
�
In the example there are repeated measurements for each pig over time. That
the observations form the same pig are correlated can be seen by calculating
the empirical correlation between the observations for all the 48 pigs (Table 1).
data(pigweight, package = "dataRep")
cor(pigweight)
Regression analysis with R April 3, 2007
16
Table 1: Empirical correlation between the measurements from a pig.
week1 week2 week3 week4 week5 week6 week7 week8 week9
week1 1.00 0.92 0.80 0.80 0.75 0.71 0.66 0.63 0.56
week2 0.92 1.00 0.91 0.91 0.88 0.84 0.78 0.71 0.66
week3 0.80 0.91 1.00 0.96 0.93 0.91 0.84 0.82 0.77
week4 0.80 0.91 0.96 1.00 0.96 0.93 0.87 0.83 0.79
week5 0.75 0.88 0.93 0.96 1.00 0.92 0.85 0.81 0.79
week6 0.71 0.84 0.91 0.93 0.92 1.00 0.96 0.93 0.89
week7 0.66 0.78 0.84 0.87 0.85 0.96 1.00 0.96 0.92
week8 0.63 0.71 0.82 0.83 0.81 0.93 0.96 1.00 0.97
week9 0.56 0.66 0.77 0.79 0.79 0.89 0.92 0.97 1.00
A random effect for pig would account for individual weight levels of the pigs
and describe to a certain extend the ’tracking’ effect.
Additional to this effect one should account for individual growth rates of the
Regression analysis with R April 3, 2007
17
pigs, which can be mathematically expressed by individual slopes.
We formulate the following model for the weight yitj for pig i and week tj :
yij = µ + ui + (β + bi)tj + εij , i = 1, . . . , 48, j = 1, . . . , 9 (10)
The parameters and their meaning
• µ The average weight of all pigs at time t = 0.
• ui ∼ N(0, σ2u) describe the pig-individual random deviation from the
mean weight at time t = 0,
• β is the mean growth rate across all pigs,
• bi ∼ N(0, σ2b ) the pig-individual random deviations from the mean slope
β,
• εit ∼ N(0, σ2) the residual error.
• Cov(ui, bi) = σub.
We make as before the assumption that both ui and bi are independent from
the residual error. But we cannot assume beforehand that also ui and bi are
Regression analysis with R April 3, 2007
18
independent. This is because a simple shift in the time variable (e.g. adding
some constant value to the week) will change the correlation of these random
effects. (For an explanation see Section 13). An assumption of independence
of these two random effects is approximately granted if the x-value (here the
week variable) where there is least variation between the pigs is close to 0.
The correlation of the observations per pig can now be calculated by noting
that
Var(yi,t, yi,t) = σ2u + t2σ2
b + 2t Cov(ui, bi) + σ2 (11)
Cov(yi,1, yi,t) = σ2u + tσ2
b + (t + 1) Cov(ui, bi) t = 2, . . . , 9 (12)
In Fig. 4 the empirical correlation already given in the first line of Tab. 1 are
compared to the model based correlations (for the R-code see section 14.3.) It
is apparent that in contrast to the compound symmetry, the correlation is
decreasing over time.
Regression analysis with R April 3, 2007
19
●
●
● ●
●
●
●
●
●
2 4 6 8
0.5
0.6
0.7
0.8
0.9
1.0
week
corr
elat
ion
●
●
●
●
●
●
●
●
●
●
●
empirical correlationmodel based correlation
Figure 4: Empirical and model based correlations Corr(yi1, yi,week) for mea-
surements for the same pig i
Example 1.5 Another example of a random coefficient model is given by an
experiment about the influence of two substances thiouracil and thyroxinil on
the growth of rats in comparison to a control group.
Data preparation
Regression analysis with R April 3, 2007
20
data(ratsbodyweight, package = "dataRep")
rat <- reshape(ratsbodyweight, direction = "long", varying = list(paste("week",
c(0:4), sep = "")), v.name = "weight", idvar = "ratid",
timevar = "week", time = c(0:4))
rat <- transform(rat, treat = factor(treat))
Plot of the data
library(lattice)
print(xyplot(weight ~ week | treat, groups = ratid,
data = rat, type = "l"))
week
wei
ght
50
100
150
0 1 2 3 4
control
0 1 2 3 4
thiouracil
0 1 2 3 4
thyroxin
Figure 5: Growth of rats.
The main question in this experiment would be whether the growth rate
Regression analysis with R April 3, 2007
21
(=slope of growth) is related to the treatment.
A model for the data assuming linear growth
yij = µ + ui + αtreat(i) (13)
+βweekj + βtreat(i)weekj + biweekj + εij
where i indexes the rat, j the week and treat=control, thiouracil, thyroxinil.
The random effects are the ui which adds to the observations of each animal a
random intercept and bi which adds a random slope for each rat.
�
2 Repeated measurements and random
coefficient models
A linear model where some of the parameters are normally distributed random
variables is called a random coefficients model. It can be generally formulated
Regression analysis with R April 3, 2007
22
as
yij =
p∑k=1
βkxijk +
q∑l=1
bilzijl + εij (14)
The yij is the normally distributed response for subject (or individual) i, the
time tj (of the repeated measurement on the individual). The covariates split
up into those for fixed and random effects:
• xijk is the value of the k-th covariate with the fixed parameter βk
• zijl is the covariate for the random variable bil. The bi = (bi1, . . . , biq)
are assumed to be normally distributed with mean 0 and a covariance Σb:
bi ∼ N(0,Σb)
The errors εij are independently normally distributed
εij ∼ N(0, σ2)
In matrix notation the model can be written more concisely
yi ∼ Xiβ + Zibi + εi (15)
Regression analysis with R April 3, 2007
23
The ni observations on the i = 1, . . . , m subjects are correlated because of
the common random variables bi
Cov(yi) = ZiΣbZ>i + I
where I is the m×m identify matrix.
A fundamental assumption in this model is that the observations between
different subjects i and i are independent i.e.
Cov(yij , yi,j) = 0 for i 6= i.
Example 2.1 The leaf example is an instance of a random-component model,
with the leaf as the random component. For these models the Zijk simplify to
Zijl = Zil = 1
In the leaf example i = 1, . . . , 4(= number of leaves), l = 1, . . . , 4(= number
of chemical determinations) the Zi is a vector with as many 1’s as the number
of chemical determinations:
Zi = (1, 1, 1, 1)> (16)
Regression analysis with R April 3, 2007
24
�
Example 2.2 In the growth of rats example we have a random coefficient
model where the matrix Zi has two column, the first represents the random
intercept and the second the week:
Zi =
1 1
1 2
1 3
1 4
(17)
�
Regression analysis with R April 3, 2007
25
3 Two stage model formulation
A two stage formulation, where one first describes the distribution of yi given
the random parameters and then the distribution of the random parameters bi
is convenient for generalizations to models with other distributions for yi than
the normal distribution.
• 1. stage:
yi|bi ∼ N(µi, Ini×ni), (18)
µi = Xiβ + Zibi. (19)
• 2. stage:
bi ∼ N(0,Σ). (20)
Additionally, one assumes that the vectors bi are independent. The residuals εi
have not disappeared. They are now included in the description of the
distribution in Eq. (18) of yi|bi. That the εij are independent is expressed by
using the identity matrix as the covariance matrix in yi|bi.
Regression analysis with R April 3, 2007
26
4 Model fitting and estimation
The fitting of a model and the estimation of the model parameters is based on
the log-likelihood.
One drawback of the maximum-likelihood estimate for variance components is
that they are biased. A common example is the maximum likelihood estimate
of the variance for n independent observations yi, say
σ2ML =
1
n
n∑i=1
(yi − yi)2 (21)
which has the divisor n. The commonly used estimator
σ2REML =
1
n− 1
n∑i=1
(yi − yi)2 (22)
is unbiased, i.e. E(σ2REML) = σ2.
A general procedure to obtain unbiased estimates of the variance-components
is to use the restricted maximum likelihood (REML) method. This method
proceeds in three steps
Regression analysis with R April 3, 2007
27
1. Transform the original observations, such that the new data are
independent on the fixed effects.
2. Maximize the likelihood in the variance components for these new data.
3. Get the fixed effects parameters from the original likelihood plugging in the
variance-component estimates.
The main argument in favor of REML-estimation is that for balanced designs
its results are the same as those obtained from an alternative classical
estimation method, ANOVA, for variance components. The unbiasedness
aspect is not so important, because even though an estimator is unbiased for
σ2 it will not be so for the standard deviation σ.
Example 4.1 One function to fit a mixed linear model is the function lmer of
the package lme4 on CRAN.
The model 1 for the leaf data is fitted by
library(lme4)
m.leaf <- lmer(ca ~ 1 + (1 | leaf), data = leafcalcium)
Linear mixed-effects model fit by REML
Regression analysis with R April 3, 2007
28
Formula: ca ~ 1 + (1 | leaf)
Data: leafcalcium
AIC BIC logLik MLdeviance REMLdeviance
-14.6 -13 9.28 -20.7 -18.6
Random effects:
Groups Name Variance Std.Dev.
leaf (Intercept) 0.072379 0.26903
Residual 0.006602 0.08125
number of obs: 16, groups: leaf, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 3.166 0.136 23.3
In the table below Fixed effects: the estimates for the fixed effects
parameters are given, similar to the table from the glm-function. You obtain
the coefficients with the extractor function fixef
fixef(m.leaf)
(Intercept)
3.16563
In the table below Random effects: the estimates for the variance
Regression analysis with R April 3, 2007
29
parameters are given. Here leaf refers to σ2L, the between leaf variance, and
Residual to σ2. Note, the values in column Std.Dev. (e.g. 0.269) are just
the square root of the estimates for the variance parameter (e.g. 0.072) and
not their standard errors.
The estimate of the variance parameters can be obtained by
VarCorr(m.leaf)
$leaf
1 x 1 Matrix of class "dpoMatrix"
(Intercept)
(Intercept) 0.0723791
attr(,"sc")
scale
0.0812534
where the sc attribute is the square root of the estimate for the residual
variance σ2 �
None of the above tables provide a confidence interval for the parameters or a
test of the hypothesis that the parameter is equal to zero.
Regression analysis with R April 3, 2007
30
This was a deliberate decision of the package maintainer D. Bates, because in
more complicated models random effects models the distribution of the
corresponding test-statistics is not clarified. One strategy is to assume that the
parameter estimates are approximately t-distributed wit the number of freedom
equal to the residual degrees of freedom. For some balanced designs these
number of degrees of freedom is too large, and one obtains a too liberal test
(e.g. too small p-values). A different strategy is based on using the Bayesian
approach in combination with simulation. For the fixed effects parameters and
the variance parameters one assumes ’uninformative’-priors. Based on the data
one can then simulate via a general method known as Markov Chain Monte
Carlo (MCMC). parameters values from the posterior distribution of the
parameters. This posterior distribution can be used to quantities, that are
similar to confidence intervals and hypothesis tests. Conceptually, the intervals
and test are different from the intervals based on the likelihood, but in many
cases they agree closely and in the present context they may give a better
picture of the uncertainty.
Confidence intervals and simple hypothesis test are available in our function
coeftable.lmer available in the package glmfun.
Regression analysis with R April 3, 2007
31
coeftable.lmer(m.leaf)
Estimate StdErr Wald95lower Wald95upper Pr(>|t|)
(Intercept) 3.1656 0.136 3.05305 3.2782 0
Example 4.2 For the rat-growth data the model 13 is fitted by:
M.rat <- lmer(weight ~ treat + week + treat:week + (1 +
week | ratid), data = rat)
library(glmfun)
coeftable.lmer(M.rat, nsim = 10000)
$fixed
Estimate MCMCmean Wald95lower Wald95upper
(Intercept) 52.8800 52.8909 51.136010 54.62399
treatthiouracil 4.8200 4.7989 2.353625 7.28637
treatthyroxin -0.7943 -0.7835 -3.512098 1.92353
week 26.4800 26.4834 25.429785 27.53022
treatthiouracil:week -9.4300 -9.4244 -10.915228 -7.94477
treatthyroxin:week 0.6629 0.6362 -0.973785 2.29950
HPD95lower HPD95upper Pr(>|t|) pMCMC
(Intercept) 48.4996 57.4297 0.0000 0.0001
Regression analysis with R April 3, 2007
32
treatthiouracil -1.4708 10.9219 0.1054 0.1218
treatthyroxin -7.8713 5.9849 0.8078 0.8236
week 23.8513 29.2751 0.0000 0.0001
treatthiouracil:week -13.2650 -5.5798 0.0000 0.0002
treatthyroxin:week -3.3931 4.9217 0.7360 0.7646
$random
MCMCmean HPD95lower HPD95upper
sigma 4.3452879 3.689706 5.075945
ratd.(In) 5.9515461 3.885008 9.244061
ratd.week 3.9753688 2.849381 5.613429
rtd.(I).wek -0.0988601 -0.562515 0.430946
In the $random component the row rtd.(I).wek displays the estimate of the
correlation between the random intercept ui and the random slopes bi (see Eq.
13 The confidence intervals for the MCMC sample are a bit larger than those
based on the t-distribution. One reason is, that the MCMC-based intervals
reflect better the uncertainty induced by estimating the variance parameters.
The default estimation procedure is REML. Estimates using the maximum
likelihood method are obtained by setting the method='ML' argument:
M.rat.ML <- lmer(weight ~ treat + week + treat:week +
Regression analysis with R April 3, 2007
33
(1 + week | ratid), data = rat, method = "ML")
reml <- coeftable.lmer(M.rat)[, c(1, 2)]
ml <- coeftable.lmer(M.rat.ML)[, c(1, 2)]
tab <- cbind(reml, ml)
colnames(tab) <- paste(colnames(tab), rep(c("REML",
"ML"), each = 2), sep = "")
EstimateREML StdErrREML EstimateML StdErrML
(Intercept) 52.8800 2.0903 52.8800 1.9710
treatthiouracil 4.8200 2.9562 4.8200 2.7874
treatthyroxin -0.7943 3.2576 -0.7943 3.0715
week 26.4800 1.2588 26.4800 1.1867
treatthiouracil:week -9.4300 1.7802 -9.4300 1.6783
treatthyroxin:week 0.6629 1.9617 0.6629 1.8494
The parameter estimates are the same, but the ML based standard errors are a
bit smaller than the REML based ones (similar to 21 and 22). �
Example 4.3 The fit of model for the pig weights is obtained by (the data
have first to be reshape into ’long’ format such that the weight measurements
form one column):
Regression analysis with R April 3, 2007
34
library(lme4)
pigweight <- get(load("data/pigweight.Rdata"))
pigweightL <- reshape(pigweight, direction = "long",
varying = list(colnames(pigweight)), v.name = "weight",
timevar = "week", idvar = "pigid")
pigweightL <- with(pigweightL, pigweightL[order(pigid,
week), ])
M.pig <- lmer(weight ~ 1 + week + (1 + week | pigid),
data = pigweightL)
and the table for the fixed effects is given by
coeftable.lmer(M.pig)[, 1:2]
Estimate StdErr
(Intercept) 19.3556 0.4039
week 6.2099 0.0920
The estimates for the variances are given by
Regression analysis with R April 3, 2007
35
VarCorr(M.pig)["pigid"]
$pigid
2 x 2 Matrix of class "dpoMatrix"
(Intercept) week
(Intercept) 6.989555 -0.103670
week -0.103670 0.379924
where the estimate for σ2u = 6.99, σ2
b = 0.38 and their covariance
σub = −0.104. The residual variance σ2 is obtained as
attr(VarCorr(M.pig), "sc")^2
scale
1.59679
�
5 Model comparison
The comparison of models may be performed by likelihood-ratio tests.
Regression analysis with R April 3, 2007
36
The distribution of the difference of the likelihoods between two nested model
M0 ⊂ M1 is asymptotically χ2 distributed with p1 − p0 degrees of freedom,
where p1 and p0 are number of model parameters. Here one has to count also
the variance-covariance parameters of the model.
The quality of the χ2 approximation of the distribution of the log likelihood
ratio statistic is different for tests of the fixed and the random components.
Example 5.1 Fixed effect In the rat growth example a reduced model of Eq.
(13) would be the model of no treatment effect. i.e.
E(yij) = µ + ui + βweekj + biweekj (23)
The likelihood-ratio test between model 13 and 23 is performed with the
anova function Comparing two models with different fixed effects one must fit
the models with the ’ML’, not the ’REML’ method.
M.rat <- lmer(weight ~ treat + week + treat:week + (1 +
week | ratid), data = rat, method = "ML")
M.rat.reduc <- lmer(weight ~ week + (1 + week | ratid),
Regression analysis with R April 3, 2007
37
data = rat, method = "ML")
anova(M.rat, M.rat.reduc)
Data: rat
Models:
M.rat.reduc: weight ~ week + (1 + week | ratid)
M.rat: weight ~ treat + week + treat:week + (1 + week | ratid)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
M.rat.reduc 5 930.5 945.0 -460.2
M.rat 9 912.9 939.1 -447.5 25.54 4 3.92e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
�
Example 5.2 Random effect-removing a covariance In the rat-growth
example the model for the random effects per rat i was
(ui, bi) ∼ N(0,Σ2,2)
Regression analysis with R April 3, 2007
38
with
Σ2,2 =
σ2u σu,b
σ2u,b σ2
b
In the analysis (see example 4.2) we saw that the confidence interval for the
covariance σub contained 0, and one may consider the simpler model of no
correlation
H0 : σbu = 0.
We fit both models and compare them with a likelihood-ratio test:
M.rat <- lmer(weight ~ treat + week + treat:week + (1 +
week | ratid), data = rat)
M.rat.simple <- lmer(weight ~ treat + week + treat:week +
(1 | ratid) + (0 + week | ratid), data = rat)
(Note, by writing separately 1|ratid and 0+week|ratid we assume
independence between these two random effects. Forgetting the ’0’ R would
automatically add a ’1’ and one would have the larger model. (Here one need
not to use a ’ML’ fit).
Regression analysis with R April 3, 2007
39
anova(M.rat, M.rat.simple)
Data: rat
Models:
M.rat.simple: weight ~ treat + week + treat:week + (1 | ratid) + (0 + week |
M.rat: ratid)
M.rat.simple: weight ~ treat + week + treat:week + (1 + week | ratid)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
M.rat.simple 8 911.4 934.6 -447.7
M.rat 9 913.2 939.3 -447.6 0.187 1 0.665
The large p-value corroborates our observation, that the two random effect
may be treated as independent. �
Example 5.3 Random effect-removing a variance Assuming the simple
model for the covariance σbu = 0 we now consider the hypothesis whether
H0 : σ2b = 0.
In contrast to the previous test on a covariance parameter, which may take
negative and positive values, the current test is on a parameter that can take
only positive values. Its value under the hypothesis lies on the boundary of the
Regression analysis with R April 3, 2007
40
parameter space. This causes problems for the χ2 approximation of the
distribution of the likelihood-ratio statistic. A recommendation is to adjust the
p-value by dividing it by 2.
M.rat.simple <- lmer(weight ~ treat + week + treat:week +
(1 | ratid) + (0 + week | ratid), data = rat)
M.rat.simple.RED <- lmer(weight ~ treat + week + treat:week +
(1 | ratid), data = rat)
anova(M.rat.simple, M.rat.simple.RED)
Data: rat
Models:
M.rat.simple.RED: weight ~ treat + week + treat:week + (1 | ratid)
M.rat.simple: weight ~ treat + week + treat:week + (1 | ratid) + (0 + week |
M.rat.simple.RED: ratid)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
M.rat.simple.RED 7 977.2 997.5 -481.6
M.rat.simple 8 911.4 934.6 -447.7 67.85 1 <2e-16
M.rat.simple.RED
M.rat.simple ***
---
Regression analysis with R April 3, 2007
41
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is so small, that and adjustment has no implication on the
conclusion. �
The following general rules hold for testing in mixed linear models:
1. Testing for reduction in fixed effects:
• The χ2 approximation is liberal, i.e. p-values tend to be too small.
• The model must not be fit with the REML-method, because the
REML-likelihood is dependent on the parameterization.
2. Random effects:
• The χ2 approximation is conservative, i.e. p-values tend to be too
large.
• In testing that a variance parameter is equal to zero one should divide
the p-value by 2.
Regression analysis with R April 3, 2007
42
6 Prediction and residuals
The random parameters in a linear mixed model are realizations of random
variables and not parameters in the usual meaning, as the β or the elements of
the covariance matrix. Nevertheless it is possible to obtain estimates for them.
One prefers to speak of prediction. The most common method to calculating
these are known as best unbiased linear predictions (BLUP).
Example 6.1 In the rat example there are two sets of random parameters
• the random intercept ui,
• the random slopes βi.
Both are available in the list returned by ranef:
pred <- ranef(M.rat)
pred <- pred[[1]]
Both can be plotted (Fig. 6) for checking whether the assumption of their
normal assumption is tenable. pred is a list with one element which is selected
by pred[[1]]Regression analysis with R April 3, 2007
43
par(mfrow = c(1, 3))
qqnorm(pred[, "(Intercept)"], main = "Random intercept")
qqline(pred[, "(Intercept)"])
qqnorm(pred$week, main = "Random slope")
qqline(pred$week)
qqnorm(residuals(M.rat), main = "Residuals")
qqline(residuals(M.rat))
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2 −1 0 1 2
−5
05
Random intercept
Theoretical Quantiles
Sam
ple
Qua
ntile
s ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2 −1 0 1 2
−6
−4
−2
02
46
Random slope
Theoretical Quantiles
Sam
ple
Qua
ntile
s
●
●
●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
−2 −1 0 1 2
−5
05
Residuals
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Figure 6: Q-Q plots for the random intercepts and slopes.
The residuals play a similar role as the random parameters. They can be
considered to be random parameters on the lowest level. They should also be
Regression analysis with R April 3, 2007
44
checked for normality. �
Based on the predictions for the random parameters one can then obtain
predictions for the expected values of the observations themselves.
Example 6.2 In the rat growth example the predictions are obtained by
evaluating Eq. (13) at the values of the covariates and the random coefficients
using the fitted function.
fitted(M.rat)
�
7 Fixed effect vs. random effect
One alternative to fitting a model where the subject effect is random would be
to use it as a fixed factor. This approach has its pros and cons.
Disadvantage of fixed effects models:
Regression analysis with R April 3, 2007
45
• It is not possible to include covariates that that are subject specific
because their variation is contained in the subject to subject effect.
• The model will not allow to model dependencies between measurements
within a subject and therefore the variance for the means comparison
between subjects is not properly inflated.
• Many clusters require the fit of a large number of parameters (one for each
cluster). The parameter estimated may become instable. Often one is not
interested in the specific value for a cluster.
Arguments for a fixed effects approach
1. With very few clusters, the assumption if the normality of the cluster
means if often not feasible.
2. The interest lies more on the specific subjects rather than on the
population of the subjects. One should add, that if the experiment only
contains a few clusters, than by the very design of the experiment, the
experimenter were obviously not interested to represent the population or
subject to subject variation
Regression analysis with R April 3, 2007
46
8 Generalized linear mixed models
In the preceding section the observations were assumed to be normally
distributed. We now generalize to observations that come from distributions of
generalized linear models.
8.1 Working example – respiratory illness
Example 8.1 The data are from a clinical trial of patients with respiratory
illness, where 111 patients from two different clinics were randomized to
receive either placebo or an active treatment. Patients were examined at
baseline and at four visits during treatment. At each examination, respiratory
status (categorized as 1 = good, 0 = poor) was determined.
• The recorded variables are:
Center (1,2), ID, Treatment (A=Active, P=Placebo), Gender
(M=Male,F=Female), Age (in years at baseline), Baseline Response.
• The response variables are:
Regression analysis with R April 3, 2007
47
Visit 1 Response, Visit 2 Response, Visit 3 Response, Visit 4 Response.
Data for 8 patients are shown in Table 2.
Table 2: Respiratory data for eight individuals. Measurements on the same
individual tend to be alike.
center id treat sex age baseline visit1 visit2 visit3 visit4
1 1 1 P M 46 0 0 0 0 0
2 1 2 P M 28 0 0 0 0 0
3 1 3 A M 23 1 1 1 1 1
4 1 4 P M 44 1 1 1 1 0
5 1 5 P F 13 1 1 1 1 1
6 1 6 A M 34 0 0 0 0 0
7 1 7 P M 43 0 1 0 1 1
8 1 8 A M 28 0 0 0 0 0
Regression analysis with R April 3, 2007
48
Plotting the mean outcome across the four visits for each patient against age
shows a parabolic trend. (The mean outcome across visits is the proportion of
the positive responses of a patient).
mean outcome vs. age
age
mea
n.ou
tcom
e
0.0
0.2
0.4
0.6
0.8
1.0
10 20 30 40 50 60 70
●
● ●
●
● ●
●
● ●
● ● ●
●
●
● ●
●
●
●● ●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ● ●
treatment 1 treatment 2●
Figure 7: Mean outcome across visits against age for each patient.
Regression analysis with R April 3, 2007
49
Interest is in comparing the treatments, but also to include center, age, gender
and baseline response in the model.
From Table 2 it is clear, that there is a dependency among the response
measurements on the same person – measurements on the same person tend to
be alike.
This dependency must be accounted for in the modeling. �
Example 8.2 A first approach is to ignore the dependency. This approach is
not appropriate but illustrative.
Let yiv denote the response measured on the ith person at visit v, where
v = 1, . . . , 4 Since the response outcomes are binary, yiv ∈ {0, 1}, it is
tempting to consider the binomial distribution as basis for the modeling. That
is, to assume that yiv ∼ bin(1, µiv) and that all yiv are independent.
As specification of µiv we consider in the following the linear predictora
a We used (age/10)2 = age2/100 to get a parameter estimate about the range of the
others for better reporting in a table. If β7 is the parameter for (age/10)2 then the parameter
β7 for age2 would be β7 = β7/100.
Regression analysis with R April 3, 2007
50
ηiv = logit(µiv)
M1 : ηiv = β1 + β2baselinei + β3;center(i) + β4;sex(i) (24)
+β5;treat(i) + β6agei + β7(age/10)2i
Note that the expression for logit(µiv) does not include the visit v. We will
write this briefly as
logit(µ) = baseline + center + sex + treat + age + (age/10)2
data(respiratory, package = "dataRep")
respiratory <- transform(respiratory, age2 = (age/10)^2,
center = factor(center), visit = factor(visit))
M.resp.1 <- glm(outcome ~ baseline + center + sex +
treat + age + age2, data = respiratory, family = binomial)
Table 3 contains the parameter estimates under the model.
Regression analysis with R April 3, 2007
51
Table 3: Parameter estimates when assuming independence
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.87 0.96 4.0 0.000
baseline 1.89 0.25 7.6 0.000
center2 0.51 0.25 2.1 0.038
sexM −0.45 0.32 −1.4 0.154
treatP −1.32 0.24 −5.4 0.000
age −0.21 0.05 −4.4 0.000
age2/100 0.26 0.06 4.1 0.000
�
Regression analysis with R April 3, 2007
52
9 Correlated Pearson residuals
Based on the fitted (wrong) independence model we can calculate the Pearson
residuals
rP ;iv =yiv − µiv√µiv(1− µiv)
, i = 1, . . . , N, v = 1, . . . , 4
which under the model approximately have mean 0 and variance 1.
From these we can estimate the correlation matrix (Table 4) showing
correlations between measurements at the different visits on the same
individual (Rcode 14.4).
Table 4: Correlation matrix based on Pearson residuals.
visit.1 visit.2 visit.3 visit.4
visit.1 1.000 0.351 0.240 0.297
visit.2 0.351 1.000 0.343 0.277
visit.3 0.240 0.343 1.000 0.362
visit.4 0.297 0.277 0.362 1.000
If the observations were independent then the true (i.e. theoretical) correlations
Regression analysis with R April 3, 2007
53
should be zero.
The correlations in Table 4 are estimated so even if observations were
independent, the estimated correlations would not necessarily be zero – but
they should be close to.
There is a clear indication in Table 4 that the correlations tend to be positive.
The task in the following is to account for dependency between measurements
on the same individual.
9.1 Generalized linear model formulation
Example 9.1 The patients may have individually different response
probabilities. To account for these we will include a patient effect in the linear
predictor. Because the patient were randomly selected we include the patient
effect as a normally distributed random component.
The classical random part of a generalized linear model is now formulated
conditionally on the random effects.
yi,v|si ∼ bin(1, µiv) (25)Regression analysis with R April 3, 2007
54
and the linear predictor in the systematic part is written
G1 : ηiv = β1 + β2baselinei + β3,center(i) + β4;sex(i) (26)
+β5;treat(i) + β6agei + β7(agei/10)2 + si;patient (27)
One has to add the assumptions on the distribution of the random effects
si ∼ N(0, σ2s), (28)
si are independently distributed. (29)
�
The general formulation of a generalized linear mixed model (GLMM) following
the Section 3
yij |bij ∼ F (µij), independent for different j, (30)
ηi = Xiβ + Zibi, (31)
bi ∼ N(0,Σ), independent for different i (32)
where F is a distribution from the exponential family, and the expectation µi is
linked to the predictor via the link-function h(µij) = ηij .
Regression analysis with R April 3, 2007
55
One has therefore in addition to the random part in Eq. (30) a second random
specification in Eq. (32).
Regression analysis with R April 3, 2007
56
9.2 Model fit and estimation
The model is fit by maximizing the likelihood. For generalized linear models
there is no longer a closed representation of the likelihood as for the linear
model with normally distributed observations. The calculation involves
numerical integration which may be difficult to solve for large data with many
random effects. The REML approach is not available.
Example 9.2 The lmer function is used for the model fitting specifying the
distribution of the response via the family argument. I should be noted that
the id variable does not uniquely identify a patient as it starts from 1 within
each center. Therefore, we define a variable patid which uniquely identifies a
patient.
respiratory <- transform(respiratory, age2 = (age/10)^2,
patid = interaction(center, id))
G.resp.1 <- lmer(outcome ~ baseline + center + sex +
treat + age + age2 + (1 | patid), data = respiratory,
family = binomial)
Regression analysis with R April 3, 2007
57
The lmer will fit the model by using the numerical approximation called
’Laplace’. A less accurate but sometimes numerically more stable alternative is
the ’PQL’ method.
The table of coefficients with (optional) MCMC-based confidence intervals is
obtained by
tab <- coeftable.lmer(G.resp.1, nsim = 0)
Estimate StdErr Wald95lower Wald95upper Pr(>|z|)
(Intercept) 5.5358 1.8689 1.872746 9.198915 0.0031
baseline 2.8603 0.4961 1.888026 3.832553 0.0000
center2 0.7615 0.4997 -0.217882 1.740785 0.1275
sexM -0.6081 0.6466 -1.875347 0.659209 0.3470
treatP -2.0117 0.4874 -2.966924 -1.056457 0.0000
age -0.3073 0.0946 -0.492802 -0.121789 0.0012
age2 0.3892 0.1301 0.134164 0.644330 0.0028
The estimates of the variance component σ2s describing the patients variability
is obtained as
vv <- VarCorr(G.resp.1)
$patid
Regression analysis with R April 3, 2007
58
1 x 1 Matrix of class "dpoMatrix"
(Intercept)
(Intercept) 3.18672
attr(,"sc")
scale
1
The list element vv$id is the estimate of σ2s and the attribute attr(vv,"sc")
is the scale or dispersion parameter.
�
10 Model comparison
The model comparison is via the likelihood-ratio test, assuming that the
difference in log-likelihoods are asymptotically χ2 distributed.
Regression analysis with R April 3, 2007
59
Example 10.1 We consider the model where the age effect has been removed
ηiv = G0 : β1 + β2baselinei + β3,center(i) + β4;sex(i) (33)
+β5;treat(i)si;patient (34)
This model is compared to model G1 from Eq. (26) via a LR-test:
G.resp.1 <- lmer(outcome ~ baseline + center + sex +
treat + age + age2 + (1 | patid), data = respiratory,
family = binomial)
G.resp.0 <- lmer(outcome ~ baseline + center + sex +
treat + (1 | patid), data = respiratory, family = binomial)
anova(G.resp.1, G.resp.0)
Data: respiratory
Models:
G.resp.0: outcome ~ baseline + center + sex + treat + (1 | patid)
G.resp.1: outcome ~ baseline + center + sex + treat + age + age2 + (1 |
G.resp.0: patid)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
G.resp.0 6 449.0 473.6 -218.5
G.resp.1 8 440.9 473.7 -212.4 12.09 2 0.00237 **
---
Regression analysis with R April 3, 2007
60
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
�
11 Prediction and residuals
Example 11.1 The fitted values on the link-scale are available with the
fitted function.
fit <- fitted(G.resp.1)
Residuals for GLMM-models are not jet implemented in the lmer function.
Pearson residuals can be obtained via our residu.lmer function of our
package glmfu.
library(glmfun)
res.p <- residu.lmer(G.resp.1)
Finally, the BLUP’s of the random effects are available via
Regression analysis with R April 3, 2007
61
raf <- ranef(G.resp.1)
and should be checked for normality (see Fig. 8)
rand.patient <- raf[[1]][, "(Intercept)"]
qqnorm(rand.patient)
qqline(rand.patient)
●
●●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
● ●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
−2 −1 0 1 2
−2
−1
01
23
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Figure 8: Q-Q plots for the random intercept.
�
Regression analysis with R April 3, 2007
62
12 Covariances and correlation*
12.1 Rules for computing covariances
Let X, Y, V and W be some random variables and a and b some constants.
The the following rules hold:
• The covariance between a constant and a random variable is zero
Cov(a, X) = 0.
• The covariance between variables multiplied by a constant
Cov(aX, bY ) = a · b Cov(X, Y ).
• The covariance between sums of random variables is the sum of the
pairwise covariances
Cov(X+Y, V +W ) = Cov(X, V )+Cov(X, W )+Cov(Y, V )+Cov(Y, W ).
From these rules one may derive the specific rules for the variance.
Regression analysis with R April 3, 2007
63
• The covariance between a variable with itself is the variance
Cov(X, X) = Var(X).
• The variance of a linearly transformed variable
Var(a + bX) = b2 Var(X)
.
The correlation between two random variables is defined as
Corr(X, Y ) =Cov(X, Y )√
Cov(X)√
Cov(Y )(35)
One has the special result that a linear transformation of a variable from X to
a + bX does not change the correlation:
Corr(X, Y ) = Corr(a + bX, Y ). (36)
Especially, the multiplication of a variable does not change the correlation.
Therefore, measuring a variable in ’meter’ or ’centimeter’ yields the same
correlation.
Regression analysis with R April 3, 2007
64
12.2 Covariances of random vectors
12.2.1 The covariance matrix
We consider the covariance between the three random variables X, Y, V which
we collect in the random vector m = (X, Y, V ) then one defines the
correlation of the vector m as the matrix Σ of all the pairwise correlations:
Cov(m) = Σ =
Var(X) Cov(X, Y ) Cov(X, V )
Cov(Y, X) Var(Y ) Cov(Y, V )
Cov(V, X) Cov(V, Y ) Var(V )
(37)
Because covariances are symmetric Cov(X, V ) = Cov(V, X) the covariance
matrix is a symmetric matrix.
If you transform a vector m by some matrix B, then the corresponding
covariance matrix is given as
Cov(Bm) = BΣB> (38)
where B> denotes the transposed matrix.Regression analysis with R April 3, 2007
65
12.2.2 From the covariance to the correlation matrix
You compute the correlation matrix from the covariance matrix by first
collection the square root of the variances (or diagonal) elements of Σ in a
diagonal matrix
D =√
diag(Σ) =
√
Var(X) 0 0
0√
Var(Y ) 0
0 0√
Var(V )
(39)
and then calculating
Corr(m) = D−1ΣD−1 (40)
Regression analysis with R April 3, 2007
66
13 Random coefficients and the positioning of
the random intercept*
T his section explains, why in a random coefficient model with random
intercept and random slopes a simple shift of the independent variable can
essentially change the correlation between the intercept and the slope. and it is
therefore recommended to to model a correlation between both.
We assume the following simple random intercept model
yit = µ + ai + bixt + εit (41)
where µ is the mean value, ai ∼ N(0, σ2a is the random intercept,
bi ∼ N(0, σ2b the random slope and εit the residuals errors with variance σ2.
We allow for the time being that ai and bi are correlated, with covariance
Cov(ai, bi) 6= 0.
We make now a simple shift of the x-values by an amount of ∆ and define the
new regressor variable zt as
zt = xt + ∆
Regression analysis with R April 3, 2007
67
The model equation 42 changes to
yit = µ + (ai − bi∆) + bi(xt + ∆) + εit = µ + ai + bizt + εit (42)
where the new ai is
ai = ai − bi∆
We can now compute the variance of ai and its covariance with bi:
Var(ai) = Var(ai) + ∆2 Var(bi)− 2b∆ Cov(ai, bi) (43)
Cov(ai, bi) = Cov(ai, bi)−∆ Var(bi) (44)
If we choose
∆0 =Cov(ai, bi)
Var(bi)
then it is seen from equation 44) that the covariance between the ai and bi is
zero. (This same ∆0 will also minimize the variance of ai, as seen from
equation 43 by taking the derivative of the right hand side wrt. ∆ and
equating it to zero.)
We can therefore conclude, that by a simple shift of the original xi by ∆0 we
can obtain uncorrelated random coefficients.
Regression analysis with R April 3, 2007
68
It is difficult to know this ∆0 beforehand. Therefore, it is safest to fit a
parameter describing the correlation between the random intercept and the
random slope if one fits a random coefficient model.
14 Additional R code
14.1 Plot of the Fig. 2
data(leafcalcium, package = "dataRep")
library(lattice)
dotplot(leaf ~ jitter(ca), groups = nr, data = leafcalcium,
xlab = "calcium")
trellis.device(file = paste("fig/leafcalcium-1.pdf",
sep = ""), device = pdf, width = 8, height = 8/2)
data(leafcalcium, package = "dataRep")
library(lattice)
dotplot(leaf ~ jitter(ca), groups = nr, data = leafcalcium,
Regression analysis with R April 3, 2007
69
xlab = "calcium")
dev.off()
14.2 Plot of Fig. 3
In the original data set the 9 observation per pig constitute one row. The data
are reshaped so that the weight measurements are contained in one column
called ’weight’ and for the identification of the pig and the week new variables
were created.
pigweight <- get(load("data/pigweight.Rdata"))
pigweightL <- reshape(pigweight, direction = "long",
varying = list(colnames(pigweight)), v.name = "weight",
timevar = "week", idvar = "pigid")
pigweightL <- with(pigweightL, pigweightL[order(pigid,
week), ])
Regression analysis with R April 3, 2007
70
print(xyplot(weight ~ week, groups = pigid, data = pigweightL,
type = "l", col = 1))
14.3 Plot of Fig. 4
W <- as.matrix(VarCorr(M.pig)["pigid"][[1]])
residual.sigma2 <- attr(VarCorr(M.pig), "sc")^2
Zt <- as.matrix(M.pig@Zt[1:2, 1:9])
V <- t(Zt) %*% W %*% Zt + residual.sigma2 * diag(9)
d <- diag(1/sqrt(diag(V)))
Cmodel <- d %*% V %*% d
Cempirical <- cor(pigweight)
pdf(file = paste("fig/pigweightCorrelation.pdf", sep = ""),
width = 7, height = 7/1.2, paper = "special")
plot(Cempirical[1, ], type = "b", pch = 1, ylab = "correlation",
xlab = "week", ylim = c(0.5, 1))
lines(Cmodel[1, ], type = "b", lty = 2, pch = 16)
legend(5, 0.9, legend = c("empirical correlation", "model based correlation"),
Regression analysis with R April 3, 2007
71
lty = c(1, 2), pch = c(1, 16))
dev.off()
14.4 Correlation matrix from Pearson residuals for table 4
To calculate the correlations from the Pearson residuals of model 24 we build
first a data frame containing the residuals and the variables center, id and visit.
The combination of center and id variable uniquely identify a patient. The data
are reshaped so that the residuals for one patient from the successive visits
form one row.
res <- residuals(M.resp.1, type = "pearson")
dummy <- data.frame(res = res, respiratory[, c("center",
"id", "visit")])
dummyL <- reshape(dummy, direction = "wide", v.names = "res",
idvar = c("center", "id"), timevar = "visit")
COR <- cor(dummyL[, c(paste("res.", 1:4, sep = ""))])
colnames(COR) <- paste("visit.", 1:4, sep = "")
Regression analysis with R April 3, 2007
72
rownames(COR) <- paste("visit.", 1:4, sep = "")
Regression analysis with R April 3, 2007