The linear mixed model: introduction and the basic model · 2014. 8. 19. · dard linear model...

39
Department of Data Analysis Ghent University The linear mixed model: introduction and the basic model Yves Rosseel Department of Data Analysis Ghent University Summer School – Using R for personality research August 23–28, 2014 Bertinoro, Italy AED The linear mixed model: introduction and the basic model 1 of 39

Transcript of The linear mixed model: introduction and the basic model · 2014. 8. 19. · dard linear model...

  • Department of Data Analysis Ghent University

    The linear mixed model: introduction and thebasic model

    Yves RosseelDepartment of Data Analysis

    Ghent University

    Summer School – Using R for personality researchAugust 23–28, 2014

    Bertinoro, Italy

    AED The linear mixed model: introduction and the basic model 1 of 39

  • Department of Data Analysis Ghent University

    Contents1 Data with non-independent observations 3

    1.1 Non-independency . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Different types of data with non-independent observations . . . . 41.3 The problem with non-independent observations . . . . . . . . . . 81.4 Ways to deal with non-independent observations . . . . . . . . . . 91.5 The many faces of mixed-effects models . . . . . . . . . . . . . . 111.6 Software packages for mixed-models . . . . . . . . . . . . . . . . 14

    2 The linear mixed model 162.1 The one-way random-effects ANOVA model revisited . . . . . . . 162.2 The structure of the model . . . . . . . . . . . . . . . . . . . . . 232.3 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . 342.4 Inference in a linear mixed model . . . . . . . . . . . . . . . . . 37

    AED The linear mixed model: introduction and the basic model 2 of 39

  • Department of Data Analysis Ghent University

    1 Data with non-independent observations

    1.1 Non-independencyHow is it that non-independence occurs? There are generally two ways in whichthis might happen:

    • the explicit way: data is (by design) collected in a way that implies non-independence; for example

    – students are sampled from many schools (from several school districts)– within-subjects designs

    • the implicit way: some unintended factors ‘spoil’ the data:

    – data on alcohol use (and abuse) in a large (random) sample have beencollected by means of interviews; 5 different interviewers (each withhis/her own style) have been involved in the data collection process

    – data on depression are collected in a large (random) sample over aperiod of ten days; the last two days were rainy

    AED The linear mixed model: introduction and the basic model 3 of 39

  • Department of Data Analysis Ghent University

    1.2 Different types of data with non-independent observationsHierachical data

    • sampling takes place at two or more levels, one nested within the other

    • for example: students within schools within school districts (three levels)

    • for example: patients within clinical centers (two levels)

    • sometimes, the nesting is not perfect: students belong two several schools,patients have visited several clinical centers (this is called: partial nesting,or partial crossing)

    • observations within the same ‘level’ are not independent of one-another

    Clustered data

    • some observations ‘naturally’ belong/cluster together

    • the clusters/groups are randomly sampled; the units within a cluster/groupare automatically included

    AED The linear mixed model: introduction and the basic model 4 of 39

  • Department of Data Analysis Ghent University

    • for example: family members

    • for example: teeth in a mouth

    • special case: dyadic data: pairwise data; for example: man and woman fromthe same couple

    • observations belonging to the same cluster/group are not independent

    Matched data

    • 1 set of observations is sampled randomly; one (or several other) set(s) arechosen is such a way that they ‘match’ the original set

    • for example: case-control studies

    • matching can be on one dimension (for example ‘age’) or on several dimen-sions (‘age’, ‘gender’, ‘iq’, . . . )

    • 1-1 matching, 1-N matching, . . .

    AED The linear mixed model: introduction and the basic model 5 of 39

  • Department of Data Analysis Ghent University

    • observations that are ‘matched’ are somehow similar and therefore not inde-pendent

    Longitudinal data

    • longitudinal data are collected when individuals (or units) are followed overtime

    • the evolution over time is the focus of the data

    • for example: annual data on vocabulary growth among children

    • for example: blood pressure of patients measured every week (for a periodof two years)

    • often considered as ‘hierarchical data’: the individual measurements are thelower level, the individuals are the higher level (but the time points are notalways randomly sampled, but fixed)

    • observations from the same ‘individual’ are not independent of one-another

    AED The linear mixed model: introduction and the basic model 6 of 39

  • Department of Data Analysis Ghent University

    Repeated measures

    • repeated measures are collected when individuals (or units) are measuredunder different (experimental) conditions

    • typical for experimental within-subjects designs

    • time is (generally) not important

    • the experimental conditions are (usually) considered as fixed

    • longitudinal data are often called repeated measures too

    • observations from the same ‘individual’ are not independent of one-another

    AED The linear mixed model: introduction and the basic model 7 of 39

  • Department of Data Analysis Ghent University

    1.3 The problem with non-independent observations• technically: the standard linear model insists that error (and hence) observa-

    tions are independent; if not, the model is not valid

    • conceptually:

    – the accuracy of the estimates (and our confidence in them) is capturedin the standard errors of the regression coefficients;

    – the linear model assumes there are N independent pieces of informa-tion when it computes these standard errors (actually, N − p− 1, oncethe degrees of freedom for the coefficients are subtracted);

    – if observations are correlated, however, there really aren’t N indepen-dent pieces of information, and the estimated standard errors will betoo small

    AED The linear mixed model: introduction and the basic model 8 of 39

  • Department of Data Analysis Ghent University

    1.4 Ways to deal with non-independent observations• Corrected or robust standard errors:

    – if the only concern is to obtain (more) accurate standard errors for astandard regression model, there are ways to correct them using con-ventional regression software (e.g., Huber-White corrected standard er-rors, sometimes referred to as a ‘sandwich estimator’)

    – cfr. the sandwich package– cfr. generalized estimating equations (GEE)

    • The fixed-effects approach:

    – since it is conditional independence that is assumed, if predictors in themodel can account for all sources of correlation between observations,then this assumption will be satisfied;

    – for instance, to control for school effects among J schools, a set of (J-1) dummy codes might be entered into the regression model to ‘covaryout’ the mean differences among the schools

    AED The linear mixed model: introduction and the basic model 9 of 39

  • Department of Data Analysis Ghent University

    – each specific school will have its own fixed (constant) regression coef-ficient

    – if school mean differences are the only source of non-independence ofobservations, then these differences will be controlled, leaving condi-tionally independent residuals that satisfy the assumptions of the stan-dard linear model

    • The mixed-effects approach:

    – same as the fixed-effects approach, but we consider ‘school’ as a ran-dom factor

    – mixed-effects models include more than one source of random varia-tion

    AED The linear mixed model: introduction and the basic model 10 of 39

  • Department of Data Analysis Ghent University

    1.5 The many faces of mixed-effects models• mixed-effects models have been developed in a variety of disciplines, with

    varying names and terminology:

    – random-effects models, random-effects ANOVA (statistics, economet-rics)

    – variance components models (statistics)– hierarchical linear models (education)– multi-level models (sociology, education)– contextual-effects models (sociology)– random-coefficient models (econometrics)– repeated-measures models, repeated measures ANOVA (statistics, psy-

    chology)

    – . . .

    • while essentially similar, the various approaches differ in terms of:

    AED The linear mixed model: introduction and the basic model 11 of 39

  • Department of Data Analysis Ghent University

    – motivation and notation– assumptions concerning the random effects– estimation method

    • the ‘older’ approaches date back to Fisher and Yates’s work on split-plotagricultural experiments

    • the ‘modern’ approaches are more general (for example: they can handleirregular observations, missing data, . . . )

    • some of the main references of the ‘modern’ mixed models approach are:

    – Harville (1977, JASA) reviewed previous work, unified the methodol-ogy, and described iterative ML algorithms

    – Patterson and Thompson (1971, Biometrika) proposed the alternativeREML approach

    – Laird and Ware (1982, Biometrics): their notation is still the standard– Jennrich and Schluchter (1986, Biometrics)– Laird, Lange, and Stram (1987, JASA)

    AED The linear mixed model: introduction and the basic model 12 of 39

  • Department of Data Analysis Ghent University

    – Diggle (1988, Biometrics)– Lindstrom and Bates (1988, JASA)– Jones and Boadi-Boateng (1991, Biometrics)– . . .

    • some of the main references of the use of these mixed models in the be-havioural sciences are:

    – Raudenbush, S.W. & Bryk A.S. (2001, second edition). HierarchicalLinear Models: Applications and Data Analysis Methods. ThousandOaks, Calif.: Sage. (Cfr. HLM software)

    – Goldstein, H. (2002). Multilevel Statistical Models. London: EdwardArnold. (Cfr. Mlwin software)

    – Snijders, T. & Bosker, R. (1999). Multilevel Analysis: An Introductionto Basic and Advanced Multilevel Modeling. Thousand Oaks, Calif.:Sage.

    – Hox, J. (2002). Multilevel Analysis: Techniques and Applications.Mahwah, N.J.: Lawrence Erlbaum Associates.

    AED The linear mixed model: introduction and the basic model 13 of 39

  • Department of Data Analysis Ghent University

    1.6 Software packages for mixed-models• MLwiN 2

    – Goldstein and co-workers– very popular in europe; lots of users in the FPPW– URL: http://www.cmm.bristol.ac.uk/MLwiN/index.shtml

    • HLM 6

    – Raudenbush and Bryk– very popular in the USA– URL: http://www.ssicentral.com/

    • Mplus

    – Bengt Muthén– does structural equation modeling, multilevel modeling and mixture

    modeling all at once

    AED The linear mixed model: introduction and the basic model 14 of 39

  • Department of Data Analysis Ghent University

    – URL: http://www.statmodel.com/

    • SAS (PROC MIXED)

    – often considered the ‘golden standard’– very flexible, extremely good documentation

    • SPSS (MIXED)

    – since version SPSS 14– very basic, poor documentation

    • R

    – the older package nlme is very flexible, but slow and out-dated– the newer package lme4 is extremely fast, state-of-the-art, but not as

    flexible as nlme or SAS PROC MIXED

    AED The linear mixed model: introduction and the basic model 15 of 39

  • Department of Data Analysis Ghent University

    2 The linear mixed model

    2.1 The one-way random-effects ANOVA model revisitedThe model using classic ‘effect’ notation

    • the effect model can be written as follows: for observation j, level i:

    yij = µ+ ai + �ij

    • µ: the population mean (intercept)

    • ai: the (random) effect for level i of the random factor A

    • �ij : the (random) error term

    • the variances σ2a and σ2� are called variance components

    AED The linear mixed model: introduction and the basic model 16 of 39

  • Department of Data Analysis Ghent University

    A toy dataset: 6 subjects, 3 scores per subject> subject score y Data Data

    subject score y1 1 1 102 1 2 113 1 3 124 2 1 105 2 2 126 2 3 147 3 1 128 3 2 139 3 3 1410 4 1 1211 4 2 1412 4 3 1613 5 1 1414 5 2 1515 5 3 1616 6 1 1417 6 2 1618 6 3 18

    AED The linear mixed model: introduction and the basic model 17 of 39

  • Department of Data Analysis Ghent University

    Exploring the data

    • grand mean and variance

    > mean(Data$y) #$

    [1] 13.5

    > var(Data$y) #$

    [1] 4.852941

    • mean/variance per subject

    > with(Data, tapply(y, list(subject=subject), mean))

    subject1 2 3 4 5 6

    11 12 13 14 15 16

    > with(Data, tapply(y, list(subject=subject), var))

    subject1 2 3 4 5 61 4 1 4 1 4

    AED The linear mixed model: introduction and the basic model 18 of 39

  • Department of Data Analysis Ghent University

    • treat ‘subject’ as a fixed factor

    > fit.lm anova(fit.lm)

    Analysis of Variance Table

    Response: yDf Sum Sq Mean Sq F value Pr(>F)

    subject 5 52.5 10.5 4.2 0.01939 *Residuals 12 30.0 2.5---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    • treat ‘subject’ as a random factor (classic approach)

    > fit.aov summary(fit.aov)

    Error: subjectDf Sum Sq Mean Sq F value Pr(>F)

    Residuals 5 52.5 10.5

    Error: WithinDf Sum Sq Mean Sq F value Pr(>F)

    Residuals 12 30 2.5

    AED The linear mixed model: introduction and the basic model 19 of 39

  • Department of Data Analysis Ghent University

    • how can we compute σ2a from this output? note that σ2� = MSE = 2.5, and

    E(MSA) = σ2�+n·σ2a, therefore σ2a = (MSA−MSE)/n = (10.5−2.5)/6 =2.6666

    • treat ‘subject’ as a random factor (modern approach)

    > library(lme4)> fit.lmer summary(fit.lmer)

    Linear mixed model fit by REML ['lmerMod']Formula: y ˜ 1 + (1 | subject)

    Data: Data

    REML criterion at convergence: 73.8866

    Random effects:Groups Name Variance Std.Dev.subject (Intercept) 2.667 1.633Residual 2.500 1.581

    Number of obs: 18, groups: subject, 6

    Fixed effects:Estimate Std. Error t value

    (Intercept) 13.5000 0.7638 17.68

    AED The linear mixed model: introduction and the basic model 20 of 39

  • Department of Data Analysis Ghent University

    The model using ‘modern’ notation

    • the model in Laird-Ware form:

    yij = β0x0ij + b0iz0ij + �ij

    = β0 + b0i + �ij

    • β0: the (fixed) intercept, representing the population mean

    • x0ij = 1: the fixed-effect (constant) regressor

    • b0i: the random-effect coefficient for subject i, representing the deviation ofsubject i from the general mean

    • z0ij = 1 the random-effect (constant) regressor

    • �ij : the (random) error term, representing the deviation of the jth score ofsubject i from the subject mean

    AED The linear mixed model: introduction and the basic model 21 of 39

  • Department of Data Analysis Ghent University

    The variance components

    • the two variance components are denoted by:

    – Var(b0i) = d20 = d2

    – Var(�ij) = σ2� = σ2

    • we typically assume the random-effect coefficients are uncorrelated and nor-mally distributed:

    – b0i ∼ N(0, d2)– �ij ∼ N(0, σ2)

    AED The linear mixed model: introduction and the basic model 22 of 39

  • Department of Data Analysis Ghent University

    2.2 The structure of the modelThe Laird-Ware form of the linear mixed model (two-level):

    yij = β1x1ij + β2x2ij + β3x3ij + . . .+ βpxpij+

    b1iz1ij + b2iz2ij + . . .+ bqzqij+

    �ij

    • yij is the value of the response variable for the jth of ni observations ofindividual/cluster i = 1, 2, . . . , N

    • xi1, . . . , xip are the values of the p regressors for observation i; they are fixedconstants (with respect to repeated sampling), and can be anything (productterms, indicator variables, . . . )

    • in many regression models, a constant term xi0 = 1 is added; β0 is calledthe intercept

    • the regression coefficients β1, . . . , βp are the fixed-effect coefficients, whichare identical for all individuals/clusters

    AED The linear mixed model: introduction and the basic model 23 of 39

  • Department of Data Analysis Ghent University

    • b1i, b2i, . . . , bqi are the random-effect coefficients for individual/cluster i;the random-effect coefficients are thought of as random variables, not asparameters (similar to the errors �ij)

    • z1ij , z2ij , . . . , zqij are the random-effect regressors; they are typically a sub-set of the fixed regressors; in many models, a random intercept term z0ij = 1is added and b0i is called a random intercept

    • �ij is the random error for the jth observation of individual/cluster i

    AED The linear mixed model: introduction and the basic model 24 of 39

  • Department of Data Analysis Ghent University

    Stochastic assumptions of the linear mixed model

    • the usual assumptions for the random-effect coefficients:

    – bki ∼ N(0, d2k)– Var(bki) = d2k (constant across individual/clusters)– Cov(bki, bk′i) = dkk′ (within the same i)– Cov(bki, bk′i′) = i (for i 6= i′)

    • the usual assumptions for the error terms:

    – �ij ∼ N(0, σ2ijj)

    – Var(�ij) = σ2ijj (constant across individual/clusters)– Cov(�ij , �ij′) = σijj′ (within the same i)– Cov(�ij , �i′j′) = 0 (for i 6= i′)– the �ij and bki are uncorrelated

    AED The linear mixed model: introduction and the basic model 25 of 39

  • Department of Data Analysis Ghent University

    Model in matrix notation

    The model Laird-ware model in matrix form:

    yi = Xiβ + Zibi + �i i = 1, 2, . . . , N

    where for observations of the ith individual/cluster:

    • yi is the ni × 1 response vector

    • Xi is the ni × p model matrix for the fixed effects

    • β is the p× 1 vector of fixed-effect coefficients

    • Zi is the ni × q model matrix for the random effects

    • bi is the q × 1 vector of random-effect coefficients

    • �i is the ni × 1 vector of errors

    or for the all individuals/clusters:

    y = Xβ + Zb + �

    AED The linear mixed model: introduction and the basic model 26 of 39

  • Department of Data Analysis Ghent University

    Stochastic assumptions in matrix notation

    • the random-effect coefficients:

    – bi ∼ Nq(0,D)– bi and bi′ are independent for i 6= i′

    • the error terms:

    – �i ∼ Nni(0,Σi)– �i and �i′ are independent for i 6= i′

    – Cov[�i,bi] = 0

    • D contains q(q + 1)/2 parameters; Σi contains ni(ni + 1)/2 elements

    AED The linear mixed model: introduction and the basic model 27 of 39

  • Department of Data Analysis Ghent University

    The variance components

    • the elements of D and the elements of Σi (for each i) are collectively calledthe variance components

    • the elements of D are often parameterized in terms of a smaller number offundamental parameters

    • for hierarchical data, where observations within an individual/cluster aresampled independently, it is often assumed that the error variance is con-stant; in this case Σi simplifies to σ2Ini

    • for longitudinal data, Σi may be specified to capture serial (over-time) cor-relation among the errors

    AED The linear mixed model: introduction and the basic model 28 of 39

  • Department of Data Analysis Ghent University

    Conditional and marginal distributions

    • because the right-hand side of the linear mixed model contains two randomvectors bi and �i, we can consider two different distributions:

    – the conditional distribution:

    yi|bi ∼ N(Xiβ + Zibi, Σi)

    – the marginal distribution:

    yi ∼ N(Xiβ,Vi)

    whereCov[yi] = Vi = ZiDZ

    ′i + Σi

    • ZiDZ′i represents the random-effects structure: it implies a covariance struc-ture of a very specific form

    • combining the N clusters, we get

    Cov[y] = V = ZDZ′ + ΣAED The linear mixed model: introduction and the basic model 29 of 39

  • Department of Data Analysis Ghent University

    where Cov[y] contains N(N + 1)/2 elements

    • a prime reason for having random effects in a model is to simplify the other-wise difficult task of specifying theN(N+1)/2 distinct elements of Cov[y];without using random effects we would have to deal with elements of Cov[y]being a variety of forms; but with random factors we can conveniently dealwith variances and covariances attributable to factors acknowledged to beaffecting the data

    AED The linear mixed model: introduction and the basic model 30 of 39

  • Department of Data Analysis Ghent University

    A toy dataset: 6 subjects, 3 scores per subject

    • at the subject level:

    Xi =

    111

    , and Zi = 11

    1

    D =[d2], and Σi =

    σ2 0 00 σ2 00 0 σ2

    Vi = ZiDZ′i + Σi =

    d2 d2 d2d2 d2 d2d2 d2 d2

    + σ2 0 00 σ2 0

    0 0 σ2

    =

    σ2 + d2 d2 d2d2 σ2 + d2 d2d2 d2 σ2 + d2

    AED The linear mixed model: introduction and the basic model 31 of 39

  • Department of Data Analysis Ghent University

    • full model:

    X =

    111111111111111111

    , and Z =

    1 0 0 0 0 01 0 0 0 0 01 0 0 0 0 00 1 0 0 0 00 1 0 0 0 00 1 0 0 0 00 0 1 0 0 00 0 1 0 0 00 0 1 0 0 00 0 0 1 0 00 0 0 1 0 00 0 0 1 0 00 0 0 0 1 00 0 0 0 1 00 0 0 0 1 00 0 0 0 0 10 0 0 0 0 10 0 0 0 0 1

    AED The linear mixed model: introduction and the basic model 32 of 39

  • Department of Data Analysis Ghent University

    V =

    V1 0 0 0 0 00 V2 0 0 0 00 0 V3 0 0 00 0 0 V4 0 00 0 0 0 V5 00 0 0 0 0 V6

    AED The linear mixed model: introduction and the basic model 33 of 39

  • Department of Data Analysis Ghent University

    2.3 Parameter estimationEstimating the fixed coefficients β

    • if we assume that D and Σ and hence V are known, the weighted least-squares estimator of β using a symmetric weight matrix W is given by

    β̂ = (X′WX)−1 X′Wy

    where the weight matrix is given by:

    W = V−1 = Cov[y]−1

    • if V is known, β̂ is normally distributed with mean β; the variances are thediagonal elements of the variance-covariance matrix:

    Cov(β̂) = [X′V−1X]−1

    • if V is unknown, it is replaced by its (current) estimate V̂

    AED The linear mixed model: introduction and the basic model 34 of 39

  • Department of Data Analysis Ghent University

    • if V is unknown, the variances of β̂ underestimate the correct variabilitybecause the variability introduced by estimating the variance components isnot taken into account

    • in practice, one often accounts of this downward bias by using approximatet- and F -statistics for testing hypotheses about β (see inference for fixedeffects)

    AED The linear mixed model: introduction and the basic model 35 of 39

  • Department of Data Analysis Ghent University

    Estimating the variance components

    • historically, several methods have been proposed: the general analysis ofvariance method, Henderson’s method I, II and III, MINQUE, MIVQUE,and I-MINQUE

    • two methods used in the ‘modern’ approach are:

    – maximum likelihood (ML)– restricted maximum likelihood (REML) estimation methods

    both are iterative methods

    AED The linear mixed model: introduction and the basic model 36 of 39

  • Department of Data Analysis Ghent University

    2.4 Inference in a linear mixed modelInference for fixed effects

    • all tests of the fixed effects can be expressed by a linear hypothesis of theform

    H0 : Lβ = 0

    • L is a h × (p + 1) matrix; each row in L specifies a linear combination ofthe parameters to be tested

    • a common statistic is the so-called Wald F -statistic:

    F =(Lb)′

    [L(X′V−1X)−1L′

    ]−1(Lb)

    h

    • unfortunately, despite its name, this statistic does not follow anF -distribution(due to the fact that V is unknown)

    • one approach is to assume that F approximates an F -distribution, with hdegrees of freedom for the nominator, and a to be estimated number of de-nominator degrees of freedom

    AED The linear mixed model: introduction and the basic model 37 of 39

  • Department of Data Analysis Ghent University

    • several methods are available to estimate the denominator degrees of free-dom:

    – the containment method (often the default in software)– the Satterthwaite’s (1946) approximation– the Kenward and Roger (1997) approximation– . . .

    • this ‘problem’ has received little attention in the literature, presumably be-cause in the context of, say, longitudinal data, sample sizes are huge, and thep-values are often very similar under different approximation methods

    • for relatively small datasets (for example: experimental within-subjects de-sign), this problem can not be ignored!

    • another alternative is to use a Bayesian approach

    AED The linear mixed model: introduction and the basic model 38 of 39

  • Department of Data Analysis Ghent University

    Inference for variance components

    • a typical hypothesis is that the variance of a random coefficient equals zero:

    H0 : d2k = 0

    • many software packages report Wald tests for all variance components

    – unfortunately, Wald tests assume asymptotic normality of the parame-ter estimates, and this is not satisfied for the variance components, duethe ‘boundary problem’

    – for example, the value of d2k = 0 is on the boundary of the parameterspace, and in this case, classical MLE results don’t hold in this context

    • alternatively, one can use likelihood ratio tests

    – again, the asymptotic chi-squared null-distribution is not valid, due theboundary problem

    – however, some results are available for the case Σi = σ2Ini

    AED The linear mixed model: introduction and the basic model 39 of 39