R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

34
IInnttrroodduuccttiioonn ooff MMiixxeedd eeffffeecctt mmooddeell Learning by simulation Supstat Inc. Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1 1 of 34 6/13/14, 9:51 PM

description

NYC Data Science Academy, NYC Open Data Meetup, Big Data, Data Science, NYC, Vivian Zhang, SupStat Inc, R programming,mixed effect analysis

Transcript of R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

Page 1: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

IInnttrroodduuccttiioonn ooff MMiixxeedd eeffffeecctt mmooddeellLearning by simulation

Supstat Inc.

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

1 of 34 6/13/14, 9:51 PM

Page 2: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

OOuuttlliinneeWhat is mixed effect model

Fixed effect model

Mixed effect model

General Mixed effect model

Case study

·

·

·

Random Intercept model

Random Intercept and Slope Model

-

-

·

·

2/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

2 of 34 6/13/14, 9:51 PM

Page 3: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

What is mixed effect model

3/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

3 of 34 6/13/14, 9:51 PM

Page 4: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

CCllaassssiiccaall nnoorrmmaall lliinneeaarr mmooddeellFormation:

Yi = b0 + b1*Xi + ei

Yi is response from suject i.

Xi are covariates.

b0, b1 are parameters that we want to estimate.

ei are the random terms in the model, and are assumped to be independently and indenticallydistributed from Normal(0,1). It is very important that there is no stucuture in ei and itrepresents the variations that could not be controled in our studies.

·

·

·

·

4/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

4 of 34 6/13/14, 9:51 PM

Page 5: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

VViioollaattiioonn ooff iinnddeeppeennddeennccee aassssuummppaattiioonn..In many cases, responses are not independent from each other. These data usualy have somecluster stucture.

We need new tools - Mixed effect model.

Repeated measures, where measurements are taken multiple times from the same sujects.(clustered by subject)

A survey of all the family memebers. (clustered by family)

A survey of students from 20 classrooms in a high school. (clustered by classroom)

Longitudial data, or known as the panel data, where several responses are collected from thesame sujects along the time. (clustered by subject)

·

·

·

·

5/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

5 of 34 6/13/14, 9:51 PM

Page 6: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

MMiixxeedd eeffffeecctt mmooddeellMixed effect model = Fixed effect + Random effect

Fixed effects

Random effect

·

expected to have a systematic and predictable influence on your data.

exhaust “the levels of a factor”.Think of sex(male/femal).

-

-

·

expected to have a non-systematic, unpredictable, or “random” influence on your data.

Random effects have factor levels that are drawn from a large population, but we do notknow exactly how or why they differ.

-

-

6/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

6 of 34 6/13/14, 9:51 PM

Page 7: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

EExxaammppllee ooff FFiixxeedd eeffffeeccttss aanndd RRaannddoomm eeffffeeccttssFIXED EFFECTS RANDOM EFFECTS

Male or female Individuals with repeated measures

Insecticide sprayed or not Block within a field

Upland or lowland Brood

One country versus another Split plot within a plot

Wet versus dry Family

7/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

7 of 34 6/13/14, 9:51 PM

Page 8: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

Fixed effect model

8/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

8 of 34 6/13/14, 9:51 PM

Page 9: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

FFiixxeedd eeffffeecctt mmooddeellFixed effect model is just the linear model that you maybe already know.

Yi = b0 + b1*Xi + ei

1<i<n n is number of sample

Yi: Response Variable

b0: fixed intercept

b1: fixed slope

Xi: Explanatory Variable (fixed effect)

ei: noise (error)

·

·

·

·

·

9/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

9 of 34 6/13/14, 9:51 PM

Page 10: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

DDaattaa ggeenneerraattiioonn ooff ffiixxeedd eeffffeecctt mmooddeellset.seed(1)# genaerate xx <- seq(1,5,length.out=100)# generate errornoise <- rnorm(n=100,mean=0,sd=1)b0 <- 1b1 <- 2# generate yy <- b0 + b1*x + noise

10/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

10 of 34 6/13/14, 9:51 PM

Page 11: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

DDaattaa ggeenneerraattiioonn ooff ffiixxeedd eeffffeecctt mmooddeellplot(y~x)

11/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

11 of 34 6/13/14, 9:51 PM

Page 12: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

CCooooeeffffiicciieenntt eessttiimmaattiioonn ooff ffiixxeedd eeffffeecctt mmooddeellmodel <- lm(y~x)summary(model)

Call:lm(formula = y ~ x)

Residuals: Min 1Q Median 3Q Max -2.3401 -0.6058 0.0155 0.5851 2.2975

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.1424 0.2491 4.59 1.3e-05 ***x 1.9888 0.0774 25.70 < 2e-16 ***---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.903 on 98 degrees of freedomMultiple R-squared: 0.871, Adjusted R-squared: 0.869 F-statistic: 660 on 1 and 98 DF, p-value: <2e-16 12/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

12 of 34 6/13/14, 9:51 PM

Page 13: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

pplloott ooff ffiixxeedd eeffffeecctt mmooddeellplot(y~x)abline(model)

13/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

13 of 34 6/13/14, 9:51 PM

Page 14: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

Mixed effect model

14/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

14 of 34 6/13/14, 9:51 PM

Page 15: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

RRaannddoomm IInntteerrcceepptt mmooddeellthere are i people, and we repeat measure j times for every people. These poeple are individuallydifferent which we don't know, so there are random effect cause by people, and there are anotherrandom noise cause by measure for every people.

Yij = b0 + b1*Xij + bi + eij

b0: fixed intercept

b1: fixed slope

Xij: fixed effect

bi: random effect(influence intercept)

eij: noise

·

·

·

·

·

15/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

15 of 34 6/13/14, 9:51 PM

Page 16: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

DDaattaa ggeenneerraattiioonn ooff RRaannddoomm IInntteerrcceepptt mmooddeellb0 <- 9.9b1 <- 2# repeat measure times for 6 peoplen <- c(13, 14, 14, 15, 12, 13)npeople <- length(n)set.seed(1)# generate x(fixed effect)x <- matrix(rep(0, length=max(n) * npeople),ncol = npeople)for (i in 1:npeople){ x[1:n[i], i] <- runif(n[i], min = 1, max = 5) x[1:n[i], i] <- sort(x[1:n[i], i])}# random effectbi <- rnorm(npeople, mean = 0, sd = 10)

16/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

16 of 34 6/13/14, 9:51 PM

Page 17: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

DDaattaa ggeenneerraattiioonn ooff RRaannddoomm IInntteerrcceepptt mmooddeellxall <- NULLyall <- NULLpeopleall <- NULLfor (i in 1:npeople){ xall <- c(xall, x[1:n[i], i]) # combine x # generate y y <- rep(b0 + bi[i], length = n[i]) + b1 * x[1:n[i],i] + rnorm(n[i], mean = 0, sd = 2) # noise yall <- c(yall, y) # combine y people <- rep(i, length = n[i]) peopleall <- c(peopleall, people)}# final datasetdata1 <- data.frame(yall,peopleall,xall)

17/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

17 of 34 6/13/14, 9:51 PM

Page 18: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

CCooooeeffffiicciieenntt eessttiimmaattiioonn ooff RRaannddoomm IInntteerrcceeppttmmooddeelllibrary(nlme)# xall is fixed effect# bi influence intercept of modellme1 <- lme(yall~xall,random=~1|peopleall,data=data1)summary(lme1)

Linear mixed-effects model fit by REML Data: data1 AIC BIC logLik 358 368 -175

Random effects: Formula: ~1 | peopleall (Intercept) ResidualStdDev: 7.3 1.77

Fixed effects: yall ~ xall Value Std.Error DF t-value p-value(Intercept) 3.60 3.041 74 1.18 0.24xall 1.61 0.186 74 8.67 0.00

18/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

18 of 34 6/13/14, 9:51 PM

Page 19: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

PPlloott ooff RRaannddoomm IInntteerrcceepptt mmooddeell

19/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

19 of 34 6/13/14, 9:51 PM

Page 20: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

RRaannddoomm IInntteerrcceepptt aanndd ssllooppee mmooddeellYij = b0 + (b1+si)*Xij + bi + eij

b0: fixed intercept

b1: fixed slope

X: fixed effect

bi: random effect(influence intercept)

eij: noise

si: random effect(influence slope)

·

·

·

·

·

·

20/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

20 of 34 6/13/14, 9:51 PM

Page 21: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

DDaattaa ggeenneerraattiioonn ooff RRaannddoomm IInntteerrcceepptt aannddssllooppee mmooddeella0 <- 9.9a1 <- 2n <- c(12, 13, 14, 15, 16, 13)npeople <- length(n)set.seed(1)si <- rnorm(npeople, mean = 0, sd = 0.5) # random slopex <- matrix(rep(0, length = max(n) * npeople), ncol = npeople)for (i in 1:npeople){ x[1:n[i], i] <- runif(n[i], min = 1, max = 5) x[1:n[i], i] <- sort(x[1:n[i], i])}

21/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

21 of 34 6/13/14, 9:51 PM

Page 22: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

DDaattaa ggeenneerraattiioonn ooff RRaannddoomm IInntteerrcceepptt aannddssllooppee mmooddeellbi <- rnorm(npeople, mean = 0, sd = 10) # random interceptxall <- NULLyall <- NULLpeopleall <- NULLfor (i in 1:npeople){ xall <- c(xall, x[1:n[i], i]) y <- rep(a0 + bi[i], length = n[i]) + (a1 + si[i]) * x[1:n[i],i] + rnorm(n[i], mean = 0, sd = 0.5) yall <- c(yall, y) people <- rep(i, length = n[i]) peopleall <- c(peopleall, people)}# generate final datasetdata2 <- data.frame(yall, peopleall, xall)

22/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

22 of 34 6/13/14, 9:51 PM

Page 23: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

CCooooeeffffiicciieenntt eessttiimmaattiioonn ooff RRaannddoomm IInntteerrcceeppttaanndd ssllooppee mmooddeell# bi influence intercept and slope of modellme2 <- lme(yall~xall,random=~1+xall|peopleall,data=data2)print(summary(lme2))

Linear mixed-effects model fit by REML Data: data2 AIC BIC logLik 179 194 -83.6

Random effects: Formula: ~1 + xall | peopleall Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr (Intercept) 11.593 (Intr)xall 0.464 0.044 Residual 0.445

Fixed effects: yall ~ xall Value Std.Error DF t-value p-value(Intercept) 13.42 4.74 76 2.83 0.0059

23/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

23 of 34 6/13/14, 9:51 PM

Page 24: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

PPlloott ooff RRaannddoomm IInntteerrcceepptt aanndd ssllooppee mmooddeell

24/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

24 of 34 6/13/14, 9:51 PM

Page 25: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

wwhhaatt iiff wwee jjuusstt uussee lliinneeaarr mmooddeellcomplete pooling·

# wrong estimationlm1 <- lm(yall~xall,data=data2)summary(lm1)

Call:lm(formula = yall ~ xall, data = data2)

Residuals: Min 1Q Median 3Q Max -17.80 -6.27 -3.67 2.19 24.33

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.86 3.72 1.84 0.06874 . xall 4.31 1.15 3.76 0.00032 ***---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11 on 81 degrees of freedom25/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

25 of 34 6/13/14, 9:51 PM

Page 26: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

wwhhaatt iiff wwee jjuusstt uussee lliinneeaarr mmooddeellno pooling·

# wrong estimation and waste too many freedom and we don't care about the exact different of people. we just need to clm2 <- lm(yall~xall+factor(peopleall)+xall*factor(peopleall),data=data1)summary(lm2)

Call:lm(formula = yall ~ xall + factor(peopleall) + xall * factor(peopleall), data = data1)

Residuals: Min 1Q Median 3Q Max -2.983 -1.194 0.054 1.092 4.238

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 18.818 1.342 14.02 < 2e-16 ***xall 0.929 0.413 2.25 0.028 * factor(peopleall)2 -14.014 1.876 -7.47 1.8e-10 ***factor(peopleall)3 -16.480 2.140 -7.70 7.1e-11 ***factor(peopleall)4 -18.790 1.913 -9.82 9.8e-15 ***

26/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

26 of 34 6/13/14, 9:51 PM

Page 27: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

General Mixed effect model

27/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

27 of 34 6/13/14, 9:51 PM

Page 28: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

LLooggiissttiicc MMiixxeedd eeffffeecctt mmooddeellYij = exp(eta)/(1+exp(eta))

eta = b0 + b1*Xij + bi + eij

b0: fixed intercept

b1: fixed slope

X: fixed effect

bi: random effect(influence intercept)

eij: noise

·

·

·

·

·

28/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

28 of 34 6/13/14, 9:51 PM

Page 29: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

DDaattaa ggeenneerraattiioonn ooff LLooggiissttiicc MMiixxeedd eeffffeeccttmmooddeellb0 <- - 6b1 <- 2.1set.seed(1)n <- c(12, 13, 14, 15, 16, 13)npeople <- length(n)x <- matrix(rep(0, length = max(n) * npeople), ncol = npeople)bi <- rnorm(npeople, mean = 0, sd = 1.5)for (i in 1:npeople){ x[1:n[i], i] <- runif(n[i], min = 1,max = 5) x[1:n[i], i] <- sort(x[1:n[i], i])}

29/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

29 of 34 6/13/14, 9:51 PM

Page 30: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

DDaattaa ggeenneerraattiioonn ooff LLooggiissttiicc MMiixxeedd eeffffeeccttmmooddeellxall <- NULLyall <- NULLpeopleall <- NULLfor (i in 1:npeople){ xall <- c(xall, x[1:n[i], i]) y <- NULL for(j in 1:n[i]){ eta1 <- b0 + b1 * x[j, i] + bi[i] y <- c(y, rbinom(n = 1, size = 1, prob = exp(eta1)/(exp(eta1) + 1))) } yall <- c(yall, y) people <- rep(i, length = n[i]) peopleall <- c(peopleall, people)}data3 <- data.frame(xall, peopleall,yall)

30/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

30 of 34 6/13/14, 9:51 PM

Page 31: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

CCooooeeffffiicciieenntt eessttiimmaattiioonn ooff LLooggiissttiicc MMiixxeeddeeffffeecctt mmooddeelllibrary(lme4)# formula is differentlmer3 <- glmer(yall~xall+(1|peopleall),data=data3,family=binomial)print(summary(lmer3))

Generalized linear mixed model fit by maximum likelihood ['glmerMod'] Family: binomial ( logit )Formula: yall ~ xall + (1 | peopleall) Data: data3

AIC BIC logLik deviance 69.8 77.1 -31.9 63.8

Random effects: Groups Name Variance Std.Dev. peopleall (Intercept) 3.94 1.98 Number of obs: 83, groups: peopleall, 6

Fixed effects: Estimate Std. Error z value Pr(>|z|)

31/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

31 of 34 6/13/14, 9:51 PM

Page 32: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

PPlloott ooff LLooggiissttiicc MMiixxeedd eeffffeecctt mmooddeell

32/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

32 of 34 6/13/14, 9:51 PM

Page 33: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

Case study

33/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

33 of 34 6/13/14, 9:51 PM

Page 34: R workshop viii--Mixed Effect Analysis in r (randam and fixed effect)

34/34

Introduction of Mixed effect model http://nycdatascience.com/slides/mixed_effect_model_supstat/index.html#1

34 of 34 6/13/14, 9:51 PM