A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and...

135
A Brief and Friendly Introduction to Mixed-Effects Models in Psycholinguistics θ Σb b1 b2 bM ··· ··· x11 x1n1 y11 y1n1 ··· ··· x21 x2n2 y21 y2n2 ··· ··· xM1 xMnM yM1 yMnM ··· ··· Cluster-specific parameters (“randomeffects”) Shared parameters (“fixedeffects”) Parameters governing inter-cluster variability Roger Levy (modified by Florian Jaeger) UC San Diego Department of Linguistics 25 March 2009

Transcript of A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and...

Page 1: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

A Brief and Friendly Introduction toMixed-Effects Models in Psycholinguistics

θ

Σb

b1 b2 bM· · ·

· · ·

x11 x1n1

y11 y1n1

· · ·

· · ·

x21 x2n2

y21 y2n2

· · ·

· · ·

xM1 xMnM

yM1 yMnM

· · ·

· · ·

Cluster-specificparameters

(“random effects”)

Shared parameters(“fixed effects”)

Parameters governinginter-cluster variability

Roger Levy (modified by Florian Jaeger)

UC San DiegoDepartment of Linguistics

25 March 2009

Page 2: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Goals of this talk

I Briefly review generalized linear models and how to use them

I Give a precise description of multi-level models

I Show how to draw inferences using a multi-level model (fittingthe model)

I Discuss how to interpret model parameter estimatesI Fixed effectsI Random effects

I Briefly discuss multi-level logit models

Page 3: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing generalized linear models I

Goal: model the effects of predictors (independent variables) X ona response (dependent variable) Y .

The picture:

θ

x1

y1

x2

y2

xn

yn· · ·

Predictors

Model parameters

Response

Page 4: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing generalized linear models I

Goal: model the effects of predictors (independent variables) X ona response (dependent variable) Y .

The picture:

θ

x1

y1

x2

y2

xn

yn· · ·

Predictors

Model parameters

Response

Page 5: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing generalized linear models I

Goal: model the effects of predictors (independent variables) X ona response (dependent variable) Y .

The picture:

θ

x1

y1

x2

y2

xn

yn· · ·

Predictors

Model parameters

Response

Page 6: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing generalized linear models I

Goal: model the effects of predictors (independent variables) X ona response (dependent variable) Y .

The picture:

θ

x1

y1

x2

y2

xn

yn· · ·

Predictors

Model parameters

Response

Page 7: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs II

Assumptions of the generalized linear model (GLM):

1. Predictors {Xi} influence Y through the mediation of a linearpredictor η;

2. η is a linear combination of the {Xi}:

η = α + β1X1 + · · ·+ βNXN (linear predictor)

3. η determines the predicted mean µ of Y

η = l(µ) (link function)

4. There is some noise distribution of Y around the predictedmean µ of Y :

P(Y = y ;µ)

Page 8: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs II

Assumptions of the generalized linear model (GLM):

1. Predictors {Xi} influence Y through the mediation of a linearpredictor η;

2. η is a linear combination of the {Xi}:

η = α + β1X1 + · · ·+ βNXN (linear predictor)

3. η determines the predicted mean µ of Y

η = l(µ) (link function)

4. There is some noise distribution of Y around the predictedmean µ of Y :

P(Y = y ;µ)

Page 9: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs II

Assumptions of the generalized linear model (GLM):

1. Predictors {Xi} influence Y through the mediation of a linearpredictor η;

2. η is a linear combination of the {Xi}:

η = α + β1X1 + · · ·+ βNXN (linear predictor)

3. η determines the predicted mean µ of Y

η = l(µ) (link function)

4. There is some noise distribution of Y around the predictedmean µ of Y :

P(Y = y ;µ)

Page 10: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs II

Assumptions of the generalized linear model (GLM):

1. Predictors {Xi} influence Y through the mediation of a linearpredictor η;

2. η is a linear combination of the {Xi}:

η = α + β1X1 + · · ·+ βNXN (linear predictor)

3. η determines the predicted mean µ of Y

η = l(µ) (link function)

4. There is some noise distribution of Y around the predictedmean µ of Y :

P(Y = y ;µ)

Page 11: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs II

Assumptions of the generalized linear model (GLM):

1. Predictors {Xi} influence Y through the mediation of a linearpredictor η;

2. η is a linear combination of the {Xi}:

η = α + β1X1 + · · ·+ βNXN (linear predictor)

3. η determines the predicted mean µ of Y

η = l(µ) (link function)

4. There is some noise distribution of Y around the predictedmean µ of Y :

P(Y = y ;µ)

Page 12: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs III

Linear regression, which underlies ANOVA, is a kind of generalizedlinear model.

I The predicted mean is just the linear predictor:

η = l(µ) = µ

I Noise is normally (=Gaussian) distributed around 0 withstandard deviation σ:

ε ∼ N(0, σ)

I This gives us the traditional linear regression equation:

Y =

Predicted Mean µ = η︷ ︸︸ ︷α + β1X1 + · · ·+ βnXn +

Noise∼N(0,σ)︷︸︸︷ε

Page 13: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs III

Linear regression, which underlies ANOVA, is a kind of generalizedlinear model.

I The predicted mean is just the linear predictor:

η = l(µ) = µ

I Noise is normally (=Gaussian) distributed around 0 withstandard deviation σ:

ε ∼ N(0, σ)

I This gives us the traditional linear regression equation:

Y =

Predicted Mean µ = η︷ ︸︸ ︷α + β1X1 + · · ·+ βnXn +

Noise∼N(0,σ)︷︸︸︷ε

Page 14: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs III

Linear regression, which underlies ANOVA, is a kind of generalizedlinear model.

I The predicted mean is just the linear predictor:

η = l(µ) = µ

I Noise is normally (=Gaussian) distributed around 0 withstandard deviation σ:

ε ∼ N(0, σ)

I This gives us the traditional linear regression equation:

Y =

Predicted Mean µ = η︷ ︸︸ ︷α + β1X1 + · · ·+ βnXn +

Noise∼N(0,σ)︷︸︸︷ε

Page 15: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs III

Linear regression, which underlies ANOVA, is a kind of generalizedlinear model.

I The predicted mean is just the linear predictor:

η = l(µ) = µ

I Noise is normally (=Gaussian) distributed around 0 withstandard deviation σ:

ε ∼ N(0, σ)

I This gives us the traditional linear regression equation:

Y =

Predicted Mean µ = η︷ ︸︸ ︷α + β1X1 + · · ·+ βnXn +

Noise∼N(0,σ)︷︸︸︷ε

Page 16: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs IV

Logistic regression, too, is a kind of generalized linear model.

I The linear predictor:

η = α + β1X1 + · · ·+ βnXn

I The link function g is the logit transform:

E(Y) = p = g−1(η)⇔

g(p) = lnp

1− p= η = α + β1X1 + · · ·+ βnXn (1)

I The distribution around the mean is taken to be binomial.

Page 17: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs IV

Logistic regression, too, is a kind of generalized linear model.

I The linear predictor:

η = α + β1X1 + · · ·+ βnXn

I The link function g is the logit transform:

E(Y) = p = g−1(η)⇔

g(p) = lnp

1− p= η = α + β1X1 + · · ·+ βnXn (1)

I The distribution around the mean is taken to be binomial.

Page 18: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs IV

Logistic regression, too, is a kind of generalized linear model.

I The linear predictor:

η = α + β1X1 + · · ·+ βnXn

I The link function g is the logit transform:

E(Y) = p = g−1(η)⇔

g(p) = lnp

1− p= η = α + β1X1 + · · ·+ βnXn (1)

I The distribution around the mean is taken to be binomial.

Page 19: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs IV

Logistic regression, too, is a kind of generalized linear model.

I The linear predictor:

η = α + β1X1 + · · ·+ βnXn

I The link function g is the logit transform:

E(Y) = p = g−1(η)⇔

g(p) = lnp

1− p= η = α + β1X1 + · · ·+ βnXn (1)

I The distribution around the mean is taken to be binomial.

Page 20: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs V

Y =

Predicted Mean︷ ︸︸ ︷α + β1X1 + · · · + βnXn +

Noise∼N(0,σ)︷︸︸︷ε

I How do we fit the parameters βi and σ (choose modelcoefficients)?

I There are two major approaches (deeply related, yet different)in widespread use:

I The principle of maximum likelihood: pick parameter valuesthat maximize the probability of your data Y

choose {βi} and σ that make the likelihoodP(Y |{βi}, σ) as large as possible

I Bayesian inference: put a probability distribution on the modelparameters and update it on the basis of what parameters bestexplain the data

Page 21: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs V

Y =

Predicted Mean︷ ︸︸ ︷α + β1X1 + · · · + βnXn +

Noise∼N(0,σ)︷︸︸︷ε

I How do we fit the parameters βi and σ (choose modelcoefficients)?

I There are two major approaches (deeply related, yet different)in widespread use:

I The principle of maximum likelihood: pick parameter valuesthat maximize the probability of your data Y

choose {βi} and σ that make the likelihoodP(Y |{βi}, σ) as large as possible

I Bayesian inference: put a probability distribution on the modelparameters and update it on the basis of what parameters bestexplain the data

Page 22: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs V

Y =

Predicted Mean︷ ︸︸ ︷α + β1X1 + · · · + βnXn +

Noise∼N(0,σ)︷︸︸︷ε

I How do we fit the parameters βi and σ (choose modelcoefficients)?

I There are two major approaches (deeply related, yet different)in widespread use:

I The principle of maximum likelihood: pick parameter valuesthat maximize the probability of your data Y

choose {βi} and σ that make the likelihoodP(Y |{βi}, σ) as large as possible

I Bayesian inference: put a probability distribution on the modelparameters and update it on the basis of what parameters bestexplain the data

Page 23: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs V

Y =

Predicted Mean︷ ︸︸ ︷α + β1X1 + · · · + βnXn +

Noise∼N(0,σ)︷︸︸︷ε

I How do we fit the parameters βi and σ (choose modelcoefficients)?

I There are two major approaches (deeply related, yet different)in widespread use:

I The principle of maximum likelihood: pick parameter valuesthat maximize the probability of your data Y

choose {βi} and σ that make the likelihoodP(Y |{βi}, σ) as large as possible

I Bayesian inference: put a probability distribution on the modelparameters and update it on the basis of what parameters bestexplain the data

P({βi}, σ|Y ) =P(Y |{βi}, σ)

Prior︷ ︸︸ ︷P({βi}, σ)

P(Y )

Page 24: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs V

Y =

Predicted Mean︷ ︸︸ ︷α + β1X1 + · · · + βnXn +

Noise∼N(0,σ)︷︸︸︷ε

I How do we fit the parameters βi and σ (choose modelcoefficients)?

I There are two major approaches (deeply related, yet different)in widespread use:

I The principle of maximum likelihood: pick parameter valuesthat maximize the probability of your data Y

choose {βi} and σ that make the likelihoodP(Y |{βi}, σ) as large as possible

I Bayesian inference: put a probability distribution on the modelparameters and update it on the basis of what parameters bestexplain the data

P({βi}, σ|Y ) =

Likelihood︷ ︸︸ ︷P(Y |{βi}, σ)

Prior︷ ︸︸ ︷P({βi}, σ)

P(Y )

Page 25: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VI: a simple example

I You are studying non-word RTs in a lexical-decision task

tpozt Word or non-word?houze Word or non-word?

I Non-words with different neighborhood densities∗ should havedifferent average RT ∗(= number of neighbors of edit-distance 1)

I A simple model: assume that neighborhood density has alinear effect on average RT, and trial-level noise is normallydistributed∗ ∗(n.b. wrong–RTs are skewed—but not horrible.)

I If xi is neighborhood density, our simple model is

RTi = α + βxi +

∼N(0,σ)︷︸︸︷εi

I We need to draw inferences about α, β, and σ

I e.g., “Does neighborhood density affects RT?”→ is β reliablynon-zero?

Page 26: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VI: a simple example

I You are studying non-word RTs in a lexical-decision task

tpozt Word or non-word?

houze Word or non-word?

I Non-words with different neighborhood densities∗ should havedifferent average RT ∗(= number of neighbors of edit-distance 1)

I A simple model: assume that neighborhood density has alinear effect on average RT, and trial-level noise is normallydistributed∗ ∗(n.b. wrong–RTs are skewed—but not horrible.)

I If xi is neighborhood density, our simple model is

RTi = α + βxi +

∼N(0,σ)︷︸︸︷εi

I We need to draw inferences about α, β, and σ

I e.g., “Does neighborhood density affects RT?”→ is β reliablynon-zero?

Page 27: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VI: a simple example

I You are studying non-word RTs in a lexical-decision task

tpozt Word or non-word?houze Word or non-word?

I Non-words with different neighborhood densities∗ should havedifferent average RT ∗(= number of neighbors of edit-distance 1)

I A simple model: assume that neighborhood density has alinear effect on average RT, and trial-level noise is normallydistributed∗ ∗(n.b. wrong–RTs are skewed—but not horrible.)

I If xi is neighborhood density, our simple model is

RTi = α + βxi +

∼N(0,σ)︷︸︸︷εi

I We need to draw inferences about α, β, and σ

I e.g., “Does neighborhood density affects RT?”→ is β reliablynon-zero?

Page 28: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VI: a simple example

I You are studying non-word RTs in a lexical-decision task

tpozt Word or non-word?houze Word or non-word?

I Non-words with different neighborhood densities∗ should havedifferent average RT ∗(= number of neighbors of edit-distance 1)

I A simple model: assume that neighborhood density has alinear effect on average RT, and trial-level noise is normallydistributed∗ ∗(n.b. wrong–RTs are skewed—but not horrible.)

I If xi is neighborhood density, our simple model is

RTi = α + βxi +

∼N(0,σ)︷︸︸︷εi

I We need to draw inferences about α, β, and σ

I e.g., “Does neighborhood density affects RT?”→ is β reliablynon-zero?

Page 29: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VI: a simple example

I You are studying non-word RTs in a lexical-decision task

tpozt Word or non-word?houze Word or non-word?

I Non-words with different neighborhood densities∗ should havedifferent average RT ∗(= number of neighbors of edit-distance 1)

I A simple model: assume that neighborhood density has alinear effect on average RT, and trial-level noise is normallydistributed∗ ∗(n.b. wrong–RTs are skewed—but not horrible.)

I If xi is neighborhood density, our simple model is

RTi = α + βxi +

∼N(0,σ)︷︸︸︷εi

I We need to draw inferences about α, β, and σ

I e.g., “Does neighborhood density affects RT?”→ is β reliablynon-zero?

Page 30: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VI: a simple example

I You are studying non-word RTs in a lexical-decision task

tpozt Word or non-word?houze Word or non-word?

I Non-words with different neighborhood densities∗ should havedifferent average RT ∗(= number of neighbors of edit-distance 1)

I A simple model: assume that neighborhood density has alinear effect on average RT, and trial-level noise is normallydistributed∗ ∗(n.b. wrong–RTs are skewed—but not horrible.)

I If xi is neighborhood density, our simple model is

RTi = α + βxi +

∼N(0,σ)︷︸︸︷εi

I We need to draw inferences about α, β, and σ

I e.g., “Does neighborhood density affects RT?”→ is β reliablynon-zero?

Page 31: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VI: a simple example

I You are studying non-word RTs in a lexical-decision task

tpozt Word or non-word?houze Word or non-word?

I Non-words with different neighborhood densities∗ should havedifferent average RT ∗(= number of neighbors of edit-distance 1)

I A simple model: assume that neighborhood density has alinear effect on average RT, and trial-level noise is normallydistributed∗ ∗(n.b. wrong–RTs are skewed—but not horrible.)

I If xi is neighborhood density, our simple model is

RTi = α + βxi +

∼N(0,σ)︷︸︸︷εi

I We need to draw inferences about α, β, and σ

I e.g., “Does neighborhood density affects RT?”→ is β reliablynon-zero?

Page 32: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VI: a simple example

I You are studying non-word RTs in a lexical-decision task

tpozt Word or non-word?houze Word or non-word?

I Non-words with different neighborhood densities∗ should havedifferent average RT ∗(= number of neighbors of edit-distance 1)

I A simple model: assume that neighborhood density has alinear effect on average RT, and trial-level noise is normallydistributed∗ ∗(n.b. wrong–RTs are skewed—but not horrible.)

I If xi is neighborhood density, our simple model is

RTi = α + βxi +

∼N(0,σ)︷︸︸︷εi

I We need to draw inferences about α, β, and σ

I e.g., “Does neighborhood density affects RT?”→ is β reliablynon-zero?

Page 33: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VII

I We’ll use length-4 nonword data from (Bicknell et al., 2008)(thanks!), such as:

Few neighbors Many neighborsgaty peme rixy lish pait yine

I There’s a wide range of neighborhood density:

Number of neighbors

Fre

quen

cy

0 2 4 6 8 10 12

02

46

8

pait

Page 34: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VII

I We’ll use length-4 nonword data from (Bicknell et al., 2008)(thanks!), such as:

Few neighbors Many neighborsgaty peme rixy lish pait yine

I There’s a wide range of neighborhood density:

Number of neighbors

Fre

quen

cy

0 2 4 6 8 10 12

02

46

8

pait

Page 35: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VIII: maximum-likelihood model fitting

RTi = α + βXi +

∼N(0,σ)︷︸︸︷εi

I Here’s a translation of our simple model into R:

RT ∼ 1 + x

I The noise is implicit in asking R to fit a linear model

I (We can omit the 1; R assumes it unless otherwise directed)

I Example of fitting via maximum likelihood: one subject fromBicknell et al. (2008)

> m <- glm(RT ~ neighbors, d, family="gaussian")

> summary(m)

[...]Estimate Std. Error t value Pr(>|t|)

(Intercept) 382.997 26.837 14.271 <2e-16 ***neighbors 4.828 6.553 0.737 0.466

> sqrt(summary(m)[["dispersion"]])

[1] 107.2248

Gaussian noise, implicit intercept

α̂

β̂

σ̂

Page 36: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VIII: maximum-likelihood model fitting

RTi = α + βXi +

∼N(0,σ)︷︸︸︷εi

I Here’s a translation of our simple model into R:

RT ∼ 1 + xI The noise is implicit in asking R to fit a linear model

I (We can omit the 1; R assumes it unless otherwise directed)

I Example of fitting via maximum likelihood: one subject fromBicknell et al. (2008)

> m <- glm(RT ~ neighbors, d, family="gaussian")

> summary(m)

[...]Estimate Std. Error t value Pr(>|t|)

(Intercept) 382.997 26.837 14.271 <2e-16 ***neighbors 4.828 6.553 0.737 0.466

> sqrt(summary(m)[["dispersion"]])

[1] 107.2248

Gaussian noise, implicit intercept

α̂

β̂

σ̂

Page 37: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VIII: maximum-likelihood model fitting

RTi = α + βXi +

∼N(0,σ)︷︸︸︷εi

I Here’s a translation of our simple model into R:

RT ∼ 1 + xI The noise is implicit in asking R to fit a linear model

I (We can omit the 1; R assumes it unless otherwise directed)

I Example of fitting via maximum likelihood: one subject fromBicknell et al. (2008)

> m <- glm(RT ~ neighbors, d, family="gaussian")

> summary(m)

[...]Estimate Std. Error t value Pr(>|t|)

(Intercept) 382.997 26.837 14.271 <2e-16 ***neighbors 4.828 6.553 0.737 0.466

> sqrt(summary(m)[["dispersion"]])

[1] 107.2248

Gaussian noise, implicit intercept

α̂

β̂

σ̂

Page 38: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VIII: maximum-likelihood model fitting

RTi = α + βXi +

∼N(0,σ)︷︸︸︷εi

I Here’s a translation of our simple model into R:

RT ∼ xI The noise is implicit in asking R to fit a linear model

I (We can omit the 1; R assumes it unless otherwise directed)

I Example of fitting via maximum likelihood: one subject fromBicknell et al. (2008)

> m <- glm(RT ~ neighbors, d, family="gaussian")

> summary(m)

[...]Estimate Std. Error t value Pr(>|t|)

(Intercept) 382.997 26.837 14.271 <2e-16 ***neighbors 4.828 6.553 0.737 0.466

> sqrt(summary(m)[["dispersion"]])

[1] 107.2248

Gaussian noise, implicit intercept

α̂

β̂

σ̂

Page 39: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VIII: maximum-likelihood model fitting

RTi = α + βXi +

∼N(0,σ)︷︸︸︷εi

I Here’s a translation of our simple model into R:

RT ∼ xI The noise is implicit in asking R to fit a linear model

I (We can omit the 1; R assumes it unless otherwise directed)

I Example of fitting via maximum likelihood: one subject fromBicknell et al. (2008)

> m <- glm(RT ~ neighbors, d, family="gaussian")

> summary(m)

[...]Estimate Std. Error t value Pr(>|t|)

(Intercept) 382.997 26.837 14.271 <2e-16 ***neighbors 4.828 6.553 0.737 0.466

> sqrt(summary(m)[["dispersion"]])

[1] 107.2248

Gaussian noise, implicit intercept

α̂

β̂

σ̂

Page 40: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VIII: maximum-likelihood model fitting

RTi = α + βXi +

∼N(0,σ)︷︸︸︷εi

I Here’s a translation of our simple model into R:

RT ∼ xI The noise is implicit in asking R to fit a linear model

I (We can omit the 1; R assumes it unless otherwise directed)

I Example of fitting via maximum likelihood: one subject fromBicknell et al. (2008)

> m <- glm(RT ~ neighbors, d, family="gaussian")

> summary(m)

[...]Estimate Std. Error t value Pr(>|t|)

(Intercept) 382.997 26.837 14.271 <2e-16 ***neighbors 4.828 6.553 0.737 0.466

> sqrt(summary(m)[["dispersion"]])

[1] 107.2248

Gaussian noise, implicit intercept

α̂

β̂

σ̂

Page 41: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VIII: maximum-likelihood model fitting

RTi = α + βXi +

∼N(0,σ)︷︸︸︷εi

I Here’s a translation of our simple model into R:

RT ∼ xI The noise is implicit in asking R to fit a linear model

I (We can omit the 1; R assumes it unless otherwise directed)

I Example of fitting via maximum likelihood: one subject fromBicknell et al. (2008)

> m <- glm(RT ~ neighbors, d, family="gaussian")

> summary(m)

[...]Estimate Std. Error t value Pr(>|t|)

(Intercept) 382.997 26.837 14.271 <2e-16 ***neighbors 4.828 6.553 0.737 0.466

> sqrt(summary(m)[["dispersion"]])

[1] 107.2248

Gaussian noise, implicit intercept

α̂

β̂

σ̂

Page 42: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VIII: maximum-likelihood model fitting

RTi = α + βXi +

∼N(0,σ)︷︸︸︷εi

I Here’s a translation of our simple model into R:

RT ∼ xI The noise is implicit in asking R to fit a linear model

I (We can omit the 1; R assumes it unless otherwise directed)

I Example of fitting via maximum likelihood: one subject fromBicknell et al. (2008)

> m <- glm(RT ~ neighbors, d, family="gaussian")

> summary(m)

[...]Estimate Std. Error t value Pr(>|t|)

(Intercept) 382.997 26.837 14.271 <2e-16 ***neighbors 4.828 6.553 0.737 0.466

> sqrt(summary(m)[["dispersion"]])

[1] 107.2248

Gaussian noise, implicit intercept

α̂

β̂

σ̂

Page 43: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs VIII: maximum-likelihood model fitting

RTi = α + βXi +

∼N(0,σ)︷︸︸︷εi

I Here’s a translation of our simple model into R:

RT ∼ xI The noise is implicit in asking R to fit a linear model

I (We can omit the 1; R assumes it unless otherwise directed)

I Example of fitting via maximum likelihood: one subject fromBicknell et al. (2008)

> m <- glm(RT ~ neighbors, d, family="gaussian")

> summary(m)

[...]Estimate Std. Error t value Pr(>|t|)

(Intercept) 382.997 26.837 14.271 <2e-16 ***neighbors 4.828 6.553 0.737 0.466

> sqrt(summary(m)[["dispersion"]])

[1] 107.2248

Gaussian noise, implicit intercept

α̂

β̂

σ̂

Page 44: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs: maximum-likelihood fitting IX

Intercept 383.00neighbors 4.83σ̂ 107.22

I Estimated coefficients arewhat underlies “best linearfit” plots

●●

●●

0 2 4 6 8 10 12

300

400

500

600

700

Neighbors

RT

(m

s)

Page 45: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs: maximum-likelihood fitting IX

Intercept 383.00neighbors 4.83σ̂ 107.22

I Estimated coefficients arewhat underlies “best linearfit” plots

●●

●●

0 2 4 6 8 10 12

300

400

500

600

700

Neighbors

RT

(m

s)

Page 46: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs: maximum-likelihood fitting IX

Intercept 383.00neighbors 4.83σ̂ 107.22

I Estimated coefficients arewhat underlies “best linearfit” plots

●●

●●

0 2 4 6 8 10 1230

040

050

060

070

0Neighbors

RT

(m

s)

Page 47: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models

I But of course experiments don’t have just one participant

I Different participants may have different idiosyncratic behavior

I And items may have idiosyncratic properties too

I We’d like to take these into account, and perhaps investigatethem directly too.

I This is what multi-level (hierarchical, mixed-effects) models arefor!

Page 48: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models II

I Recap of the graphical picture of a single-level model:

θ

x1

y1

x2

y2

xn

yn· · ·

Predictors

Model parameters

Response

Page 49: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models III: the new graphical picture

θ

Σb

b1 b2 bM· · ·

· · ·

x11 x1n1

y11 y1n1

· · ·

· · ·

x21 x2n2

y21 y2n2

· · ·

· · ·

xM1 xMnM

yM1 yMnM

· · ·

· · ·

Cluster-specificparameters

(“random effects”)

Shared parameters(“fixed effects”)

Parameters governinginter-cluster variability

Page 50: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models III: the new graphical picture

θ

Σb

b1 b2 bM· · ·

· · ·

x11 x1n1

y11 y1n1

· · ·

· · ·

x21 x2n2

y21 y2n2

· · ·

· · ·

xM1 xMnM

yM1 yMnM

· · ·

· · ·

Cluster-specificparameters

(“random effects”)

Shared parameters(“fixed effects”)

Parameters governinginter-cluster variability

Page 51: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models III: the new graphical picture

θ

Σb

b1 b2 bM· · ·

· · ·

x11 x1n1

y11 y1n1

· · ·

· · ·

x21 x2n2

y21 y2n2

· · ·

· · ·

xM1 xMnM

yM1 yMnM

· · ·

· · ·

Cluster-specificparameters

(“random effects”)

Shared parameters(“fixed effects”)

Parameters governinginter-cluster variability

Page 52: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models III: the new graphical picture

θ

Σb

b1 b2 bM· · ·

· · ·

x11 x1n1

y11 y1n1

· · ·

· · ·

x21 x2n2

y21 y2n2

· · ·

· · ·

xM1 xMnM

yM1 yMnM

· · ·

· · ·

Cluster-specificparameters

(“random effects”)

Shared parameters(“fixed effects”)

Parameters governinginter-cluster variability

Page 53: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models III: the new graphical picture

θ

Σb

b1 b2 bM· · ·

· · ·

x11 x1n1

y11 y1n1

· · ·

· · ·

x21 x2n2

y21 y2n2

· · ·

· · ·

xM1 xMnM

yM1 yMnM

· · ·

· · ·

Cluster-specificparameters

(“random effects”)

Shared parameters(“fixed effects”)

Parameters governinginter-cluster variability

Page 54: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models IV

An example of a multi-level model:

I Back to your lexical-decision experiment

tpozt Word or non-word?houze Word or non-word?

I Non-words with different neighborhood densities should havedifferent average decision time

I Additionally, different participants in your study may alsohave:

I different overall decision speedsI differing sensitivity to neighborhood density

I You want to draw inferences about all these things at thesame time

Page 55: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models IV

An example of a multi-level model:

I Back to your lexical-decision experiment

tpozt Word or non-word?houze Word or non-word?

I Non-words with different neighborhood densities should havedifferent average decision time

I Additionally, different participants in your study may alsohave:

I different overall decision speedsI differing sensitivity to neighborhood density

I You want to draw inferences about all these things at thesame time

Page 56: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models IV

An example of a multi-level model:

I Back to your lexical-decision experiment

tpozt Word or non-word?houze Word or non-word?

I Non-words with different neighborhood densities should havedifferent average decision time

I Additionally, different participants in your study may alsohave:

I different overall decision speedsI differing sensitivity to neighborhood density

I You want to draw inferences about all these things at thesame time

Page 57: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models V: Model construction

I Once again we’ll assume for simplicity that the number ofword neighbors x has a linear effect on mean reading time,and that trial-level noise is normally distributed∗

I Random effects, starting simple: let each participant i haveidiosyncratic differences in reading speed

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

I In R, we’d write this relationship as

I Once again we can leave off the 1, and the noise term εij isimplicit

Page 58: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models V: Model construction

I Once again we’ll assume for simplicity that the number ofword neighbors x has a linear effect on mean reading time,and that trial-level noise is normally distributed∗

I Random effects, starting simple: let each participant i haveidiosyncratic differences in reading speed

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

I In R, we’d write this relationship as

I Once again we can leave off the 1, and the noise term εij isimplicit

Page 59: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models V: Model construction

I Once again we’ll assume for simplicity that the number ofword neighbors x has a linear effect on mean reading time,and that trial-level noise is normally distributed∗

I Random effects, starting simple: let each participant i haveidiosyncratic differences in reading speed

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

I In R, we’d write this relationship as

RT ∼ 1 + x + (1 | participant)

I Once again we can leave off the 1, and the noise term εij isimplicit

Page 60: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models V: Model construction

I Once again we’ll assume for simplicity that the number ofword neighbors x has a linear effect on mean reading time,and that trial-level noise is normally distributed∗

I Random effects, starting simple: let each participant i haveidiosyncratic differences in reading speed

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

I In R, we’d write this relationship as

RT ∼ 1 + x + (1 | participant)

I Once again we can leave off the 1, and the noise term εij isimplicit

Page 61: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models V: Model construction

I Once again we’ll assume for simplicity that the number ofword neighbors x has a linear effect on mean reading time,and that trial-level noise is normally distributed∗

I Random effects, starting simple: let each participant i haveidiosyncratic differences in reading speed

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

I In R, we’d write this relationship as

RT ∼ x + (1 | participant)

I Once again we can leave off the 1, and the noise term εij isimplicit

Page 62: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models VI: simulating data

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

I Simulation of trial-level data can be invaluable for achievingdeeper understanding of the data

## simulate some data

> sigma.b <- 125 # inter-subject variation larger than

> sigma.e <- 40 # intra-subject, inter-trial variation

> alpha <- 500

> beta <- 12

> M <- 6 # number of participants

> n <- 50 # trials per participant

> b <- rnorm(M, 0, sigma.b) # individual differences

> nneighbors <- rpois(M*n,3) + 1 # generate num. neighbors

> subj <- rep(1:M,n)

> RT <- alpha + beta * nneighbors + # simulate RTs!

b[subj] + rnorm(M*n,0,sigma.e) #

Page 63: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models VI: simulating data

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

I Simulation of trial-level data can be invaluable for achievingdeeper understanding of the data

## simulate some data

> sigma.b <- 125 # inter-subject variation larger than

> sigma.e <- 40 # intra-subject, inter-trial variation

> alpha <- 500

> beta <- 12

> M <- 6 # number of participants

> n <- 50 # trials per participant

> b <- rnorm(M, 0, sigma.b) # individual differences

> nneighbors <- rpois(M*n,3) + 1 # generate num. neighbors

> subj <- rep(1:M,n)

> RT <- alpha + beta * nneighbors + # simulate RTs!

b[subj] + rnorm(M*n,0,sigma.e) #

Page 64: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models VII: simulating data

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8

400

600

800

1000

# Neighbors

RT

I Participant-level clustering is easily visible

I This reflects the fact that (simulated) inter-participantvariation (125ms) is larger than (simulated) inter-trialvariation (40ms)

I And the (simulated) effects of neighborhood density are alsovisible

Page 65: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models VII: simulating data

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8

400

600

800

1000

# Neighbors

RT

Subject intercepts (alpha + beta[subj])

I Participant-level clustering is easily visible

I This reflects the fact that (simulated) inter-participantvariation (125ms) is larger than (simulated) inter-trialvariation (40ms)

I And the (simulated) effects of neighborhood density are alsovisible

Page 66: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models VII: simulating data

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8

400

600

800

1000

# Neighbors

RT

Subject intercepts (alpha + beta[subj])

I Participant-level clustering is easily visibleI This reflects the fact that (simulated) inter-participant

variation (125ms) is larger than (simulated) inter-trialvariation (40ms)

I And the (simulated) effects of neighborhood density are alsovisible

Page 67: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Multi-level Models VII: simulating data

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8

400

600

800

1000

# Neighbors

RT

Subject intercepts (alpha + beta[subj])

I Participant-level clustering is easily visibleI This reflects the fact that (simulated) inter-participant

variation (125ms) is larger than (simulated) inter-trialvariation (40ms)

I And the (simulated) effects of neighborhood density are alsovisible

Page 68: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Statistical inference with multi-level models

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

I It’s insightful to create a model and simulate data (in thiscase to understand the type of model, but also more generallyto understand your data) . . .

I but, more commonly, we have data and we need to infer amodel

I Specifically, the “fixed-effect” parameters α, β, and σε, plus theparameter governing inter-subject variation, σb

I e.g., hypothesis tests about effects of neighborhood density:can we reliably infer that β is {non-zero, positive, . . . }?

I Fortunately, we can use the same principles as before to dothis:

I The principle of maximum likelihoodI Or Bayesian inference

Page 69: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Statistical inference with multi-level models

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

I It’s insightful to create a model and simulate data (in thiscase to understand the type of model, but also more generallyto understand your data) . . .

I but, more commonly, we have data and we need to infer amodel

I Specifically, the “fixed-effect” parameters α, β, and σε, plus theparameter governing inter-subject variation, σb

I e.g., hypothesis tests about effects of neighborhood density:can we reliably infer that β is {non-zero, positive, . . . }?

I Fortunately, we can use the same principles as before to dothis:

I The principle of maximum likelihoodI Or Bayesian inference

Page 70: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Statistical inference with multi-level models

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

I It’s insightful to create a model and simulate data (in thiscase to understand the type of model, but also more generallyto understand your data) . . .

I but, more commonly, we have data and we need to infer amodel

I Specifically, the “fixed-effect” parameters α, β, and σε, plus theparameter governing inter-subject variation, σb

I e.g., hypothesis tests about effects of neighborhood density:can we reliably infer that β is {non-zero, positive, . . . }?

I Fortunately, we can use the same principles as before to dothis:

I The principle of maximum likelihoodI Or Bayesian inference

Page 71: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Fitting a multi-level model using maximum likelihood

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

> m <- lmer(time ~ neighbors.centered +

(1 | participant),dat,REML=F)

> print(m,corr=F)

[...]Random effects:Groups Name Variance Std.Dev.participant (Intercept) 4924.9 70.177Residual 19240.5 138.710Number of obs: 1760, groups: participant, 44

Fixed effects:Estimate Std. Error t value

(Intercept) 583.787 11.082 52.68neighbors.centered 8.986 1.278 7.03

α̂

β̂

σ̂b

σ̂ε

Page 72: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Fitting a multi-level model using maximum likelihood

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

> m <- lmer(time ~ neighbors.centered +

(1 | participant),dat,REML=F)

> print(m,corr=F)

[...]Random effects:Groups Name Variance Std.Dev.participant (Intercept) 4924.9 70.177Residual 19240.5 138.710Number of obs: 1760, groups: participant, 44

Fixed effects:Estimate Std. Error t value

(Intercept) 583.787 11.082 52.68neighbors.centered 8.986 1.278 7.03

α̂

β̂

σ̂b

σ̂ε

Page 73: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Fitting a multi-level model using maximum likelihood

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

> m <- lmer(time ~ neighbors.centered +

(1 | participant),dat,REML=F)

> print(m,corr=F)

[...]Random effects:Groups Name Variance Std.Dev.participant (Intercept) 4924.9 70.177Residual 19240.5 138.710Number of obs: 1760, groups: participant, 44

Fixed effects:Estimate Std. Error t value

(Intercept) 583.787 11.082 52.68neighbors.centered 8.986 1.278 7.03

α̂

β̂

σ̂b

σ̂ε

Page 74: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Fitting a multi-level model using maximum likelihood

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

> m <- lmer(time ~ neighbors.centered +

(1 | participant),dat,REML=F)

> print(m,corr=F)

[...]Random effects:Groups Name Variance Std.Dev.participant (Intercept) 4924.9 70.177Residual 19240.5 138.710Number of obs: 1760, groups: participant, 44

Fixed effects:Estimate Std. Error t value

(Intercept) 583.787 11.082 52.68neighbors.centered 8.986 1.278 7.03

α̂

β̂

σ̂b

σ̂ε

Page 75: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Fitting a multi-level model using maximum likelihood

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

> m <- lmer(time ~ neighbors.centered +

(1 | participant),dat,REML=F)

> print(m,corr=F)

[...]Random effects:Groups Name Variance Std.Dev.participant (Intercept) 4924.9 70.177Residual 19240.5 138.710Number of obs: 1760, groups: participant, 44

Fixed effects:Estimate Std. Error t value

(Intercept) 583.787 11.082 52.68neighbors.centered 8.986 1.278 7.03

α̂

β̂

σ̂b

σ̂ε

Page 76: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Interpreting parameter estimates

Intercept 583.79neighbors.centered 8.99σ̂b 70.18σ̂ε 138.7

I The fixed effects are interpreted just as in a traditionalsingle-level model:

I The “average” RT for a non-word in this study is 583.79msI Every extra neighbor increases “average” RT by 8.99ms

I Inter-trial variability σε also has the same interpretationI Inter-trial variability for a given participant is Gaussian,

centered around the participant+word-specific mean withstandard deviation 138.7ms

I Inter-participant variability σb is what’s new:I Variability in average RT in the population from which the

participants were drawn has standard deviation 70.18ms

Page 77: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Interpreting parameter estimates

Intercept 583.79neighbors.centered 8.99σ̂b 70.18σ̂ε 138.7

I The fixed effects are interpreted just as in a traditionalsingle-level model:

I The “average” RT for a non-word in this study is 583.79msI Every extra neighbor increases “average” RT by 8.99ms

I Inter-trial variability σε also has the same interpretationI Inter-trial variability for a given participant is Gaussian,

centered around the participant+word-specific mean withstandard deviation 138.7ms

I Inter-participant variability σb is what’s new:I Variability in average RT in the population from which the

participants were drawn has standard deviation 70.18ms

Page 78: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Interpreting parameter estimates

Intercept 583.79neighbors.centered 8.99σ̂b 70.18σ̂ε 138.7

I The fixed effects are interpreted just as in a traditionalsingle-level model:

I The “average” RT for a non-word in this study is 583.79ms

I Every extra neighbor increases “average” RT by 8.99ms

I Inter-trial variability σε also has the same interpretationI Inter-trial variability for a given participant is Gaussian,

centered around the participant+word-specific mean withstandard deviation 138.7ms

I Inter-participant variability σb is what’s new:I Variability in average RT in the population from which the

participants were drawn has standard deviation 70.18ms

Page 79: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Interpreting parameter estimates

Intercept 583.79neighbors.centered 8.99σ̂b 70.18σ̂ε 138.7

I The fixed effects are interpreted just as in a traditionalsingle-level model:

I The “average” RT for a non-word in this study is 583.79msI Every extra neighbor increases “average” RT by 8.99ms

I Inter-trial variability σε also has the same interpretationI Inter-trial variability for a given participant is Gaussian,

centered around the participant+word-specific mean withstandard deviation 138.7ms

I Inter-participant variability σb is what’s new:I Variability in average RT in the population from which the

participants were drawn has standard deviation 70.18ms

Page 80: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Interpreting parameter estimates

Intercept 583.79neighbors.centered 8.99σ̂b 70.18σ̂ε 138.7

I The fixed effects are interpreted just as in a traditionalsingle-level model:

I The “average” RT for a non-word in this study is 583.79msI Every extra neighbor increases “average” RT by 8.99ms

I Inter-trial variability σε also has the same interpretation

I Inter-trial variability for a given participant is Gaussian,centered around the participant+word-specific mean withstandard deviation 138.7ms

I Inter-participant variability σb is what’s new:I Variability in average RT in the population from which the

participants were drawn has standard deviation 70.18ms

Page 81: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Interpreting parameter estimates

Intercept 583.79neighbors.centered 8.99σ̂b 70.18σ̂ε 138.7

I The fixed effects are interpreted just as in a traditionalsingle-level model:

I The “average” RT for a non-word in this study is 583.79msI Every extra neighbor increases “average” RT by 8.99ms

I Inter-trial variability σε also has the same interpretationI Inter-trial variability for a given participant is Gaussian,

centered around the participant+word-specific mean withstandard deviation 138.7ms

I Inter-participant variability σb is what’s new:I Variability in average RT in the population from which the

participants were drawn has standard deviation 70.18ms

Page 82: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Interpreting parameter estimates

Intercept 583.79neighbors.centered 8.99σ̂b 70.18σ̂ε 138.7

I The fixed effects are interpreted just as in a traditionalsingle-level model:

I The “average” RT for a non-word in this study is 583.79msI Every extra neighbor increases “average” RT by 8.99ms

I Inter-trial variability σε also has the same interpretationI Inter-trial variability for a given participant is Gaussian,

centered around the participant+word-specific mean withstandard deviation 138.7ms

I Inter-participant variability σb is what’s new:

I Variability in average RT in the population from which theparticipants were drawn has standard deviation 70.18ms

Page 83: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Interpreting parameter estimates

Intercept 583.79neighbors.centered 8.99σ̂b 70.18σ̂ε 138.7

I The fixed effects are interpreted just as in a traditionalsingle-level model:

I The “average” RT for a non-word in this study is 583.79msI Every extra neighbor increases “average” RT by 8.99ms

I Inter-trial variability σε also has the same interpretationI Inter-trial variability for a given participant is Gaussian,

centered around the participant+word-specific mean withstandard deviation 138.7ms

I Inter-participant variability σb is what’s new:I Variability in average RT in the population from which the

participants were drawn has standard deviation 70.18ms

Page 84: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Inferences about cluster-level parameters

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

I What about the participants’ idiosyncracies themselves—thebi?

I We can also draw inferences about these—you may haveheard about BLUPs

I To understand these: committing to fixed-effect andrandom-effect parameter estimates determines a conditionalprobability distribution on participant-specific effects:

P(bi |α̂, β̂, σ̂b, σ̂ε,X)

I The BLUPs are the conditional modes of the bi s—the choicesthat maximize the above probability

Page 85: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Inferences about cluster-level parameters

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

I What about the participants’ idiosyncracies themselves—thebi?

I We can also draw inferences about these—you may haveheard about BLUPs

I To understand these: committing to fixed-effect andrandom-effect parameter estimates determines a conditionalprobability distribution on participant-specific effects:

P(bi |α̂, β̂, σ̂b, σ̂ε,X)

I The BLUPs are the conditional modes of the bi s—the choicesthat maximize the above probability

Page 86: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Inferences about cluster-level parameters

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

I What about the participants’ idiosyncracies themselves—thebi?

I We can also draw inferences about these—you may haveheard about BLUPs

I To understand these: committing to fixed-effect andrandom-effect parameter estimates determines a conditionalprobability distribution on participant-specific effects:

P(bi |α̂, β̂, σ̂b, σ̂ε,X)

I The BLUPs are the conditional modes of the bi s—the choicesthat maximize the above probability

Page 87: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Inferences about cluster-level parameters

RTij = α + βxij +

∼N(0,σb)︷︸︸︷bi +

Noise∼N(0,σε)︷︸︸︷εij

I What about the participants’ idiosyncracies themselves—thebi?

I We can also draw inferences about these—you may haveheard about BLUPs

I To understand these: committing to fixed-effect andrandom-effect parameter estimates determines a conditionalprobability distribution on participant-specific effects:

P(bi |α̂, β̂, σ̂b, σ̂ε,X)

I The BLUPs are the conditional modes of the bi s—the choicesthat maximize the above probability

Page 88: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Inferences about cluster-level parameters II

I The BLUP participant-specific “average” RTs for this datasetare black lines on the base of this graph

400 500 600 700 800

0.00

00.

002

0.00

40.

006

Millseconds

Pro

babi

lity

dens

ity

I The solid line is a guess at their distribution

I The dotted line is the distribution predicted by the model forthe population from which the participants are drawn

I Reasonably close correspondence

Page 89: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Inferences about cluster-level parameters II

I The BLUP participant-specific “average” RTs for this datasetare black lines on the base of this graph

400 500 600 700 800

0.00

00.

002

0.00

40.

006

Millseconds

Pro

babi

lity

dens

ity

I The solid line is a guess at their distributionI The dotted line is the distribution predicted by the model for

the population from which the participants are drawn

I Reasonably close correspondence

Page 90: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Inferences about cluster-level parameters II

I The BLUP participant-specific “average” RTs for this datasetare black lines on the base of this graph

400 500 600 700 800

0.00

00.

002

0.00

40.

006

Millseconds

Pro

babi

lity

dens

ity

I The solid line is a guess at their distributionI The dotted line is the distribution predicted by the model for

the population from which the participants are drawnI Reasonably close correspondence

Page 91: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Inference about cluster-level parameters III

I Participants may also have idiosyncratic sensitivities toneighborhood density

I Incorporate by adding cluster-level slopes into the model:

RTij = α + βxij +

∼N(0,Σb)︷ ︸︸ ︷b1i + b2i xij +

Noise∼N(0,σε)︷︸︸︷εij

I In R (once again we can omit the 1’s):

RT ∼ 1 + x + (1 + x | participant)

> lmer(RT ~ neighbors.centered +

(neighbors.centered | participant), dat,REML=F)

[...]Random effects:Groups Name Variance Std.Dev. Corrparticipant (Intercept) 4928.625 70.2042

neighbors.centered 19.421 4.4069 -0.307Residual 19107.143 138.2286

These three numbers jointlycharacterize Σ̂b

Page 92: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Inference about cluster-level parameters III

I Participants may also have idiosyncratic sensitivities toneighborhood density

I Incorporate by adding cluster-level slopes into the model:

RTij = α + βxij +

∼N(0,Σb)︷ ︸︸ ︷b1i + b2i xij +

Noise∼N(0,σε)︷︸︸︷εij

I In R (once again we can omit the 1’s):

RT ∼ 1 + x + (1 + x | participant)

> lmer(RT ~ neighbors.centered +

(neighbors.centered | participant), dat,REML=F)

[...]Random effects:Groups Name Variance Std.Dev. Corrparticipant (Intercept) 4928.625 70.2042

neighbors.centered 19.421 4.4069 -0.307Residual 19107.143 138.2286

These three numbers jointlycharacterize Σ̂b

Page 93: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Inference about cluster-level parameters III

I Participants may also have idiosyncratic sensitivities toneighborhood density

I Incorporate by adding cluster-level slopes into the model:

RTij = α + βxij +

∼N(0,Σb)︷ ︸︸ ︷b1i + b2i xij +

Noise∼N(0,σε)︷︸︸︷εij

I In R (once again we can omit the 1’s):

RT ∼ 1 + x + (1 + x | participant)

> lmer(RT ~ neighbors.centered +

(neighbors.centered | participant), dat,REML=F)

[...]Random effects:Groups Name Variance Std.Dev. Corrparticipant (Intercept) 4928.625 70.2042

neighbors.centered 19.421 4.4069 -0.307Residual 19107.143 138.2286

These three numbers jointlycharacterize Σ̂b

Page 94: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Inference about cluster-level parameters III

I Participants may also have idiosyncratic sensitivities toneighborhood density

I Incorporate by adding cluster-level slopes into the model:

RTij = α + βxij +

∼N(0,Σb)︷ ︸︸ ︷b1i + b2i xij +

Noise∼N(0,σε)︷︸︸︷εij

I In R (once again we can omit the 1’s):

RT ∼ 1 + x + (1 + x | participant)

> lmer(RT ~ neighbors.centered +

(neighbors.centered | participant), dat,REML=F)

[...]Random effects:Groups Name Variance Std.Dev. Corrparticipant (Intercept) 4928.625 70.2042

neighbors.centered 19.421 4.4069 -0.307Residual 19107.143 138.2286

These three numbers jointlycharacterize Σ̂b

Page 95: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Inference about cluster-level parameters III

I Participants may also have idiosyncratic sensitivities toneighborhood density

I Incorporate by adding cluster-level slopes into the model:

RTij = α + βxij +

∼N(0,Σb)︷ ︸︸ ︷b1i + b2i xij +

Noise∼N(0,σε)︷︸︸︷εij

I In R (once again we can omit the 1’s):

RT ∼ 1 + x + (1 + x | participant)

> lmer(RT ~ neighbors.centered +

(neighbors.centered | participant), dat,REML=F)

[...]Random effects:Groups Name Variance Std.Dev. Corrparticipant (Intercept) 4928.625 70.2042

neighbors.centered 19.421 4.4069 -0.307Residual 19107.143 138.2286

These three numbers jointlycharacterize Σ̂b

Page 96: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Inference about cluster-level parameters IV

450 500 550 600 650 700 750

02

46

810

Intercept

neig

hbor

s

●●

● ●

●●

●●

●●

●●

Participants

I Correlation visible in participant-specific BLUPsI Participants who were faster overall also tend to be more

affected by neighborhood density

Σ̂ =

(70.20 −0.3097−0.3097 4.41

)

Page 97: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Inference about cluster-level parameters IV

450 500 550 600 650 700 750

02

46

810

Intercept

neig

hbor

s

●●

● ●

●●

●●

●●

●●

Participants

I Correlation visible in participant-specific BLUPsI Participants who were faster overall also tend to be more

affected by neighborhood density

Σ̂ =

(70.20 −0.3097−0.3097 4.41

)

Page 98: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Inference about cluster-level parameters IV

450 500 550 600 650 700 750

02

46

810

Intercept

neig

hbor

s

●●

● ●

●●

●●

●●

●●

Participants

I Correlation visible in participant-specific BLUPsI Participants who were faster overall also tend to be more

affected by neighborhood density

Σ̂ =

(70.20 −0.3097−0.3097 4.41

)

Page 99: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Inference about cluster-level parameters IV

450 500 550 600 650 700 750

02

46

810

Intercept

neig

hbor

s

●●

● ●

●●

●●

●●

●●

Participants

I Correlation visible in participant-specific BLUPs

I Participants who were faster overall also tend to be moreaffected by neighborhood density

Σ̂ =

(70.20 −0.3097−0.3097 4.41

)

Page 100: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Inference about cluster-level parameters IV

450 500 550 600 650 700 750

02

46

810

Intercept

neig

hbor

s

●●

● ●

●●

●●

●●

●●

Participants

I Correlation visible in participant-specific BLUPsI Participants who were faster overall also tend to be more

affected by neighborhood density

Σ̂ =

(70.20 −0.3097−0.3097 4.41

)

Page 101: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Advantages of Multilevel Approach I

I There are several respects in which multi-level models gobeyond AN(C)OVA:

1. They handle imbalanced datasets quite well in contrast toAN(C)OVA (see e.g. Dixon, 2008; Baayen et al., 2008).

2. They handle empty cells3. Group-level means are fitted considering the number of

observations in the group in a rational way (cf. shrinkage)4. Multilevel models have more power.5. Like for any regression model, they make it easier to

investigate effect direction and even shape,

Page 102: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Advantages of Multilevel Approach I

I There are several respects in which multi-level models gobeyond AN(C)OVA:

1. They handle imbalanced datasets quite well in contrast toAN(C)OVA (see e.g. Dixon, 2008; Baayen et al., 2008).

2. They handle empty cells3. Group-level means are fitted considering the number of

observations in the group in a rational way (cf. shrinkage)4. Multilevel models have more power.5. Like for any regression model, they make it easier to

investigate effect direction and even shape,

Page 103: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Advantages of Multilevel Approach I

I There are several respects in which multi-level models gobeyond AN(C)OVA:

1. They handle imbalanced datasets quite well in contrast toAN(C)OVA (see e.g. Dixon, 2008; Baayen et al., 2008).

2. They handle empty cells

3. Group-level means are fitted considering the number ofobservations in the group in a rational way (cf. shrinkage)

4. Multilevel models have more power.5. Like for any regression model, they make it easier to

investigate effect direction and even shape,

Page 104: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Advantages of Multilevel Approach I

I There are several respects in which multi-level models gobeyond AN(C)OVA:

1. They handle imbalanced datasets quite well in contrast toAN(C)OVA (see e.g. Dixon, 2008; Baayen et al., 2008).

2. They handle empty cells3. Group-level means are fitted considering the number of

observations in the group in a rational way (cf. shrinkage)

4. Multilevel models have more power.5. Like for any regression model, they make it easier to

investigate effect direction and even shape,

Page 105: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Advantages of Multilevel Approach I

I There are several respects in which multi-level models gobeyond AN(C)OVA:

1. They handle imbalanced datasets quite well in contrast toAN(C)OVA (see e.g. Dixon, 2008; Baayen et al., 2008).

2. They handle empty cells3. Group-level means are fitted considering the number of

observations in the group in a rational way (cf. shrinkage)4. Multilevel models have more power.

5. Like for any regression model, they make it easier toinvestigate effect direction and even shape,

Page 106: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Advantages of Multilevel Approach I

I There are several respects in which multi-level models gobeyond AN(C)OVA:

1. They handle imbalanced datasets quite well in contrast toAN(C)OVA (see e.g. Dixon, 2008; Baayen et al., 2008).

2. They handle empty cells3. Group-level means are fitted considering the number of

observations in the group in a rational way (cf. shrinkage)4. Multilevel models have more power.5. Like for any regression model, they make it easier to

investigate effect direction and even shape,

Page 107: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Advantages of Multilevel Approach II

6. You can use non-linear linking functions (e.g., logit models forbinary-choice data)

7. You can cross cluster-level (random) effects

I Every trial belongs to both a participant cluster and an itemcluster (and could belong to even more clusters, e.g. socialvariables, experimental groups)

I You can build a single unified model for inferences from yourdata

I ANOVA requires separate by-participants and by-itemsanalyses (quasi-F ′ is quite conservative)

Page 108: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Advantages of Multilevel Approach II

6. You can use non-linear linking functions (e.g., logit models forbinary-choice data)

7. You can cross cluster-level (random) effects

I Every trial belongs to both a participant cluster and an itemcluster (and could belong to even more clusters, e.g. socialvariables, experimental groups)

I You can build a single unified model for inferences from yourdata

I ANOVA requires separate by-participants and by-itemsanalyses (quasi-F ′ is quite conservative)

Page 109: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Advantages of Multilevel Approach II

6. You can use non-linear linking functions (e.g., logit models forbinary-choice data)

7. You can cross cluster-level (random) effects

I Every trial belongs to both a participant cluster and an itemcluster (and could belong to even more clusters, e.g. socialvariables, experimental groups)

I You can build a single unified model for inferences from yourdata

I ANOVA requires separate by-participants and by-itemsanalyses (quasi-F ′ is quite conservative)

Page 110: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Advantages of Multilevel Approach II

6. You can use non-linear linking functions (e.g., logit models forbinary-choice data)

7. You can cross cluster-level (random) effects

I Every trial belongs to both a participant cluster and an itemcluster (and could belong to even more clusters, e.g. socialvariables, experimental groups)

I You can build a single unified model for inferences from yourdata

I ANOVA requires separate by-participants and by-itemsanalyses (quasi-F ′ is quite conservative)

Page 111: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

The logit link function for categorical data

I Much psycholinguistic data is categorical rather thancontinuous:

I Yes/no answers to alternations questionsI Speaker choice: (realized (that) her goals were unattainable)I Cloze continuations, and so forth. . .

I Linear models inappropriate; they predict continuous values

I We can stay within the multi-level generalized linear modelsframework but use different link functions and noisedistributions to analyze categorical data

I e.g., the logit model (Agresti, 2002; Jaeger, 2008)

ηij = α + βXij + bi (linear predictor)

ηij = logµij

1− µij(link function)

P(Y = y ;µij) = µij (binomial noise distribution)

Page 112: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

The logit link function for categorical data

I Much psycholinguistic data is categorical rather thancontinuous:

I Yes/no answers to alternations questionsI Speaker choice: (realized (that) her goals were unattainable)I Cloze continuations, and so forth. . .

I Linear models inappropriate; they predict continuous values

I We can stay within the multi-level generalized linear modelsframework but use different link functions and noisedistributions to analyze categorical data

I e.g., the logit model (Agresti, 2002; Jaeger, 2008)

ηij = α + βXij + bi (linear predictor)

ηij = logµij

1− µij(link function)

P(Y = y ;µij) = µij (binomial noise distribution)

Page 113: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

The logit link function for categorical data

I Much psycholinguistic data is categorical rather thancontinuous:

I Yes/no answers to alternations questionsI Speaker choice: (realized (that) her goals were unattainable)I Cloze continuations, and so forth. . .

I Linear models inappropriate; they predict continuous values

I We can stay within the multi-level generalized linear modelsframework but use different link functions and noisedistributions to analyze categorical data

I e.g., the logit model (Agresti, 2002; Jaeger, 2008)

ηij = α + βXij + bi (linear predictor)

ηij = logµij

1− µij(link function)

P(Y = y ;µij) = µij (binomial noise distribution)

Page 114: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

The logit link function for categorical data

I Much psycholinguistic data is categorical rather thancontinuous:

I Yes/no answers to alternations questionsI Speaker choice: (realized (that) her goals were unattainable)I Cloze continuations, and so forth. . .

I Linear models inappropriate; they predict continuous values

I We can stay within the multi-level generalized linear modelsframework but use different link functions and noisedistributions to analyze categorical data

I e.g., the logit model (Agresti, 2002; Jaeger, 2008)

ηij = α + βXij + bi (linear predictor)

ηij = logµij

1− µij(link function)

P(Y = y ;µij) = µij (binomial noise distribution)

Page 115: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

The logit link function for categorical data

I Much psycholinguistic data is categorical rather thancontinuous:

I Yes/no answers to alternations questionsI Speaker choice: (realized (that) her goals were unattainable)I Cloze continuations, and so forth. . .

I Linear models inappropriate; they predict continuous values

I We can stay within the multi-level generalized linear modelsframework but use different link functions and noisedistributions to analyze categorical data

I e.g., the logit model (Agresti, 2002; Jaeger, 2008)

ηij = α + βXij + bi (linear predictor)

ηij = logµij

1− µij(link function)

P(Y = y ;µij) = µij (binomial noise distribution)

Page 116: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

The logit link function for categorical data

I Much psycholinguistic data is categorical rather thancontinuous:

I Yes/no answers to alternations questionsI Speaker choice: (realized (that) her goals were unattainable)I Cloze continuations, and so forth. . .

I Linear models inappropriate; they predict continuous values

I We can stay within the multi-level generalized linear modelsframework but use different link functions and noisedistributions to analyze categorical data

I e.g., the logit model (Agresti, 2002; Jaeger, 2008)

ηij = α + βXij + bi (linear predictor)

ηij = logµij

1− µij(link function)

P(Y = y ;µij) = µij (binomial noise distribution)

Page 117: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

The logit link function for categorical data

I Much psycholinguistic data is categorical rather thancontinuous:

I Yes/no answers to alternations questionsI Speaker choice: (realized (that) her goals were unattainable)I Cloze continuations, and so forth. . .

I Linear models inappropriate; they predict continuous values

I We can stay within the multi-level generalized linear modelsframework but use different link functions and noisedistributions to analyze categorical data

I e.g., the logit model (Agresti, 2002; Jaeger, 2008)

ηij = α + βXij + bi (linear predictor)

ηij = logµij

1− µij(link function)

P(Y = y ;µij) = µij (binomial noise distribution)

Page 118: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

The logit link function for categorical data II

I We’ll look at the effect of neighborhood density on correctidentification as non-word in Bicknell et al. (2008)

I Assuming that any effect is linear in log-odds space (seeJaeger (2008) for discussion)

> lmer(correct ~ neighbors.centered + (1 | participant),

dat, family="binomial")

[...]Random effects:Groups Name Variance Std.Dev.participant (Intercept) 0.9243 0.9614Number of obs: 1760, groups: participant, 44

Fixed effects:Estimate Std. Error z value Pr(>|z|)

(Intercept) 3.16310 0.18998 16.649 < 2e-16 ***neighbors.centered -0.18207 0.03483 -5.228 1.72e-07 ***

α̂—participants usually right

σ̂b (note there is no

σ̂ε for logit models)

β̂—effect small compared with inter-subject variation

Page 119: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

The logit link function for categorical data II

I We’ll look at the effect of neighborhood density on correctidentification as non-word in Bicknell et al. (2008)

I Assuming that any effect is linear in log-odds space (seeJaeger (2008) for discussion)

> lmer(correct ~ neighbors.centered + (1 | participant),

dat, family="binomial")

[...]Random effects:Groups Name Variance Std.Dev.participant (Intercept) 0.9243 0.9614Number of obs: 1760, groups: participant, 44

Fixed effects:Estimate Std. Error z value Pr(>|z|)

(Intercept) 3.16310 0.18998 16.649 < 2e-16 ***neighbors.centered -0.18207 0.03483 -5.228 1.72e-07 ***

α̂—participants usually right

σ̂b (note there is no

σ̂ε for logit models)

β̂—effect small compared with inter-subject variation

Page 120: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

The logit link function for categorical data II

I We’ll look at the effect of neighborhood density on correctidentification as non-word in Bicknell et al. (2008)

I Assuming that any effect is linear in log-odds space (seeJaeger (2008) for discussion)

> lmer(correct ~ neighbors.centered + (1 | participant),

dat, family="binomial")

[...]Random effects:Groups Name Variance Std.Dev.participant (Intercept) 0.9243 0.9614Number of obs: 1760, groups: participant, 44

Fixed effects:Estimate Std. Error z value Pr(>|z|)

(Intercept) 3.16310 0.18998 16.649 < 2e-16 ***neighbors.centered -0.18207 0.03483 -5.228 1.72e-07 ***

α̂—participants usually right

σ̂b (note there is no

σ̂ε for logit models)

β̂—effect small compared with inter-subject variation

Page 121: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

The logit link function for categorical data II

I We’ll look at the effect of neighborhood density on correctidentification as non-word in Bicknell et al. (2008)

I Assuming that any effect is linear in log-odds space (seeJaeger (2008) for discussion)

> lmer(correct ~ neighbors.centered + (1 | participant),

dat, family="binomial")

[...]Random effects:Groups Name Variance Std.Dev.participant (Intercept) 0.9243 0.9614Number of obs: 1760, groups: participant, 44

Fixed effects:Estimate Std. Error z value Pr(>|z|)

(Intercept) 3.16310 0.18998 16.649 < 2e-16 ***neighbors.centered -0.18207 0.03483 -5.228 1.72e-07 ***

α̂—participants usually right

σ̂b (note there is no

σ̂ε for logit models)

β̂—effect small compared with inter-subject variation

Page 122: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

The logit link function for categorical data II

I We’ll look at the effect of neighborhood density on correctidentification as non-word in Bicknell et al. (2008)

I Assuming that any effect is linear in log-odds space (seeJaeger (2008) for discussion)

> lmer(correct ~ neighbors.centered + (1 | participant),

dat, family="binomial")

[...]Random effects:Groups Name Variance Std.Dev.participant (Intercept) 0.9243 0.9614Number of obs: 1760, groups: participant, 44

Fixed effects:Estimate Std. Error z value Pr(>|z|)

(Intercept) 3.16310 0.18998 16.649 < 2e-16 ***neighbors.centered -0.18207 0.03483 -5.228 1.72e-07 ***

α̂—participants usually right

σ̂b (note there is no

σ̂ε for logit models)

β̂—effect small compared with inter-subject variation

Page 123: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

The logit link function for categorical data II

I We’ll look at the effect of neighborhood density on correctidentification as non-word in Bicknell et al. (2008)

I Assuming that any effect is linear in log-odds space (seeJaeger (2008) for discussion)

> lmer(correct ~ neighbors.centered + (1 | participant),

dat, family="binomial")

[...]Random effects:Groups Name Variance Std.Dev.participant (Intercept) 0.9243 0.9614Number of obs: 1760, groups: participant, 44

Fixed effects:Estimate Std. Error z value Pr(>|z|)

(Intercept) 3.16310 0.18998 16.649 < 2e-16 ***neighbors.centered -0.18207 0.03483 -5.228 1.72e-07 ***

α̂—participants usually right

σ̂b (note there is no

σ̂ε for logit models)

β̂—effect small compared with inter-subject variation

Page 124: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Summary

I Multi-level models may seem strange and foreign

I But all you really need to understand them is three basicthings

I Generalized linear modelsI The principle of maximum likelihoodI Bayesian inference

Page 125: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs III: Bayesian model fitting

P({βi}, σ|Y ) =

Likelihood︷ ︸︸ ︷P(Y |{βi}, σ)

Prior︷ ︸︸ ︷P({βi}, σ)

P(Y )

I Alternative tomaximum-likelihood:Bayesian model fitting

I Simple (uniform, non-informative)

prior: all combinations of(α, β, σ) equally probable

I Multiply by likelihood →posterior probabilitydistribution over (α, β, σ)

I Bound the region of highestposterior probabilitycontaining 95% ofprobability density → HPDconfidence region

−10

−5

0

5

10

15

20

25

0.00 0.06

P(ββ|Y)

ββ

pMCMC = 0.46

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

0.00

0

alpha

P(αα

|Y)

I pMCMC (Baayen et al., 2008)is 1 minus the largestpossible symmetricconfidence interval wholly onone side of 0

Page 126: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs III: Bayesian model fitting

P({βi}, σ|Y ) =

Likelihood︷ ︸︸ ︷P(Y |{βi}, σ)

Prior︷ ︸︸ ︷P({βi}, σ)

P(Y )

I Alternative tomaximum-likelihood:Bayesian model fitting

I Simple (uniform, non-informative)

prior: all combinations of(α, β, σ) equally probable

I Multiply by likelihood →posterior probabilitydistribution over (α, β, σ)

I Bound the region of highestposterior probabilitycontaining 95% ofprobability density → HPDconfidence region

−10

−5

0

5

10

15

20

25

0.00 0.06

P(ββ|Y)

ββ

pMCMC = 0.46

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

0.00

0

alpha

P(αα

|Y)

I pMCMC (Baayen et al., 2008)is 1 minus the largestpossible symmetricconfidence interval wholly onone side of 0

Page 127: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs III: Bayesian model fitting

P({βi}, σ|Y ) =

Likelihood︷ ︸︸ ︷P(Y |{βi}, σ)

Prior︷ ︸︸ ︷P({βi}, σ)

P(Y )

I Alternative tomaximum-likelihood:Bayesian model fitting

I Simple (uniform, non-informative)

prior: all combinations of(α, β, σ) equally probable

I Multiply by likelihood →posterior probabilitydistribution over (α, β, σ)

I Bound the region of highestposterior probabilitycontaining 95% ofprobability density → HPDconfidence region

−10

−5

0

5

10

15

20

25

0.00 0.06

P(ββ|Y)

ββ

pMCMC = 0.46

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

0.00

0

alpha

P(αα

|Y)

I pMCMC (Baayen et al., 2008)is 1 minus the largestpossible symmetricconfidence interval wholly onone side of 0

Page 128: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs III: Bayesian model fitting

P({βi}, σ|Y ) =

Likelihood︷ ︸︸ ︷P(Y |{βi}, σ)

Prior︷ ︸︸ ︷P({βi}, σ)

P(Y )

I Alternative tomaximum-likelihood:Bayesian model fitting

I Simple (uniform, non-informative)

prior: all combinations of(α, β, σ) equally probable

I Multiply by likelihood →posterior probabilitydistribution over (α, β, σ)

I Bound the region of highestposterior probabilitycontaining 95% ofprobability density → HPDconfidence region

−10

−5

0

5

10

15

20

25

0.00 0.06

P(ββ|Y)

ββ

pMCMC = 0.46

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

0.00

0

alpha

P(αα

|Y)

I pMCMC (Baayen et al., 2008)is 1 minus the largestpossible symmetricconfidence interval wholly onone side of 0

Page 129: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs III: Bayesian model fitting

P({βi}, σ|Y ) =

Likelihood︷ ︸︸ ︷P(Y |{βi}, σ)

Prior︷ ︸︸ ︷P({βi}, σ)

P(Y )

I Alternative tomaximum-likelihood:Bayesian model fitting

I Simple (uniform, non-informative)

prior: all combinations of(α, β, σ) equally probable

I Multiply by likelihood →posterior probabilitydistribution over (α, β, σ)

I Bound the region of highestposterior probabilitycontaining 95% ofprobability density → HPDconfidence region

−10

−5

0

5

10

15

20

25

0.00 0.06

P(ββ|Y)

ββ

pMCMC = 0.46

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

0.00

0

alpha

P(αα

|Y)

I pMCMC (Baayen et al., 2008)is 1 minus the largestpossible symmetricconfidence interval wholly onone side of 0

Page 130: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs III: Bayesian model fitting

P({βi}, σ|Y ) =

Likelihood︷ ︸︸ ︷P(Y |{βi}, σ)

Prior︷ ︸︸ ︷P({βi}, σ)

P(Y )

I Alternative tomaximum-likelihood:Bayesian model fitting

I Simple (uniform, non-informative)

prior: all combinations of(α, β, σ) equally probable

I Multiply by likelihood →posterior probabilitydistribution over (α, β, σ)

I Bound the region of highestposterior probabilitycontaining 95% ofprobability density → HPDconfidence region

−10

−5

0

5

10

15

20

25

0.00 0.06

P(ββ|Y)

ββ

pMCMC = 0.46

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 6000.

000

alpha

P(αα

|Y)

I pMCMC (Baayen et al., 2008)is 1 minus the largestpossible symmetricconfidence interval wholly onone side of 0

Page 131: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Reviewing GLMs III: Bayesian model fitting

P({βi}, σ|Y ) =

Likelihood︷ ︸︸ ︷P(Y |{βi}, σ)

Prior︷ ︸︸ ︷P({βi}, σ)

P(Y )

I Alternative tomaximum-likelihood:Bayesian model fitting

I Simple (uniform, non-informative)

prior: all combinations of(α, β, σ) equally probable

I Multiply by likelihood →posterior probabilitydistribution over (α, β, σ)

I Bound the region of highestposterior probabilitycontaining 95% ofprobability density → HPDconfidence region

−10

−5

0

5

10

15

20

25

0.00 0.06

P(ββ|Y)

ββ

pMCMC = 0.46

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 600

−10

05

1020

Intercept

Nei

ghbo

rs

<αα̂,ββ̂>

200 300 400 500 6000.

000

alpha

P(αα

|Y)

I pMCMC (Baayen et al., 2008)is 1 minus the largestpossible symmetricconfidence interval wholly onone side of 0

Page 132: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Bayesian inference for multilevel models

P({βi}, σb, σε|Y ) =

Likelihood︷ ︸︸ ︷P(Y |{βi}, σb, σε)

Prior︷ ︸︸ ︷P({βi}, σb, σε)

P(Y )

I We can also use Bayes’rule to draw inferencesabout fixed effects

I Computationally morechallenging than withsingle-level regression;Markov-chain MonteCarlo (MCMC) samplingtechniques allow us toapproximate it

560 580 600 620

68

1012

Intercept

neig

hbor

s

560 580 600 620

0.00

αα

P(αα

|Y)

4

6

8

10

12

14

0.00

P(αα|Y)

ββ

Page 133: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Bayesian inference for multilevel models

P({βi}, σb, σε|Y ) =

Likelihood︷ ︸︸ ︷P(Y |{βi}, σb, σε)

Prior︷ ︸︸ ︷P({βi}, σb, σε)

P(Y )

I We can also use Bayes’rule to draw inferencesabout fixed effects

I Computationally morechallenging than withsingle-level regression;Markov-chain MonteCarlo (MCMC) samplingtechniques allow us toapproximate it

560 580 600 620

68

1012

Intercept

neig

hbor

s

560 580 600 620

0.00

αα

P(αα

|Y)

4

6

8

10

12

14

0.00

P(αα|Y)

ββ

Page 134: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

Bayesian inference for multilevel models

P({βi}, σb, σε|Y ) =

Likelihood︷ ︸︸ ︷P(Y |{βi}, σb, σε)

Prior︷ ︸︸ ︷P({βi}, σb, σε)

P(Y )

I We can also use Bayes’rule to draw inferencesabout fixed effects

I Computationally morechallenging than withsingle-level regression;Markov-chain MonteCarlo (MCMC) samplingtechniques allow us toapproximate it

560 580 600 620

68

1012

Intercept

neig

hbor

s

560 580 600 620

0.00

αα

P(αα

|Y)

4

6

8

10

12

14

0.00

P(αα|Y)

ββ

Page 135: A Brief and Friendly Introduction to Mixed-Effects Models in … · 2009. 4. 30. · A Brief and Friendly Introduction to Mixed-E ects Models in Psycholinguistics parameters b b 1

References I

Agresti, A. (2002). Categorical Data Analysis. Wiley, second edition.

Baayen, R. H., Davidson, D. J., and Bates, D. M. (2008). Mixed-effectsmodeling with crossed random effects for subjects and items. Journalof Memory and Language. In press.

Bicknell, K., Elman, J. L., Hare, M., McRae, K., and Kutas, M. (2008).Online expectations for verbal arguments conditional on eventknowledge. In Proceedings of the 30th Annual Conference of theCognitive Science Society, pages 2220–2225.

Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs(transformation or not) and towards logit mixed models. Journal ofMemory and Language. In press.