Longitudinal data

Longitudinal data

• Used commonly in developmental psychology, but also in other fields

• Longitudinal data are very useful for determining trajectories of functioning (i.e., depression over 5 years)

• Also useful for determining causal relations

My favorite example

StressPsycho.

adaptation-.58**

Note. These data are typically obtained in one point of time,i.e., they are concurrent. Can we draw the arrow as we have here?

Correlation is not causation!(Have we heard that before?)

Stress

Psycho.adaptation

r = -.58**

What is the direction of causation, ifany exists? Does stress cause poorerpsychological adaptation, or the reverse?Can’t tell with these data. Can proposeyour hypothesis (previous page), but can’tprove this relationship this way.

How about this?

StressPsycho.

adaptation-.36*

Time1 Time2

R2=.20

Well, there are a lot of variables that might be affecting this relationship

Psycho.adaptation

Psycho.adaptation

Time1 Time2

SES

Stress

The recommended method(Pedhazur, 1997): Residualized regression

Psycho.adaptation

Time1 Time2

SES

Stress

Psycho.adaptation

Enteredfirst

All othervariablesenteredsubse-quently Note. However, exogenous variables

like SES, gender, etc. may be entered first as covariates.

R2=.85

R2ch=.02

R2ch=.05

So, how would it work with the stress and coping paradigm?

Stress

Psycho.adaptation

Stress

Psycho.adaptation

Time 1 Time2

How do we assess causality here?

Stress

Psycho.adaptation

Stress

Psycho.adaptation

Time 1 Time2

Stability & autocorrelation

Stability & autocorrelation

A B

A: Stress1 -> Outcome2 B: Outcome1 -> Stress2

A simpler example: Stability and autocorrelation

• Used a dataset by Wheaton et al. (1977), in which they were interested in whether alienation was reasonably stable over a period of 4 years

• Latent variable of alienation, made up of anomie and powerlessness

• Also curious whether socio-economic status would impact on this stability

Wheaton et al. model

Anomie 67 Powerl 67 Anomie 71 Powerl 71

Alienation 67 Alienation 71

1 1

Syntax, of courseStability of alienation, First Model: uncorrelated error terms

DA NI=6 NO=932 MA=KM

KM

11.834

6.947 9.364

6.819 5.091 12.532

4.783 5.028 7.495 9.986

-3.839 -3.889 -3.841 -3.625 9.610

-2.190 -1.883 -2.175 -1.878 3.552 4.503

la

anomia67 power67 anomia71 power71 educatin socioind

se

1 2 3 4 /

MO NY=4 NE=2 BE=SD PS=DI TE=SY

le

alien67 alien71

FR LY 2 1 LY 4 2

VA 1.0 LY 1 1 LY 3 2

PD

OU SE TV MI ND=2

Anomie 67

Powerl 67

Anomie 71

Powerl 71

.31

.37

.27

.39

Alienation 67

Alienation 71

1.0

.95

1.0

.91

.69

.33.77

First model: uncorrelated error terms

R2=.56

How does it fit?

•X2(1) = 61.11

•RMSEA = .25

•NFI = .96

•PNFI = .16

•CFI = .96

•RFI = .81

•Crit. N = 102.09

•GFI = .97

•AGFI = .69

•PGFI = .10

What’s da problem?

• Would like to allow correlated error . . .• But there’s another problem: namely, there’s only 1

df; can’t allow program to estimate any more parameters.

• Ooops, now what? Well, can increase the scope of the model to include other variables, and “borrow” some dfs from those relationships

• Two SES variables: education and socioeconomic indicator

• Make the model a bit more complex (see next page)



1 1

SES 67Education

1

Socio-econindicator

Second model: new latent variable, uncorrelatederror terms

Anomie 67

Powerl 67

Anomie 71

Powerl 71

.34

.34

.30

.36

Alienation 67

Alienation 71

1.0

1.0

1.0

.95

.45

.30.68**

Second model: uncorrelated error terms

SES

Education 67

SEI 67

.31

.58

1.0

.78

-.55*

-.15*

R2=.32

R2=.58

A better fit?

•X2(6) = 71.47

•RMSEA = .11

•NFI = .97

•PNFI = .39

•CFI = .97

•RFI = .92

•Crit. N = 219.99

•GFI = .98

•AGFI = .91

•PGFI = .28

Now, what about correlated error?Would that help?



1 1

SES 67Education

1

Socio-econindicator

Third model: 3 latents & correlated error for anomie

Now, do we have a better fit?

•X2(5) = 6.33

•RMSEA = .02

•NFI = 1.00

•PNFI = .33

•CFI = 1.00

•RFI = .99

•Crit. N = 2218.49

•GFI = 1.00

•AGFI = .99

•PGFI = .24

What about correlated error for powerlessness?

• I ran a fourth model, and the change in chi-square was trivial, in other words, correlated anomie was important, but correlated powerlessness was not.

• Why not? Who knows? Sometimes the same measure carries forward more error than other measures.

• Defensible to correlate all identical measures, but maybe should check to find out whether it makes a difference or not.

Other techniques?

• There are a number of other approaches to longitudinal data.

• Let’s briefly consider the SIMPLEX model• One obtains measures from the same subjects on

the same measure over time, usually more than three times.

• Find that correlations between close time points are higher than between distant time points

Simplex for college grades

• Humphreys (1968) got eight semesters of grade point average (0 to 4 scale)

• Wanted to see how stable the grades were over time

• Simply a matter of computing betas between etas (see next page)

y1 y2 y3 y4

Etc.

A simplex model

Findings

• Findings were not all that earth-shattering:– Correlation between contiguous GPAs were about .90;

between T1 and T8 was .62

– Found that the stabilities (i.e., betas) stayed about the same across time

– Reliabilities of GPA grew slightly over time, in other words, it became a more stable indicator

– Doesn’t tell us anything about the direction of change in the variable (i.e., do GPAs go up, down, or stay the same?). Next method can tell us something about this.

Other methods for analyzing change

• Hierarchical linear modeling (Raudenbush & Bryk) or latent growth curve modeling is all the rage right now.

• Can use LISREL (HLM is sold by SSI too) to perform this analysis. Multilevel modeling (in SAS) does the same thing

• Must have three time points (they don’t have to be equidistant)

• Don’t need a gazillion subjects to do this analysis: big advantage

What’s the logic of HLM?

• Think of it within a regression perspective

• Want to perform a regression in a case where subjects are nested within another variable. For example, students are nested within schools. You have information about school-level variables, but can’t regress these variables in the typical fashion.

Student-level variables: academic ability scores, gender, and socio-economic status

School-level variables: teacher/student ratio,tax dollars spent per pupil, amount of teachertraining.

Dependent variable: grade point average

GPA = slope(ability) + slope(gender) + slope(ratio) + slope(tax)

Individual students

123 students250

students

34students

194students 62

students

Problem: students are nested within a higher level

Why is this a problem?• If you just throw all the variables into a single

regression equation, you would have many students with the same school-level variables.

• You would treat the school-level variables as though they are individual-level variables, i.e., varying for each individual.

• In other words, those values would be correlated among individuals.

• HLM “separates” out the two levels (hence the name, multilevel modeling) in a more appropriate statistical fashion.

Why am I telling you all this?

• You might have an occasion when your data is nested--within institutions, within geographical locations, etc.—and you want to consider both individual-level and group-level data.

• A second reason: it works brilliantly for longitudinal designs in that subjects are nested within age. There are certain statistical advantages in treating the data in this fashion.

0

20

40

60

80

100

120

140

160

1993 1995 1997

Chin-AmerEuro-Amer

Found an Ethnic group X Time of measurementinteraction (p < .001) on PPVT scores

Advantages

• Can learn similar things from repeated measures MANOVA and regression, but HLM is more powerful in its ability to combine variables across nested variables.

Another topic: MIMIC(Multiple indicators and multiple causes)

• Back to LISREL• Sometimes you’ll have exogenous variables

that predict something, and indicators that can be combined into a latent variable, but how does one combine them in a sensible model?

• The exogenous variables are X indicators, and the other variables are Y indicators.

Income

Occu-pation

Educa-tion

Socialparticipation

Churchattend.

Mem-ber-ships

friendsseen

One latent variable is predicted by Xs: mixed model

Mixed models

• The implication is that you can combine observed and latent variables in a single model.

• Why? Might not have multiple indicators of a construct. (But there are ways around this: splitting a measure into 2 or 3 equal parts.)

• Why couldn’t Hodge and Treiman (1968) create a latent variable—SES—from the three X indicators? (see next page)

• May want to know the strength of each specific predictor.

Income

Occu-pation

Educa-tion

Socialparticipation

Churchattend.

Mem-ber-ships

friendsseen

SES

Two latent variables instead of one

The end, or is it just the beginning?

• Thanks for listening, and struggling through a lot of arcane jargon and arbitrary nomenclature.

• I hope that you have a new appreciation for the possibilities of structural equation modeling, whether you use LISREL, AMOS, or EQS.

• May the thought of these possibilities lead you to construct more sophisticated studies and become rich and famous! Well, maybe famous. Okay, maybe mildly known among a few other isolated academics. . . At the least, have fun.

Longitudinal data

Documents

Transcript of Longitudinal data