Basics of Structural Equation Modeling

37
Basics of Structural Equation Modeling Dr. Sean P. Mackinnon

Transcript of Basics of Structural Equation Modeling

Page 1: Basics of Structural Equation Modeling

Basics of Structural Equation Modeling

Dr. Sean P. Mackinnon

Page 2: Basics of Structural Equation Modeling

Virtually every model you’ve done already using the Ordinary Least Squares approach (linear regression; uses sums of squares) can also be done using SEM

The difference is primarily how the parameters and SEs are calculated (SEM uses Maximum Likelihood

Estimation instead of Sums of Squares)

First, let’s get used to the notation of SEM diagrams

Page 3: Basics of Structural Equation Modeling

Correlation Coefficient

Depression Anxiety

.50

Rectangles indicate observed variables

Double-headed arrows indicate covariances

(so if standardized variables are used, it’s a pearson r)

Page 4: Basics of Structural Equation Modeling

Linear Regression

Depression Anxiety

.50

Single headed arrows are paths

In this example, depression is the IV and anxiety is the DV

IVs = exogenous variables (no arrows pointing to them)DVs = endogenous variables (arrows pointing to them)

Page 5: Basics of Structural Equation Modeling

Variances and Residual Variances

Depression Anxiety

.50

Exogenous variables also have a variance as a parameter

Endogenous variables have residual variance as a parameter (i.e., error; the portion of variance unexplained by model)

These are rarely drawn out explicitly in the diagrams, but worth remembering for later when we’re counting parameters and for more advanced applications.

Page 6: Basics of Structural Equation Modeling

Multiple Regression

Perfectionism Anxiety

.40

The correlations among DVs is specified in SPSS too

You just don’t get the output from it

R2 values often put in top right corner of DVs

Depression

SES

.26

-.11

.25

.09

.30

.01

Page 7: Basics of Structural Equation Modeling

Moderation

Perfectionism Depression

.40

Moderation is specified the same way as multiple regression

Only difference is that one of the variables is an interaction

Stress

Perfectionism * Stress

.26

-.11

.25

.09

.30

.01

Page 8: Basics of Structural Equation Modeling

Perfectionism Depression

Conflicta-path b-path

c’-path

Instead of a two-step process, it’s done all in one single analysis

If you want to get the c-path, run one more linear regression w/o the conflict variable included

Usually you’d use bootstrapping to test the indirect effect (a*b) in SEM

Mediation

Page 9: Basics of Structural Equation Modeling

Independent t-test

Sex Anxiety

B = 1.25

Sex is coded as 0 (women) or 1 (men)

Use unstandardized coefficents

The value for the intercept is mean for women

The value for the slope + intercept is value for men

If p-value for the slope < .05, the means are different

Page 10: Basics of Structural Equation Modeling

One-Way ANOVA (3 groups)

Treatment 1 (dummy)

Anxiety

Treatment 2 (dummy)

Original variable:1 = Control group; 2 = Treatment 1; 3 = Treatment 2

Treatment 1 (dummy): 1 = Treatment 1, 0 = other groupsTreatment 2 (dummy): 1 = Treatment 2, 0 = other groups

Similar to the t-test, you can get means for each group

This kind of dummy coding compares treatments to the control group

Page 11: Basics of Structural Equation Modeling

SEM can also address more complicated questions

Page 12: Basics of Structural Equation Modeling

Path Analysis

Complex relationships between variables can be used to test theory

Mackinnon et al. (2011)

Page 13: Basics of Structural Equation Modeling

Confirmatory Factor Analysis

Negative Affect

Anger Shame Sadness

Ovals represent latent variables

Paths are factor loadings in this diagram

Conceptually, this is like an EFA except you have an idea ahead of time about what items should comprise the latent variable

(and we can test hypotheses!)

Page 14: Basics of Structural Equation Modeling

Structural Equation ModelingLike path analysis, except looks at relationships among latent variables

Useful, because it accounts for the unreliability of measurement so it offers more un-biased parameters

Also lets you test virtually any theory you might have

Mackinnon et al. (2012)

Page 15: Basics of Structural Equation Modeling

Rules for Building Models

• Every path, correlation, and variance is a parameter

• The number of parameters cannot exceed the number of data points– If so, your model is under-identified, and can’t be

estimated using SEM

• Data points are calculated by:– p(p+1) / 2

– Where p = The number of observed variables

– Ex. with 3 variables: 3(4) / 2 = 6

Page 16: Basics of Structural Equation Modeling

A just-identified or “saturated” model

Perfectionism Anxiety

In this case, 4 variables:4*5 / 2 = 10 possible data points

Ten Parameters:4 variances

+6 covariances

Depression SES

So really, it’s a model where everything is related to everything else! Not very parsimonious

Page 17: Basics of Structural Equation Modeling

Another just-identified model

Perfectionism Anxiety

In this case, 3 variables:3*4 / 2 = 6 possible data points

Six Parameters:3 variances

1 covariance2 paths

Depression

Note that the variances for endogenous variables will be residual variances (parts unexplained by the predictors)

Page 18: Basics of Structural Equation Modeling

More Parsimonious Models

Just identified models are interesting, but often not parsimonious (i.e., everything is related to everything)

Are there paths or covariances in your model that you can remove, but still end up with a well-fitting model?

Path analysis and SEM can answer these questions. When we fit models with fewer parameters than data points, we can see if the model is still a good “fit” with some paths omitted

Page 19: Basics of Structural Equation Modeling

An identified mediation model

Perfectionism Depression

Conflicta-path b-path

Fix to Zero

In this case, 3 variables:3*4 / 2 = 6 possible data points

Five Parameters:3 variances

2 paths(path fixed to zero has been “freed”)

Can we remove the c’ path from this mediation model? This model is more parsimonious, so it would be preferred. Fit indices judge the adequacy of this model.

Page 20: Basics of Structural Equation Modeling

Model FitFit refers to the ability of a model to reproduce the data (i.e., usually the variance-covariance matrix).

1 2 3

1. Perfect 2.6

2. Conflict .40 5.2

3. Depression 0 .32 3.5

1 2 3

1. Perfect 2.5

2. Conflict .39 5.3

3. Depression .03 .40 3.1

Predicted by Model Actually observed in your data

So, in SEM we compare these matrices (model-created vs. actually observed in your data), and see how discrepant they are. If they are basically identical, the model “fits well”

Page 21: Basics of Structural Equation Modeling

Model Fit χ2

We condense these matrix comparisons into a SINGLE NUMBER:

Chi-square (χ2)df = (data points) – (estimated parameters)

It tests the null hypothesis that the model fits the data well (i.e., the model covariance matrix is very similar to the observed covariance matrix)

Thus, non-significant chi-squares are better!

Page 22: Basics of Structural Equation Modeling

Problems with χ2

Simulation studies show that the chi-square is TOO sensitive. It rejects models way more often than it should.

More importantly, it is tied to sample size. As sample size increases, the likelihood of a significant chi-square increases.

Thus, there is a very high Type II error rate, and it gets worse as sample size increases. Thus, we need alternative methods that account for this.

Page 23: Basics of Structural Equation Modeling

Incremental Fit Indices

Incremental fit indices compare your model to the fit of the baseline or “null” model:

Perfectionism Depression

ConflictFix to Zero Fix to Zero

Fix to Zero

The null model fixes all covariances and paths to be zeroSo, every variable is unrelated

Technically, the most parsimonious model, but not a useful one

Page 24: Basics of Structural Equation Modeling

Incremental Fit Indices

Confirmatory Fit Index (CFI)

d(Null Model) - d(Proposed Model)d(Null Model)

Let d = χ2 - df where df are the degrees of freedom of the model. If the index is greater than one, it is set at one and if less than zero, it is set to zero.

Values range from 0 (no fit) to 1.0 (perfect fit)

http://davidakenny.net/cm/fit.htm

Page 25: Basics of Structural Equation Modeling

Tucker-Lewis Index

Tucker-Lewis Index (TLI) Assigns a penalty for model complexity (prefers more parsimonious models).

χ2/df(Null Model) - χ2/df(Proposed Model)χ2/df(Null Model) – 1

Value range from 0 (no fit) to 1.0 (perfect fit)

The TLI is more conservative, will almost always reject more models than CFI

http://davidakenny.net/cm/fit.htm

Page 26: Basics of Structural Equation Modeling

Parsimonious Indices

Root Mean Square Approximation of Error (RMSEA)

Similar to the others, except that it doesn’t actually compare to the null model, and (like TLI) offers a penalty for more complex models:

√(χ2 - df)

√[df(N - 1)]

Can also calculate a 90% CI for RMSEA

http://davidakenny.net/cm/fit.htm

Page 27: Basics of Structural Equation Modeling

Absolute Indices

Standardized Root Mean Square Residual (SRMR)

The formula is kind of complicated, so conceptual understanding is better. This one uses the residuals.

The SRMR is an absolute measure of fit and is defined as the standardized difference between the observed correlation matrix and the predicted correlation matrix.

A value of 0 = perfect fit (i.e., residuals of zero)

The SRMR has no penalty for model complexity.

http://davidakenny.net/cm/fit.htm

Page 28: Basics of Structural Equation Modeling

Fit Indices Cut-offs• χ2

– ideally non-significant, p > .01 or even p > .001• CFI and TLI

– Ideally greater than .95• RMSEA

– Ideally less than .06– Ideally, 90% CI for RMSEA doesn’t contain .08 or higher

• SRMR– Ideally less than .08

Citations for papers:Kline, R. B. (2011). Principles and practice of structural equation modeling (3rd ed.). New York, NY: Guilford.

Hooper, D., Coughlan, J., & Mullen, M. (2008). Structural equation modelling: guidelines for determining model fit. Electronic Journal of Business Research Methods, 6, 53-60.

Page 29: Basics of Structural Equation Modeling

A problem with latent variables

In this case, 3 variables:3*4 / 2 = 6 possible data points

Seven Parameters:3 variances for observed vars

1 variance for LATENT variable3 paths (factor loadings)

This model can’t be estimated!

Also, the latent variable has no metric (what does a “1” on this latent variable even mean?

Negative Affect

Anger Shame Sadness

Page 30: Basics of Structural Equation Modeling

A problem with latent variables

A solution:Fix the variance of the latent variable to 1. This frees up one parameter.

The latent variable becomes standardized with a mean of zero, and standard deviation of 1.

(Actually, all along we’ve been constraining the means to be zero to simplify the math “saturated mean structure”. Usually we don’t care

about the means for our theory so they aren’t explicitly modeled)

Constrain to be 1.0

Negative Affect

Anger Shame Sadness

Page 31: Basics of Structural Equation Modeling

A problem with latent variables

An alternate solution:Fix one of the factor loadings (typically the one expected to have the

largest loading) to 1. This also frees up one parameter.

The latent variable will have the same variance as the observed variable that was constrained to be 1.0

Either solution works, and won’t affect fit indices

Constrain to be 1.0

Negative Affect

Anger Shame Sadness

Page 32: Basics of Structural Equation Modeling

Let’s try a sample analysis in R

A confirmatory factor analysis with 10 items and 1 latent variable (general self-esteem).

Page 33: Basics of Structural Equation Modeling

Install packages you’ll need

#For converting an SPSS file for R

install.packages(“foreign", dependencies = TRUE)

#For running structural equation modeling

install.packages("lavaan", dependencies = TRUE)

You only need to do this once ever (not every time you load R)

Page 34: Basics of Structural Equation Modeling

Get the SPSS file into R

#Load the foreign packagelibrary(foreign)

#Set working directory to where the dataset is located. This is also where you’ll save files. I’d create a new folder for this somewhere on your computersetwd("C:/Users/Sean Mackinnon/Desktop/R Analyses")

#Take the datafile and read it into R. This datafile will be henceforth called “lab9data” when working in Rlab9data <- read.spss("A4.selfesteem.sav", use.value.labels = TRUE, to.data.frame = TRUE)

Page 35: Basics of Structural Equation Modeling

Specify the model

#Load the lavaan package (only need to do once per time you open R)library(lavaan)

#Specify the model you’re testing, call that model “se.g.model1” (could call it anything)#By default, will constrain the first indicator to be 1.0se.g.model1 <-'se_g =~ se3 + se16r + se29 + se42r + se55 + se68 + se81r + se94 + se107r + se120r + se131 + se135r'

Page 36: Basics of Structural Equation Modeling

Fit the model

#Fit the data, call that fitted model “fit” (or anything you want)#Estimator = “MLR” is a robust estimator. I recommend always using this instead of the default.#missing = “ML” is to handle missing data using a full information maximum likelihood method#fixed.x = “TRUE” is optional. I include it because I want results to be similar to Mplus, which is another program I use often. See lavaan documentation for more info.

fit <- cfa(se.g.model1, data = lab9data, estimator = "MLR“, missing = “ML”, fixed.x = “TRUE”)

Page 37: Basics of Structural Equation Modeling

Request Output

#request the summary statistics to interpret

#In this case, I request fit indices and standardized values in addition to default output

summary(fit, fit.measures = TRUE, standardized = TRUE)