ICPSR General Structural Equations

1

ICPSR General Structural Equations

Week 4 No. 2

2

Review of solutions for non-normal and missing data (see handout)Issue #1: My data are not normally distributed

Each variable has a reasonable number of discrete values (10 or more for most, with perhaps the odd variable with 5-6* but definitely no variables with fewer than 5 discrete values). [*variables with smaller number of categories should not be heavily skewed]Solution #1: AMOS, LISREL and SAS-CALISTransform the data to reduce the level of kurtosis within the Stat package (SAS or SPSS).

COMPUTE LVAR1 = LN(VAR1).COMPUTE LVAR1= LN(VAR1 + .1). [if there are 0 values]COMPUTE VAR1_2 = VAR1**2. COMPUTE VAR1_2 = 1 / VAR1.See John Fox’s Regression text or other regression texts formore details. Usually, dealing with skewedness also dealswith Kurtosis.

Checks: DESCRIPTIVES VARIABLES=VAR1 /STATISTICS = SKEW KURTOSIS.

With transformed data, regular ML covariance analysis can be used.

3

Non-normal data ADF estimationSolution #2: AMOS, LISREL, SAS-CALIS

Use an ADF (arbitrary distribution function) estimator.

AMOS: An option under Analysis Options.LISREL: Input of a asymptotic covariance matrix (4th moment matrix)

is required. To generate such a matrix in PRELIS, check off the asymptotic covariances check box and insert a file

name. In LISREL, you will need to add a line to read in this matrix:CM FI= AC FI=

And you will need to specify the ADF fit function:OU ME=WLSAS PROC CALIS METHOD=WLSImportant note on the ADF fit function:

Large sample sizes are required For the acov matrix to be non-singular, N must be at least > p + (1/2)(p)(p+1)

20 variables: N>23030 variables: N>495

Working anywhere near these minima is not recommended.

4

Solution #3: LISREL only. Scaled test statistics.

Use a scaled or adjusted chi-square and standard error calculation

(e.g., Bentler-Satora). Input of asymptotic covariance matrix is required, so it is necessary to specify an AC= line which points to the asymptotic covariance matrix but specify ME=ML and not ME=WL in LISREL. Probably Definitely better than ADF for small to moderate sized samples.

5

Missing Data

Issue #2: I have missing cases. My data are fairly normally distributed (or I have transformed them to near normality – Kurtosis values in the +1 to -1 range or fairly close to this).

Solution #1: Use EM algorithm to construct imputed covariance matrix [assumes normality]

LISREL: This is an option under PRELIS.Small limitation: imputed data treated as “real” data byLISREL (affects N, significance tests)

AMOS: If you have the SPSS Missing Data module, you may be able to generate an imputed covariance matrix. If you

do not, a “last ditch” approach would be to “flip” the datasetinto SAS (if you have it), use the SAS MI procedure, then “flip” the covariance matrix file back into SPSS. See Appendix in handout.

6

Missing Data


Solution #2: Use a multiple-group model to explicitly model missing data.AMOS, LISREL (SAS CALIS will not estimate multiple-group models)

This works if the number of missing data patterns is fairly small (say <3-5) or if cleaning up problems with a small number of missing data patterns deals with most of the overall problem

7

Missing Data


Solution #3: Use nearest-neighbor imputationLISREL only. Limitation: for data with small number of values for each variable, “ties” will be generated. Even with a generous criterion, imputation could easily fail for ½ of the cases.Small limitation: imputed data treated as “real” data byLISREL (affects N, significance tests)If working with STATA files, there is a user routine called hotdeck (see Stata Tech. bulletins #51 and #54). Must be installed.

8

Missing Data

Solution #3: Use nearest-neighbor imputationIf working with STATA files, there is a user routine called

hotdeck (see Stata Tech. bulletins #51 and #54). Must be installed. This is not the same as the Prelis nearest neighbour procedure, but uses some similar principles. With AMOS, must use Stata or PRELIS. Stata: use Stat-Transfer or DBMS-Copy to convert file to AMOS-readable SPSS .sav file.

Important note, from hotdeck documentation:If a dataset contains many variables with missing values then it ispossible that many of the rows of data will contain at least onemissing value. The hotdeck procedure will not work very well in suchcircumstances. There are more elaborate methods that only replacemissing values, rather than the whole row, for imputed values.

PRELIS: More complicated process to move data into SPSS. (see point #4 in handout “PRELISQuirks.doc”).

9

Missing Data

Solution #4: Use FIML estimation [assumes normality]AMOS:Check off “estimation using means and intercerpts”

under Analysis Options and then input dataset withmissing values. Amos will not provide modification indices with its version of FIML estimation (some other form of estimation needed for model-fitting)

LISREL Must input raw data into LISREL. Declare missing values in PRELIS (already done if SPSS file read into PRELIS), save the PRELIS .psf file and then read it into LISREL:Instead of CM FI= or SY FI= :RA FI=C:\TEMP\MYDATA.PSFWill also need a DA statement:

10

Missing Data

Issue #3: My data need to be weightedNote: sophisticated adjustment of standard errors, test statistics (see STATA documentation) not available. It is possible to construct some stratified sample problems as multiple group analyses.

Solution #1: Use weighting in generating a covariance matrix to be passed to the SEM program

PRELIS: Under Transformation select Weight Variable before generating the covariance matrix.

*It is not clear if LISREL can handle weighted data in conjunction with FIML estimation. Some other missing data technique may be required.

11

Missing DataSolution #1: Use weighting in generating a covariance matrix to be passed to the SEM program

PRELIS: Under Transformation select Weight Variable before generating the covariance matrix.

*It is not clear if LISREL can handle weighted data in conjunction with FIML estimation. Some other missing data technique may be required.

Data weight cases menu

Note: it is not clear if weight variable needs to be rescaled to mean=1.0 (probably a good idea)

12

Missing DataIssue #3: My data need to be weighted

Solution #1: Use weighting in generating a covariance matrix to be passed to the SEM program

AMOS: AMOS will not accept a weighted SPSS dataset. In fact,if you try to get AMOS to work with a dataset where a

WEIGHT command has been issued, it may generate anerror message (to unweight data, simply use the commands:

COMPUTE WTVAR=1.0 & WEIGHT WTVAR). But it should be possible to construct a covariance matrix within SPSS

(using weighting) and then pass the “covariance matrix system file” to AMOS.

In spss: Weight by wgtvar.correlations variables= [list of variables]/missing=listwise/ matrix out(*).mconvert matrix=in(*) / replace.save outfile = 'c:\temp\covs1.sav'.

13

Coarsely categorized data

Issue #4: My data are at best ordinal (3-5 discrete values per indicator)

Solution #1: Use CVM techniques for ordinal data. PRELIS only: By default, variables with less than 15 discrete

valuesare treated as “ordinal” and matrices are not simplecovariance matrices. Use the Data Define Variables menus to alter any defaults.Usually, you will want to generate an

asymptotic covariance matrix tooIf there are also missing data, strictly speaking, the use of FIML or EM imputation is

not correct. Nearest neighbor approaches (issue #2, solution #3 above) are acceptable.

14

Coarsely categorized data

Issue #4: My data are at best ordinal (3-5 discrete values per indicator)Solution #2: Resort to “item parcels”

(Best check these variables, with crosstabulations, first)Add scores of 2 or more variables you believe to be parallel indicators to form single indicators.

Missing data approaches for parcels can be tricky. Considertrying to create parcels with very similar patterns of missing-ness(same respondents missing, same respondents non-missing acrossboth) and then give the variable a missing value when either of thevariables is missing.

Once variables have a sufficient number of discrete values with parceling, if the distributions are not normal, refer to issue #1 for solutions.IF you parcel variables, read the “pro and con” literature (see course outline).

15

Ordinal Data models

CVM approaches in PRELIS/LISREL.

Example file: Week4Examples\OrdinalData2

See folder for listing of programs, output listings and a codebook for variables used.

Program LisrelU1.ls8 is simple model based on PM matrix.

16

Extensions of the ordinal variable model

Basic form: Threshold parameters, representing mapping of

z* (latent variable, continuous) onto z (coarsely categorized variables, where z has m categories.

These thresholds will be familiar to anyone used to working with logistic regression models (or probit models):

Univariate case:ln (cumulative odds) = τ(k)

Tau coefficient = ln ( kth category or lower / higher categories)

17

Extensions of the ordinal variable model

Univariate case:ln (cumulative odds) = τ(k)

Tau coefficient = ln ( kth category or lower / higher categories)

Example:

20 20 30 40 50 distribution of cases

Tau1 = ln ( 20 / (20+30+40+50)

Tau2 = ln (40 / (30+40+50)

Tau3 = ln (70 / (40+50)

Tau4 = ln (110 / 50)

18

Polychoric correlations

Polychoric correlations:- Estimate thresholds from univariate

distributions- Then, minimize a fit function involving

reproduced probabilities based on a parameter vector that includes thresholds + p (est. correlation)

19

Categorical Variable Model(ordinal data)

For each of the variables, the mean is fixed to 0 and the standard deviation fixed to 1.0 (otherwise, under-identified)

ParameterizationMean Std. dev. Thresholds0.0 1.0 τ1 τ2 τ3 τ4

Alternative parameterization:u1 σ1 0 1 τ3* τ4*

20

Fixing thresholds

“Equal Thresholds” Same threshold for 2 variables measured

over time (longitudinal data) Same threshold for 1 variable measured in

two different groups See Week4Examples/OrdinalData2 files

21

Longitudinal data

I. Modeling of latent variable mean differences over time

II. More complicated tests (linear growth, quadratic growth, etc.)

See slides from previous class

22

Applications to longitudinal data

I. Modeling of latent variable mean differences over time

II. More complicated tests (linear growth, quadratic growth, etc.)

23


Basic model for assessing latent variable mean change: Can run this model

on X or Y side (LISREL)

Equations:

X1 = a1 + 1.0L1 + e1

X2 = a2 + b1 L1 + e2

X3 = a3 + b2 L1 + e3

X4 = a4 + 1.0 L2 + e4

X5 = a5 + b3 L2 + e5

X6 = a6 + b4 L2 + 36

Constraints:

b1=b3 b2=b4 LX=IN

a1=a4 a2=a5 a3=a6 TX=IN

Ka1 = 0 ka2 = (to be estimated)

24


Basic model for assessing latent variable mean change:

Can run this model on X or Y side (LISREL)

Equations:

X1 = a1 + 1.0L1 + e1

X2 = a2 + b1 L1 + e2

X3 = a3 + b2 L1 + e3

X4 = a4 + 1.0 L2 + e4

X5 = a5 + b3 L2 + e5

X6 = a6 + b4 L2 + 36

Constraints:

b1=b3 b2=b4 LX=IN

a1=a4 a2=a5 a3=a6 TX=IN

Ka1 = 0 ka2 = (to be estimated)

Correlated errors

25


Model for assessing latent variable mean change

Ksi-1

x11

1

x2

1

x3

1

Ksi-2

x4 x5 x61

1 1 1

Ksi-3

x7 x8 x91

1 1 1

Usual parameter constraints:

TX(1)=TX(4)=TX(7)

LISREL: EQ TX 1 TX 4 TX 7

AMOS: same parameter name

0,

Ksi-1

a1

x1

0,

1

1a2

x2

0,

1a3

x3

0,

1

0,

Ksi-2

a1

x4

0,

a2

x5

0,

a3

x6

0,

1

1 1

0,

Ksi-3

a1

x7

0,

a2

x8

0,

a3

x9

0,

1

1 1 1

26



Ksi-1

x11

1

x2

1

x3

1

Ksi-2

x4 x5 x61

1 1 1

Ksi-3

x7 x8 x91

1 1 1


TX(1)=TX(4)=TX(7)



KA(1) = 0

KA(2) = mean difference parameter #1


LISREL: KA=FI group 1 KA=FR groups 2,3

IN AMOS:

0,

Ksi-1

a1

x1

0,

1

1a2

x2

0,

1a3

x3

0,

1

kappa1,

Ksi-2

a1

x4

0,

a2

x5

0,

a3

x6

0,

1

1 1

kappa2,

Ksi-3

a1

x7

0,

a2

x8

0,

a3

x9

0,

1

1 1 1

27



Ksi-1

x11

1

x2

1

x3

1

Ksi-2

x4 x5 x61

1 1 1

Ksi-3

x7 x8 x91

1 1 1


TX(1)=TX(4)=TX(7)



KA(1) = 0



LISREL: KA=FI group 1 KA=FR groups 2,3

Some tests:

Test for change: H0: ka1=ka2=0

Linear change model: ka2 = 2*ka1

Quadratic change model: ka2 = 4*ka1

28

As a causal model:

• Beta 1 “stability coefficient”

Eta-1

1

1 1 1

Eta-2

1

1 1 1

Beta-1 1

• Stability coefficient is high if relative rankings preserved, even if there has been massive change with respect to means

• In model with AL1=0 and AL2=free, can have high Beta2,1 with a) AL(1)=AL(2) or AL(1) massively different from AL(2)

29

Causal models:

Ksi-1

Ksi-2 Eta-1

gamma1,1

gamma1,2

Ksi-2 as lagged (time 1) version of eta-1

(could re-specify as an eta variable)

Temporal order in Ksi-1 Eta-1 relationship

30

Causal models:

Ksi-1

Ksi-2 Eta-2

ga2,1

Eta-1

ga1,2

1

1

Cross-lagged panel coefficients

[Reduced form of model on next slide]

31

Causal models:

Reciprocal effects, using lagged values to achieve model identification

Ksi-1

Ksi-2 Eta-2

Eta-1

1

1

32

Causal models:

TV Use

PoliticalTrust

Pol TrustTime 2

gamma 1,1 gamma2,1

Beta 2,1

A variant

Issue: what does ga(1,1) mean given concern over causal direction?

33

Lagged and contemporaneous effects

1

1

This model is underidentified

34

Lagged effects model

ksi-2 eta-1 eta-2

ksi-1

Ksi-1 could be an “event”

1/0 dummy variable

35

First order model for three wave data(univariate)

1

1 1 1

1

1 1 1

1

1 1 1

Time 1 Time 2 Time 3

36

First order model for three wave data(univariate)

1

1 1 1

1

1 1 1

1

1 1 1

b1 b1

Tests: Equivalent of stability coefficients (b1)

Mean differences (see earlier slide)

37

Second order model for three wave data(univariate)

1

1 1 11

1 1 1

1

1 1 1

b1 b1

No longer comparable to b1 (t1 t2)

38

Second order model for three wave data(univariate)

1

1 1 11

1 1 1

1

1 1 1

b1 b1

Issue: adding appropriate error terms (2nd order)

39

Multivariate Model for Three-wave panel data: cross-lagged effects (first order)

1

1

1

1

40

Multivariate Model for Three-wave panel data: cross-lagged effects (first order)

1

1

1

1

Equivalence of parameters:

T1 T2

T2 T3

41

Multivariate Model for Three-wave panel data: cross-lagged effects (second order)

42

Multivariate Model for Four-wave panel data: cross-lagged effects (second order)

43

Lagged and contemporaneous effectsThree wave model with constraints:

a

e f

b

d

c

1

1

a

b

e f

1

1

d

c

Under many circumstances, there will be an empirical under-ident. problem, though in theory this model is identified

44

Example:

• Canada, Quality of Life data

• In directory \Panel in

Week4Examples

45

Re-expressing parameters:GROWTH CURVE MODELS

Intercept & linear (& sometimes quadratic) terms

46

Linear Growth Model

Two Factor LGM

Parm1,

Intercept

Parm2,

Slope

0

V1 - t1

0

V2 - t2

10

1

1

0, 01

0, 01

47

Linear Growth Model

Two Factor LGM

Parm1,

Intercept

Parm2,

Slope

0

LV-t1

0,

1

10,1

0,1

0

LV-t2

0,0,0,

1

111

1

01

1

A bit more complicated with latent variables instead of single manifest variables

48

Linear Growth ModelTwo Factor Linear Growth Model

Parm1,

Intercept

Parm2,

Slope

0

t1

0

t2

0

t3

11

1 01 2

0,1

0,1

0,1

49

Unspecified 2 factor Growth Curve Model

Two Factor Unspecified Growth Model

Parm1,

Intercept

Parm2,

Slope

0

t1

0

t2

0

t3

11

1 01 lambda

0,1

0,1

0,1

50

3 factor Growth Curve Model

Parm1,

Intercept

Parm2,

Linear

0

t1

0

t2

0

t3

11

10

1

0,1

0,1

0,1

2

0,

Quadratic0

1 4

51

Last slide

ICPSR General Structural Equations

Documents

Transcript of ICPSR General Structural Equations