ICPSR General Structural Equations
-
Upload
audrey-cunningham -
Category
Documents
-
view
35 -
download
3
description
Transcript of ICPSR General Structural Equations
1
ICPSR General Structural Equations
Week 4 No. 2
2
Review of solutions for non-normal and missing data (see handout)Issue #1: My data are not normally distributed
Each variable has a reasonable number of discrete values (10 or more for most, with perhaps the odd variable with 5-6* but definitely no variables with fewer than 5 discrete values). [*variables with smaller number of categories should not be heavily skewed]Solution #1: AMOS, LISREL and SAS-CALISTransform the data to reduce the level of kurtosis within the Stat package (SAS or SPSS).
COMPUTE LVAR1 = LN(VAR1).COMPUTE LVAR1= LN(VAR1 + .1). [if there are 0 values]COMPUTE VAR1_2 = VAR1**2. COMPUTE VAR1_2 = 1 / VAR1.See John Fox’s Regression text or other regression texts formore details. Usually, dealing with skewedness also dealswith Kurtosis.
Checks: DESCRIPTIVES VARIABLES=VAR1 /STATISTICS = SKEW KURTOSIS.
With transformed data, regular ML covariance analysis can be used.
3
Non-normal data ADF estimationSolution #2: AMOS, LISREL, SAS-CALIS
Use an ADF (arbitrary distribution function) estimator.
AMOS: An option under Analysis Options.LISREL: Input of a asymptotic covariance matrix (4th moment matrix)
is required. To generate such a matrix in PRELIS, check off the asymptotic covariances check box and insert a file
name. In LISREL, you will need to add a line to read in this matrix:CM FI= AC FI=
And you will need to specify the ADF fit function:OU ME=WLSAS PROC CALIS METHOD=WLSImportant note on the ADF fit function:
Large sample sizes are required For the acov matrix to be non-singular, N must be at least > p + (1/2)(p)(p+1)
20 variables: N>23030 variables: N>495
Working anywhere near these minima is not recommended.
4
Solution #3: LISREL only. Scaled test statistics.
Use a scaled or adjusted chi-square and standard error calculation
(e.g., Bentler-Satora). Input of asymptotic covariance matrix is required, so it is necessary to specify an AC= line which points to the asymptotic covariance matrix but specify ME=ML and not ME=WL in LISREL. Probably Definitely better than ADF for small to moderate sized samples.
5
Missing Data
Issue #2: I have missing cases. My data are fairly normally distributed (or I have transformed them to near normality – Kurtosis values in the +1 to -1 range or fairly close to this).
Solution #1: Use EM algorithm to construct imputed covariance matrix [assumes normality]
LISREL: This is an option under PRELIS.Small limitation: imputed data treated as “real” data byLISREL (affects N, significance tests)
AMOS: If you have the SPSS Missing Data module, you may be able to generate an imputed covariance matrix. If you
do not, a “last ditch” approach would be to “flip” the datasetinto SAS (if you have it), use the SAS MI procedure, then “flip” the covariance matrix file back into SPSS. See Appendix in handout.
6
Missing Data
Issue #2: I have missing cases. My data are fairly normally distributed (or I have transformed them to near normality – Kurtosis values in the +1 to -1 range or fairly close to this).
Solution #2: Use a multiple-group model to explicitly model missing data.AMOS, LISREL (SAS CALIS will not estimate multiple-group models)
This works if the number of missing data patterns is fairly small (say <3-5) or if cleaning up problems with a small number of missing data patterns deals with most of the overall problem
7
Missing Data
Issue #2: I have missing cases. My data are fairly normally distributed (or I have transformed them to near normality – Kurtosis values in the +1 to -1 range or fairly close to this).
Solution #3: Use nearest-neighbor imputationLISREL only. Limitation: for data with small number of values for each variable, “ties” will be generated. Even with a generous criterion, imputation could easily fail for ½ of the cases.Small limitation: imputed data treated as “real” data byLISREL (affects N, significance tests)If working with STATA files, there is a user routine called hotdeck (see Stata Tech. bulletins #51 and #54). Must be installed.
8
Missing Data
Solution #3: Use nearest-neighbor imputationIf working with STATA files, there is a user routine called
hotdeck (see Stata Tech. bulletins #51 and #54). Must be installed. This is not the same as the Prelis nearest neighbour procedure, but uses some similar principles. With AMOS, must use Stata or PRELIS. Stata: use Stat-Transfer or DBMS-Copy to convert file to AMOS-readable SPSS .sav file.
Important note, from hotdeck documentation:If a dataset contains many variables with missing values then it ispossible that many of the rows of data will contain at least onemissing value. The hotdeck procedure will not work very well in suchcircumstances. There are more elaborate methods that only replacemissing values, rather than the whole row, for imputed values.
PRELIS: More complicated process to move data into SPSS. (see point #4 in handout “PRELISQuirks.doc”).
9
Missing Data
Solution #4: Use FIML estimation [assumes normality]AMOS:Check off “estimation using means and intercerpts”
under Analysis Options and then input dataset withmissing values. Amos will not provide modification indices with its version of FIML estimation (some other form of estimation needed for model-fitting)
LISREL Must input raw data into LISREL. Declare missing values in PRELIS (already done if SPSS file read into PRELIS), save the PRELIS .psf file and then read it into LISREL:Instead of CM FI= or SY FI= :RA FI=C:\TEMP\MYDATA.PSFWill also need a DA statement:
10
Missing Data
Issue #3: My data need to be weightedNote: sophisticated adjustment of standard errors, test statistics (see STATA documentation) not available. It is possible to construct some stratified sample problems as multiple group analyses.
Solution #1: Use weighting in generating a covariance matrix to be passed to the SEM program
PRELIS: Under Transformation select Weight Variable before generating the covariance matrix.
*It is not clear if LISREL can handle weighted data in conjunction with FIML estimation. Some other missing data technique may be required.
11
Missing DataSolution #1: Use weighting in generating a covariance matrix to be passed to the SEM program
PRELIS: Under Transformation select Weight Variable before generating the covariance matrix.
*It is not clear if LISREL can handle weighted data in conjunction with FIML estimation. Some other missing data technique may be required.
Data weight cases menu
Note: it is not clear if weight variable needs to be rescaled to mean=1.0 (probably a good idea)
12
Missing DataIssue #3: My data need to be weighted
Solution #1: Use weighting in generating a covariance matrix to be passed to the SEM program
AMOS: AMOS will not accept a weighted SPSS dataset. In fact,if you try to get AMOS to work with a dataset where a
WEIGHT command has been issued, it may generate anerror message (to unweight data, simply use the commands:
COMPUTE WTVAR=1.0 & WEIGHT WTVAR). But it should be possible to construct a covariance matrix within SPSS
(using weighting) and then pass the “covariance matrix system file” to AMOS.
In spss: Weight by wgtvar.correlations variables= [list of variables]/missing=listwise/ matrix out(*).mconvert matrix=in(*) / replace.save outfile = 'c:\temp\covs1.sav'.
13
Coarsely categorized data
Issue #4: My data are at best ordinal (3-5 discrete values per indicator)
Solution #1: Use CVM techniques for ordinal data. PRELIS only: By default, variables with less than 15 discrete
valuesare treated as “ordinal” and matrices are not simplecovariance matrices. Use the Data Define Variables menus to alter any defaults.Usually, you will want to generate an
asymptotic covariance matrix tooIf there are also missing data, strictly speaking, the use of FIML or EM imputation is
not correct. Nearest neighbor approaches (issue #2, solution #3 above) are acceptable.
14
Coarsely categorized data
Issue #4: My data are at best ordinal (3-5 discrete values per indicator)Solution #2: Resort to “item parcels”
(Best check these variables, with crosstabulations, first)Add scores of 2 or more variables you believe to be parallel indicators to form single indicators.
Missing data approaches for parcels can be tricky. Considertrying to create parcels with very similar patterns of missing-ness(same respondents missing, same respondents non-missing acrossboth) and then give the variable a missing value when either of thevariables is missing.
Once variables have a sufficient number of discrete values with parceling, if the distributions are not normal, refer to issue #1 for solutions.IF you parcel variables, read the “pro and con” literature (see course outline).
15
Ordinal Data models
CVM approaches in PRELIS/LISREL.
Example file: Week4Examples\OrdinalData2
See folder for listing of programs, output listings and a codebook for variables used.
Program LisrelU1.ls8 is simple model based on PM matrix.
16
Extensions of the ordinal variable model
Basic form: Threshold parameters, representing mapping of
z* (latent variable, continuous) onto z (coarsely categorized variables, where z has m categories.
These thresholds will be familiar to anyone used to working with logistic regression models (or probit models):
Univariate case:ln (cumulative odds) = τ(k)
Tau coefficient = ln ( kth category or lower / higher categories)
17
Extensions of the ordinal variable model
Univariate case:ln (cumulative odds) = τ(k)
Tau coefficient = ln ( kth category or lower / higher categories)
Example:
20 20 30 40 50 distribution of cases
Tau1 = ln ( 20 / (20+30+40+50)
Tau2 = ln (40 / (30+40+50)
Tau3 = ln (70 / (40+50)
Tau4 = ln (110 / 50)
18
Polychoric correlations
Polychoric correlations:- Estimate thresholds from univariate
distributions- Then, minimize a fit function involving
reproduced probabilities based on a parameter vector that includes thresholds + p (est. correlation)
19
Categorical Variable Model(ordinal data)
For each of the variables, the mean is fixed to 0 and the standard deviation fixed to 1.0 (otherwise, under-identified)
ParameterizationMean Std. dev. Thresholds0.0 1.0 τ1 τ2 τ3 τ4
Alternative parameterization:u1 σ1 0 1 τ3* τ4*
20
Fixing thresholds
“Equal Thresholds” Same threshold for 2 variables measured
over time (longitudinal data) Same threshold for 1 variable measured in
two different groups See Week4Examples/OrdinalData2 files
21
Longitudinal data
I. Modeling of latent variable mean differences over time
II. More complicated tests (linear growth, quadratic growth, etc.)
See slides from previous class
22
Applications to longitudinal data
I. Modeling of latent variable mean differences over time
II. More complicated tests (linear growth, quadratic growth, etc.)
23
Applications to longitudinal data
Basic model for assessing latent variable mean change: Can run this model
on X or Y side (LISREL)
Equations:
X1 = a1 + 1.0L1 + e1
X2 = a2 + b1 L1 + e2
X3 = a3 + b2 L1 + e3
X4 = a4 + 1.0 L2 + e4
X5 = a5 + b3 L2 + e5
X6 = a6 + b4 L2 + 36
Constraints:
b1=b3 b2=b4 LX=IN
a1=a4 a2=a5 a3=a6 TX=IN
Ka1 = 0 ka2 = (to be estimated)
24
Applications to longitudinal data
Basic model for assessing latent variable mean change:
Can run this model on X or Y side (LISREL)
Equations:
X1 = a1 + 1.0L1 + e1
X2 = a2 + b1 L1 + e2
X3 = a3 + b2 L1 + e3
X4 = a4 + 1.0 L2 + e4
X5 = a5 + b3 L2 + e5
X6 = a6 + b4 L2 + 36
Constraints:
b1=b3 b2=b4 LX=IN
a1=a4 a2=a5 a3=a6 TX=IN
Ka1 = 0 ka2 = (to be estimated)
Correlated errors
25
Applications to longitudinal data
Model for assessing latent variable mean change
Ksi-1
x11
1
x2
1
x3
1
Ksi-2
x4 x5 x61
1 1 1
Ksi-3
x7 x8 x91
1 1 1
Usual parameter constraints:
TX(1)=TX(4)=TX(7)
LISREL: EQ TX 1 TX 4 TX 7
AMOS: same parameter name
0,
Ksi-1
a1
x1
0,
1
1a2
x2
0,
1a3
x3
0,
1
0,
Ksi-2
a1
x4
0,
a2
x5
0,
a3
x6
0,
1
1 1
0,
Ksi-3
a1
x7
0,
a2
x8
0,
a3
x9
0,
1
1 1 1
26
Applications to longitudinal data
Model for assessing latent variable mean change
Ksi-1
x11
1
x2
1
x3
1
Ksi-2
x4 x5 x61
1 1 1
Ksi-3
x7 x8 x91
1 1 1
Usual parameter constraints:
TX(1)=TX(4)=TX(7)
LISREL: EQ TX 1 TX 4 TX 7
AMOS: same parameter name
KA(1) = 0
KA(2) = mean difference parameter #1
KA(3) = mean difference parameter #2
LISREL: KA=FI group 1 KA=FR groups 2,3
IN AMOS:
0,
Ksi-1
a1
x1
0,
1
1a2
x2
0,
1a3
x3
0,
1
kappa1,
Ksi-2
a1
x4
0,
a2
x5
0,
a3
x6
0,
1
1 1
kappa2,
Ksi-3
a1
x7
0,
a2
x8
0,
a3
x9
0,
1
1 1 1
27
Applications to longitudinal data
Model for assessing latent variable mean change
Ksi-1
x11
1
x2
1
x3
1
Ksi-2
x4 x5 x61
1 1 1
Ksi-3
x7 x8 x91
1 1 1
Usual parameter constraints:
TX(1)=TX(4)=TX(7)
LISREL: EQ TX 1 TX 4 TX 7
AMOS: same parameter name
KA(1) = 0
KA(2) = mean difference parameter #1
KA(3) = mean difference parameter #2
LISREL: KA=FI group 1 KA=FR groups 2,3
Some tests:
Test for change: H0: ka1=ka2=0
Linear change model: ka2 = 2*ka1
Quadratic change model: ka2 = 4*ka1
28
As a causal model:
• Beta 1 “stability coefficient”
Eta-1
1
1 1 1
Eta-2
1
1 1 1
Beta-1 1
• Stability coefficient is high if relative rankings preserved, even if there has been massive change with respect to means
• In model with AL1=0 and AL2=free, can have high Beta2,1 with a) AL(1)=AL(2) or AL(1) massively different from AL(2)
29
Causal models:
Ksi-1
Ksi-2 Eta-1
gamma1,1
gamma1,2
Ksi-2 as lagged (time 1) version of eta-1
(could re-specify as an eta variable)
Temporal order in Ksi-1 Eta-1 relationship
30
Causal models:
Ksi-1
Ksi-2 Eta-2
ga2,1
Eta-1
ga1,2
1
1
Cross-lagged panel coefficients
[Reduced form of model on next slide]
31
Causal models:
Reciprocal effects, using lagged values to achieve model identification
Ksi-1
Ksi-2 Eta-2
Eta-1
1
1
32
Causal models:
TV Use
PoliticalTrust
Pol TrustTime 2
gamma 1,1 gamma2,1
Beta 2,1
A variant
Issue: what does ga(1,1) mean given concern over causal direction?
33
Lagged and contemporaneous effects
1
1
This model is underidentified
34
Lagged effects model
ksi-2 eta-1 eta-2
ksi-1
Ksi-1 could be an “event”
1/0 dummy variable
35
First order model for three wave data(univariate)
1
1 1 1
1
1 1 1
1
1 1 1
Time 1 Time 2 Time 3
36
First order model for three wave data(univariate)
1
1 1 1
1
1 1 1
1
1 1 1
b1 b1
Tests: Equivalent of stability coefficients (b1)
Mean differences (see earlier slide)
37
Second order model for three wave data(univariate)
1
1 1 11
1 1 1
1
1 1 1
b1 b1
No longer comparable to b1 (t1 t2)
38
Second order model for three wave data(univariate)
1
1 1 11
1 1 1
1
1 1 1
b1 b1
Issue: adding appropriate error terms (2nd order)
39
Multivariate Model for Three-wave panel data: cross-lagged effects (first order)
1
1
1
1
40
Multivariate Model for Three-wave panel data: cross-lagged effects (first order)
1
1
1
1
Equivalence of parameters:
T1 T2
T2 T3
41
Multivariate Model for Three-wave panel data: cross-lagged effects (second order)
42
Multivariate Model for Four-wave panel data: cross-lagged effects (second order)
43
Lagged and contemporaneous effectsThree wave model with constraints:
a
e f
b
d
c
1
1
a
b
e f
1
1
d
c
Under many circumstances, there will be an empirical under-ident. problem, though in theory this model is identified
44
Example:
• Canada, Quality of Life data
• In directory \Panel in
Week4Examples
45
Re-expressing parameters:GROWTH CURVE MODELS
Intercept & linear (& sometimes quadratic) terms
46
Linear Growth Model
Two Factor LGM
Parm1,
Intercept
Parm2,
Slope
0
V1 - t1
0
V2 - t2
10
1
1
0, 01
0, 01
47
Linear Growth Model
Two Factor LGM
Parm1,
Intercept
Parm2,
Slope
0
LV-t1
0,
1
10,1
0,1
0
LV-t2
0,0,0,
1
111
1
01
1
A bit more complicated with latent variables instead of single manifest variables
48
Linear Growth ModelTwo Factor Linear Growth Model
Parm1,
Intercept
Parm2,
Slope
0
t1
0
t2
0
t3
11
1 01 2
0,1
0,1
0,1
49
Unspecified 2 factor Growth Curve Model
Two Factor Unspecified Growth Model
Parm1,
Intercept
Parm2,
Slope
0
t1
0
t2
0
t3
11
1 01 lambda
0,1
0,1
0,1
50
3 factor Growth Curve Model
Parm1,
Intercept
Parm2,
Linear
0
t1
0
t2
0
t3
11
10
1
0,1
0,1
0,1
2
0,
Quadratic0
1 4
51
Last slide