Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

26
Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt

Transcript of Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Page 1: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Factor Analysis

Lua Augustin, Savannah Guo, and Blair Marquardt

Page 2: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Learning Outcomes

To understand:

1.What is factor analysis.

2.What is its model.

3.Latent vs. observable variables; examples of each.

4.Potential applications of factor analysis.

5.Assumptions of factor analysis.

6.How to perform an analysis using R.

Page 3: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Big Picture – What is Factor Analysis?

To explain the covariance between observed variables by unobserved variables

Observable variables xi are modeled as a linear combination of

factors plus errors.

xi-μi = li1F1+ … + likFk + Ɛi

Here, xi refers to variable i for the generic person.

In matrix form: XƐ.

X is px1 (variables), F is kx1 (common factors)

Page 4: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

What can we do with it?

Originated in psychometrics as a tool to understand human thought and behavior.

Some applications of Factor Analysis:

- Development of a scale or questionnaire

- Development or corroboration of theory

- Prediction of some other variable as a function of the modeled latent variables

Page 5: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Unobservable vs. Observable Variables

Latent variable: “...variables that are not directly observed but are rather inferred from other variables that are directly measured...” (Wikipedia)

DNA exists but it is unobservable.

DNA predicts observable traits.

* DNA used to be unobservable

Unobservable Observable

DNA* eye color, hair type

Depression suicidal thoughts, insomnia

Page 6: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Do factors exist?

Using intelligence example:

• Does everyone have a level of

intelligence?

• Is it just ONE number?

• Factors explain the variables

in the model.

Page 7: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Unobservable vs. Observable Variables

Our model assumes...

Latent variables are unobservable but correlated with observable variables.

Use data from observable variables to infer information about unobservable variables.

Examine the covariance matrix of observable variables.

Page 8: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

What do the data look like?

Formula:

X = LF + Ɛ

Let’s look at a basic matrix that applies to any example.

Page 9: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Unobservable vs. Observable Variables - ExamplesExample 1: DNA

- Our genetic makeup (our DNA) is the underlying causal factor for our traits.

- Loadings communicate the strength of the relationship.

- The model is still random.

Page 10: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Unobservable vs. Observable Variables - ExamplesExample 2: Boss appreciation (factor)

- Observed variables (outward manifestations of your latent appreciation): Performance evaluation rating, absenteeism, survey responses.

- Unobserved variable: How much you appreciate your boss.

- When the F is high, all X’s tend to be high; vice versa.

Page 11: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Unobservable vs. Observable Variables - ExamplesExample 3: Depression and anxiety*

- Observed variables: Insomnia, suicidal thoughts, nausea, and hyperventilation Cov(Ins, Suic) = 0.3

- Unobserved: Depression, anxiety (cause the variance and covariance between the observed variables)

*Example courtesy of Ben Lambert (see References)

Page 12: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Unobservable vs. Observable Variables - ExamplesExample 4: Student evaluation data

- Observed: faculty expertise rating, teaching ability rating

- Unobserved: satisfaction with teacher

Page 13: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Assumptions of Factor Analysis

1. Latent factors exist.

- Think back to our examples - this is an easier sell for some examples than for others.

- Ex: Can we really boil depression down into a single value? That’s what our model assumes!

2. The observed variables are linearly related to the factors.

- Evident in our model xi-μi = li1F1+ … + likFk + Ɛi.

Page 14: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Assumptions of Factor Analysis

3. The model is invariant across subjects.

- The same loadings and error variance apply for all subjects

- Implies the factors affect all people equally, subject only to random error. Reasonable?

4. No association between the factor and measurement error.

- Cov(F, ε) = 0

- This is automatically true if L is the regression coefficient matrix .

Page 15: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Assumptions of Factor Analysis

5. No association between the errors (uncorrelated errors).

- Implies all items are “independent” measures of the common factor. Reasonable?

6. The factors (if we include more than one) are uncorrelated with each other.

- Cov(F) = I. Key distinction between EFA and CFA.

7. Multivariate normally distributed observed variables.

Page 16: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

The Steps to Factor Analysis

1.Obtain a multivariate dataset having numeric columns.

2.Estimate (rotated) models for various pre-defined numbers of factors.

a. Goal: observed covariance matrix (of Xs) as close as possible to the implied covariance matrix

b. Maximum likelihood methods typically work well.

3.Choose a particular model (number of factors).

a. Based primarily on interpretability.

b. Aided by objective statistical measures.

Page 17: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Estimating our Model using Orthogonal RotationsObserved vs. Implied covariance matrix:

- The model implies

- LΣFL’ + Ψ is the implied covariance matrix.

- Since ΣF is the Identity matrix (recall assumptions), it simplifies to

LL’ + Ψ.

- SX is the observed covariance matrix (eg cov in R; no model

assumption).

- Choose the parameters L and Ψ to make LL’ + Ψ as close as possible to SX.

Page 18: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Model Implications

The model implies that Corr(Fs, Xj) = Ljs.

Thus, the loadings are also simple correlations.

The squared loadings capture the reliability of the observed variable as a measure of the underlying factor.

- We want loadings to capture as much of the observed variation as possible!

- Higher loadings → Higher correlation between the factor and its manifestation.

Page 19: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

An example

Assume the covariance matrix of X below. Our goal is to identify the L’s and Ɛ that result in an implied covariance matrix LL’ as close as possible to Cov(X).

What might some reasonable estimates of L be here?

In practice, we use maximum likelihood or other methods (OLS, WLS, GLS).

R procedure “factanal” defaults to ML.

Page 20: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Orthogonal Rotation

T is the rotation matrix applied to the original loading matrix.

If X = (LT)F + Ɛ, then the implied covariance matrix is identical to the original:

(LT)(LT)’ + Ψ = LTT’L’ + Ψ = LL’ + Ψ

Maximize the squared loading variance across variables (varimax) or factors (quartimax).

Increases the discriminant power of the factors.

*Image thanks to Elizabeth Garrett-Mayer (see References)

Page 21: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Which factors do we keep?

Factor analysis is a “data reduction” method, in that we desire fewer factors than observed variables.

1.Theory basis:

a. Theory suggests relevant factors.

b. Use judgment to label the factors based on heavy loadings.

2.Mathematical basis:

a. Kaiser criterion: Keep factors with eigenvalue greater than 1 (ugly rule of thumb)

b. Scree test

Page 22: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Scree test

*Image thanks to Dell (see References)

Page 23: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Exploratory vs. Confirmatory Factor Analysis- EFA used when no a priori hypothesis of factors exists:

- All factors are uncorrelated.

- CFA used to test for consistency in a priori hypothesis:

- Allows for interfactor correlations based on theory.

- Pre-specify factors and loadings. Specify some loadings as zero, based on theory.

Page 24: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

ReferencesCarpita, M., Brentari, E., & Qannari, E. M. (Eds.). (2015). Advances in Latent Variables: Methods, Models and Applications. Springer.

Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological assessment, 7(3), 286.

Westfall, P. H., Henning, K. S., & Howell, R. D. (2012). The effect of error correlation on interfactor correlation in psychometric measurement. Structural Equation Modeling: A Multidisciplinary Journal, 19(1), 99-117.

Factor analysis (n.d.) In Wikipedia. Retrieved 22 September 2015 from https://en.wikipedia.org/wiki/Factor_analysis

Page 25: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

References cont’dExploratory factor analysis. In Wikipedia. Retrieved 22 September 2015 from https://en.wikipedia.org/wiki/Exploratory_factor_analysis

Latent variable (n.d.) In Wikipedia. Retrieved 22 September 2015 from https://en.wikipedia.org/wiki/Latent_variable

Lambert, Ben. (published on Feb 20, 2014). Factor Analysis - an introduction. Retrieved 22 September 2015 from https://www.youtube.com/watch?v=WV_jcaDBZ2I

Lambert, Ben. (published on Feb 20, 2014). Factor Analysis - model representation.. Retrieved 22 September 2015 from https://www.youtube.com/watch?v=TeIx7dRedkg

Page 26: Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

References cont’dLambert, Ben. (published on Feb 20, 2014). Factor Analysis - assumptions.. Retrieved 22 September 2015 from https://www.youtube.com/watch?v=PgqiBezoAUA

Principal Components Factor Analysis. Retrieved on 22 September 2015 from http://documents.software.dell.com/Statistics/Textbook/Principal-Components-Factor-Analysis

Garret-Mayer, Elizabeth. Statistics in Psychosocial Research Lecture 8. Factor Analysis I. Retrieved 22 September 2015 from http://ocw.jhsph.edu/courses/StatisticsPsychosocialResearch/PDFs/Lecture8.pdf