Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Factor Analysis

Lua Augustin, Savannah Guo, and Blair Marquardt

Learning Outcomes

To understand:

1.What is factor analysis.

2.What is its model.

3.Latent vs. observable variables; examples of each.

4.Potential applications of factor analysis.

5.Assumptions of factor analysis.

6.How to perform an analysis using R.

Big Picture – What is Factor Analysis?

To explain the covariance between observed variables by unobserved variables

Observable variables xi are modeled as a linear combination of

factors plus errors.

xi-μi = li1F1+ … + likFk + Ɛi

Here, xi refers to variable i for the generic person.

In matrix form: XƐ.

X is px1 (variables), F is kx1 (common factors)

What can we do with it?

Originated in psychometrics as a tool to understand human thought and behavior.

Some applications of Factor Analysis:

- Development of a scale or questionnaire

- Development or corroboration of theory

- Prediction of some other variable as a function of the modeled latent variables

Unobservable vs. Observable Variables

Latent variable: “...variables that are not directly observed but are rather inferred from other variables that are directly measured...” (Wikipedia)

DNA exists but it is unobservable.

DNA predicts observable traits.

* DNA used to be unobservable

Unobservable Observable

DNA* eye color, hair type

Depression suicidal thoughts, insomnia

Do factors exist?

Using intelligence example:

• Does everyone have a level of

intelligence?

• Is it just ONE number?

• Factors explain the variables

in the model.

Unobservable vs. Observable Variables

Our model assumes...

Latent variables are unobservable but correlated with observable variables.

Use data from observable variables to infer information about unobservable variables.

Examine the covariance matrix of observable variables.

What do the data look like?

Formula:

X = LF + Ɛ

Let’s look at a basic matrix that applies to any example.

Unobservable vs. Observable Variables - ExamplesExample 1: DNA

- Our genetic makeup (our DNA) is the underlying causal factor for our traits.

- Loadings communicate the strength of the relationship.

- The model is still random.

Unobservable vs. Observable Variables - ExamplesExample 2: Boss appreciation (factor)

- Observed variables (outward manifestations of your latent appreciation): Performance evaluation rating, absenteeism, survey responses.

- Unobserved variable: How much you appreciate your boss.

- When the F is high, all X’s tend to be high; vice versa.

Unobservable vs. Observable Variables - ExamplesExample 3: Depression and anxiety*

- Observed variables: Insomnia, suicidal thoughts, nausea, and hyperventilation Cov(Ins, Suic) = 0.3

- Unobserved: Depression, anxiety (cause the variance and covariance between the observed variables)

*Example courtesy of Ben Lambert (see References)

Unobservable vs. Observable Variables - ExamplesExample 4: Student evaluation data

- Observed: faculty expertise rating, teaching ability rating

- Unobserved: satisfaction with teacher

Assumptions of Factor Analysis

1. Latent factors exist.

- Think back to our examples - this is an easier sell for some examples than for others.

- Ex: Can we really boil depression down into a single value? That’s what our model assumes!

2. The observed variables are linearly related to the factors.

- Evident in our model xi-μi = li1F1+ … + likFk + Ɛi.


3. The model is invariant across subjects.

- The same loadings and error variance apply for all subjects

- Implies the factors affect all people equally, subject only to random error. Reasonable?

4. No association between the factor and measurement error.

- Cov(F, ε) = 0

- This is automatically true if L is the regression coefficient matrix .


5. No association between the errors (uncorrelated errors).

- Implies all items are “independent” measures of the common factor. Reasonable?

6. The factors (if we include more than one) are uncorrelated with each other.

- Cov(F) = I. Key distinction between EFA and CFA.

7. Multivariate normally distributed observed variables.

The Steps to Factor Analysis

1.Obtain a multivariate dataset having numeric columns.

2.Estimate (rotated) models for various pre-defined numbers of factors.

a. Goal: observed covariance matrix (of Xs) as close as possible to the implied covariance matrix

b. Maximum likelihood methods typically work well.

3.Choose a particular model (number of factors).

a. Based primarily on interpretability.

b. Aided by objective statistical measures.

Estimating our Model using Orthogonal RotationsObserved vs. Implied covariance matrix:

- The model implies

- LΣFL’ + Ψ is the implied covariance matrix.

- Since ΣF is the Identity matrix (recall assumptions), it simplifies to

LL’ + Ψ.

- SX is the observed covariance matrix (eg cov in R; no model

assumption).

- Choose the parameters L and Ψ to make LL’ + Ψ as close as possible to SX.

Model Implications

The model implies that Corr(Fs, Xj) = Ljs.

Thus, the loadings are also simple correlations.

The squared loadings capture the reliability of the observed variable as a measure of the underlying factor.

- We want loadings to capture as much of the observed variation as possible!

- Higher loadings → Higher correlation between the factor and its manifestation.

An example

Assume the covariance matrix of X below. Our goal is to identify the L’s and Ɛ that result in an implied covariance matrix LL’ as close as possible to Cov(X).

What might some reasonable estimates of L be here?

In practice, we use maximum likelihood or other methods (OLS, WLS, GLS).

R procedure “factanal” defaults to ML.

Orthogonal Rotation

T is the rotation matrix applied to the original loading matrix.

If X = (LT)F + Ɛ, then the implied covariance matrix is identical to the original:

(LT)(LT)’ + Ψ = LTT’L’ + Ψ = LL’ + Ψ

Maximize the squared loading variance across variables (varimax) or factors (quartimax).

Increases the discriminant power of the factors.

*Image thanks to Elizabeth Garrett-Mayer (see References)

Which factors do we keep?

Factor analysis is a “data reduction” method, in that we desire fewer factors than observed variables.

1.Theory basis:

a. Theory suggests relevant factors.

b. Use judgment to label the factors based on heavy loadings.

2.Mathematical basis:

a. Kaiser criterion: Keep factors with eigenvalue greater than 1 (ugly rule of thumb)

b. Scree test

Scree test

*Image thanks to Dell (see References)

Exploratory vs. Confirmatory Factor Analysis- EFA used when no a priori hypothesis of factors exists:

- All factors are uncorrelated.

- CFA used to test for consistency in a priori hypothesis:

- Allows for interfactor correlations based on theory.

- Pre-specify factors and loadings. Specify some loadings as zero, based on theory.

ReferencesCarpita, M., Brentari, E., & Qannari, E. M. (Eds.). (2015). Advances in Latent Variables: Methods, Models and Applications. Springer.

Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological assessment, 7(3), 286.

Westfall, P. H., Henning, K. S., & Howell, R. D. (2012). The effect of error correlation on interfactor correlation in psychometric measurement. Structural Equation Modeling: A Multidisciplinary Journal, 19(1), 99-117.

Factor analysis (n.d.) In Wikipedia. Retrieved 22 September 2015 from https://en.wikipedia.org/wiki/Factor_analysis

https://en.wikipedia.org/wiki/Factor_analysis

References cont’dExploratory factor analysis. In Wikipedia. Retrieved 22 September 2015 from https://en.wikipedia.org/wiki/Exploratory_factor_analysis

Latent variable (n.d.) In Wikipedia. Retrieved 22 September 2015 from https://en.wikipedia.org/wiki/Latent_variable

Lambert, Ben. (published on Feb 20, 2014). Factor Analysis - an introduction. Retrieved 22 September 2015 from https://www.youtube.com/watch?v=WV_jcaDBZ2I

Lambert, Ben. (published on Feb 20, 2014). Factor Analysis - model representation.. Retrieved 22 September 2015 from https://www.youtube.com/watch?v=TeIx7dRedkg

https://en.wikipedia.org/wiki/Exploratory_factor_analysis

https://en.wikipedia.org/wiki/Latent_variable

https://www.youtube.com/watch?v=WV_jcaDBZ2I

https://www.youtube.com/watch?v=TeIx7dRedkg

References cont’dLambert, Ben. (published on Feb 20, 2014). Factor Analysis - assumptions.. Retrieved 22 September 2015 from https://www.youtube.com/watch?v=PgqiBezoAUA

Principal Components Factor Analysis. Retrieved on 22 September 2015 from http://documents.software.dell.com/Statistics/Textbook/Principal-Components-Factor-Analysis

Garret-Mayer, Elizabeth. Statistics in Psychosocial Research Lecture 8. Factor Analysis I. Retrieved 22 September 2015 from http://ocw.jhsph.edu/courses/StatisticsPsychosocialResearch/PDFs/Lecture8.pdf

https://www.youtube.com/watch?v=PgqiBezoAUA

http://documents.software.dell.com/Statistics/Textbook/Principal-Components-Factor-Analysis

http://ocw.jhsph.edu/courses/StatisticsPsychosocialResearch/PDFs/Lecture8.pdf

Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.

Documents

Transcript of Factor Analysis Lua Augustin, Savannah Guo, and Blair Marquardt.