An Introduction to Factor Analysis
description
Transcript of An Introduction to Factor Analysis
An Introduction to Factor Analysis
Reducing variables and/or detecting underlying structures
Books you’ll never see . . .
Uses
• Data reduction
Factor 1Factor 2
24 actual variables
Two latent variables
Uses
• Create composites/scales for psychometric instruments
DepressionAnxiety
Uses
• Validate composites/scales for psychometric instruments
DepressionAnxiety
Summary of uses• Also used in the
development or exploration of questionnaires or other psychometric instruments.
• Factor analytic techniques are most commonly used to reduce many items into a more usable number of factors. This way, the more simplified data can be used more easily in research.
Latent variables
A metaphor
An example of common variance using bivariate relationships
• I measure a sample of kindergarten children’s ability to recognize the sound(s) at the beginning of words, e.g., /k/ in “cat”
• I also measure the children’s ability to segment (break apart) sounds
e.g., “cat” = /k/ /a/ /t/
• I correlate these two measures
Beginning letter sounds
Ph
on
eme
Seg
men
tati
on
Not useful when A vast array of
variables, with no theoretical association are forced into analysis just to see what turns up
The variables have inadequate reliability. This lack of stability of measurement affects the meaningfulness of the derived factors.
Approaches to Factor Analytic Techniques
Exploratory• Mathematically driven
technique• Seeks to identify the
underlying structure of a set of items or variables
• Use of scholarly intuition to figure out what the factors mean
Confirmatory• Starts with a theory of
what you expect to confirm (a priori)
• Do the items load as you expected on the factors that you predicted?
• Much more involved Structural Equation Modeling approach—test of model fit
Methodological Considerations1. Selection of variables
2. Size of sample
3. Reliability of measures
4. Appropriateness of using Factor Analytic techniques (given the goal of the research)
5. Choice of method (how to extract the factors)
6. How many factors to retain
7. Methods of rotation (to ease interpretability)
Hagarty, K. Y., Kromrey, J. D., Ferron, J. M., & Hines, C. V. (2004). Selection of variables in exploratory factor analysis: An empirical comparison of a stepwise and traditional approach. Psychomtrika, 69(4), 593-611.
Methodological Considerations
1.Selection of variables
Hagarty, K. Y., Kromrey, J. D., Ferron, J. M., & Hines, C. V. (2004). Selection of variables in exploratory factor analysis: An empirical comparison of a stepwise and traditional approach. Psychomtrika, 69(4), 593-611.
Assumptions and Requirements of Factor Analytic Techniques
• More than one variable involved• Sample acquired through random selection• Robust bivariate relationships among variables• Variables are measured using either interval or
ratio (or ordinal—quasi-interval?) level data• Data approximate a normal distributions
(multivariate normality is also nice)• Relationships among variables are linear• Variables are measured reliably • No multicolinearity (e.g., bivariate r above 0.90)• Few missing observations• “Large” number of observations
Methodological Considerations1. Selection of variables
2. Size of sample
3. Reliability of measures
4. Appropriateness of using Factor Analytic techniques (given the goal of the research)
5. Choice of method (how to extract the factors)
6. How many factors to retain
7. Methods of rotation (to ease interpretability)
Hagarty, K. Y., Kromrey, J. D., Ferron, J. M., & Hines, C. V. (2004). Selection of variables in exploratory factor analysis: An empirical comparison of a stepwise and traditional approach. Psychomtrika, 69(4), 593-611.
Size of sampleWhat is a reasonable sample size? How many
observations do you need?• Old school: Ten observations per planned
extracted factor (with a minimum of 100 recommended)
• “More is better” rule. Similar reasoning as other parametric statistical techniques, but less can be okay under some circumstances.
• Recently, it is more recognized that smaller samples can be reasonably factor analyzed, but this is something still hotly debated.
Methodological Considerations1. Selection of variables
2. Size of sample
3. Reliability of measures
4. Appropriateness of using Factor Analytic techniques (given the goal of the research)
5. Choice of method (how to extract the factors)
6. How many factors to retain
7. Methods of rotation (to ease interpretability)
Hagarty, K. Y., Kromrey, J. D., Ferron, J. M., & Hines, C. V. (2004). Selection of variables in exploratory factor analysis: An empirical comparison of a stepwise and traditional approach. Psychomtrika, 69(4), 593-611.
Reliability of measures• Factor analysis is a correlational technique
(multiple regression)
• Low reliabilities attenuate correlations
• Low reliabilities introduce “noise” and obscure “signal” for the factors you are trying to detect and extract
Researcher as Quality Control
Methodological Considerations1. Selection of variables
2. Size of sample
3. Reliability of measures
4. Appropriateness of using Factor Analytic techniques (given the goal of the research)
5. Choice of method (how to extract the factors)
6. How many factors to retain
7. Methods of rotation (to ease interpretability)
Hagarty, K. Y., Kromrey, J. D., Ferron, J. M., & Hines, C. V. (2004). Selection of variables in exploratory factor analysis: An empirical comparison of a stepwise and traditional approach. Psychomtrika, 69(4), 593-611.
Appropriateness of Factor Analysis• Test development and instrument validation
– Create composites/sub-scales for psychometric instruments
– Detect underlying structures within• Construct validity • Evaluation of a theory
• Data reduction– Reduce multiple variables to a smaller group, while
maintaining the diversity of information offered.– Demonstrate that multiple instruments test the same
thing
demonstrate that items load on one factor, or no factors, or multiple factors
Methodological Considerations1. Selection of variables
2. Size of sample
3. Reliability of measures
4. Appropriateness of using Factor Analytic techniques (given the goal of the research)
5. Choice of method (how to extract the factors)
6. How many factors to retain
7. Methods of rotation (to ease interpretability)
Hagarty, K. Y., Kromrey, J. D., Ferron, J. M., & Hines, C. V. (2004). Selection of variables in exploratory factor analysis: An empirical comparison of a stepwise and traditional approach. Psychomtrika, 69(4), 593-611.
Partitioning Variance
1. Variance common to other variables
2. Variance specific to that variable
3. Random measurement error
Most common methods of extracting factors?
Common Factor Analysis (CFA)
Assumption: The factors explain the correlations among the variables (variance in common)
Finds common variance among many items, groups it, and then it must be appropriately labeled
Goal: To find the fewest number of factors that account for the relationships among variables
Kahn 2006
Unique variance
(item)
Common variance
Unique variance
(item)
Unique variance
(item)
CFA considers this
variance
DeCoster (1998) Overview of Factor Analysis
Principal Components Analysis (PCA)Assumption: Components
explain the variance in common among the variables and the amount of unique variance (item & error) present
Goal: To find the fewest components that account for the relationships among variables
Unique variance
(item+error)
Unique variance
(item+error)
Unique variance
(item+error)
Comparisons
Common Factor Analysis
• Seeks the factors that account for the common variance among the variables
• Used for Exploratory Factor Analysis (EFA) or Confirmatory Factor Analysis (CFA)
• Easier to generalize to other samples/populations since the unique and error variance of items isn’t considered
• Most often used to detect underlying structures among variables.
Principal Components Analysis
• Seeks factors that account for all of the common and other variance among the variables
• Harder to generalize since other sources of variance (that are item specific and not shared) are included in the model
• Most often used for data reduction to use in research
Factor Analytic TechniquesItem 1
Item 4
Item 5
Item 8
Item 7
Factor 1
Item 2
Item 3
Item 6
Item 10
Item 9
Factor 2
Latent Variables
(unobserved)
What factors exist among the variables?
To what degree are the variables (items) related to the factors that were extracted?
FACTOR LOADINGS
Exploratory questions:
Kahn 2006
unique
unique
unique
unique
unique
unique
unique
unique
unique
unique
Observed variables
Common Factor Analysis• CFA takes into account shared (common) and
item specific variance and uses the squared multiple correlation (R squared) as the measure of communality.
• Communality is the variance in one variable that is shared with the other variables.
• The factors extracted by CFA, therefore, explain the shared variance common to more than one variable.
Common Factor Analysis1. Variance common to other variables
Multicultural Counseling Inventory—Item 6:
“I include the facts of age, gender roles, and socioeconomic status in my understanding of different minority cultures.”
The measured overlap (R square) between this item and the other items on the MCI is the communality.
Common Factor Analysis
Partitions variance for that variable, that is in common with other variables. How?
Uses Multiple Regression.
a. Use each item as an outcome in MR
b. Use all other items as predictors
c. Finds the communality among all of the variables, relative to one another
Common Factor Analysis
Predictors:
Item 2
Item 3
Item 4
Item 5
Item 6
Item 7
Item 8
Item 9
Item 10
Outcome:
Item 1
The R square is the average shared variance for that item with the other items
Item 1
Predictors:
Item 1
Item 3
Item 4
Item 5
Item 6
Item 7
Item 8
Item 9
Item 10
Outcome:
Item 2
The average R square is the average shared variance for that item with the other items
Common Factor Analysis
Item 2
Predictors:
Item 1
Item 2
Item 4
Item 5
Item 6
Item 7
Item 8
Item 9
Item 10
Outcome:
Item 3
The average R square is the average shared variance for that item with the other items
Common Factor Analysis
Item 3
How is communality reported with CFA?
Item 1 Item 2 Item 3 Item 4 Item 5 Item
Item 1 .76
Item 2 .60 .56
Item 3 .43 .76 .87
Item 4 .34 .45 .64 .56
Item 5 .33 .32 .34 .65 .52
Item 6 .82 .81 .45 .57 .33 .41
Squared multiple correlations (R square) are on the diagonal of the correlation matrix
What makes a good factor?
• It is consistent with the literature regarding past investigations of variable relationships
• It is easy to understand and interpret
• It adheres to the “simple structure” model
Principal Component Analysis
Data reduction
Principal Component AnalysisItem 1
Item 4
Item 5
Item 8
Item 7
Component 1
Item 2
Item 3
Item 6
Item 10
Item 9
Component 2
How many components are there that can account for
all or most of the information contained in the original data?
Kahn 2006
unique
unique
unique
unique
unique
unique
unique
unique
unique
unique
How is communality reported with PCA?
Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
Item 1 1.0
Item 2 .71 1.0
Item 3 .62 .76 1.0
Item 4 .34 .45 .64 1.0
Item 5 .33 .32 .34 .65 1.0
Item 6 .82 .81 .45 .57 .33 1.0
CFA vs. PCA• Common factor analysis and principal
components analysis often yield similar results when sample sizes are large and/or if item communalities are large.
• Common factor analysis is preferred in situations in which these criteria are not met, especially when the researcher wishes to better understand the latent variables that underlie a mass of items.
Factor Analytic Family of Techniques
Metaphors for extraction of factors/components
• With each extraction of a component, less and less variance is unaccounted for.
12
3 4 5 6 7 8
Factor Analysis MetaphorITEM POOL: Variance-covariance matrix for an instrument Extracts the
shared variation only (i.e., plusses)
First factor+ + + + - - + + +
+ - - + - - + + +
+ + + + - - + + +
+ - - + - - + - -
+ + + + + + + + + + + + + + +
+ + +
+ + +
- -
+ - - + - -
- -
+ - - + - - + - - + + +
+ + +
Extracts the shared variation only (i.e., plusses)
+ + +
+
+
+
+ + + +
Second factorITEM POOL: There is still shared variance left, but it is different than the first batch
The Principle of Parsimony
• Goal: We often want to use the smallest number of separate variables to convey the most information about the relationships among constructs.
“Less is more”
Kahn 2006
Methodological Considerations1. Selection of variables
2. Size of sample
3. Reliability of measures
4. Appropriateness of using Factor Analytic techniques (given the goal of the research)
5. Choice of method (how to extract the factors)
6. How many factors to retain?
7. Methods of rotation (to ease interpretability)
Hagarty, K. Y., Kromrey, J. D., Ferron, J. M., & Hines, C. V. (2004). Selection of variables in exploratory factor analysis: An empirical comparison of a stepwise and traditional approach. Psychomtrika, 69(4), 593-611.
How many factors to retain?If you keep letting the program extract
factors, it will extract as many factors as there are items.
So how do you decide how many factors to extract?
Bryant & Yarnold (1995). Principal-Components and Factor Analysis from Grimm & Yarnold’s (Eds.) Reading and Understanding Multivariate Statistics
You want the fewest factors necessary to account for the most variance.
Factor Analytic techniques will give you as many factors as you want (even if they’re complete nonsense). The aim is to find the real factors that are consistent with the theoretical structure, not just factors that pop up and have no logical explanation.
Ferketich & Muller (1999) Readings in Research Methodology, Second Edition
How many factors to retain?
A priori criterion
• Replication criterion
• Percentage criterion
Stopping rules
• Kaiser rule
• Catell’s scree plot
• Parallel analysis
Bryant & Yarnold (1995). Principal-Components and Factor Analysis from Grimm & Yarnold’s (Eds.) Reading and Understanding Multivariate Statistics
A priori criterion1. When you are replicating research and
you want to use the same number of factors to retain as previous researchers.
2. You decide a cut-off point, based on some theoretical rationale (e.g., retain factors until 80% of the variance is explained by the extracted factors).
Eigenvalues
The eigenvalue is the variance in every variable that is accounted for by the factor in question.
The sum of all eigenvalues = number of variables/items in component analysis
Ferketich & Muller (1999) Readings in Research Methodology, Second Edition
How many factors to retain?Kaiser criterion - Retain all
factors with an Eigenvalue greater than 1.0)
This sets the limit so that a component must account for at least as much variance as a single variable (to be considered useful).
Kahn 2006
(For CFA, which SPSS calls principal axis factoring, this would be “factor” instead of “component”)
How many factors to retain?Catell’s scree test: Retain all
factors with a big drop (change in slope). Can be combined with the Kaiser criterion (Factors with an eigenvalue greater than 1.0)
This includes the limit so that a factor must show that it accounts for a chunk of unique variance that is more than the variance of a single item.
Parallel Analysis
• You generate a scree plot (with eigenvalues) based on random data that uses the same number of variables (items) and the same number of cases.
• Retain the factors with eigenvalues higher than the random eignenvalues.
• Not an option in SPSS
Kahn 2006
Factor Rotation
Obtaining a clearer pattern of factor loadings
The Goal of Rotating Factors
To create high factor loadings for each item on one factor
And create low factors loadings for all other factors
THIS COMBINATION OF CHARACTERISTICS IS REFERRED TO AS THE SIMPLE STRUCTURE.
IT MAKES THE FACTORS MORE INTERPRETABLE
Ferketich & Muller (1999) Readings in Research Methodology, Second Edition
Factor Structure Coefficients• These are correlations between the item and it’s
associated factor.
• The simple structure dictates that factor coefficients are best if they are very high (in reference to their own factor) and very low (in reference to any other retained factor).
• Rotating factors will change their structure coefficients, thus better approximating the simple structure being sought.
Thurston’s Rule
• Good items (variables) should only load onto one factor
• Items should load on that one factor at least a magnitude of 0.30.
• The item should not have an eigenvalue of less than 1.0
Item 1
Item 2
Item 3
Item 5
Item 4
Factor 1
Item 7
Item 8
Item 6
Item 1
Item 3
Item 4
Item 7
Item 2
Item 5
Item 6
Item 8
Factor 2
Distillation
Kirby, J.R., Parrila, R., & Pfeiffer, S. (2003). Naming speed and phonological awareness as predictors of reading development. Journal of Educational Psychology, 93(3), 453-464.
Kirby, J.R., Parrila, R., & Pfeiffer, S. (2003). Naming speed and phonological awareness as predictors of reading development. Journal of Educational Psychology, 93(3), 453-464.
.96
.90
.77
.63
.90
.75
.47
Picture naming
Color naming
Sound isolation
Phoneme elision
Blending onset-rime
Blending phonemes
Rapid automatized
naming
Phonological awareness
-.10
-.05
.06
.15
.03
-.05
Rapid automatized
naming
Blending phonemes
Blending onset-rime
Phoneme elision
Sound isolation
Color naming
Picture naming
Phonological awareness
Factor 1
Factor 2
Factor 1
Factor 2
Common rotations
Orthogonal - factors are at 90 degree angles (i.e., uncorrelated)
• *Varimax
• Quartimax
• Equimax
*most popular
Oblique-Factors maybe correlated with each other.
Ferketich & Muller (1999) Readings in Research Methodology, Second Edition
Factor Extraction
Because the first factor extracted accounts for the most variance among the variables, the next factor extracted will capture variance not accounted for by the first factor. This helps the latent variables be “orthogonal,” meaning that the extracted factors are generally uncorrelated with each other.
Orthogonal Rotations
Varimax: Most common. Maximizes loadings on one factor while minimizing loadings on other factors.
Quartimax: Uncommon. Maximizes factor loading on the first factor only.
Equimax: Also less common. Combines other techniques and because of this, is more difficult to interpret than the other two options.
Ferketich & Muller (1999) Readings in Research Methodology, Second Edition
Oblique rotationsNot used frequently but should be when factors
are correlated.
Promax is the most popular of the oblique methods• First rotates orthogonally• Then followed by oblique rotation• Minimizes small loadings• Simple structure is best approximated
Ferketich & Muller (1999) Readings in Research Methodology, Second Edition
How to decide?
• You want what will give you the most interpretable result, with the simplest solution, consistent with an underlying theoretical structure.
• You can use different rotational techniques and compare results. Similar results strengthen confidence in the outcome.
Ferketich & Muller (1999) Readings in Research Methodology, Second Edition
How to clarify factor loadings using rotation
Item 1Item 2
Item 3
Factor 1 axis
Factor’s 2 axis
Item 4
Rotation
Item 1Item 2
Factor 1 axis
Factor’s 2 axis
Item 4
Item 1Item 2
Factor 1 axis
Factor’s 2 axis
Rotation
Item 4
Item 1Item 2
Factor 1 axis
Factor’s 2 axis
Rotation
Item 4
Item 1Item 2
Factor 1 axis
Factor’s 2 axis
Rotation
Item 4
Item 1Item 2
Factor 1 axis
Factor’s 2 axis
Rotation
Item 4
Item 1Item 2
Factor 1 axis
Factor’s 2 axis
Rotation
Item 4
Factor Rotation
Item 1Item 2
Factor 1axis
Item 3
Rot
ated
Fac
tor 1
Factor 2axis
Rotated Factor 2Item 4
• Factor loading coefficients define the eigenvector. The factor loading coefficient represents the correlation between the item and the eigenvector
Eigenvectors
Variables 1 2
1 .62 .52
2 .54 .25
3 .25 .59
4 .39 .66
5 .35 .68
Before orthogonal rotation
After orthogonal rotation
• Factor loading coefficients define the eigenvector. The factor loading coefficient represents the correlation between the item and the eigenvector
Eigenvectors
Variables 1 2
1 .65 .45
2 .62 .09
3 .05 .694 .02 .685 .10 .82
Factor coefficients: before and after
Eigenvectors
Variables 1 2
1 .65 .45
2 .62 .09
3 .05 .69
4 .02 .68
5 .10 .82
Eigenvectors
Variables 1 2
1 .62 .52
2 .54 .25
3 .25 .59
4 .39 .66
5 .35 .68
Uses of Factor Analytic Techniques
• All of the techniques associated with creating factors from many variables are sample specific; however, the better the quality of your sample (size, representativeness, etc.), the more likely your results will generalize to other samples, and theoretically, to the population of interest.
Floyd & Widaman (1995)
“Thus, common factor analysis can provide valuable insights into the multivariate structure of a measuring instrument, isolating the theoretical constructs [i.e., factors] whose effects are reflected in responses on the instrument.” (p. 287)
Cross Validation
• Randomly divide your sample (2/3, 1/3)
• Try to replicate factor solutions across groups
• Explore for part of the sample, then confirm with the other portion
EFA vs. CFA
Exploratory • Find and retain
factors (no test of significance, per se)
Confirmatory• See how well the
constructed model fits the data
Chi-square goodness of fit test
Confirmatory Factor Analysis and Model Fit
The researcher specifies in advance (predicts) how many factors will be found and which items should load on which factors.
Factor 1
Factor 2Factor 3
Factor 4
Links and Resources
• http://www.siu.edu/~epse1/pohlmann/factglos/