AN INTRODUCTION TO LATENT CLASS AND LATENT PROFILE …
Transcript of AN INTRODUCTION TO LATENT CLASS AND LATENT PROFILE …
AN INTRODUCTION TO LATENT CLASS AND
LATENT PROFILE ANALYSIS
Social Science Research Commons
Indiana University Bloomington
Workshop in Methods
BETHANY C. BRAY, PH.D.
• Associate Director for Scientific and Infrastructure Development, Institute for Health Research and Policy
• The University of Illinois at Chicago
OVERVIEW
• Conceptual introduction to latent class analysis (LCA)
• An example:Latent classes of adolescent drinking behavior
• Parameters estimated in LCA
• Technical considerations:Model identification, model selection
• Software options
OVERVIEW
• Including grouping variables
• Predicting latent class membership
• Predicting a distal outcome
• Considerations with latent profile analysis
• Resources
• Question & Answer
LATENT CLASS ANALYSIS(LCA)
ABBREVIATIONS
• LCA = latent class analysis
• Static, categorical latent variable measured with categorical items
• LPA = latent profile analysis
• Static, categorical latent variable measured with continuous items
• LTA = latent transition analysis
• Dynamic, categorical latent variable
CONCEPTUAL INTRODUCTION: LCA
THE BASIC IDEAS
• Individuals can be divided into subgroups based on unobservable construct
• The construct of interest is the latent variable
• Subgroups are called latent classes
THE BASIC IDEAS
• Individuals can be divided into subgroups based on unobservable construct
• The construct of interest is the latent variable
• Subgroups are called latent classes
• True class membership is unknown
• Unknown due to measurement error
• Measurement of the construct is typically based on several categorical indicators
THE BASIC IDEAS
• Individuals can be divided into subgroups based on unobservable construct
• The construct of interest is the latent variable
• Subgroups are called latent classes
• True class membership is unknown
• Unknown due to measurement error
• Measurement of the construct is typically based on several categorical indicators
• Latent classes are mutually exclusive & exhaustive
ESTIMATED PARAMETERS
• Latent class prevalences
• e.g., probability of membership in HIGH DEPRESSION latent class
• Item-response probabilities
• e.g., probability of reporting Felt Lonely given membership in HIGH DEPRESSION latent class
LATENT CLASSES OFADOLESCENT DRINKING BEHAVIOR
DRINKING IN 12TH GRADE
• Data from 2004 cohort of Monitoring the Future public release
• n = 2490 high school seniors who answered at least one question about alcohol use (48% boys, 52% girls)
• Goals of the study:
• Alcohol use behavior among U.S. 12th graders
• Gender differences in measurement and behavior
• Predict behavior from skipping school and grades
DRINKING IN 12TH GRADE
Item Proportion ‘Yes’
Lifetime alcohol use 82%
Past-year alcohol use 73%
Past-month alcohol use 50%
Lifetime drunkenness 57%
Past-year drunkenness 49%
Past-month drunkenness 29%
5+ drinks in past 2 weeks 26%
Seven indicators of drinking behavior
WE WILL USE LCA TO…
• Identify and describe underlying classes of drinking behavior in U.S. 12th grade students
THE 5-CLASS MODEL
Probability of ‘Yes’ response
Item
Class 1
(18%)
Class 2
(22%)
Class 3
(9%)
Class 4
(17%)
Class 5
(34%)
Lifetime alcohol use .00 1.00 1.00 1.00 1.00
Past-year alcohol .00 .61 1.00 1.00 1.00
Past-month alcohol .00 .00 1.00 .39 1.00
Lifetime drunk .00 .24 .29 1.00 1.00
Past-year drunk .00 .00 .00 1.00 1.00
Past-month drunk .00 .00 .00 .00 .92
5+ drinks past 2 wk .00 .00 .16 .00 .73
What would you name
these 5 classes?
THE 5-CLASS MODEL
Probability of ‘Yes’ response
Item
Non-
Drinkers
Experi-
menters
Light
Drinkers
Past
Partiers
Heavy
Drinkers
Lifetime alcohol use √ √ √ √
Past-year alcohol √ √ √ √
Past-month alcohol √ √
Lifetime drunk √ √
Past-year drunk √ √
Past-month drunk √
5+ drinks past 2 wk √
What would you name
these 5 classes?
GRAPHICAL REPRESENTATION
Drinking
Classes
Lifetime
Use
Past-
Year
Use
5+
Drinks…
SOME TECHNICAL DETAILS: LCA
PARAMETERS ESTIMATED
LATENT CLASS NOTATION
• Y represents the vector of all possible response patterns
• y represents a particular response pattern
• Example: y = (Y, Y, N, N, N, N, N)
• X represents the vector of all covariates of interest
• x represents a particular covariate
LATENT CLASS NOTATION
• The latent class model can be expressed as
where
( )
|
1 1 1
[ | ] ( )m
m m
i m
m
RMKI y r
i i i i c i mr c
c m r
P Y y X x x
0 1 1
1
0 1 1
1
exp[ ]( ) [ | ]
1 exp[ ]
i i i
i
i i i
c c i pc ip
c i i i i i K
c c i pc ip
c
x xP C c
x x
x X x
LATENT CLASS NOTATION
…with (c = 1,2,…,K) latent classes and (m = 1,2,…,M) indicators, each with (rm = 1,2,…,Rm) response options.
= probability of membership in latent class c(latent class membership probabilities)
= probability of response rm to indicator m,conditional on membership in latent class c(item-response probabilities)
( )
|m m
m
I y r
mr c
c
ITEM-RESPONSE PROBABILITIES
• parameters express the relation between…
• The discrete latent variable in an LCA and
• The observed indicator variables
• Similar conceptually to factor loadings
• Basis for interpretation of latent classes
• Are probabilities (between 0 and 1)
ITEM-RESPONSE PROBABILITIES
• parameters analogous to factor loadings; both…
• Express relation between manifest and latent variables
• Form basis for interpreting latent structure
• But…
• Factor loadings are -weights
• parameters are probabilities
PARAMETERS
• 0 ≤ ≤ 1
• When latent variable and manifest variable completely correspond, = 0 OR = 1
• When latent variable does not at all predict manifest variable, = marginal probability for all classes
• So, if we are trying to measure a latent variable, what kind of ’s do we like?
CHARACTERISTICS OF PATTERNS OF PARAMETERS
• Homogeneity: degree to which parameters for a particular latent class are close to 0 and 1
• Latent class separation: degree to which latent classes can clearly be distinguished from each other
CHARACTERISTICS OF PATTERNS OF PARAMETERS
Probability of correctly
performing practical task
Latent
Class 1
Latent
Class 2
Task 1 .10 .91
Task 2 .15 .90
Task 3 .05 .89
Task 4 .10 .95
Task 5 .12 .90
High homogeneity + High latent class separation
CHARACTERISTICS OF PATTERNS OF PARAMETERS
Probability of correctly
performing practical task
Latent
Class 1
Latent
Class 2
Task 1 .80 .91
Task 2 .82 .90
Task 3 .81 .89
Task 4 .80 .95
Task 5 .84 .90
High homogeneity + Lower latent class separation
MODEL IDENTIFICATION
MODEL IDENTIFICATION
• What is “maximum likelihood estimation”?
• Likelihood function expresses likelihood of observed data, given model being fit and as a function of all possible parameter estimates
• “Winning” parameter estimates (if identified): the set that maximizes the likelihood
DEALING WITH IDENTIFICATION IN PRACTICE
• Many estimation procedures require initial values for the parameters to “kick off” the estimation procedure
• If different starting values produce very different estimates and different G2s, model is not well-identified
• Run many different sets of starting values, say 100 or more
• Look at distribution of G2 values
MODEL SELECTION
ABSOLUTE VS. RELATIVE MODEL FIT
• Absolute model fit model fit refers to whether a specified LCA model provides an ‘adequate’ representation of the data
• Adequate, according to some test statistic
• To test absolute model fit, we need the distribution of the test statistic under the null hypothesis
• H0: the specified model fits the data
COMMON TEST STATISTIC: G2
• As in many contingency table methods, LCA computes predicted response pattern proportions according to the model and estimated parameters
• These predicted response pattern proportions are compared to the observed response pattern proportions
• This comparison is expressed in the likelihood ratio statistic G2
ISSUES WITH THIS APPROACH
• There are issues with this approach to model selection in LCA, and especially in LTA
• When data are sparse, G2 not distributed as chi-square
• This makes it hard to test the fit of model
ABSOLUTE VS. RELATIVE MODEL FIT
• Relative model fit refers to deciding whether Model A or Model B is better
• AIC, BIC good tools for relative model fit
• These are information criteria (penalized log-likelihood)
• Optimize balance between fit and parsimony
• Usually scaled so that smaller AIC, BIC is better
AIC AND BIC
• p = number of parameters estimated in the model
• n = sample size
2
2
2
[log( )][ ]
AIC G p
BIC G n p
DIFFERENCE IN G2 VS. BLRT
• It is tempting to calculate the G2 difference for two competing models
• For example, 3 vs. 4 classes
• But test is not appropriate because we do not know the correct reference distribution for the test
• One solution: bootstrap the G2 difference
• H0: 3 class model sufficient
• H1: 4 classes required
SELECTING THE NUMBER OF DRINKING CLASSES
• BLRT not significant for 6- vs. 5-class model, indicating 6 classes are not needed
Classes G2 df AIC BIC BLRT
1 9510 120 9524 9564 N/A
2 3019 112 3049 3137 .01
3 911 104 957 1091 .01
4 209 96 271 452 .01
5 4 88 81 308 .01
6 4 80 98 372 .08
7 3 72 113 434 N/A
SOFTWARE OPTIONS
MAIN OPTIONS
• SAS (LCA, LTA)
• Stata (LCA)
• Mplus (LCA, LPA, LTA)
• Latent Gold (LCA, LPA, LTA)
• R – poLCA (LCA)
• http://www.john-uebersax.com/stat/soft.htm
INCLUDING GROUPING VARIABLES
MULTIPLE-GROUPS LCA
• Two reasons to include a grouping variable:
• To explore measurement invariance
• e.g., “Do the items map onto the latent construct in the same way for males and females?”
• To divide sample into groups for comparison purposes
• e.g., “How does the probability of membership in the HEAVY DRINKERS latent class differ in the experimental and control groups?”
MULTIPLE-GROUPS LCA
• parameters may vary as a function of the grouping variable
• Allows test of measurement invariance
• parameters may vary as a function of the grouping variable
• Allows comparison of latent class prevalences
WE WILL USE LCA TO…
• Identify and describe underlying classes of drinking behavior in U.S. 12th grade students
• Include a grouping variable (i.e., sex)
• Test for measurement invariance across males and females
• Examine sex differences in prevalence of behavior types
• Models with parameters free and constrained equal across groups are statistically nested
• Free parameters allow measurement to differ across groups
• Constrained parameters equate corresponding measurement parameters across groups
In general, two models are nested if the simpler model can be arrived at by imposing parameter restrictions on the more complex model.
MEASUREMENT INVARIANCEACROSS GROUPS
TESTING MEASUREMENT INVARIANCE
• H0: Simpler model is adequate
• H1: Simpler model is not adequate
• Often, we hope to fail to reject the null hypothesis
• If non-significant, strong support for measurement invariance
• Our result not significant…• G2=18 with 35 df, p>.05
• Measurement invariance is plausible
• Keep parameter restrictions
SEX DIFFERENCES IN CLASS PREVALENCES
Class Males Females
Nondrinkers 18% 18%
Experimenters 22% 23%
Light Drinkers 9% 9%
Past Partiers 13% 21%
Heavy Drinkers 38% 28%
Sex differences in probabilities of membership in drinking classes: parameters
SEX DIFFERENCES IN CLASS PREVALENCES
Class Males Females
Nondrinkers 18% 18%
Experimenters 22% 23%
Light Drinkers 9% 9%
Past Partiers 13% 21%
Heavy Drinkers 38% 28%
Sex differences in probabilities of membership in drinking classes: parameters
SEX DIFFERENCES IN CLASS PREVALENCES
Class Males Females
Nondrinkers 18% 18%
Experimenters 22% 23%
Light Drinkers 9% 9%
Past Partiers 13% 21%
Heavy Drinkers 38% 28%
Sex differences in probabilities of membership in drinking classes: parameters
PREDICTING LATENT CLASS MEMBERSHIP
WE WILL USE LCA TO…
• Identify and describe underlying classes of drinking behavior in U.S. 12th grade students
• Include a grouping variable (i.e., sex)
• Test for measurement invariance across males and females
• Examine sex differences in prevalence of behavior types
• Explore whether grades and skipping school predict drinking class membership
GRAPHICAL REPRESENTATION
Skipping
SchoolDrinking
Classes
Lifetime
Use
Past-
Year
Use
5+
Drinks…
• Regress latent class variable on predictors
• Logistic regression with latent outcome
• parameters express relation between covariates and class membership
LCA WITH COVARIATES
• is a logistic regression coefficient influencing the log-odds that an individual falls into Class 1 relative to Class 2
INTERPRETING BETA PARAMETERS
101 11
2
( )log
( )
ii
i
xx
x
11
TRANSFORMING BETAS TO ODDS RATIOS
• Exponentiated parameters are odds ratios
• They reflect the increase in odds of class membership, relative to the reference class, corresponding to a one-unit increase in the covariate
LCA WITH COVARIATES:SKIPPING SCHOOL, GRADES
• Skipped school in past month (dummy coded; 33% yes)
• Grades (standardized)
• Covariates added in separate models here, but can be added simultaneously to control for effects
• Non-drinkers class specified as reference group for multinomial logit model
OVERALL TESTS OF SIGNIFICANCE
• Skipped school:
• Change in 2logL (4 df) = 162.1
• p<.0001
• Grades:
• Change in 2logL (4 df) = 56.8
• p<.0001
BETAS AND ODDS RATIOS
Class
Skipped School Grades
β OR β OR
Nondrinkers --- 1.0 --- 1.0
Experimenters 0.4 1.5 -0.2 0.8
Light Drinkers 0.7 2.0 -0.4 0.7
Past Partiers 0.9 2.5 -0.3 0.7
Heavy Drinkers 1.6 5.0 -0.5 0.6
BETAS AND ODDS RATIOS
Class
Skipped School Grades
β OR β OR
Nondrinkers --- 1.0 --- 1.0
Experimenters 0.4 1.5 -0.2 0.8
Light Drinkers 0.7 2.0 -0.4 0.7
Past Partiers 0.9 2.5 -0.3 0.7
Heavy Drinkers 1.6 5.0 -0.5 0.6
The odds of membership in the Heavy Drinkers class relative to the Non-Drinkers class is 5 times higher for adolescents who skipped school relative to those who did not skip.
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Did not Skip Skipped
Nondrinkers
Experimenters
Drinkers
Bingers
Heavy Drinkers
Past Partiers
RELATION BETWEEN SKIPPING SCHOOL AND DRINKING CLASSES
RELATION BETWEEN GRADES AND DRINKING CLASSES
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
-2.0 -1.6 -1.2 -0.8 -0.4 0.0 0.4 0.8 1.2 1.6 2.0
Grades
Nondrinkers Experimenters Drinkers Bingers Heavy DrinkersPast
Partiers
PREDICTING A DISTAL OUTCOME
MOTIVATION
• Now, we are interested in predicting later academic achievement from drinking subtypes
• Latent class variable: drinking
• Distal outcome: academic achievement
• Recent research has been aimed at developing new approaches to estimate these types of associations
GRAPHICAL REPRESENTATION
Drinking
Classes
Lifetime
Use
Past-
Year
Use
5+
Drinks…
Academic
Achievement
USING CLASSES TO PREDICT AN OUTCOME
• Broadly categorized as 1-step and 3-step approaches based on terminology by Vermunt
• 1-step approaches are sometimes called model-based approaches
• 3-step approaches are sometimes called classify-analyze approaches
• These approaches are based on posterior probabilities
TWO TRADITIONAL APPROACHES
• Maximum probability assignment, also known as modal probability assignment
• Multiple pseudo-class draws assignment
THREE MODERN APPROACHES
• 3-step approach with adjustment for classification error using specialized weights, often referred to as the “BCH approach”
• Model-based approach using Bayes’ Theorem, often referred to as the “LTB approach”
• 3-step approach based on multiple imputation, which relies on a model, often referred to as the “inclusive classify-analyze approach”
MODERN APPROACH #1
• Classification error correction using the BCH approach
• Good references to consider:
Bakk, Z., & Vermunt, J. K. (2016). Robustness of stepwise latent class modeling with continuous distal outcomes. Structural Equation Modeling, 23, 20-31.
Dziak, J. J., Bray, B. C., Zhang, J.-T., Zhang, M., & Lanza, S. T. (2016). Comparing the performance of improved classify-analyze approaches for distal outcomes in latent profile analysis. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 12, 107-116.
MODERN APPROACH #1
• Corrects the traditional 3-step approach by accounting for classification error
• Uses the idea that the joint probability distribution of the distal outcome Y and the assigned class variable W is a linear combination of the joint probability distribution of Y and the true latent class variable C, weighted by classification error probabilities
• In other words, it weights the outcome analysis model to adjust for classification error
MODERN APPROACH #1
• Some important notes:
• BCH weights are NOT like survey weights, they CANNOTbe imported into other software packages and used as weights in generalized models; they MUST be used in a software package/routine designed to handle BCH weights
• Works well for both binary and continuous outcomes
• Has been shown to be fairly robust to violations of homoscedasticity of the outcome across classes
• Implemented robustly in Mplus and Latent GOLD and implemented limitedly in SAS
MODERN APPROACH #1
• Some important notes:
• Can only be used with a single latent class variable
LATENT PROFILE ANALYSIS(LPA)
TO MAKE A LONG STORY SHORT…
• LPA is conceptually the same as LCA
• Original cite for LPA is often attributed to:
• Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston, MA: Houghton Mifflin.
• Just about everything from the LCA part of the workshop is relevant to LPA…
CONCEPTUAL INTRODUCTION: LPA
AN EXAMPLE RESEARCH QUESTION
• Are there distinct profiles of self leader perceptions?
ESTIMATED PARAMETERS
• Latent class prevalences
• e.g., probability of membership in the PROTOTYPICAL latent class
• Class-specific means
• e.g., mean of SINCERE indicator given membership in the PROTOTYPICAL latent class
• Class-specific variances
• e.g., variance of SINCERE indicator given membership in the PROTOTYPICAL latent class
ESTIMATED PARAMETERSFactors Items
Overall Item Means Prototypical Laissez-Faire Narcissistic
Anti-Prototypical
Latent Profile Membership Proportions
.31(n=151)
.38(n=181)
.18(n=87)
.13(n=64)
Within-Profile Item Means
Sensitivity Sincere 5.75 6.60 5.89 4.93 4.47
Compassionate 5.60 6.53 5.76 4.44 4.56
Sensitive 5.38 6.34 5.48 4.33 4.28
Warm 5.59 6.53 5.65 4.66 4.48
Sympathetic 5.34 6.39 5.53 3.90 4.30
Within-Profile Means 6.48 5.66 4.45 4.42
Intelligence Knowledgeable 5.57 6.39 5.21 5.73 4.41
Educated 5.72 6.43 5.30 6.14 4.71
Wise 4.97 5.64 4.55 5.33 4.10
Intellectual 5.43 6.22 5.07 5.67 4.23
Intelligent 5.64 6.26 5.26 6.13 4.57
Within-Profile Means 6.19 5.08 5.80 4.40
Dedication Motivated 5.86 6.55 5.62 6.14 4.51
Dedicated 6.00 6.71 5.80 6.13 4.67
Hardworking 6.12 6.69 5.86 6.52 4.96
Within-Profile Means 6.65 5.76 6.26 4.71
Tyranny Pushy 3.02 2.62 2.88 3.85 3.27
Manipulative 2.73 2.41 2.48 3.50 3.17
Conceited 2.55 2.29 2.31 3.27 2.90
Selfish 2.61 2.11 2.49 3.21 3.30
Within-Profile Means 2.36 2.54 3.46 3.16
ESTIMATED PARAMETERS
• Means and variances can be held constant or be made free-to-vary across classes
• Probably you want to leave the means free-to-vary because this allows the classes to be different from each other
• If you have estimation issues, one of the first thing to try, though, is restricting the variances to be equal across classes
• Greatly reduces the ‘unknowns’ in the model and can help with model identification
ESTIMATED PARAMETERS
• If we are trying to measure a latent variable, what kind of means and variances do we like?
LATENT PROFILE HOMOGENEITY
• In LCA, homogeneity is the degree to which the item-response probabilities for a particular latent class are close to 0 and 1
• In FA, homogeneity is the degree to which the factor loadings for a particular factor are close to -1 and 1
• Unlike in LCA and FA, when the latent variable completely predicts the manifest variables, the means aren’t any specific value(s)
LATENT PROFILE HOMOGENEITY
• Homogeneity is tied to both the within-class mean and the amount of within-class variance for each manifest variable
• But, for estimation purposes, usually we have to constrain the variances to be equal across classes
• Thus, homogeneity is not as straightforward as it is in LCA and FA
LATENT PROFILE SEPARATION
• The good news is that latent profile separation is still a very helpful concept to consider
• It is the degree to which the latent profiles can clearly be distinguished from each other
KEY ASSUMPTIONS
• Latent profile indicators are continuous and normally distributed within classes
• Why is this important?
• If they are not normally distributed, simulation studies in the context of GMM suggest you will over-extract the number of classes
• Bauer, D. J., & Curran, P. J. (2003). Distributional assumptions of growth mixture models: Implications for overextraction of latent trajectory classes. Psychological Methods, 8, 338-363.
KEY ASSUMPTIONS
• Latent profile indicators are independent within classes (conditional independence)
RESOURCES
LCA, LPA, LTA RESOURCES
• Recommended reading list included in the download of these slides
www.latentclassanalysis.com
LCA, LPA, LTA RESOURCES
• YouTube videos include…
• Intro to LCA
• Intro to LTA
• 1-and-1 webinar on LCA
• 1-and-1 webinar on LTA
QUESTIONS?
THANK YOU!!
• Bethany C. Bray, [email protected]
• Associate Director for Scientific and Infrastructure Development, Institute for Health Research and Policy, The University of Illinois at Chicago
• bcbray.com
• latentclassanalysis.com