Latent Class Analysis - Statistical Innovations
Transcript of Latent Class Analysis - Statistical Innovations
Latent Class Analysis
Foundation Entries
SAGE Research Methods Foundations
By: Jay Magidson, Jeroen K. Vermunt & John P. Madura
Published:2020
Length: 10,000 Words
DOI: http://dx.doi.org/10.4135/9781526421036
Methods: Latent Class Analysis
Online ISBN: 9781526421036
Disciplines: Anthropology, Business and Management, Criminology and Criminal Justice,
Communication and Media Studies, Counseling and Psychotherapy, Economics, Education, Geography,
Health, History, Marketing, Nursing, Political Science and International Relations, Psychology, Social
Policy and Public Policy, Social Work, Sociology, Science, Technology, Computer Science, Engineering,
Mathematics, Medicine
Access Date: September 10, 2020
Publishing Company: SAGE Publications Ltd
City: London
© 2020 SAGE Publications Ltd All Rights Reserved.
This PDF has been generated from SAGE Research Methods.
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
Abstract
Latent class (LC) analysis is a widely used method for extracting meaningful groups (LCs) from data.
The basic concept was introduced by Paul Lazarsfeld in 1950 for building typologies (or clusters) from
dichotomous variables as part of his more general latent structure analysis. In 1974, Leo Goodman
operationalized and extended LC analysis using maximum likelihood methods, which resolved previous
implementation problems. After 1995, extensions to traditional LC modeling took place, and Jeroen K.
Vermunt and Jay Magidson defined the LC model more generally as any model where some parameters differ
across unobserved subgroups called LCs. LC modeling has since become a general multivariate modeling
approach for revealing latent segments based on any set of observed indicators, across a wide range of
applications. In particular, LC generalizes traditional cluster, factor, and item response theory analyses and
also generalizes various kinds of regression modeling where the parameters that differ for different classes
are the regression coefficients. This entry discusses traditional LC modeling, tools for determining the number
of classes, approaches for identifying meaningful classes, and advanced LC regression models for analyzing
ratings and choice data. Several other topics—such as dealing with covariates using one-step and three-step
approaches, multilevel LC models, latent Markov models, and LC growth models—are also briefly discussed.
Introduction
Latent class (LC) analysis has become one of the most widely used methods for extracting meaningful groups
(LCs) from data. The concept of LCs was originally introduced by Paul Lazarsfeld (1950) in building typologies
(or clusters) from dichotomous variables as part of his more general latent structure analysis (Lazarsfeld &
Henry, 1968). However, implementation problems were substantial and not resolved until 1974 when Leo
Goodman operationalized and extended LC analysis in a major breakthrough using maximum likelihood
methods (Goodman, 1974a, 1974b).
Goodman’s algorithm, now referred to as an implementation of the expectation-maximization (EM) algorithm,
was made publicly accessible by Clifford Clogg’s computer program maximum likelihood latent structure
analysis (MLLSA). During the period 1974–1995, MLLSA was the primary LC program for extracting
underlying LCs from a set of categorical response variables (categorical indicators) as a method for
understanding the underlying structure of those variables.
In this entry, Goodman’s LC model is referred to as the traditional LC model. Numerous extensions to
traditional LC modeling took place after 1995, prompting Jeroen K. Vermunt and Jay Magidson (2004a)
to redefine the LC model more generally as any model where some parameters differ across unobserved
subgroups called LCs (or clusters or segments).
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 2 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
LC as a Measurement Model
Table 1 shows how the traditional LC model differs from other (traditional) latent variable measurement
models according to the scale types of the observed and latent variables (categorical vs. continuous).
Table 1. Traditional latent variable measurement/classification models.
Observed Indicators
Latent Variable
Continuous Categorical
Continuous Factor analysis Latent profile analysis
Categorical Item response theory Latent class analysis
LC modeling has evolved during the 21st century to become a general multivariate modeling approach for
revealing meaningful latent segments based on any set of observed indicators, not just categorical indicators,
and across a wide range of applications. In particular, LC generalizes traditional cluster, factor, and item
response theory (IRT) analyses in many ways and also generalizes various kinds of regression modeling
(including discrete choice models) where the parameters that differ for different classes are the regression
coefficients.
LC can be performed with cross-sectional or longitudinal data with response variables that are categorical,
continuous, or counts (binomial and/or Poisson) or combinations of these and other scale types. LC
clustering, discrete factor (DFactor) analysis, and LC regression analyses are generally conducted with cross-
sectional data, while LC growth modeling and mixture latent Markov (LM) or latent transition models can
provide valuable insights for longitudinal or panel data.
As used here, the term LC model refers to any statistical model conceptualized using one or more categorical
latent variables. Thus, LC modeling includes both traditional LC analysis and latent profile analysis (see Table
1), DFactor models containing two or more categorical latent variables, and hybrid models containing both
categorical and continuous latent variables such as the random intercept LC regression model illustrated later
to analyze ratings.
Example of LC Cluster Analysis
As an example of a traditional LC cluster analysis, Table 2 provides results from a 3-class LC model estimated
on responses from the 1982 General Social Survey (GSS; see McCutcheon, 1987). Two of the variables
ascertain the respondent’s opinion regarding (Y1) the purpose of surveys and (Y2) how accurate they are,
and the others are evaluations made by the interviewer of (Y3) the respondent’s levels of understanding of
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 3 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
the survey questions and (Y1) cooperation shown in answering the questions. Alan McCutcheon named the
classes “Ideals,” “Believers,” and “Skeptics.”
Table 2. Parameter estimates from a traditional 3-class latent class analysis.a
White Respondents (n = 1,202)
Class 1 Class 2 Class 3
Ideals Believers Skeptics
LC probabilities 0.62 0.20 0.18
Conditional probabilities
PURPOSE
Good 0.89 0.92 0.16
Depends 0.05 0.07 0.22
Waste 0.06 0.01 0.62
ACCURACY
Mostly true 0.61 0.65 0.04
Not true 0.39 0.35 0.96
UNDERSTAND
Good 1.00 0.32 0.75
Fair, poor 0.00 0.68 0.25
COOPERATE
Interested 0.95 0.69 0.64
Cooperative 0.05 0.26 0.26
Impatient/hostile 0.00 0.05 0.10
aData consist of four variables obtained from 1982 General Social Survey.
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 4 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
Comparing these classes with respect to their conditional probabilities shows that they clearly differ from
each other. The “Ideal” class (Class 1), representing an estimated 62% of the population, is favorable toward
surveys, approximately 90% agreeing that “Surveys serve a good purpose” (PURPOSE = Good), and are
rated by the interviewer as showing a “Good understanding of these survey questions” (UNDERSTAND =
Good).
The “Believers” (Class 2), representing an estimated 20% of the population, is similar to the “Ideals” in being
favorable toward surveys but differ from the Ideals in that they showed only a fair or poor understanding of
the questions (UNDERSTAND = Fair, poor). In contrast to both of these classes, the remaining 18%, called
“Skeptics” (Class 3), mostly state that “surveys are a waste of time” (PURPOSE = Waste).
This entry begins with a formal introduction to Goodman’s approach to LC (Goodman, 1974a, 1974b),
followed by the most important extensions. These extensions include (a) the ability to analyze a large
number of indicators, where the indicators are not only categorical (nominal and ordinal) indicators but also
continuous, count, and other scale types or any combination of these; (b) models for longitudinal multivariate
data that allow respondents to change from one LC (latent state) to another over time (LM or latent transition
models); (c) LC regression (including growth and conjoint models) for a repeated univariate response, where
the parameters that differ over classes are regression coefficients; and (d) the inclusion of covariates in any
LC model.
In addition, this entry discusses two recent approaches that are especially useful when analyzing a large
number of indicators and/or a large number of covariates—LC Tree modeling and three-step LC modeling.
Some advanced LC models, such as multilevel LC models, random intercept regression models, and scale-
adjusted choice models, are also introduced. All approaches discussed here have been implemented in the
program Latent GOLD (see, e.g., Vermunt & Magidson, 2016).
Traditional LC Analysis
This section introduces the basic ideas of traditional LC modeling for categorical (nominal or ordinal) response
variables. Formally, each LC corresponds to one of K mutually exclusive and exhaustive categories that
comprise the underlying categorical latent variable, X. Each observation is a member of one and only one
latent (unobservable) class. In other words, the population of interest contains K subgroups, but we do not
know to which subgroup an individual belongs—that is, the number of classes, K, is unknown.
Basic Ideas
The fundamental idea of traditional LC modeling is to extract that latent variable X with the fewest number of
classes sufficient to explain all the associations among the J observed response variables (indicators) y1, y2,
…, yj. For example, with J = 4 indicators, the joint response distribution can be expressed in terms of the LCs
as
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 5 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
(1)
where P(X = k) is the probability of belonging to LC k = 1, 2, …, K and P(y1, y2, y3, y4 | X = k) is the
probability of observing a particular response pattern given that one belongs to LC k. Equation 1 shows that
each LC or subgroup has its own (joint response) probability P(y1, y2, y3, y4 | X = k) and that the overall
probability P(y1, y2, y3, y4) for the total population is obtained as a weighted average of the conditional
probabilities using the class proportions, P(X = k), as weights. That is, the population consists of a mixture of
K unordered (nominal) LCs.
We denote the number of categories for the indicators by M1, M2, M3, and M4, respectively, implying that
1 ≤ y1 ≤ M1, 1 ≤ y2 ≤ M2, and so on. Thus, in total, there are M1 · M2 · M3 · M4 possible response
patterns, where P(y1, y2, y3, y4) denotes the probability of occurrence of a particular response pattern on
these variables. The traditional K-class LC model is a measurement model in the sense that the model
includes parameters that define the underlying discrete latent variable in terms of the indicators. Specifically,
the model parameters consist of class sizes, P(X = k), that sum to 1, and for each indicator the conditional
probabilities, P(yj | X = k), that define class k in terms of the expected response distribution expected on
indicator j. The extent to which the conditional probabilities for a given class differ from the other classes
serves to define that class. For example, LC 3 in Table 2 differs primarily from the other classes in that they
have a relatively high probability of stating “surveys are a waste of time” (PURPOSE = Waste) and are not
accurate (ACCURACY = Not True), and hence that class was named “Skeptics.”
The key assumption made for any latent variable model is that the latent variable or variables explain
all associations between the indicators, as implied by the assumption of local independence (Vermunt &
Magidson, 2004b). That is, the indicators are mutually independent of each other conditional on class
membership. This assumption supports the meaningfulness of the classes as being the underlying source of
the associations among the indicators.
Revisiting our four-indicator example, local independence is expressed by the following equation:
(2)
where P(y1 | X = k) denotes the probability of giving response y1 on the first indicator given that one belongs
to LC k, P(y2 | X = k) is the corresponding conditional response probability for the second indicator, and
so on. As can be seen, conditional on the class one belongs to, the probability of occurrence of any given
response pattern can be computed as the product of the separate indicator response probabilities, which is
the definition of (conditional) independence.
This computation in Equation 2 can be illustrated with the example in Table 2. The local independence
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 6 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
assumption implies that for LC 1 (“Ideals”) the probability of choosing the first, second, first, and first response
category on the four indicators equals 0.89, 0.39, 1.0, and 0.95, respectively. Thus, the probability for the full
response pattern, P(1, 2, 1, 1 | X = 1), is equal to (0.89)(0.39)(1.0)(0.95) = 0.32.
Substituting Equation 2 into Equation 1 yields the standard form of the LC model where the basic probability
parameters are included in the right side of the equation:
(3)
Table 2 provides the parameter estimates obtained by maximizing the likelihood function under the traditional
3-class LC model.
Determining the Number of Classes
The primary model selection problem is deciding on the number of classes. This can be done formally using
global goodness-of-fit tests, information criteria, or local goodness-of-fit tests (bivariate residuals [BVRs]).
The traditional global test compares maximum likelihood estimates for frequency counts expected under the
model with the actual observed frequencies using the likelihood ratio or Pearson lack-of-fit statistic, (L2 or X2;
Goodman, 1974a). The associated p value assesses the extent to which the estimated counts are sufficiently
close to the observed counts so that any differences can be explained by sampling variability.
In the special case where the expected frequencies in Equation 4 reproduce each of the corresponding
observed cell counts perfectly, the model fit is perfect and both L2 and X2 equals zero. To the extent that
the value for L2 exceeds 0, the L2 quantifies the lack of model fit, representing the amount of association
(nonindependence) that remains unexplained by that model. Since, the L2 for the null model of independence
(model H0 in Table 3) measures the total amount of association in the data, an R2-like measure can assess
the percentage of the association explained by each LC model (see % reduction of L2 in Table 3).
Estimates for the expected frequencies, ^F, are obtained by multiplying both sides of Equation 3 by the sample
size N to obtain F = N * P, and then plugging in the estimates for the probability parameters in the right side
of Equation 3 to obtain estimates for P:
(4)
However, when the number of indicators becomes large, the data become sparse (containing many empty or
small cell counts) in which case L2 and X2 no longer follow a χ2 distribution. An alternative is to use bootstrap
p values which do not depend upon the χ2 distribution (Langeheine et al., 1996). However, since computation
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 7 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
of the bootstrap p value is somewhat complex and can take considerable time to compute for large models, a
popular alternative is to use information criteria such as the Bayesian information criteria (BIC).
Traditional exploratory applications proceed by estimating successive LC models with K = 1, 2, … classes
and stopping when the hypothesis of local independence can no longer be rejected at (say) the .05 level. For
our example, Table 3 shows that the 1-class and 2-class models are rejected because they fail to fit the data
(p < .05) but the 3-class model does provide an adequate fit (p = .11), and hence the number of classes K =
3 is accepted as the simplest solution that explains all of the associations among the indicators.
To pinpoint where any two-way residual associations remain, a local goodness-of-fit statistic such as the BVR
can be used to determine where violations of local independence may occur.
Table 3. Results from LC model fit for the General Social Survey data.
Model Description BICLL L² Npar df p Value % Reduction in L²(H0)
H0 1-class 5,787.0 257.3 6 29 2.0 × 10-38 0.0%
H1C 2-class 5,658.9 79.5 13 22 2.0 × 10-8 69.1%
H2C 3-class 5,651.1 22.1 20 15 0.11 91.4%
H3C 4-class 5,685.3 6.6 27 8 0.58 97.4%
Some additional models
H1D 2-class + direct effect 5,606.1 12.6 15 20 0.89 99.8%
H2F Basic 2-DFactor 5,640.1 11.1 20 15 0.75 95.7%
In practice, LC analysis is often conducted with a relatively large number of indicators, in which case the p
value determined from L2 or X2 is not valid because those statistics no longer follow a χ2 distribution due to
the data being sparse. An indication that p values cannot be trusted is when p values from L2 and X2 differ
substantially (in the extreme 1 and 0). In this case, one can use either a bootstrap of L2 to estimate the p
value or use an information criterion based on the log-likelihood (LL), such as AICLL or BICLL, to determine
the number of classes. BICLL reported in Table 3 (with nonsparse data) agrees that the best fit is provided
with 3 classes (BICLL reaches its minimum with 3 classes before increasing with 4 classes).
A bootstrap estimate of the p value corresponding to the difference in log-likelihood values for any two nested
models can also assist in the determination of the number of classes. This test finds that the 4-class model
(H3C) provides a significant improvement over the 3-class model. While this result is contrary to the BIC
criteria which suggests that the 3-class model is best, Table 3 shows that BIC suggests that a restricted
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 8 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
4-class model (model H2C in Table 3) is preferred over the 3-class model. Model H2C is discussed in the
section “DFactor Models.”
Bivariate Residuals
Modification indices, called BVRs, help pinpoint where violations of local independence occur. BVRs are
Lagrange-type χ2 statistics which test each pair of indicators for local independence (Oberski & Vermunt,
2013; Vermunt & Magidson, 2016). As a rule of thumb, a BVR that exceeds a value of four indicates that a
significant residual association remains unexplained by the classes.
Table 4. Bivariate residuals for LC models.
Indicator Pair
LC Model
1-class 2-class 3-class 2-class + direct effect 2 DFactor
PURPOSE ACCURACY 61.64 0.07 0.11 0.01 0.01
PURPOSE UNDERSTAND 0.53 0.74 0.09 0.10 0.03
PURPOSE COOPERATE 10.59 0.05 0.12 0.07 0.10
ACCURACY UNDERSTAND 0.26 1.10 0.00 0.02 0.02
ACCURACY COOPERATE 8.61 0.41 0.25 0.22 0.44
UNDERSTAND COOPERATE 43.37 32.33 2.41 0.00 0.19
Recall from Table 3 that the 1- and 2-class models were both rejected (p < .05), which means that they
violate the assumption of local independence. The first two columns of Table 4 identify the particular indicator
pairs for which the local independence assumption is violated for these models—the residual association
remains large as indicated by a large BVR. The magnitude of the BVRs is reduced as the number of classes
is increased from 1 to 2, but not until a 3rd class is added is the model fit judged to be adequate (and the
BVRs are all small).
The 1-class model, H0, assumes that respondents are homogeneous and is referred to as the null model
because it assumes all indicators are mutually independent. Each BVR for this model is equivalent to the
Pearson χ2 lack-of-fit statistic (X2) divided by the degrees of freedom. From Table 4 we see that this model
fails to fit the data because four of the six indicators have large BVRs indicating that significant associations
exist between those pairs.
For this model, the two small BVRs (0.26 and 0.53) correspond to the pairs (PURPOSE, UNDERSTAND)
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 9 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
and (ACCURACY, UNDERSTAND) suggest that a respondent’s level of understanding of the question
(UNDERSTAND) is statistically independent of their opinion of surveys as reflected by the indicators
PURPOSE and ACCURACY. Thus, we might expect the data to reflect two distinct dimensions—one
assessing the respondents’ level of understanding, the other their attitude toward surveys (favorable vs.
unfavorable).
The 2-class model, H1C, is a one-dimensional model which assumes the existence of two homogeneous
classes, mutual independence existing within each class (local independence). For these data, the 2-class
model is rejected (p < .05). Table 4 pinpoints that the reason for the lack of fit is that a residual association
between UNDERSTAND and COOPERATE remains unexplained (i.e., the BVR for this pair remains large).
As indicated in the 3rd column of Table 4, this association requires a 3rd class to explain. By accounting for a
2nd dimension, the 3-class model, H2C, reduces the BVR from 32.33 to the acceptable value of 2.41.
It should be noted that K-class models may sometimes represent fewer than K − 1 dimensions. In particular,
a one-dimensional model can capture a unique ordering among the classes. For example, data from Leonard
Pearlin and Joyce Johnson (1977), reanalyzed by Magidson & Vermunt (2001) suggested that the three
classes extracted from five dichotomous indicators of depression, represent three ordered classes. Class 1
(“Healthy” group) exhibits the lowest probabilities of depressive symptoms (lack of enthusiasm, low energy,
sleeping problem, poor appetite, and feeling hopeless) during the past week. At the other extreme Class
3 (“Depressed” class) exhibits the highest probabilities for these symptoms, and Class 2 falls somewhere
between these extremes on all symptoms. We might interpret Class 2 as having a “bad week” but not
necessarily being depressed. The summary of these results can be found in Figure 1 (for additional details of
this analysis, see Magidson & Vermunt, 2001).
Figure 1. Plot of class-specific response probabilities for each of five symptoms of depression.
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 10 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
Classification
Traditional clustering approaches, such as K-means and hierarchical clustering, focus primarily on
classification. For example, Leonard Kaufman and Peter J. Rousseeuw (1990) define cluster analysis as
“classification of similar objects into groups, where the number of groups, as well as their forms are unknown”
(p. 1). N. Kumar (2005) provides a similar definition:
Cluster analysis divides data into groups (clusters) that are meaningful, useful, or both. If meaningful
groups are the goal, then the clusters should capture the natural structure of the data. (p. 487)
In contrast, LC is model based, classification being accomplished as a rigorous second step following
parameter estimation. This is similar to traditional factor analysis, where factor scores are estimated as
a second step following estimation of the model parameters (factor loadings). Specifically, for any given
observed response pattern (y1,…, yj), the posterior probability of belonging to class k, P(X = k | yj,…, yj),
can be expressed in terms of the LC model probability parameters (the unconditional class size probability,
P(X = k), and the conditional probabilities associated with each of the J indicators for respondents in this
class, P(yj | X = k), j = 1, 2, …, J.
(6)
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 11 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
This result is an application of Bayes’ theorem, the last equality resulting from the assumption of local
independence (recall Equation 2).
Table 5 provides the posterior probabilities obtained from the 3-class model H2C. Given their response pattern,
respondents are typically assigned to the LC for which their posterior probability is highest (bolded in Table
5), a classification rule called modal assignment.
Table 5. Posterior classification probabilities for 3-class latent class model.
PURPOSE ACCURACY UNDERSTAND COOPERATION Ideal Believer Skeptic
Good Mostly true Good Interested 0.92 0.08 0.00
Cooperative 0.64 0.35 0.01
Impatient/hostile 0.02 0.94 0.04
Fair, poor Interested 0.02 0.97 0.00
Cooperative 0.00 0.99 0.00
Impatient/hostile 0.00 0.99 0.01
Not true Good Interested 0.88 0.06 0.06
Cooperative 0.52 0.24 0.24
Impatient/hostile 0.01 0.35 0.64
Fair, poor Interested 0.02 0.85 0.12
Cooperative 0.00 0.86 0.13
Impatient/hostile 0.00 0.78 0.22
Depends Mostly true Good Interested 0.87 0.10 0.04
Cooperative 0.49 0.36 0.15
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 12 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
PURPOSE ACCURACY UNDERSTAND COOPERATION Ideal Believer Skeptic
Impatient/hostile 0.01 0.56 0.44
Fair, poor Interested 0.02 0.93 0.06
Cooperative 0.00 0.94 0.06
Impatient/hostile 0.00 0.89 0.11
Not true Good Interested 0.37 0.04 0.59
Cooperative 0.08 0.05 0.87
Impatient/hostile 0.00 0.03 0.97
Fair, poor Interested 0.01 0.28 0.72
Cooperative 0.00 0.27 0.73
Impatient/Hostile 0.00 0.16 0.84
Waste Mostly true Good Interested 0.88 0.02 0.10
Cooperative 0.53 0.07 0.41
Impatient/hostile 0.01 0.08 0.91
Fair, poor Interested 0.05 0.50 0.44
Cooperative 0.01 0.51 0.48
Impatient/hostile 0.00 0.36 0.64
Not true Good Interested 0.20 0.00 0.80
Cooperative 0.03 0.00 0.96
Impatient/hostile 0.00 0.00 1.00
Fair, poor Interested 0.00 0.03 0.97
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 13 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
PURPOSE ACCURACY UNDERSTAND COOPERATION Ideal Believer Skeptic
Cooperative 0.00 0.02 0.97
Impatient/hostile 0.00 0.01 0.99
These posterior probabilities provide useful information about how good the classification is. For example,
from the first row of Table 5, we see that for persons categorized as (Good, Mostly true, Good, Interested)
to the four indicators, the expected accuracy that they belong to the Ideal class is quite high—92%. Only
8% of these respondents would be expected to be misclassified—they really belong to the Believers class.
Averaging these different accuracies over all respondents yields an overall expected accuracy of 87%.
Extracting Meaningful LCs
As one moves from an exploratory to a more confirmatory use of LC modeling the question “What do you want
the LCs to represent?” becomes relevant. As argued by Christian Hennig (2015), the idea of “truth” in cluster
analysis depends on the context and the clustering aims. Thus, for example, if the goal of the LC analysis is to
extract the classes that represent different attitudes toward surveys, the association between UNDERSTAND
and COOPERATE should be ignored when deriving the classes. There are several approaches for modifying
the traditional LC approach that can accomplish this goal.
Next, three such approaches, each with its own advantages, are described, but for the current example, all
yield essentially the same two substantively meaningful LCs—one favorable and the other unfavorable toward
surveys. The approaches are as follows:
1. Add a direct effect to the model (see “Adding Direct Effects” section). This is the simplest approach
when dealing with a small number of indicators.
2. Use a separate discrete latent factor for each dimension, the first corresponding to the desired
classes, the second a nuisance factor (see “DFactor Models” section). Directly analogous to
traditional factor analysis, the DFactor approach is especially useful when the number of indicators is
large, in both exploratory settings where the number of dimensions is unknown and in confirmatory
settings.
3. Use LC tree models which impose a hierarchical tree structure on the data (see “LC Tree Models”
section). This approach is similar to the use of DFactors but easier to implement. It is designed
primarily for exploratory applications with a large number of indicators.
4. Weight the indicators to downplay the relative importance of one or more “less important” variables
(Eagle & Magidson, 2019).
Adding Direct Effects
While increasing the number of classes is one way to achieve an adequate fit, at some point increasing the
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 14 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
number of classes from say K to K + 1 may result in classes that are less as opposed to more meaningful. For
example, the previous 3-class model distinguishes between both respondents who have a more versus less
favorable view of surveys (Classes 1 and 2 vs. Class 3) as well as respondents who show a good versus not-
so-good understanding of the data (Classes 1 and 3 vs. Class 2). If the latter distinction were not of interest,
one could modify the model in either of two ways:
1. by removing the indicator UNDERSTAND or
2. by adding a direct effect between UNDERSTAND and COOPERATE (Model H1D)
Approach 2 adds a parameter directly to the model to account for the association between UNDERSTAND
and COOPERATE, allowing that association to be explained directly rather than requiring it to be explained by
the classes (see Hagenaars, 1988, for details). As shown in Table 3, modifying the 2-class model (Model H1C)
by (a) adding a 3rd class (Model H2C) or (b) adding a direct effect (Model H1D), both provide an acceptable fit
to the data.
DFactor Models
DFactor models are restricted LC cluster models where ordinality restrictions are imposed on each DFactor
(Magidson & Vermunt, 2001). Each DFactor may have 2 or more levels. For example, Table 4 shows that a
DFactor model with two dichotomous DFactors V = 1, 2 and W = 1, 2 corresponds to a 4-class cluster model.
Similar to traditional factor analysis, the basic DFactor model restricts the factors to be independent of each
other.
LC factor models were introduced originally by Goodman (1974b) for confirmatory applications and extended
by Magidson &Vermunt (2001) who developed it for use as a general exploratory alternative to traditional LC
cluster modeling. DFactor models are analogous to traditional factor analysis where each factor corresponds
to a distinct dimension.
The dimensionality of a K-class LC cluster model is K − 1, corresponding to the K − 1 distinct contrasts
that can be formed by comparing classes. The 1-class model, which assumes mutual independence among
the indicators, means that the population is completely homogeneous, so there can be no justification for
subdividing the respondents into two or more subgroups.
The 2-class model is a one-dimensional model. As such, the classes can be interpretable as being high
versus low (e.g., favorable vs. unfavorable) on that dimension. The 3-class model is a two-dimensional model
and hence the solution can be plotted in a two-dimensional space using a barycentric coordinate plot as
described by Magidson &Vermunt (2001; see Figure 2).
Figure 2. Barycentric coordinate display of results reported for model H2C.
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 15 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
Later, another two-dimensional LC model consisting of two DFactors, where each DFactor corresponds to
one of the dimensions, will be introduced.
Table 6. Relationship between DFactors V and W and the joint DFactor X = (V, W).
W = 1 W = 2
V = 1 X = 1 X = 2
V = 2 X = 3 X = 4
While the basic 2-DFactor model, H2F, is a restricted 4-class model, Table 3 shows that it has the same
number of parameters as the 3-class model H2C (Npar = 20) but fits better. Table 7 displays the probability
parameters for the joint DFactor X = (V, W), while Table 8 provides the probability parameters for each
DFactor separately.
Table 7. Joint DFactor output for the Basic 2-DFactor Model H2F.
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 16 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
DFactor1 (V) 1 1 2 2
DFactor2 (W) 1 2 1 2
Class size 0.43 0.34 0.13 0.10
Indicators
PURPOSE
good 0.97 0.61 0.90 0.21
depends 0.03 0.13 0.10 0.18
waste 0.00 0.26 0.00 0.62
ACCURACY
mostly true 0.81 0.24 0.60 0.10
not true 0.19 0.76 0.40 0.90
UNDERSTAND
good 0.91 0.97 0.29 0.57
fair/poor 0.09 0.03 0.71 0.43
COOPERATE
interested 0.95 0.92 0.58 0.42
cooperative 0.05 0.08 0.34 0.39
impatient/hostile 0.00 0.00 0.07 0.19
Table 8. Parameters estimates for the Basic 2-DFactor Model H1C.
DFactor1 (V) DFactor2 (W)
Indicators Level 1 Level 2 Level 1 Level 2
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 17 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
DFactor1 (V) DFactor2 (W)
size 0.77 0.23 0.56 0.44
PURPOSE
good 0.82 0.59 0.96 0.52
depends 0.07 0.13 0.04 0.14
waste 0.11 0.27 0.00 0.34
ACCURACY
mostly true 0.56 0.38 0.76 0.21
not true 0.44 0.62 0.24 0.79
UNDERSTAND
good 0.94 0.41 0.77 0.88
fair/poor 0.06 0.59 0.23 0.12
COOPERATE
interested 0.94 0.51 0.86 0.81
cooperative 0.06 0.36 0.12 0.15
impatient/hostile 0.00 0.13 0.02 0.04
Table 8 makes clear that DFactor1 distinguishes primarily between those having a good (V = 1) versus fair/
poor (V = 2) understanding of the questions. Figure 3 provides the biplot (Magidson & Vermunt, 2001), which
confirms graphically that the variable UNDERSTAND loads primarily on DFactor 1 (horizontal dimension),
while the survey attitude indicators (PURPOSE and ACCURACY) load primarily on DFactor 2 (vertical
dimension).
Figure 3. Biplot of DFactor 1 and DFactor 2.
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 18 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
This interpretation is further supported by the estimated loadings (Table 9) showing that the indicator
UNDERSTAND loads primarily on DFactor1 (loading = .57), while PURPOSE and ACCURACY load primarily
of DFactor 2 (for details on how these loadings are computed, see Magidson & Vermunt, 2003; Vermunt &
Magidson, 2004c).
Table 9. Loadings output.
Loadings
Indicator DFactor1 DFactor2 R²
PURPOSE 0.19 0.45 0.26
ACCURACY 0.15 0.55 0.33
UNDERSTAND 0.57 0.14 0.36
COOPERATE 0.42 0.07 0.19
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 19 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
Comparing the LC cluster and DFactor approaches, in practice, the DFactor model almost always fits better
and the parameters are easier to interpret since each DFactor corresponds to a distinct dimension. In
addition, fitting a 1-DFactor model with three or more levels, can be used to test whether the resulting classes
can be ordered. Specifically, by default equidistant scores are assigned to each level so that for a DFactor
with three levels, Level 2 can be interpreted as being midway between Levels 1 and 3. More general DFactor
models with unequal spacing can also be specified using the scores keyword in the Latent GOLD® syntax
(see Vermunt & Magidson, 2013; for related work on scoring LCs and the relationship to IRT models, see
Clogg, 1988; Formann, 1992; Heinen, 1996; Uebersax, 1993; Vermunt, 2001).
On the other hand, classification is more complex for DFactor models with two or more DFactors in that
posterior membership probabilities are available for each DFactor separately as well as the joint DFactor.
However, as an interpretive tool, the additional complexity may be worthwhile as one of the DFactors may
represent the factor of interest, while the others correspond to nuisance factors that allow the meaningful
factor to be measured without bias (as an example of how the DFactor model has been proven useful to
adjust for response style behavior, see Moors, 2003).
LC Tree Models
Especially when there are many indicators, the number of LCs may become large and thus more difficult
to interpret. A remedy for this is to structure the latent variable using LC tree models, which imposes a
hierarchical tree structure on the latent variable X (Van den Bergh et al., 2017, 2018) LC tree modeling begins
by extracting a relatively small number of interpretable root classes. Next, to the extent to which a root class
is heterogeneous, this class is split into two subclasses, and this splitting process continues to trace out the
full tree until all the heterogeneity is explained. The terminal nodes of the full tree replace the K classes in a
standard LC cluster analysis.
Each of the relatively small number of root classes represents an overall theme in the data, which makes
for easier interpretation of the K terminal classes. Thus, these root classes might be referred to as theme
classes.
To demonstrate the easier interpretation provided by the LC tree structure, an 8-class solution obtained in
choice modeling is re-examined using the LC tree approach. The LC tree solution yields three theme classes
which reveal the primary differences in preferences, each of which contains some within-class variation. For
each of these theme classes, the within-class variation is then modeled in a meaningful manner by splitting it
into two subclasses, which reveal secondary differences. Figure 4 depicts the LC tree process which results
in six classes that are much easier to interpret than the original eight classes.
Figure 4. General latent class tree model structure.
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 20 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
This application demonstrates the improvement in interpretation which can result from the LC tree model
when there are many indicators and many classes. It is presented in the “SALC Tree Models” section, where
an extended LC tree model, referred to as the scale-adjusted LC (SALC) tree model, is also introduced.
The LC tree model can also provide useful insights in smaller applications. For example, when applied
to the GSS data, it yields two theme classes at the first level, corresponding to those who are favorable
versus unfavorable toward surveys, as previously estimated with the 2-class model H1C. In order to obtain an
adequate fit which explains all the heterogeneity, the LC tree approach then splits both of these classes to
account for the differences in understanding. Figure 5 shows how the largest BVR is reduced to acceptable
levels after this second split.
Figure 5. LC tree model using four indicators from the 1982 General Social Survey data (see McCutcheon,
1987).
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 21 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
Extensions of Traditional LC Modeling
The most important extensions to traditional LC modeling involve the ability to analyze continuous indicators
and other scale types in additional to categorical indicators, the inclusion of covariates with either a one-
step or three-step modeling approach, multilevel LC models which yield a classification at multiple levels, LM
models which allow respondents to change from one LC to another over time, and LC regression modeling,
including LC ratings-based and choice-based conjoint models which allow LCs to be defined based on
differences in meaningful regression coefficients.
Continuous Indicators, Counts, and Other Scale Types
While traditional LC modeling was limited to categorical indicators, LC modeling today may be conducted not
only with response variables that are categorical (nominal and/or ordinal) but also continuous and/or counts
or a combination of these (or other) scale types continuous. This extension is direct by replacing the right-
most terms in Equation 4 by the appropriate distribution for each indicator. For example, if y3 is a count, the
Poisson or binomial distribution is used in Equation 4 for P(y3 | X = k). If y3 were continuous, P(y3 | X = k)
corresponds to the normal density with associated class-specific mean and variance parameters (μ3k, σ3k2 ).
Finite mixture modeling, which involves un-mixing continuous variables (McLachlan & Peel, 2004), predates
traditional LC modeling. In fact, it was Karl Pearson who first brought attention to this type of statistical
application. In his first statistical publication, Pearson (1883) dealt with the approximation of a complicated
continuous density as a finite mixture of simpler densities. In a classic application, he showed that the
asymmetric nature of the forehead to body length distribution in crabs can be explained as a mixture of
two normal probability density functions with different means and different variances (Pearson, 1894). He
interpreted the results as providing evidence that this population was evolving into two new species.
Since inclusion of continuous indicators introduces variance parameters into LC models, this allows classes
to be revealed that differ not only in their means for one or more indicators but also in variance. As explained
in the next subsection, the ability to account for variance heterogeneity allows LC models to extract segments
that are more meaningful than those obtained from the K-means clustering algorithm. While χ2 model fit
statistics such as L2 and X2 are not available for continuous variables, the BICLL statistic can be used with
such models since its computation requires only knowledge of the likelihood function.
Relationship to K-Means
When all variables in a LC model are continuous, LC models can be compared directly to the popular
K-means approach to clustering, which has been shown to be equivalent to maximizing the classification
likelihood of a restricted mixture model (Vermunt, 2011). The K-means Euclidean distance criterion translates
into implicit assumptions of (a) local independence and (b) equal within-cluster variances. These assumptions
are often referred to as sphericity because the locus of points associated each of the K-clusters correspond
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 22 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
to equal-sized spheres.
In comparison to LC, K-means has been shown to perform poorly in cluster recovery because the K-means
assumption that residual variances are equal within each cluster fails to hold true in practice (Magidson &
Vermunt, 2002a, 2002b; Vermunt, 2011).
As an example, Figure 6 shows how patients from three known groups (those with “overt diabetes,” those
with “chemical diabetes,” and “normal” individuals who do not have diabetes), compare with respect to three
continuous measurements—GLUCOSE, INSULIN, and SSPG (Reaven & Miller, 1999). In particular, the
scatterplot of GLUCOSE by INSULIN reveals that the variances for these variables are much larger in the
Overt diabetes group,
Figure 6. Matrix scatterplot of diabetes data set by clinical classification.
Three-class LC models that allow variances to differ across classes not only provide the best fit to these data
in terms of the BIC statistic but also have been shown to recover the three structural groups with much higher
accuracy than K-means. Using simulated data, Vermunt (2011) confirms the general superiority of LC over
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 23 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
K-means in recovering true class membership in the commonly occurring situation where variances differ
across classes.
The scatterplot between (y1) GLUCOSE and (y2) INSULIN in Figure 6 shows that the Overt group has larger
variances than the other groups, and also that a positive correlation between these variables remains within
this group, violating the local independence assumption. One way to handle such violations without increasing
the number of classes is to include a direct effect in the model. This is handled by applying the bivariate
normal distribution to the variable pair, P(y1, y2 | X = k) as shown in Equation 7, which introduces a covariance
parameter along with the mean and variance parameters:
(7)
Covariate Extension: One-Step and Three-Step Approaches
An important extension of the LC model involves inclusion of covariates predicting class membership (Dayton
& Macready, 1988; Kamakura, Wedel, & Agrawal, 1994). Denoting a person i’s covariate vector by zi, this
extended LC model is defined as:
(8)
The main change compared to the basic LC model is that the class membership probabilities may now be
dependent on zi, whereas the conditional probability of the indicators, P(yij | X = k), remains unchanged.
Note that an important additional assumption is made, namely that the effect of the zi on the yi is fully
mediated by the LCs. It is possible to test this assumption using local fit measures (BVRs) similar to those
discussed earlier, as well as to relax it by allowing for direct effects, which implies replacing P(yij | X = k)by P(yij | X = k, zi) for one or more of the yij. Typically, P(X = k | zi) is modeled using a multinomial logistic
specification; that is,
(9)
where γ0k and γpk represents the intercept and the slope of predictor zip for LC k. For identification, we assume
parameters sum to 0 across classes (effect coding) or are equated to 0 for one class (dummy coding).
The simultaneous modeling of responses yi and covariates zi using this one-step approach may sometimes
be impractical, especially when the number of possibly relevant covariates is large. Moreover, in most
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 24 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
applications, we wish to obtain a clustering that is not affected by the chosen covariates but only by the
selected indicator variables. Therefore, most researchers prefer using a three-step approach involving:
1. estimating the LC model without covariates,
2. obtaining the individuals’ class assignments using the posterior membership probabilities, and
3. investigating how the class assignments are related to covariates.
However, as shown by Annabel Bolck, Marcel Croon, and Jacques Hagenaars (2004), this three-step
approach yields downward biased estimates of the covariate effects. Based on the work of these authors,
Vermunt (2010) proposed a simple method to adjust for this bias (see also Bakk, Tekle, & Vermunt, 2013).
The adjustment is based on the following relationship between the class assignments wi and the true class
memberships:
(10)
Note that this again is a LC model but with wi as a single “response” variable. The adjustment proposed by
Vermunt (2010) therefore involves estimating a LC model with zi as concomitant variables and wi as the single
response variable, while fixing the P(wi | X = k) at the values computed using the parameter estimates from
the first step.
Multilevel LC Analysis
Vermunt (2003) proposed the multilevel LC model, which can be used when individuals (lower level units
such as students) belong to groups (higher level units such as schools), and when the number of groups is
too large to use the grouping variable as a series of dummy variables in a LC model with covariates. The
description of the multilevel LC model requires expansion of our notation. We refer to a particular group or
higher level unit as g and to the response vector of a group and an individual within a group as yg and ygi,
respectively. The number of individuals within a group is denoted by ng, the group-level LC variable by V, a
group-level LC by d, and the number of group-level LCs by D. The lower level part of the two-level LC model
has the following form:
(11)
which is the same as a standard LC model, except for the fact that the lower level LC proportions are allowed
to differ across higher level LCs V. As in the standard LC model, we assume local independence across the
J indicators. The higher level part of the model which connects the responses of the ng persons in group g
equals
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 25 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
(12)
As can be seen, the main additional model assumptions are each group belongs to one of D group-level
LCs and that the individuals’ responses within a group are independent given the group’s class membership.
Combining the Equations (11) and (12) yields the full equation of a two-level LC model:
(13)
As in a standard LC class model, we can include covariates predicting the higher level and lower level class
memberships, either using a one-step or a three-step approach. Moreover, the assumption that the ygij are
independent of the group’s class membership given the individual’s class membership can be tested and
relaxed (Nagelkerke, Oberski, & Vermunt, 2016).
LM or Latent Transition Models
A LM model is a LC model for longitudinal data in which persons are allowed to switch between latent states
across measurement occasions. It is also referred to as latent transition model (Collins & Lanza, 2010),
hidden Markov model (MacDonald & Zucchini, 1997; Visser, 2011), Markov switching or regime-switching
model.
More generally, a mixture LM model utilizes both LCs and latent states to study different transition patterns,
from one latent state to another, that occur for different LCs. For example, a Mover-Stayer model is a 2-class
LM model where one class consists of “stayers” who always remain in the same state, while the other class
are “movers” who change from one state to another over time.
LC Regression or Conjoint Models
LC regression models differ from LC cluster models in that the parameters that differ across unobserved
subgroups are regression coefficients rather than conditional probabilities. However, unlike standard
regression, where regression coefficients are treated as fixed effects, LC regression is more like mixed
models which allows for heterogeneity in the regression coefficients between observations.
Typical LC regression applications often involve multiple replications or repeated univariate responses for
each case, where the replications correspond to time (one replication for each time point), or different
situations, such as ratings of different products. Applications of the former include LC growth modeling, where
observations are clustered or grouped based on the way they change over time. The latter includes ratings-
based and choice-based conjoint applications, where observations are generally clustered or segmented
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 26 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
based on the attributes that drive their ratings (or choices).
To facilitate LC regression models that involve multiple replications per case, the data are organized as a
long file rather than the typical wide file format. For example, in a taste testing experiment, consumers were
asked to rate each of J = 15 cracker products on an M = 9-point liking scale (Popper et. al., 2004; Magidson &
Vermunt, 2006). Figure 7 shows the data as a wide file in which each of the 15 ratings are stored in separate
columns, and each record corresponds to a different consumer (e.g., ratings for consumer #1101 are provided
in the first record). This format is typical for a LC cluster analysis.
Figure 7. Example of wide file format.
In contrast, Figure 8 shows the same data as a long file in which the product ratings for a given consumer
are stored in a single response variable RATING, with separate records for each of the 15 products
for that consumer. The latter format allows for the regression of the dependent variable RATING to be
performed as a function of the single nominal dependent variable PRODUCT or as a function of separate
product attributes. To accommodate the latter regression, the restructured data also include four appearance
attributes (JAPP1-JAPP4), four flavor attributes (FLV1-FLV4), and four texture attributes (TEX1-TEX4) for
each cracker product (Figure 8). The values for these sensory attributes were obtained from food experts
(see Popper et. al., 2004).
For these data, regardless of whether a LC cluster analysis is performed, a LC regression with product as a
nominal predictor, or LC regression with the product attributes as predictors, the resulting class assignments
are similar. In all three cases, a 2-class model consists of respondents who tend to rate all crackers relatively
low (Class 1) and respondents who tend to rate all crackers relatively high (Class 2)—see Figure 9. This result
is typical when analyzing ratings data. While successful in capturing the strong response level difference,
this type of result is not useful for food manufacturers, who want to know which types of crackers appeal to
different consumer segments.
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 27 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
Figure 8. Example of long file format for LC Cluster model.
Figure 9. Results from standard 2-class LC analysis.
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 28 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
Next, some approaches that separate out confounding factors, resulting in more meaningful classes, are
described. In particular, the random intercept model is very useful when analyzing ratings data to remove the
response level confound, by capturing local dependencies in a structured manner (see “Random Intercept
Model for Analyzing Ratings” section), and the SALC model (see “SALC Tree Models for Analyzing Choices”
section) is useful in adjusting classes to remove the confounding effects of scale in choice models.
Regardless of whether ratings or choices are analyzed, the challenge in both cases is to extract classes that
are meaningful and free from the potential confounding effects of response level (for the analysis of ratings)
or scale effects (for the analysis of choices).
Random Intercept Model for Analyzing Ratings
In practice, when LC regression is used with ratings data, care must be taken to avoid regression intercept
heterogeneity from dominating the model, resulting in LC segments that differ primarily in their ratings
style—one class tends to rate all objects high while a second class tends to give lower ratings to all objects.
To deal with this problem, a random intercept can be introduced into the LC regression model to account for
this heterogeneity, allowing the LCs to capture the more meaningful heterogeneity related to differences in
the regression coefficients. Using the ordinal scale type based on the adjacent-category logit model for m
rating, the LC random intercept regression model (Magidson & Vermunt, 2006) for the rating of product t can
be expressed as:
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 29 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
(14)
where the continuous latent variable F is used to model the random intercept, allowing the regression
coefficients βk1, βk2, …, βk15 to assess liking for each of the 15 cracker products relative to individual i’s overall
liking of crackers (as reflected by their intercept). This is similar to centering where the ratings for an individual
is measured relative to their average rating for all 15 crackers. Figure 9 plots the results obtained from the
2-class random intercept regression model. Compared to the results from the standard LC (Figure 10), the
resulting classes now differ in their relative preferences for one cracker over another and thus are more useful
to food manufacturers.
Figure 10. Results from 2-class LC random intercept regression analysis.
Results from the alternative 2-class random intercept regression where the sensory attributes Z1 − ZQ are
utilized as predictors instead of PRODUCT yields results very similar to that in Figure 9, providing evidence
that class differences in the liking ratings reflect different preferences with respect to the cracker attributes.
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 30 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
The specification of this latter model is
(15)
SALC Tree Models for Analyzing Choices
By asking respondents to choose between two or more alternatives rather than rate each alternative, choice-
based conjoint avoids the response style problem inherent in ratings-based conjoint. However, LC choice
modeling has its own unique scaling problem that should be dealt with in order to avoid the confounding
effects of scale classes (Groothuis-Oudshoorn et al., 2018). By decomposing utilities into separate scale and
preference components, the SALC model (Magidson & Vermunt, 2007) was introduced as a potential solution
to this problem, allowing LCs to reflect differences in preferences rather than differences in scale.
Table 9 provides results from a standard LC best-worst choice model where each respondent selected their
most and least preferred principle to be used in health plan design (for details of these data, see Louviere &
Flynn, 2007).
The class-specific utility estimates presented in Table 9 yield different rankings of the 15 principles, the
highest estimate corresponding to the principle with the largest utility for that class. Because of the relatively
large number of classes and the potential scale confound, interpretation becomes somewhat difficult.
Table 9. Utilities estimates for the 8-class model.
Description Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 Class 7 Class 8
Culture of reflective improvement −0.39 −1.77 −1.18 −0.40 −1.03 −2.51 −0.48 −0.15
A respectful, ethical system −0.07 −0.35 1.06 1.29 −0.56 0.19 1.08 0.40
Comprehensiveness −0.55 −0.59 0.74 −1.40 0.57 −0.12 −1.46 0.45
Equity −0.64 −0.61 0.12 −2.05 −0.95 2.18 −1.89 2.65
People and family centered 0.90 0.94 −0.17 2.59 −1.47 1.90 −0.58 −0.91
Promoting wellness and prevent 0.25 −0.68 0.52 1.55 2.15 −0.47 3.31 0.24
Providing for future generations 0.82 0.07 −0.36 1.24 −0.26 0.17 0.16 0.32
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 31 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
Description Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 Class 7 Class 8
Public voice community engage −0.46 −1.53 −0.38 −0.70 −2.74 −1.31 −0.15 −1.04
Quality and safety 0.06 2.12 0.86 2.01 2.47 3.26 0.72 0.46
Social and environ shape health 0.14 −1.27 −1.08 0.87 −1.01 −1.90 2.46 1.14
Responsible spending 0.27 1.44 0.65 −1.24 1.38 −0.27 −0.70 −1.42
Shared responsibility −0.50 −0.95 −0.82 −0.62 −0.03 −0.73 −1.26 −0.62
Taking the long-term view 0.43 0.03 −0.92 −0.63 0.24 −0.60 −0.02 0.45
Transparency and accountability −0.41 0.08 1.12 −0.03 −0.23 0.28 −0.99 −1.20
Value for the money 0.15 3.06 −0.15 −2.48 1.47 −0.08 −0.20 −0.76
Table 10 provides results obtained for the three root level classes obtained from a LC tree analysis of these
data. Separate results are presented based on the standard LC choice model that does not adjust for scale
and the SALC model.
Table 10. Standard estimates (utilities) vs. scale-adjusted estimates (preferences) for 3-class models.a
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 32 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
aThere are six indicators that stand out in discriminating across the three Non Adjusted and Adjusted classes.
The first is the equity indicator (seen in blue); the second is a set of three indicators associated with families,
prevention and future generations (seen in yellow); and the last is the pair of indicators associated with value
for the money and responsible spending (seen in green).
Note that the estimates for Class 2 of the standard 3-class model have very low magnitude as compared to
the other classes (standard deviation of .31 compared to 1.08 and 0.96 for the other classes). This reflects
a scale confound which makes this class difficult to interpret. In contrast, the results obtained from the
SALC model show no evidence of this confound as the magnitudes of the three class-specific estimates
are comparable (as reflected by similar standard deviations). The SALC model shows clear preference
differences between the three classes. Class 1 tends to choose “value for the money” as their most preferred
principle, Class 2 tends toward principles that are “people and family centered” whereas Class 3 chooses
“equity” as most important.
Example of SALC Tree Modeling
Using the 3-class SALC model as the root classes of the tree, within-heterogeneity is found to be sufficient
to split each class into two subclasses, revealing what appears to be secondary preferences. This yields six
terminal classes (Table 11). For example, the first two classes primarily prefer “value for money” (relatively
high preference utilities of 2.39 and 1.11 for this principle), and “responsible spending” but differ on the relative
importance of “people and family centered.”
Table 11. Estimated utilities revealing primary and secondary preferences for scale-adjusted LC tree model
with three theme classes.
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 33 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
Further Reading
Goodman, L. A. (2002). Latent class analysis: The empirical study of latent types, latent variables, and latent
structures. In J. A. Hagenaars & A. L. McCutcheon (Eds.), Applied latent class analysis (2–55). Cambridge,
England: Cambridge University Press.
Hagenaars, J. A., & McCutcheon, A. L. (2002). Applied latent class analysis. Cambridge, England:
Cambridge University Press.
Magidson, J., & Vermunt, J. K. (2004). Latent class models. In D. Kaplan (Ed.), The SAGE handbook of
quantitative methodology for the social sciences (Chapter 10, pp. 175–198). Thousand Oaks, CA: SAGE.
Magidson, J., & Madura. J. P. (2018). Development of an adaptive typing tool from MaxDiff response data.
Sawtooth Software Conference Proceedings, Orlando, FL.
Moors, G. (2010). Ranking the ratings: A latent-class regression model to control for overall agreement in
opinion research. International Journal of Public Opinion Research, 22, 93–119.
Vermunt, J. K., Tran, B., & Magidson, J. (2008). Latent class models in longitudinal research. In S. Menard
(Ed.), Handbook of longitudinal research: Design, measurement, and analysis (pp. 373–385). Burlington, MA:
Elsevier.
References
Bakk, Z., Tekle, F. B., & Vermunt, J. K. (2013). Estimating the association between latent class membership
and external variables using bias adjusted three-step approaches. Sociological Methodology, 43, 272–311.
Bolck, A., Croon, M. A., & Hagenaars, J. A. (2004). Estimating latent structure models with categorical
variables: One-step versus three-step estimators. Political Analysis, 12, 3–27.
Clogg, C. C. (1988). Latent class models for measuring. In R. Langeheine & J. Rost (Eds.), Latent trait and
latent class models (pp. 173–205). New York, NY: Plenum Press.
Collins, L. M., & Lanza, S. T. (2010). Latent class and latent transition analysis for the social, behavioral, and
health sciences. New York, NY: Wiley.
Dayton, C. M., & Macready, G. B. (1988). Concomitant-variable latent-class models. Journal of the American
Statistical Association, 83, 173–178.
Eagle, T., & Magidson, J. (2020, forthcoming). Segmenting choice and non-choice data simultaneously: part
deux, Sawtooth Software Conference Proceedings, San Diego, CA.
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 34 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
Formann, A. K. (1992). Linear logistic latent class analysis for polytomous data. Journal of the American
Statistical Association, 87, 476–486.
Goodman, L. A. (1974a). The analysis of systems of qualitative variables when some of the variables are
unobservable: Part I—A modified latent structure approach. American Journal of Sociology, 79, 1179–1259.
Reproduced in Goodman (with J. Magidson, editor), Analyzing qualitative/categorical data: Log-linear
models and latent-structure analysis (J. Magidson, Ed.). University Press, 1978, Lanham, MD.
Goodman, L. A. (1974b). Exploratory latent structure analysis using both identifiable and unidentifiable
models. Biometrika, 61, 215–231. Reproduced in Goodman, Analyzing qualitative/categorical data: Log-linear
models and latent-structure analysis (J. Magidson, Ed.). University Press, 1978.
Groothuis-Oudshoorn, C. G. M., Flynn, T., Yoo, H. I., Magidson, J., & Oppe, M. (2018). Key issues
and potential solutions for understanding health care preference heterogeneity free from patient level scale
confounds. The Patient: Patient-Centered Outcomes Research. Retrieved from https://rdcu.be/Mx8e
Hagenaars, J. A. (1988). Latent structure models with direct effects between indicators: Local dependence
models. Sociological Methods and Research, 16, 379–405.
Heinen, T. (1996). Latent class and discrete latent trait models: Similarities and differences. Thousand Oakes,
CA: SAGE.
Hennig, C. (2015). What are the true clusters? Pattern Recognition Letters, 64, 53–62.
Kamakura, W. A., Wedel, M., & Agrawal, J. (1994). Concomitant variable latent class models for the external
analysis of choice data. International Journal of Marketing Research, 11, 541–464.
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New
York, NY: John Wiley.
Kumar, N. (2005). Cluster analysis: Basic concepts and algorithms. In P. N. Tan, M. Steinbach, & V. Kumar
(Eds.), Introduction to data mining (2nd ed., Chapter 7, pp. 487–568). London, England: Pearson.
Langeheine, R., Pannekoek, J., & Van de Pol, F. (1996). Bootstrapping goodness-of-fit measures in
categorical data analysis. Sociological Methods and Research, 24, 492–516.
Lazarsfeld, P. F. (1950). The logical and mathematical foundation of latent structure analysis & the
interpretation and mathematical foundation of latent structure analysis. In S. A. Stouffer, L. Guttman, E. A.
Suchman, P. F. Lazarsfeld, S. A. Star, J. A. Clausen (Eds.), Measurement and prediction (pp. 362–472).
Princeton, NJ: Princeton University Press.
Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston, MA: Houghton Mill.
Louviere, J., & Flynn, T. N. (2010). Using best-worst scaling choice experiments to measure public
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 35 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
perceptions and preferences for healthcare reform in Australia. The Patient, 3, 275–283.
Magidson, J., & Vermunt, J. K. (2001). Latent class factor and cluster models, bi-plots and related graphical
displays. Sociological Methodology, 31, 223–264.
Magidson, J., & Vermunt, J. K. (2006). Use of latent class regression models with a random intercept
to remove overall response level effects in ratings data. In A. Rizzi & M. Vichi (Eds.), Proceedings in
computational statistics (pp. 351–360). Heidelberg, Germany: Springer.
Magidson, J., & Vermunt, J. K. (2002a). Latent class models for clustering: A comparison with K-means.
Canadian Journal of Marketing Research, 20, 36–43.
Magidson, J., & Vermunt, J. K. (2002b). Latent class modeling as a probabilistic extension of K-means
clustering. Quirk’s Marketing Research Review, 20, 77–80.
Magidson, J., & Vermunt, J. K. (2003). Comparing latent class factor analysis with traditional factor analysis
for datamining. In H. Bozdogan (Ed.), Statistical datamining & knowledge discovery (Chapter 22, pp.
373–383). Boca Raton, FL: Chapman & Hall/CRC Press.
Magidson, J., & Vermunt, J. K. (2007, October). Removing the scale confound in multinomial logit choice
models to obtain better estimates of preference. Sawtooth Software Conference Proceedings, Santa Rosa,
CA.
MacDonald, I. L., & Zucchini, W. (1997). Hidden Markov and other models for discrete-valued time series.
London, England: Chapman & Hall.
McLachlan, G., & Peel, D. (2004). Finite mixture models. Hoboken, NJ: John Wiley.
McCutcheon, A. L. (1987). Latent class analysis, SAGE university paper series on quantitative applications
in the social sciences number 07-064. Newbury Park, CA: SAGE.
Moors, G. (2003). Diagnosing response style behavior by means of a latent-class factor approach. Socio-
demographic correlates of gender role attitudes and perceptions of ethnic discrimination reexamined.
Quantity & Quality, 37, 277–302.
Nagelkerke, E., Oberski, D. L., & Vermunt, J. K. (2016). Goodness-of-fit measures for multilevel latent class
models. Sociological Methodology, 46, 252–282.
Oberski, D. L., & Vermunt, J. K. (2013). A model-based approach to goodness-of-fit evaluation in item
response theory. Measurement, 3, 117–122.
Pearlin, L. I., & Johnson, J. S. (1977). Marital status, life-strains and depression. American Sociological
Review, 42, 104–115.
Pearson, K. (1883). Maimonides and spinoza. Mind, 8, 338–353.
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 36 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling
Pearson, K. (1956). Contributions to the mathematical theory of evolution. Philosophical Transactions of the
Royal Society Series A, 185, 71–110. Reprinted in Pearson.
Popper, R., Kroll, J., & Magidson, J. (2004). Applications of latent class models to food product
development: A case study. Sawtooth Software Conference Proceedings, San Diego, CA.
Reaven, G. M., & Miller, R. G. (1979). An attempt to define the nature of chemical diabetes using a
multidimensional analysis. Diabetologia, 16, 17–24.
Uebersax, J. S. (1993). Statistical modeling of expert ratings on medical treatment appropriateness. Journal
of the American Statistical Association, 88, 421–427.
Vermunt, J. K. (2001). The use restricted latent class models for defining and testing nonparametric and
parametric IRT models. Applied Psychological Measurement, 25, 283–294.
Vermunt, J. K. (2010). Latent class modeling with covariates: Two improved three-step approaches. Political
Analysis, 18, 450–469.
Vermunt, J. K., & Magidson, J. (2004a). Latent class analysis. In M. S. Lewis-Beck, A. Bryman, & T. F.
Liao (Eds.), The SAGE encyclopedia of social sciences research methods (pp. 580–558). Thousand Oaks,
CA: SAGE.
Vermunt, J. K., & Magidson, J. (2004b). Local independence. In M. S. Lewis-Beck, A. Bryman, & T. F. Liao
(Eds.), The SAGE encyclopedia of social sciences research methods (pp. 580–558). Thousand Oaks, CA:
SAGE.
Vermunt, J. K., & Magidson, J. (2004c). Factor analysis with categorical indicators: A comparison between
traditional and latent class approaches. In A. Van der Ark, M. A. Croon, & K. Sijtsma (Eds.), New
developments in categorical data analysis for the social and behavioral sciences (pp. 41–62). Mahwah, NJ:
Erlbaum.
Vermunt, J. K., & Magidson, J. (2013). LG-syntax user’s guide: Manual for latent GOLD 5.0 syntax module.
Belmont, MA: Statistical Innovations.
Vermunt, J. K., & Magidson, J. (2016). Technical guide for latent GOLD 5.1: Basic, advanced, and syntax.
Belmont, MA: Statistical Innovations Inc.
Vermunt, J. K. (2011). K-means may perform as well as mixture model clustering but may also be much
worse: Comment on Steinley and Brusco (2011). Psychological Methods, 16, 82–88.
Visser, I. (2011). Seven things to remember about hidden Markov models: A tutorial on Markovian models for
time series. Journal of Mathematical Psychology, 55, 403–415.
SAGE
2020 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Foundations
Page 37 of 37 Latent Class Analysis
Not for redistribution beyond Jay Magidson's online course in Latent Class Modeling