Tutorial on Developing a Latent Class Tree (LCT): Using ...

12
1 Tutorial on Developing a Latent Class Tree (LCT): Using Latent GOLD 6.0 to Estimate Latent Class Trees DemoData = ‘depressOut.sav’ While Latent Class Tree (LCT) modeling is especially valuable when analyzing large datasets, in this tutorial, we use a small dataset with only five dichotomous indicators of depression to illustrate some basic differences between LCT and standard LC modeling. For a general introduction to LCT modeling see van den Bergh, Schmittmann, and Vermunt (2017) (PDF), and van den Bergh et al., 2018. In this tutorial, you will use the beta version of Latent GOLD 6.0 to: Perform a standard latent class analysis with these data Generate a latent class tree (LCT) model with these data Compare the resulting 3 classes obtained from the standard vs LCT approaches Explore the new graphical and tabular output generated for latent class tree models Depression Data Set The data used are responses to a checklist to identify persons who are depressed (Pearlin & Johnson, 1977; Schaeffer, 1988; Magidson & Vermunt, 2001). Respondents who reported having a symptom during the previous week were coded 1, those not reporting the symptom were coded 2. The 5 symptom variables, their corresponding descriptions, and data file response codes are summarized in Table 1. Table 1. Depression survey variable names, descriptions, and response codes. Variable Description Symptom Present? enthus lack of enthusiasm 1 = Yes; 2 = No energy low energy 1 = Yes; 2 = No sleep sleeping problems 1 = Yes; 2 = No appetite poor appetite 1 = Yes; 2 = No hopeless feeling hopeless 1 = Yes; 2 = No frq frequency of response pattern (separately for females and males) Min = 1; Max = 272

Transcript of Tutorial on Developing a Latent Class Tree (LCT): Using ...

Page 1: Tutorial on Developing a Latent Class Tree (LCT): Using ...

1

Tutorial on Developing a Latent Class Tree (LCT): Using Latent GOLD 6.0 to Estimate Latent Class Trees DemoData = ‘depressOut.sav’ While Latent Class Tree (LCT) modeling is especially valuable when analyzing large datasets, in this tutorial, we use a small dataset with only five dichotomous indicators of depression to illustrate some basic differences between LCT and standard LC modeling. For a general introduction to LCT modeling see van den Bergh, Schmittmann, and Vermunt (2017) (PDF), and van den Bergh et al., 2018. In this tutorial, you will use the beta version of Latent GOLD 6.0 to:

• Perform a standard latent class analysis with these data

• Generate a latent class tree (LCT) model with these data

• Compare the resulting 3 classes obtained from the standard vs LCT approaches

• Explore the new graphical and tabular output generated for latent class tree models Depression Data Set The data used are responses to a checklist to identify persons who are depressed (Pearlin & Johnson, 1977; Schaeffer, 1988; Magidson & Vermunt, 2001). Respondents who reported having a symptom during the previous week were coded 1, those not reporting the symptom were coded 2. The 5 symptom variables, their corresponding descriptions, and data file response codes are summarized in Table 1. Table 1. Depression survey variable names, descriptions, and response codes.

Variable Description Symptom Present?

enthus

lack of enthusiasm

1 = Yes; 2 = No

energy

low energy

1 = Yes; 2 = No

sleep

sleeping problems

1 = Yes; 2 = No

appetite

poor appetite

1 = Yes; 2 = No

hopeless

feeling hopeless

1 = Yes; 2 = No

frq frequency of response pattern

(separately for females and males)

Min = 1; Max = 272

Page 2: Tutorial on Developing a Latent Class Tree (LCT): Using ...

2

The Goal Our goal is to illustrate the LCT alternative to the standard LC modeling approach by comparing both approaches in this confirmatory application. The research goal is to obtain a latent class model for identifying persons who are depressed based on reported symptoms.

Identification and Enumeration of Substantively Meaningful “Parent Level” Classes Traditional exploratory LC relies on information criteria such as BIC (Bayesian Information Criteria) to determine the number of latent classes (Schwarz, 1978). In confirmatory applications where the number of classes is dictated by theory1, it is common to find that the number of classes recommended according to the BIC statistic exceeds the number dictated by theory. Using BIC as the criteria, both the standard and LCT approaches suggest the existence of three underlying classes in this example (depressed, troubled but not depressed, healthy). However, theory suggests that there are two latent classes of primary interest – depressed and not depressed. We will see that the LCT approach is highly sensitive to the number of classes used at the root note, and if we begin with two classes at the root node, while LCT results in three end-node classes, these classes differ from the three classes obtained from the standard LC modeling approach, and somewhat different scoring equations for identifying the depressed class. Step 1: Use Latent GOLD to Perform a standard LC analysis

➢ From the “File” menu, Open the demo file depressionOut.sav ➢ From the “Cluster” menu, select the five depression indicator variables (enthusiasm,

energy, sleep, appetite, hopeless) ➢ Right-click on these variables and set them to “Nominal” ➢ In the “Clusters” box, type “1-4” ➢ Click Estimate

Output from standard LC models with 1-4 classes are obtained. To view the Summary Output:

➢ Click on the data file name (see Figure 1).

1 While this is a confirmatory example, in exploratory applications both the traditional LC and LCT approaches are still appropriate, where statistics may be utilized instead of theory to determine the number of classes to use in the initial split (see e.g., van den Bergh et. al., 2018 LCTree paper ).

Page 3: Tutorial on Developing a Latent Class Tree (LCT): Using ...

3

Figure 1. Fit statistics for standard latent class models (1-, 2-, 3-, and 4 class)

Results suggest that a three-class model fits best (3-class model has lowest BIC = 8788.95). 2 In order to organize the models for additional analyses:

➢ Click on the model names to enter Edit mode and rename the 4 estimated models to “1 class,” “2 class model,” “3 class model,” and “4 class model” (see Figure 1).

➢ Click ‘+’ to expand the output listings for the 3-class model. ➢ Select Profile to View the Profile Output for this model (see Figure 2).

Figure 2. Profile output for the standard 3-class model.

2 The results presented in Magidson and Vermunt (2001) also determine that the 3-class solution fits best. However, in that analysis, the variable Gender was included as an active covariate. For simplicity, gender is not included in this tutorial. As a result, fit statistics, class sizes, and probabilities differ somewhat from the solution reported in that publication.

Page 4: Tutorial on Developing a Latent Class Tree (LCT): Using ...

4

As seen in Figure 2, the class with the highest probability of reporting the symptoms is class 3,

representing 11.32% of the sample. This class could represent the depressed group based on

the traditional approach to LC modeling.

Step 2: Perform a latent class tree analysis Since our analysis begins with a clear theory that persons are either depressed or not depressed (2-classes), from a confirmatory modeling perspective we begin our LCT analysis by estimating a 2-class model. However, we saw that this model did not provide an adequate fit to the data. As an alternative to replacing the 2-class model with a 3-class model, the LCT approach is to view these 2 classes as Parent Classes, and determine whether the lack-of-fit can be remedied by splitting one or both of these Parent classes into 2 sub-classes called Child classes. After attempting to perform a split of these parent classes, the internal Latent GOLD 6.0 decision loop further attempts to split the newly formed “child” classes. The tree splitting process stops when no further splitting is warranted. In default mode (which this tutorial uses) the BIC criteria is used3 to determine whether to split a class or not (i.e., whether the log-likelihood associated with a 2-class split is significantly better than a 1-class (no-split) model. The current implementation of LCT in Latent GOLD utilizes the new keyword ‘Tree’ in the Syntax module to initiate the tree analysis process. The most straightforward approach to estimate a LCT model is to start by using the menu-driven graphical user interface (GUI), as we did above, to generate an initial set of syntax commands. This can be done as follows:

➢ Right click on “2 class model” (which you renamed earlier in this tutorial). ➢ Open the Model tab and check the Box preceding ‘Tree’ as shown in Figure 3 below

3 To replace the BIC with other criteria such as maximum BVR, AIC, minimum sample size, etc. an additional syntax statement would be used.

Page 5: Tutorial on Developing a Latent Class Tree (LCT): Using ...

5

Figure 3. Model tab in Latent GOLD 6.0

➢ Click the Estimate button. A LC Tree is developed that attempts to perform binary splits of each of the two root node classes using the BIC criteria to determine whether the log-likelihood difference (LLdiff) for the 2-class model represents a significant improvement over the 1-class model (minLLdiff = critical value for LLdiff (based on BIC) is 22.33. The splitting process proceeds until this improvement criterion is no longer met. Note: The critical value of 22.33 can be found in the Iteration Detail output, following the model estimation.

Page 6: Tutorial on Developing a Latent Class Tree (LCT): Using ...

6

Step 3: Examine the LCT output After the model is estimated, tabular and graphical output is available to view. For example,

➢ Click on Tree Summary This output (Figure 4) shows that the LL difference statistic (LLdiff) is too small to split node 1, but at 26.6 for node 2 it is large enough to be statistically significant and hence node 2 splits into nodes 21 and 22.

Figure 4. Tree Summary Output

Two views of tree are available. To view the standard tree output,

➢ Just below the Tree Summary, click ‘Tree Graph’. To open a new Tree window that can always be visible, from the View menu

➢ select ‘Tree Graph’ to view the Tree in a separate Window

Page 7: Tutorial on Developing a Latent Class Tree (LCT): Using ...

7

From the Tree in the separate Window,

➢ click on Node 2 Figure 4. Clicking on Node 2 in Tree Window positions output listings at Node-2 output

Note that the standard tree output is no longer visible, as the list of output files is positioned at Node 2 output (see Figure 5). From the Output listings,

➢ Beneath ‘Node – 2’ in the output listings, select ‘Profile’ to view the Profile output for the three ‘end node’ classes

Page 8: Tutorial on Developing a Latent Class Tree (LCT): Using ...

8

Figure 5 Profile output for the three end nodes classes of LCT Model

Compare Figure 5 with Figure 2 to explore differences in the LCT model definition of the depressed class (end node ‘22’ in Figure 5), compared to the standard 3-class model (Figure 2). The results of the standard latent class model and the latent class tree model are compared in Table A and Figure A below. While results of the two final “3-class models” are somewhat similar, there are clear differences. For example, the size of the classes and the probabilities associated with endorsement of symptoms differ. Figure A. Comparison of Standard 3-class LC (left) with LCT model (right) for Depression data.

Page 9: Tutorial on Developing a Latent Class Tree (LCT): Using ...

9

Table A. Comparing the Profile output for the Standard 3-class LC Model with Corresponding Probabilities Obtained for the Three Terminal Nodes from the Latent Class Tree

Standard LC Model LC Tree Model

Healthy Troubled Depressed Healthy Troubled Depressed

Size

0.44

0.45 0.11 0.60 0.35

0.05

enthusiasm

lack of enthusiasm

0.24

0.81 0.95 0.35 0.91

0.94

enthusiasm

0.76

0.19 0.05 0.65 0.09

0.06

energy

low energy

0.03

0.59 0.95 0.10 0.78

0.97

energy

0.97

0.41 0.05 0.90 0.22

0.03

sleep

sleeping problem

0.09

0.36 0.76 0.13 0.48

0.83

sleep OK

0.91

0.64 0.24 0.87 0.52

0.17

appetite

poor appetite

0.04

0.21 0.70 0.05 0.33

0.82

good appetite

0.96

0.79 0.30 0.95

0.67

0.18

hopeless

feeling hopeless

0.03

0.09

0.65 0.03

0.18

0.90

hopeful 0.97 0.91 0.35 0.97 0.82 0.10

Continuing the exploration of features in LG 6.0 LCT modeling implantation Normally we would not prune a significant split, but to illustrate the ‘prune’ feature,

➢ right-click in Node 2 of the Tree diagram to reveal the popup menu ➢ select ‘Hide’ (see Figure 6)

Page 10: Tutorial on Developing a Latent Class Tree (LCT): Using ...

10

Figure 6 Pruning the Tree

Note that Node 21 and Node 22 now disappear from view as they are collapsed to prune the tree back to the original 2-class model consisting of nodes 1 and 2, and the bottom left edge of Node 2 appears with a diagonal to indicate that it has been pruned. (Figure 7) . Figure 7 LC Tree diagram after pruning back the split of Node 2

(To undo the Pruning, repeat the selection of ‘Hide’) Since the End node output is synchronized to the tree, note that the Profile output has now automatically collapsed to show the original 2-class model consisting of nodes 1 and 2 (Figure 8)

Page 11: Tutorial on Developing a Latent Class Tree (LCT): Using ...

11

Figure 8 End nodes Profile output after pruning the tree

Node-specific and Summary output. Each node of the tree graph summarizes the BIC (or log-likelihoods or other statistics if selected) for the 1- and 2-class models. This output is summarized in Tabular form in the Tree Summary output. The LCT output also contains the maximum bivariate residual (BVR) associated with each 2-class model. A large value associated with the maximum BVR suggests that local independence is not yet achieved in which case that node might be split. In this example, the maximum BVR for parent class 2 is much higher than for parent class 1 (24.53 vs. 3.50) and is split into child classes 21 and 22 each of which have relatively small (acceptable) BVR values (8.83 and 0.57 respectively). The maximum BVR can be used as an alternative to the BIC as the splitting criterion. The information displayed in each node of the Tree Graph can be customized within the task bar, located in the upper left corner of the graph Window. To do this:

➢ Select the Edit option in the Tree Graph screen to open the Tree Node Display ➢ Select the Node Items… option.

Page 12: Tutorial on Developing a Latent Class Tree (LCT): Using ...

12

To save the output for each node, which can be restored in a future LG run without re-estimating the model, we need to first write the classification information to an outfile. Then, from the File Menu, select Save Tree (If you attempt to save the tree prior to writing this outfile, you will get a warning message saying ‘save without file name’? The outfile information is needed to obtain the correct weights for restoring the tree output.

References Magidson, J. & Vermunt, J. K. (2001). Latent class factor and cluster models, bi-plots, and

related graphical displays. Sociological Methodology, 31, 223-264. Pearlin, L. I., & Johnson, J. S. (1977). Marital status, life-strains, and depression. American

Sociological Review, 42, 104-115. Schaeffer, N. C. (1988). An application of item response theory to the measurement of

depression. In Sociological Methodology 1988, C. Clogg (Ed.). Washington: American Sociological Society.

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461-

464. van den Bergh, M., Kollenburg, G. H., & Vermunt, J. K. (under review). Deciding on the starting

number of classes of a latent class tree. van den Bergh, M., Schimittman, V. D., & Vermunt, J. K. (2017). Building latent class trees, with

an application to a study of social capital. Methodology: European Journal of Research Methods for Behavioral and Social Sciences, 13, 13-22.