The BULIT-R: Its reliability and clinical validity as a screening tool for DSM-III-R bulimia nervosa...

11
The BULIT-R: Its Reliability and Clinical Validity as a Screening Tool for DSM-Ill-R Bulimia Nervosa in a Female Tertiary Education Population Garry Welch Laurie Thompson Anne Hall (Accepted 19 November 1992) The Bulimia Test (BULITI has been updated to accommodate the DSM-Ill-R criteria for bu- limia nervosa. Therefore, in this study, we evaluated the psychometric properties of the BULIT-R using a sample of young women at a tertiary educational institute. The results showed all 28 BULIT-R items correlated highly with the total test score (average = .59) and the internal reliability was high (.92). In terms of its concurrent validity, the BULIT-R (:orre- lated highly (.go1 with the Bulimia Investigatory Test Edinburgh (BITE), a screening measure argued to detect bulimia nervosa. In terms of criterion-related validity, the optimal cutoff for the BULIT-R as a screening measure was 98 with this sample, using a semistructured 1>15-111 R interview administered by experienced clinicians who specialize in eating disorders. At this cutoff, the sensitivity was loo%, the specificity 99.0%, the negative predictive value 100%, and the positive predictive value 71.3%. 0 1993 by lohn Wiley & Sons, Inc. There have been numerous descriptive studies and estimates of the prevalence of the bulimia syndrome since it was first introduced and operationally defined by the Amer- ican Psychiatric Association (APA) in the Diagnostic and Statistical Manual (DSM-111) in 1980. Attention has recently turned to estimating the prevalence of the new bulimia nervosa syndrome, since APA modified the criteria in 1987 (in DSM-111-R) to more ac- curately reflect clinically significant morbidity. It has been estimated that over 50 re- lated prevalence studies have been conducted to date (Fairburn & Beglin, 1990). Generally speaking, there has been a focus in these prevalence studies on young fe- male groups in tertiary educational settings. This focus has been for three reasons. Carry Welch, Ph.D., is Research Fellow in the Mental Health Unit, Joslin Diabetes Center, Boston, MA. Laurie Thompson, MB.BSc. FRANZCP, is Staff Psychiatrist at Porirua Hospital, Wellington, New Zealand. Anne Hall, MB.BSc., FRANZCP, is Associate Professor at the Department of Psychological Medicine, Wellington School of Medicine, Wellington, New Zealand. Address correspondence to the first author at the Mental Health Unit, loslin Diabetes Center, One loslin Place, Boston, MA 022 15. hternational journal of Eating Disorders, Vol. 14, No. 1, 95- 105 (1993) 0 1993 by John Wiley & Sons, Inc. CCC 0276-3478/93/010@95-11

Transcript of The BULIT-R: Its reliability and clinical validity as a screening tool for DSM-III-R bulimia nervosa...

Page 1: The BULIT-R: Its reliability and clinical validity as a screening tool for DSM-III-R bulimia nervosa in a female tertiary education population

The BULIT-R: Its Reliability and Clinical Validity as a Screening Tool for DSM-Ill-R

Bulimia Nervosa in a Female Tertiary Education Population

Garry Welch Laurie Thompson

Anne Hall

(Accepted 19 November 1992)

The Bulimia Test (BULITI has been updated to accommodate the DSM-Ill-R criteria for bu- limia nervosa. Therefore, in this study, we evaluated the psychometric properties of the BULIT-R using a sample of young women at a tertiary educational institute. The results showed al l 28 BULIT-R items correlated highly with the total test score (average = .59) and the internal reliability was high (.92). In terms of its concurrent validity, the BULIT-R (:orre- lated highly (.go1 with the Bulimia Investigatory Test Edinburgh (BITE), a screening measure argued to detect bulimia nervosa. In terms of criterion-related validity, the optimal cutoff for the BULIT-R as a screening measure was 98 with this sample, using a semistructured 1>15-111 R interview administered by experienced clinicians who specialize in eating disorders. At this cutoff, the sensitivity was loo%, the specificity 99.0%, the negative predictive value 100%, and the positive predictive value 71.3%. 0 1993 by lohn Wiley & Sons, Inc.

There have been numerous descriptive studies and estimates of the prevalence of the bulimia syndrome since it was first introduced and operationally defined by the Amer- ican Psychiatric Association (APA) in the Diagnostic and Statistical Manual (DSM-111) in 1980. Attention has recently turned to estimating the prevalence of the new bulimia nervosa syndrome, since APA modified the criteria in 1987 (in DSM-111-R) to more ac- curately reflect clinically significant morbidity. It has been estimated that over 50 re- lated prevalence studies have been conducted to date (Fairburn & Beglin, 1990).

Generally speaking, there has been a focus in these prevalence studies on young fe- male groups in tertiary educational settings. This focus has been for three reasons.

Carry Welch, Ph.D., i s Research Fellow in the Mental Health Unit, Joslin Diabetes Center, Boston, MA. Laurie Thompson, MB.BSc. FRANZCP, is Staff Psychiatrist at Porirua Hospital, Wellington, New Zealand. Anne Hall, MB.BSc., FRANZCP, i s Associate Professor at the Department of Psychological Medicine, Wellington School of Medicine, Wellington, New Zealand. Address correspondence to the first author at the Mental Health Unit, loslin Diabetes Center, One loslin Place, Boston, MA 022 15.

hternational journal of Eating Disorders, Vol. 14, No. 1, 95- 105 (1993) 0 1993 by John Wiley & Sons, Inc. CCC 0276-3478/93/010@95-11

Page 2: The BULIT-R: Its reliability and clinical validity as a screening tool for DSM-III-R bulimia nervosa in a female tertiary education population

96 Welch, Thompson, and Hall

Firstly, female academic groups are relatively accessible to prospective researchers and large numbers of such respondents can be surveyed in one place. Secondly, the sub- jects are usually receptive to the task of completing psychological test batteries, as a result of their academic training. Thirdly, it is believed that the young, middle, and upper class women who characterize such educational groups are at a higher risk of developing eating disorders and warrant closer research attention than more represen- tative community samples (e.g., Gandour, 1984).

As part of the research interest to date in estimating the prevalence of bulimia and bulimia nervosa, many self-report measures have been developed as criterion measures. Self-report measures have been employed for their economy and speed where large numbers of respondents must be tested and because they can provide anonymity for respondents who may not respond to a face to face interview on such a personal and sensitive issue. Also, there is the advantage that the responses to the questionnaires can be quantified and therefore are amenable to descriptive and inferential statistical analysis. However, it has been argued that self-report questionnaires remain inferior to clinical interviews in other respects, because some aspects of the specific psychopathol- ogy of bulimia nervosa are better detected by a trained clinician in a clinical setting, particularly the central overconcern with weight and shape (Fairburn & Beglin, 1990). Despite this concern, it is clear that in large-scale epidemiological studies it is impracti- cal to administer a comprehensive eating disorders interview to all individuals in a given population.

A practical solution to this problem is to combine a suitable sampling strategy with the use of a self-report screening measure to first identify potential cases and subse- quently to employ a fuller clinical interview to confirm eating disorder caseness within this smaller, more manageable subgroup. However, it is critical to the success of this two-stage process that the reliability and discriminant validity of the chosen screening measure is known, if the measure is to provide accurate information to guide the selec- tion of those wanted later for interview. Specifically, because considerable importance is attached to individual screening test scores obtained in the first phase, the internal reliability of the test should be high and preferably in the vicinity of 0.90 (Nunnally & Durham, 1975) to ensure that the standard error of measurement is kept relatively small. The reliability of the measure will be related to a number of factors, including the range or domain of content tapped by the measure, its test length, and the standardization of its instructions, format, and scoring. In terms of its validity as a screening instrument, the optimal cutoff score for the test should have been established beforehand on a com- parable population to ensure that no true cases are missed (i.e., no false negatives at the chosen cutoff) and that the number of false positives to be given a full clinical in- terview is minimized to conserve human resources.

Numerous self-report measures are available for the task of screening for bulimia or bulimia nervosa. For example, the Bulimia Test (BULIT, Smith & Thelen, 1984), the Bu- limic Investigatory Test Edinburgh (BITE, Henderson & Freeman, 1987), the Conroy- Healy Eating Questionnaire (CHEQ; Healy, Conroy, & Walsh, 1985), the Eating Attitudes Test (EAT; Garner & Garfinkel, 1979), the Binge Eating Scale (Halmi, Falk, & Schwartz, 1981), and the measures of Pope, Hudson, Yurgelun-Todd, and Hudson (1984), Pyle, Mitchell, Eckert, Halvorson, Newman, and Goff (1983), and Kurtzman, Yager, Landsverk, Wiesmeier, and Bodurka (1989) have been used. In terms of the psychometric studies of the measures produced to date, there has been an increase in the reporting of their reliability and a shift in emphasis in the validity studies from simple face and content validity (e.g., CHEQ; Healy et al., 1985), to case-control

Page 3: The BULIT-R: Its reliability and clinical validity as a screening tool for DSM-III-R bulimia nervosa in a female tertiary education population

BULIT-R 97

discriminant validity (e.g., BITE; Henderson & Freeman, 1987) through to the more de- manding validity studies that have evaluated the discriminant validity of a given mea- sure in a representative community sample with a full range of morbidity and explored a range of potential cutoff scores for the measure. To date, some information on the latter type of validity is available for the BITE (Henderson & Freeman, 1988) and the BULIT (Smith & Thelen, 1984), although the reliability and validity of the "gold stan- dard" of diagnosis used in these studies was not addressed.

The original authors of the BULIT have recently updated their measure to accommo- date the new criteria for bulimia nervosa (BULIT-R; Thelen, Farmer, Wonderlich, & Smith, 1991). To establish the potential value of the BULIT-R as a screening measure, we evaluated its internal reliability, its concurrent validity using two conceptually re- lated measures, and its discriminant validity against a comprehensive clinical interview, using a sample of young women attending a tertiary educational institution.

METHOD

Subjects

The sample was drawn from the Wellington Polytechnic School of Nursing and Health Education, a tertiary educational institution serving Wellington, New Zealand. At the time of the study, the school comprised 319 females who were enrolled in 20 classes covering years 1, 2, and 3. Eleven males were enrolled at the school, but were not in- cluded in the analyses. One class (n = 15) was not available to the study, reducing the total available female population to 304. Two hundred forty-three (80%) of these 304 students volunteered to take part in the study. The mean age of this sample was 21.3 ( S D = 4.4) years, the mean body mass index was 22.1 (SD = 2.6), and the mean New Zealand socioeconomic status index (Johnston, 1983) was 2.3 (SD = 1.0) on a scale from 1 (highest status) to 6 (lowest status).

The BULIT-R

This 28-item measure is a revised version of the 32-item BULIT that was designed to detect DSM-I11 bulimia. The BULIT has been shown to have high (i.e., above .90) inter- nal reliability and l-week test-retest reliability and there is some evidence to support its predictive value as a measure of DSM-I11 bulimia (Smith & Thelen, 1984; Thelen, Mann, Pruitt, & Smith, 1987; Welch & Hall, 1989; Welch, Hall, & Renner 1990). There are 36 BULIT-R items in all and 28 used to make up the total score. Sixteen items have been retained from the original BULIT.

Two conceptually related measures were employed to investigate the concurrent va- lidity of the BULIT-R. The first of these was the Symptom subscale of the BITE, a mea- sure of bulimic behaviors that has been described by its authors as a potential diagnostic tool for both DSM-111 bulimia and DSM-III-R bulimia nervosa, based on its item content and the interim findings from a validation study involving three university samples (Henderson & Freeman, 1988). The BITE was reported to have a high internal consis- tency. The second measure used was a 21-item subscale derived from the Eating 13s- order Inventory (EDI; Garner, Olmstead, & Polivy, 1983). The ED1 was first developed as an 8-subscale patient measure designed to assess the cognitive and behavioral char- acteristics of anorexia nervosa and bulimia. A recent study showed that the ED1

Page 4: The BULIT-R: Its reliability and clinical validity as a screening tool for DSM-III-R bulimia nervosa in a female tertiary education population

98 Welch, Thompson, and Hall

has three subscales in the nonpatient setting. One of these subscales comprised 21 items from the original Drive for Thinness, Body Dissatisfaction, and Bulimia symptoms of the ED1 and was found to have internal reliability above .90 (Welch, Hall, & Walkey, 1988). Although the original ED1 subscales were intended to enable the exploration of distinct patient subgroups within anorexia nervosa and bulimia and not specifically de- signed as screening tools, the new 21-item subscale would appear to have promise as a screening measure for bulimia nervosa based on its item content and so was used in this study.

DSM-Ill-R Diagnostic Interview Schedule for Bulimia Nervosa

The diagnostic interview schedule used built on earlier research by one of us (Hay & Hall, 1991). It incorporated the bulimia section of the Diagnostic Interview Schedule- version I11 Revised (DIS-111 R; Helzer & Robins, 1988) and the bulimia section of the Diagnostic Interview Schedule-version 111 A that had been modified for a New Zealand community prevalence study (DIS-111 A; Bushnell, Wells, Hornblow, Oakley-Brown, & Joyce, 1990). The resultant interview schedule used in our study incorporated all rele- vant DIS-I11 R and DIS-111 A items unchanged and DIS-I11 rules. Modification of the DIS-I11 sections employed was in four ways: (i) adding extra items; (ii) increasing num- ber of probe items; (iii) providing for semistructured use of the schedule; and (iv) using the schedule in conjunction with a checklist of diagnostic criteria (see Appendix A for details). The extra items were added to cover perceived weaknesses of the DIS-R and provide sufficient data to make diagnostic decisions at the no disorderleating disorder Not Otherwise Specified (NOS) boundary. Although only bulimia nervosa diagnoses are reported here, a full range of past and present eating disorder diagnoses were made.

Interviews for students without an eating disorder took 15-20 minutes to complete and those with an eating disorder, 30-40 minutes. The reliability of the interview pro- cess was improved by two steps. Firstly, the interviewers carried out beforehand a num- ber of role plays to consider potentially difficult cases (e.g., vomiting for weight control, obese dieting, binging of long duration, patients with confusion about loss of control, easily offended interviewees, and those without an eating disorder). Secondly, dual rat- ings were made during screened interviews of eating disorder outpatients, healthy med- ical students, and nursing staff. The diagnostic interviews were completed by two clinicians with considerable experience with eating disorders, one a psychiatrist with over 20 years experience and the other a senior psychiatric registrar with a particular interest in eating disorders.

Procedure

The students firstly completed the study booklet containing the BULIT-R, the BITE, and the ED1 in 19 class groups over a 1-week period. They were told some subjects would be asked to participate in a 20-minute interview conducted by one of two expe- rienced clinicians and that these clinicians would be blind to the questionnaire results. Also, it was stressed that those interviewed would represent a wide range of scoring on the study measures under evaluation to help evaluate the performance of the three tests over a range of dieting behaviors. The students were further informed of the eth- ical procedures in place to provide confidentiality of information given. Specifically, questionnaires and background information were coded with student identification numbers up to the interview stage. Following the interviews, the data were coded by

Page 5: The BULIT-R: Its reliability and clinical validity as a screening tool for DSM-III-R bulimia nervosa in a female tertiary education population

BULIT-R 99

new computer numbers and the original student identification numbers and names were destroyed. The students were informed that participation in the study was voluntary and unrelated to their course work, and that the study had been approved by the Re- search Ethical Committee of the Wellington Area Health Board. Arrangements were made with the student health and local psychiatric services to ensure treatment for eat- ing disorders was available for those requesting it and this was discussed at interview where appropriate.

Following the completion of the study booklets, the BULIT-R, BITE, and the ED1 sub- scale responses were entered on an IBM 4381 disk and scored. Students were classified as a high BULIT-R scorer if they scored at or above the prescribed “caseness” threshold for bulimia nervosa of 104, a medium scorer if they scored at or above 60 to below 104, or as a low scorer if they scored below 60. All high and medium scorers were selected for interview, and an equal number of randomly selected low scorers. Using this cut- off, 6 high scorers were selected, 53 medium scorers, and 59 randomly selected low scorers. The completion of the three questionnaires took place over 1 week and the in- terviews conducted over the following 2 weeks as the classes were made available to the study. In addition to completing the BULIT-R, the BITE, and the EDI, subjects com- pleted a section covering demographic details such as age, sex, socioeconomic status, weight history, weight perception, and meal patterns.

Statistical Analyses

Pearson’s correlations were used for the item-to-total analyses. For internal reliability calculations, Cronbach’s alpha was employed. For the concurrent validity analyses, Pearson’s correlations among the three measures were calculated. To assess the dis- criminant validity of the tests, the sensitivity, positive predictive value, specificity, and negative predictive value were calculated for the designated cutoff of 104 for the BULIT-R (Thelen et al., 1991) using a diagnostic interview for bulimia nervosa and the subjects selected for interview. The discriminant validity indices of the BULIT-R were determined by its performance against the 109 diagnostic interviews. To calculate the validity indices, a 2 X 2 table is constructed showing the numbers scoring at or above the cutoff score of 104 who were either identified at interview as true cases or true non- cases. The data for the 109 interviewed were weighted to reflect the proportions of high, medium, and low scorers in the original sample of 243, with the assumption that those not interviewed were similar in the proportion of cases and noncases as those inter- viewed. For example, there were 46/53 medium scorers and it was found at interview that 1 was a case of bulimia nervosa and 45 were not. Therefore, these numbers were multiplied through by 53/46 = 1.15 to produce an estimate for the total 53. Similarly, the high scorers were multiplied by 6/5 = 1.2 and the low scorers by 184/58 = 3.17. Values for a range of alternate cutoff‘s were then explored to identify the best cutoff for a screening measure (Le., the lowest cutoff at which sensitivity = 100% and positive predictive value was maximized).

RESULTS

From the 304 enrolled, 243 (80%) took part in the study. The range of BULIT-R scor- ing was from 29-126 with a mean of 51.5 and a median of 45. Six had BULIT-R scores of 109 or above (high scorers) and were approached for interview. Five of these six re-

Page 6: The BULIT-R: Its reliability and clinical validity as a screening tool for DSM-III-R bulimia nervosa in a female tertiary education population

100 Welch, Thompson, and Hall

sponded positively and one, with a score of 126 had left the school. Four of the five were found to be bulimia nervosa cases (BULIT-R for noncase = 110). Fifty-three stu- dents had BULIT-R scores that were below 104 and at or above 60 (medium scorers). They were approached for interview and 46 (87%) responded positively. One had left school, two were sick, two refused, and two did not respond to a follow-up letter. The mean BULIT-R score for the seven nonresponders was 79.3 with a range from 68-90. One of the 46 subjects interviewed was found to be a bulimia nervosa case (BULIT-R = 98). Finally, 184 subjects scored below 60 on the BULIT-R (low scorers). A random num- ber of this group were selected to match the number of high scorers and middle scorers chosen to be interviewed. Therefore, there were 59 low scorers selected for interview. Fifty eight (98.3%) responded to the interview. None were found to be bulimia nervosa cases.

Therefore, the overall number of confirmed cases of bulimia nervosa was 5/243 (i.e., bulimia nervosa prevalence = 2.1%, 95% confidence interval [CI] .7-4.7). All were con- sidered clinical cases and offered treatment. As noted above, the one high scorer who was not interviewed had a BULIT-R score of 126, the highest found in the study (the subject also had a BITE score of 30) raising the possibility that there were in fact six cases of bulimia nervosa (i.e., prevalence = 2.5%, 95% CI .9-5.3%). For other eating disorder diagnoses, there were six (2.5%, 95% CI .9-5.3) DSM-I11 bulimia and five (2.1%, 95% CI .7-4.7) eating disorder NOS. Therefore, the total number of eating dis- orders confirmed was therefore 16/243 (6.6%, 95% CI 3.8-10.5%).

Reliability

The reliability data for the BULIT-R were based on the responses of the 243 subjects who completed all three questionnaires. An analysis of the item-to-total correlations of the BULIT-R showed that all 28 items with the exception of q26 (relating to vomiting behavior, r = .26) had a correlation of .30 or better with the total test. The average item- to-total correlation was .59 with a range from .26-.72. The internal consistency of the total test was .92.

Concurrent Validity

The concurrent validity of the BULIT-R was determined from the correlations between the BULIT-R and two theoretically related measures. Its correlation with the BITE was .90 and the ED1 subscale .75. The correlation between the BITE and the ED1 subscale was .73. All three correlations were highly significant (i.e., at the p < .0001 level). For the five identified cases of DSM-111-R bulimia nervosa, BITE symptom scores ranged from 22-28 with a mean of 25.

Criterion-Related Validity

At the cutoff of 104 suggested by the original authors for the measure, the sensitivity of the BULIT-R was SO%, the specificity 99.5%, the positive predictive value 80%, and the negative predictive value 99.5%. For a screening measure, there must be ideally be no false negatives and the proportion of false positives among those scoring above the cutoff should be minimized (i.e., the positive predictive value should be maximized) to reduce the number of interviews to be subsequently carried out in the second phase.

Page 7: The BULIT-R: Its reliability and clinical validity as a screening tool for DSM-III-R bulimia nervosa in a female tertiary education population

BULIT-R 1 c1

For these purposes, the best cutoff was for a score of 98 on the BULIT-R. At this level, the sensitivity (l-false negative rate) was loo%, the positive predictive value 71.3%, the specificity 99.0%, and the negative predictive value 100%.

Normative Data

Mean BULIT-R score (and standard deviation) for the interviewed noncases was 53.8 (17.2), for the eating disorder NOS group 68.2 (10.4), the bulimia group 89.0 (14.8), and the bulimia nervosa group 113.8 (9.7).

DISCUSSION

This evaluation of the psychometric properties of the BULIT-R in a selected female population showed the BULIT-R to be a well constructed screening tool that covers all DSM-III-R criteria and has a test layout that had good subject acceptability, producing few unanswered items.

The response rates achieved in this study at both the questionnaire completion (80%) and interview participation (92%) stages were good considering the voluntary nature of the exercise and the academic demands on the women at the time of the survey. In an earlier survey of the same population, students completed a bulimia nervosa question- naire anonymously and had no prior knowledge of the subject matter (Welch & Hall, 1989). Under these conditions, 90% of students enrolled at the school completed the study measure, with one refusal. Thelen et al. (1991) reported a response rate of 73.9% to the request for a clinical interview by female students scoring at or above the cutoff of 104 in the original BULIT-R validation study. By way of comparison to these studies, the average response rate to questionnaire completion in community surveys of eating disorders has been estimated as 74.4% (with a range from 34-100% for the 37 studies reviewed by Fairburn and Beglin (1990).

In terms of its reliability, the 28 items making up the total test correlated highly with the total score, with an average correlation of .59 and only one item scoring below our criterion of .30. This close interrelatedness of the BULIT-R item pool contributed to a high internal reliability of .92 for the total test, which met the standard suggested by Nunnally and Durham (1975) for a screening test. A similar high value was found by Thelen et al. (1991) who reported a coefficient alpha of .97.

In terms of its concurrent validity as examined here, the BULIT-R was found to cor- relate highly (i.e., .90) with the BITE, a screening measure of binge eating syndromes argued by Freeman and Henderson (1988) to detect DSM-I11 bulimia and “probably” DSM-III-R. The less strong but still substantial correlation (i.e., .75) found with a se- lected subscale of the ED1 was expected, given that the latter subscale is clearly broader in eating disorder item content than the BITE, measuring a combination of Drive for Thinness, Body Dissatisfaction, and Bulimia symptoms. These correlations were highly significant (i.e., at the p < .0001 level). The suggested cutoff score for the BITE to iden- tify DSM-I11 bulimia caseness and probable DSM-III-R bulimia nervosa caseness (Free- man & Henderson, 1988) is a symptom score of 20. All five cases of bulimia nervosa identified in this study had a BITE symptom score above 20 with a range from 22-28 and a mean of 25.

The construct validity of the BULIT-R was further evaluated against a comprehensive eating disorders interview administered by two psychiatrists who specialize in eating

Page 8: The BULIT-R: Its reliability and clinical validity as a screening tool for DSM-III-R bulimia nervosa in a female tertiary education population

102 Welch, Thompson, and Hall

disorders who carefully applied all the diagnostic criteria for bulimia nervosa to each interviewee. It was intended that by combining the relevant items from the DIS-R, items from a recently modified version of the DIS-111 A, and the use of a semistructured in- terview format, we would maximize diagnostic reliability and clinical validity of our di- agnoses by combining clinical experience in the diagnostic process and the application of standardized measures of eating disorders. It should be noted at this point that Thelen et al. (1991) reported a kappa coefficient of .68 for the interrater reliability among post- graduate students and research assistants who conducted the structured clinical inter- views in the original validation study for the BULIT-R. In this study, formal assessment of interrater reliability was not carried out, although the interviewers were experienced psychiatrists who specialize in the assessment and management of eating disorders. Also, role playing and dual assessments were carried out prior to the interview phase to improve this aspect of the measurement process.

The performance of the BULIT-R in discriminating caseness at the cutoff of 104 sug- gested by the original authors for the measure was good. Specifically, the sensitivity was found to be 80% and the specificity 99.5%. Turning to the base-rate dependent indices (i.e., positive and negative predictive value), the positive predictive value was 80% and the negative predictive value 99.5%. On this point, Shrout and Fleiss (1981) have pointed out that negative predictive value will always be close to 100% when the underlying base rate is below 10% and positive predictive value will typically drop sharply below this level. Given the low base rate of prevalence for bulimia nervosa in the community sample used in this study (i.e., 2.1%, 95% CI .7-4.7%), the ability of the BULIT-R to discriminate true positives from false positives is strong. With the ex- ception of the positive predictive value (which is similar) these indices are higher than those found by Thelen et al. (1991) although the direct comparability of the two sets of validity indices is difficult as a result of differences in design and implementation of the two studies. For example, Thelen et al. used a much larger study sample although the response rate to their study was not reported and the response rate to the request for an interview was lower than for this study. Also, their interviewed subset contained an overrepresentation of subjects just below the cutoff which would have provided a more stringent test of the BULIT-R and lowered the validity coefficients obtained.

There was one noncase and one case identified at the interview stage that were mis- classified by the BULIT-R at the cutoff of 104. The noncase who scored 110 on the BULIT-R (i.e., a false positive) was a 24-year-old woman who had a severe past history of anorexia nervosa and bulimia nervosa from age 17-19 years. At the time of inter- view she was binge eating only once a week, so she did not meet the full criteria for bulimia nervosa, although she did report extreme use of exercise to control weight and had an extreme concern with body shape and weight which contributed to her high score on the BULIT-R. The misclassified case (i.e., a false negative) scored 98 on the BULIT-R. The subject was a 25-year-old woman whose misclassification appeared to be partly due to her lack of use of laxatives or vomiting for weight control and partly due to a semantic issue related to the wording of four specific items in the BULIT-R. Spe- cifically, she considered the words "dieting" and "diets" to apply to women who epi- sodically practiced dietary restraint and otherwise attempted to control their weight and shape. She considered her own behavior as an ongoing characteristic of her life and not episodic, and therefore not "dieting" as such. This interpretation of these items re- sulted in a lowering of her BULIT-R score and her misclassification on the BULIT-R.

An exploration of the optimal cutoff for the BULIT-R as a tool for use in a two-stage screening exercise identified a cutoff of 98 as the best test score. At this cutoff, the sen-

Page 9: The BULIT-R: Its reliability and clinical validity as a screening tool for DSM-III-R bulimia nervosa in a female tertiary education population

BULIT-R 103

sitivity was loo%, specificity 99.0%, negative predictive value loo%, and positive pre- dictive value 71.3%. Therefore, the use of a BULIT-R cutoff of 98 in a similar setting to this study would be expected to produce no false negatives, approximately 71 true cases, and 29 false positives among 100 high scorers.

It should be cautioned at this point that the analyses in this study were based on a relatively small number of bulimia nervosa cases that reflected in part the more nar- rowly defined syndrome of bulimia nervosa that is now used in DSM-111-R (as com- pared to the earlier DSM-111 bulimia or other more broadly defined existing bulimic syndromes), as well as the relative size of the educational institution used in our study.

Finally, in addition to the replication of the findings presented here for a female ter- tiary educational group, future research on the performance of the BULIT-R as a screen- ing measure of bulimia nervosa should estimate its reliability and validity in other specific community settings that have been of interest to date, such as family planning clinics (Cooper, Charnock, & Taylor, 1987) and general practice clinics (Ben-Tovim, 1988) as well as representative community populations (Bushnell et al., 1990). This informa- tion would provide valuable evidence to guide our two-stage questionnaire and inter- view strategies needed to identify putative cases of bulimia nervosa in the community. This process would provide more accurate prevalence estimation, and importantly, would facilitate closer study of the psychological and physical morbidity among these community cases compared to patients seen in eating disorder clinics and the stability of these findings over time.

APPENDIX A

Modifications Made to the DIS-Ill R in This Study

Extra Items Two items that were added concerned the perceived presence of an eating disorder

and the percentage of time spent thinking about weight and shape. These were used to introduce students to the main theme of the interview and to establish rapport. Five extra questions were included from the DIS-I11 A and nine questions from the modifi- cation of the DIS-111 A by Bushnell, et al. (1990) as part of the Christchurch Psychiatric Epidemiological Survey. The latter has since been used by one of us in a study of psy- chiatric inpatients (Hay & Hall, 1991). These questions concerned weight history, binge characteristics such as frequency, duration, perceived embarrassment and abnormality of eating patterns, body image concerns such as terror of being overweight, feelings about weight loss and weight gain and dissatisfaction with body proportions, and fi- nally, treatment options. Where appropriate, these questions were scored according to DIS-R rules. Other items added that were not from the DIS-R, the DIS-I11 A, or its mod- ification by Bushnell et al. concerned the most weight ever lost by dieting, the percep- tion of body weight during weight loss, oral contraceptive use during a period of weight loss, and the exact food eaten during a typical binge.

Probe Questions The DIS-R decision tree was altered to increase the number of probe questions and

reduce the number of items skipped if symptoms appeared to be absent. This was to ensure that even entirely normal students were asked a minimum of questions related

Page 10: The BULIT-R: Its reliability and clinical validity as a screening tool for DSM-III-R bulimia nervosa in a female tertiary education population

104 Welch, Thompson, and Hall

to eating disorders. This is in contrast to the DIS-R, where in the bulimia nervosa sec- tion, for example, no further questions are asked if the respondent denies binge eating. For the bulimia nervosa diagnosis, all respondents were asked about the presence of binging, dieting, exercise, vomiting, laxative and diuretic use, and also asked about body weight and shape concerns. If there was any doubt about the presence of clini- cally significant binge eating episode, a more detailed definition was given that included perceived embarrassment about binge eating, binge frequency and duration, perceived abnormality of eating, and the extent of functional impairment from the eating disor- der.

Semistructured Use of the Schedule Interviewers did not restrict themselves solely to asking the probe questions, but

asked additional questions if necessary to decide on the presence of symptoms or a diagnostic criterion. Responses to interview items were always recorded on the basis of the clinicians’ judgment and not solely on the respondents’ literal interpretation of probe questions. Where discrepancies between immediate and considered responses occurred, these were recorded manually along with responses to unstructured questions. In ad- dition, on completion of the schedule, free questions could be asked to clarify a given diagnosis.

Use of the Interview Schedule in Conjunction with a Checklist of Diagnostic Criteria

The following present and past diagnoses were determined: DSM-111-R bulimia ner- vosa, bulimia, eating disorder NOS bulimia type, eating disorder NOS vomiting for weight control, anorexia nervosa, and eating disorder NOS anorexia nervosa type.

REFERENCES

American Psychiatric Association. (1980). Diagnostic and statistical manual of mental disorders (3rd ed.). Wash-

American Psychiatric Association. (1987). Diagnostic and statistical manual of mental disorders (3rd ed., revised).

Ben-Tovim, D. I. (1988). DSM-I1 draft, DSM-I1 R and the diagnosis and prevalence of bulimia in Australia.

Bushnell, J . A,, Wells, J. E., Hornblow, A. R., Oakley-Brown, M. A,, &Joyce, P. (1990). The prevalence of

Cooper, P. J., Charnock, D., & Taylor, M. J. (1987). The prevalence of bulimia nervosa: A replication study.

Fairburn, C., & Beglin, S. (1990). Studies of the epidemiology of bulimia nervosa. American Journal of Psy-

Freeman, C. P., & Henderson, M. (1988). The BITE: Indices of agreement. British Journal of Psychiatry, 152,

Gandour, M. j. (1984). Bulimia: Clinical description, assessment, aetiology and treatment. International lour- nal of Eating Disorders, 3, 3-36.

Garner, D. M., & Garfinkel, P.E. (1979). The Eating Attitudes Test: An index of the symptoms of anorexia nervosa. Psychological Medicine, 9, 273-279.

Garner, D. M., Olmsted, M. P., & Polivy, J. (1983). Development and validation of a multidimensional Eat- ing Disorder Inventory for anorexia nervosa and bulimia. International Journal of Eating Disorders, 2 , 15-34.

Halmi, K. A,, Falk, J. R., & Schwartz, E. (1981). Binge eating and vomiting: A survey of a college population. Psychological Medicine, 2 2 , 697-706.

Hay, P., & Hall, A. (1991). The prevalence of eating disorders in recently admitted psychiatric inpatients. British Journal of Psychiatry, 159, 652-655.

Healy, K., Conroy, R. M., & Walsh, N. (1985). The prevalence of binge eating and bulimia in 1063 college students. lournal of Psychiatric Research, 19, 161-166.

ington, DC: Author.

Washington, DC: Author.

American Journal of Psychiatry, 245, 1000-1002.

three bulimic syndromes in the general population. Psychological Medicine, 20, 671 -680.

British Jourrial of Psychiatry, 151, 684-686.

chiatry, 147, 401-408.

575-576.

Page 11: The BULIT-R: Its reliability and clinical validity as a screening tool for DSM-III-R bulimia nervosa in a female tertiary education population

BULIT-R 105

Helzer, J. E., & Robins, L. N. (1988). The Diagnostic Interview Schedule: Its development, evaluation and

Henderson, M., & Freeman, C. P. L. (1987). A self-rating scale for bulimia: The “BITE”. British journal of

Henderson, M., & Freeman, C. P. (1988). The BITE: Indices of agreement. British Journal of Psychiatq,, 152,

Johnston, R. (1983). A revision of socioeconomic indices. Wellington, New Zealand: New Zealand Council of Ed- ucational Research.

Kurtzman, F. D., Yager, I., Landsverk, J . , Wiesmeier, D., & Bodurka, D. (1989). Eating disorders among selected female student populations at UCLA. Journal of the American Dietetic Association, 89, 45-53.

Nunnally, J. C., & Durham, R. L. (1975). Validity, reliability, and special problems of measurement in eval- uation research. In E.L. Streuning & M. Guttentag (Eds.), Handbook of evaluation research (Vol. 1) (pp. 289-352). London: Sage Publications.

Pope, H. J., Hudson, J. I., & Yurgelum-Todd, D. (1984). Anorexia nervosa and bulimia among 300 suburban women shoppers. American Journal of Psychiatry, 141, 292-294.

Pyle, R. I., Mitchell, J. E., Eckert, E., Halvorson, P. A., Newman, P. A., & Goff, G. M. (1983). The incidence of bulimia in freshman college students. International journal of Eating Disorders, 2, 75-85.

Shrout, P. E., & Fleiss, J. L. (1981). Reliability and case detection. In J. K., Wing, P. Bebbington, & L. N. Robins (Eds.), What is a case? The problem of definition in psychiatric surveys. (pp. 117-128). London: Grant McIntyre Ltd.

Smith, M. C . , & Thelen, M. H. (1984). Development and validation of a test for bulimia. [ournal of Consulting

use. Social Psychiatry and Psychiatric Epidemiology, 234, 6- 16.

Psychiatry, 150, 18-24.

575-576.

and Clinical Psychology, 52, 863-872. -

Thelen. M. H.. Farmer, 1.. Wonderlich, S.. & Smith, M. (1991). A revision of the Bulimia Test: The BULIT-R. . \ I . . Psychologid Assessment, 3, 119- 124.

lege women. journal of Psychosomatic Research, 38, 73-78.

chologically based classification. Zntermtional Journal of Eating Disorders, 9, 311-322.

limic behaviours. Journal of Psychiatric Research, 23, 125- 133.

of Clinical Psychology, 44, 51-65.

Thelen, M. H., Mann, L., Pruitt, J., & Smith, M. (1987). Bulimia: Prevalence and component factors in col-

Welch, G., Hall, A., & Renner, R. (1990). Patient subgrouping in anorexia nervosa and bulimia using psy-

Welch, G. W., & Hall, A. (1989). The reliability and discriminant validity of three potential measures of bu-

Welch, G. W., Hall, A., & Walkey, F. H. (1988). The factor structure of the Eating Disorder Inventory. Journal