Problems in the development and validation of questionnaire-based screening instruments for...

9

Click here to load reader

Transcript of Problems in the development and validation of questionnaire-based screening instruments for...

Page 1: Problems in the development and validation of questionnaire-based screening instruments for ascertaining cases with symptomatic knee osteoarthritis: The Framingham Study

ARTHRITIS & RHEUMATISMVol. 44, No. 5, May 2001, pp 1105–1113© 2001, American College of RheumatologyPublished by Wiley-Liss, Inc.

Problems in the Development and Validation ofQuestionnaire-Based Screening Instruments for

Ascertaining Cases With Symptomatic Knee Osteoarthritis

The Framingham Study

Michael LaValley, Timothy E. McAlindon, Stephen Evans, Christine E. Chaisson,and David T. Felson

Objective. To determine if screening for symptom-atic knee osteoarthritis (OA) for clinical trials andepidemiologic studies could be satisfactorily done with-out performing knee radiographs and to develop effi-cient screening instruments for symptomatic knee OAbased on self-reported symptoms and functional limita-tions.

Methods. We administered a mailed question-naire containing many different questions on kneesymptoms and functional limitations to 1,921 partici-pants of the Framingham Study who had previouslybeen screened for symptomatic OA with a history andknee radiographs. Recursive partitioning methods (us-ing the Classification and Regression Trees [CART]program) were used to create a set of screening instru-ments for symptomatic knee OA, which was defined asknee symptoms on most days and radiographic evidenceof OA. Three screening instruments were developed tomaximize the sensitivity, specificity, and efficiency.

Results. The sensitive instrument had 84% sensi-tivity and 73% specificity. The specific instrument had46% sensitivity and 94% specificity. The efficient instru-ment had 56% sensitivity and 85% specificity. Sensitiv-ity was lower and specificity was higher when these

instruments were used to screen for radiographic OA.All instruments had higher sensitivity but lower speci-ficity when used for older subjects (age >60) withgreater disease prevalence. However, using any of theseinstruments as a single-step screening mechanism re-sulted in considerable misclassification.

Conclusion. We conclude that none of these in-struments has adequate diagnostic test performance toserve as a single-step evaluation of the presence orabsence of symptomatic knee OA.

Osteoarthritis (OA) of the knee is a major causeof pain, disability, and cost in the general population,particularly among the elderly (1–3). Currently, thereare few effective preventive strategies or medical thera-pies for this disorder (4,5). Further research is thereforeneeded to study risk factors for knee OA and to developand test medical treatments.

However, issues surrounding case ascertainmenthamper OA studies (6). Strategies employing radiogra-phy as the only step in case detection are costly andexpose many individuals without OA to radiation. If aset of questions could adequately identify those withOA, the cost of disease assessment would drop dramat-ically and many large-scale surveys and epidemiologicstudies could begin to provide valid and highly usefuldata on OA. Further, large-scale inexpensive trials oftreatments aimed at symptoms could be performed,facilitating the development of greatly needed newtherapies. Some recent studies have characterized thosewho self-report arthritis as having either arthritis or OA(7), but the validity of this approach has not beencritically evaluated. Last, federal and state health agen-

Supported by NIH Arthritis Center grants AR-20613 andAG-09300 and by NIH/National Heart, Lung, and Blood Institutecontract N01-HC-38038 to the Framingham Heart Study.

Michael LaValley, PhD, Timothy E. McAlindon, MD, MPH,Stephen Evans, MPH, Christine E. Chaisson, MPH, David T. Felson,MD, MPH: Boston University, Boston, Massachusetts.

Address correspondence and reprint requests to MichaelLaValley, PhD, Boston University Arthritis Center, 715 Albany Street,A203, Boston, MA 02118.

Submitted for publication July 25, 2000; accepted in revisedform December 19, 2000.

1105

Page 2: Problems in the development and validation of questionnaire-based screening instruments for ascertaining cases with symptomatic knee osteoarthritis: The Framingham Study

cies are in need of a simple survey-based approach thatwould identify persons with OA of the knee.

Some studies have used a staged screening ap-proach composed of pain queries or self-reported diag-nosis, followed by radiography or a physical examination(8–12), but this still involves the expense of performingradiographs or physical examinations on many individu-als. While some investigators have explored the poten-tial of various symptom questions to predict the pres-ence of knee OA, these endeavors have focused onindividual questions (11,13,14) rather than on develop-ing screening instruments. Generally, individual ques-tions have been found to have only weak predictiveproperties, and it remains unknown whether multi-itemquestionnaires would be sufficiently predictive to beused without radiologic or clinical confirmation of dis-ease status.

Also, the trade-off between sensitivity and spec-ificity of case ascertainment might operate differently invarious settings. An instrument of high sensitivity (cap-turing a high percentage of cases) may be of great utilityas a first step when recruiting participants with knee OAfor a clinical trial, while high specificity (excluding a highpercentage of noncases) may be critical in populationsurveys of knee OA (15).

Criteria for the classification of knee OA havebeen established by the American College of Rheuma-tology (16). However, the intent of that study was tostandardize the diagnosis of idiopathic knee OA, and itemployed clinical and laboratory findings (e.g., synovialfluid, and erythrocyte sedimentation rate) to classifysubjects. Our goal was to allow screening of largenumbers of potential study subjects without recourse toclinical findings.

The primary objective of the present study was todetermine whether we could develop a knee OA screen-ing instrument of sufficient accuracy to take the place ofradiography in epidemiologic studies. Our secondaryobjective was to develop a set of screening instrumentswith varying sensitivity and specificity for the followinguses: as a sensitive first-step screening instrument toavoid missing true cases, as a specific instrument to excludenoncases, and as an efficient instrument to provide equalweighting of sensitivity and specificity. To accomplishthese objectives, we mailed questionnaires about kneesymptoms and function to subjects from the Framing-ham Heart Study and the Framingham Heart StudyOffspring cohorts who had previously been examined forsymptomatic knee OA (17,18). Questionnaire responseswere used to develop and evaluate screening instru-ments for symptomatic knee OA.

PATIENTS AND METHODS

Study subjects. Participants in this study were drawnfrom 2 Framingham Heart Study cohorts that had undergoneassessment for knee OA during 1992–1995. Members of theoriginal Framingham Heart Study cohort had weight-bearinganteroposterior (AP) and lateral knee radiographs taken andanswered a standardized question about knee pain duringbiennial examination 22 (1992–1993). The methodology forthese assessments has been previously described (17,19). Thechildren of the original Framingham cohort and their spouseswere recruited in the early 1970s to form the FraminghamOffspring cohort (20). A subset of the Framingham Offspringcohort had AP and lateral knee radiographs taken, andanswered the same standardized question about knee pain, aspart of examination 5 (1993–1995) (18).

To be eligible for this investigation, members of thecohorts had to have participated in the knee OA investigationsdescribed above. Individuals with total knee replacements,those taking second-line drugs for rheumatoid arthritis, andthose with hand radiographs showing evidence of rheumatoidarthritis were excluded from the study.

Radiographic evaluation. AP radiographs were scoredon the Kellgren/Lawrence scale (range 0–4) (21). Because thescale is not applicable to the patellofemoral joint, we used apreviously validated definition that characterized OA as beingpresent on the lateral view if there were either osteophytes(grade $2; range 0–3) or a combination of joint space narrow-ing (grade $2; range 0–3) and at least a small osteophyte(grade $1) or a cyst or sclerosis on the lateral knee radio-graphs (19). We classified an individual as having radiographicknee OA if the definition for either the tibiofemoral orpatellofemoral compartments in either knee was met.

Definition of symptomatic knee OA. Participants in thestudy were considered to have knee OA symptoms if theyanswered in the affirmative to the question “On most days, doyou have pain, aching, or stiffness in either of your knees?” atthe Framingham examination at which their knee radiographwas obtained. We classified participants as having symptomaticknee OA if they had symptoms and had radiographic OA in thesame knee.

Questionnaire assembly and mailing. We assembled aquestionnaire with the goal of identifying subjects with symp-tomatic knee OA as defined above. Our survey comprised 12questions, including 4 with multilevel or multi-item responseoptions, which were intended to capture elements of knee OAsymptoms (Table 1). These included questions used in previ-ous epidemiologic studies (3,22,23) and diagnostic question-naires (24,25), along with additional questions that we thoughtcould have utility in detecting knee OA. Question 12, whichasked about knee pain on more than 2 occasions in theprevious year, provided no guidance as to what length of timeconstituted an occasion.

Prior to mailing, the questions were evaluated for faceand content validity by showing them to rheumatologists andby testing them informally among patients with knee OA in theRheumatology Clinic at Boston University School of Medicine.The mailing took place between March and July 1996; secondand third reminders were mailed at 1-month intervals.

Statistical analysis. In general, a screening instrumentor classification rule does not perform as well on new data ason the data that were used in its development (26,27). Thus, to

1106 LAVALLEY ET AL

Page 3: Problems in the development and validation of questionnaire-based screening instruments for ascertaining cases with symptomatic knee osteoarthritis: The Framingham Study

Table 1. Rate of response and association of individual question responses with symptomatic knee osteoarthritis in the development sample

Candidate question

Frequency ofpositive

findings (%)Sensitivity

(%)Specificity

(%)

Positivepredictivevalue (%)

Oddsratio

1. If you have knee symptoms, which words best describeyour usual knee symptoms:a. Pain 16.8 41.9 86.2 22.2 4.6b. Stiffness 25.8 55.8 77.6 19.3 4.5c. Aching 19.4 51.2 83.8 23.4 5.5d. Tender 8.6 26.7 93.3 27.7 5.2e. Swollen 8.1 20.9 93.6 23.8 3.9f. Numb 2.0 7.0 98.5 31.6 5.0

2. Has a doctor ever told you that you have arthritis inyour knees?

15.0 57.0 77.2 33.8 13.2

3. During last month, did you ever have pain ordiscomfort in or around the knees?

32.6 73.3 68.4 20.0 9.6

4. Have you ever had pain lasting at least a month in oraround the knee, including the back of the knee?

18.4 58.1 80.5 28.1 9.0

5. Press the palm of your hand over the front of yourknee and slowly bend and straighten your knee. Doyou feel a continuous sensation of roughness orgrating from the knee as you do this? (Do not includesingle clicks or snaps.)

13.8 31.4 81.4 20.3 3.9

6. On most days do you have pain, aching or stiffness ineither of your knees?

25.0 73.3 75.9 26.0 11.5

If yes, is the pain, aching or stiffness—a. Mild 10.8 16.3 85.8 13.5 4.0b. Moderate 9.9 36.1 88.5 32.3 12.3c. Severe and extreme* 2.2 14.0 94.8 57.1 37.6

7. During the last month, did you have any knee pain ordiscomfort when doing the following activities?a. Going up steps 22.7 59.3 73.4 23.2 9.6b. Going down stairs 19.0 55.8 76.0 26.1 9.4c. Rising from a chair without using arms 18.5 58.1 76.9 27.9 11.6d. Walking 2–3 blocks (one-fourth of a mile) 14.6 55.8 79.8 34.0 15.9e. Walking from room to room 9.1 34.9 83.1 34.1 9.7f. Bending your knee or squatting down 31.0 70.3 66.8 20.3 10.2g. Getting in/out of car 18.3 51.2 76.7 24.9 7.8h. Putting on socks or stockings 11.9 40.7 80.8 30.4 8.6

8. Do you currently experience any sensation ofroughness or grating, “like sandpaper,” in your knees,while walking or during knee bending?

11.1 37.2 87.0 29.9 7.3

9. The following questions concern the amount ofstiffness (not pain) in your knee(s) that you haveexperienced in the last month. Stiffness is a sensationof restriction or slowness in the ease with which youmove your joints:

i. How severe is your stiffness after awakening inthe morning?a. Mild 22.5 30.2 68.0 11.9 5.3b. Moderate 11.5 33.7 80.8 26.1 14.4c. Severe and extreme* 2.1 12.8 88.8 55.0 47.5

ii. How severe is your stiffness after sitting, lying orresting later in the day?a. Mild 22.2 24.4 66.6 9.8 3.2b. Moderate 11.3 33.7 79.6 26.6 10.7c. Severe and extreme* 2.5 10.5 87.0 37.5 19.9

iii. How long does this stiffness take to wear off?a. Less than 10 minutes 30.2 44.2 71.4 13.0 4.4b. At least 10 minutes, and up to 30 minutes 6.9 24.4 95.1 31.3 13.6c. At least 30 minutes, and up to 60 minutes 1.0 2.3 99.1 20.0 4.8d. At least 60 minutes 1.5 7.0 99.1 42.9 21.4

10. Do you take medication most days for knee pain,aching, or stiffness?

11.5 37.2 87.5 28.8 7.5

11. On most days for the last 6 months, have you hadpain in or around the knee, including the back of theknee?

16.7 54.7 82.4 29.2 9.1

12. Have you had knee pain on more than two occasionsin the last year?

32.5 73.3 66.3 20.1 10.6

* Because of low response rates, these adjacent categories were combined.

QUESTIONNAIRE SCREENING FOR KNEE OA 1107

Page 4: Problems in the development and validation of questionnaire-based screening instruments for ascertaining cases with symptomatic knee osteoarthritis: The Framingham Study

accomplish both the development and validation of screeninginstruments, 2 data sets were required. The first step of theanalysis was to pool the cohort and offspring participants andthen randomly assign each into 1 of 2 subsets of equal size,called the development sample and the validation sample.Preliminary testing of questions and development of screeninginstruments were performed in the development sample. Theperformances of the final forms of the instruments and sensi-tivity analyses were evaluated in the validation sample.

Development of screening instruments. The second stepof the analysis was to develop screening instruments thatclassify participants in the development sample as being likelyto have symptomatic knee OA based on their responses toquestionnaire items. Association between questionnaire itemsand OA was measured using sensitivity (percentage of OAcases who screen positive), specificity (percentage of non-OAsubjects who screen negative), positive predictive value (per-centage of subjects who screen positive who are OA cases), andthe odds ratio. To reduce chance effects in questions withseveral levels of response, we combined levels that had fewerthan 5 positive responses with an adjacent level.

We used recursive partitioning (28), as implemented inthe Classification and Regression Trees [CART] program (29),to find the most informative ways to classify subjects (30), bytheir response to questionnaire items, into successively morehomogeneous groups with regard to symptomatic knee OAstatus. We chose to develop a set of 3 screening instrumentsthat could be employed in various situations. By weightingfalse-negative (subjects with symptomatic OA screened as nothaving the disease) and false-positive (subjects without symp-tomatic OA screened as having the disease) misclassificationsdifferently, we generated a screening instrument with en-hanced sensitivity and an instrument with enhanced specificity.Weighting false-negative and false-positive misclassificationsequally generated an efficient instrument that balanced sensi-tivity and specificity.

Validation of screening instruments. Screening instru-ments were tested in the validation sample, using sensitivity,specificity, positive predictive value, and accuracy of associa-tion with symptomatic knee OA. Accuracy was defined as thepercentage of screening results, both positive and negative,that were correct (31). We also considered the effect of theamount of misclassification we observed on the ability todetect risk factors for symptomatic knee OA if the instrumentswere used as the sole means of ascertaining symptomatic kneeOA in a case–control study (32). For this, we considered ahypothetical study with an exposure of 30% prevalence (suchas persons in the upper 30% of the weight distribution),symptomatic OA as the outcome, and an odds ratio of 2.0. Ofinterest was how much smaller than 2.0 the sample odds ratiowould be biased if the screening instruments alone were usedto determine outcome status. If odds ratios were biased to be,1.5, we thought that an instrument was not adequate.

To follow up on these results, we considered theprevalence of knee pain and radiographic knee OA in subjectswithout symptomatic knee OA. Subjects who do not havesymptomatic knee OA, but who screen positive may havesubclinical disease, which we operationalized as either radio-graphic OA without symptoms or symptoms alone. To examinethis possibility, we took all subjects in the validation samplewithout symptomatic knee OA and, for each instrument,

categorized them as a false positive (screened positive) or atrue negative (screened negative). We then examined theprevalence and odds ratios of symptoms and radiographic OAin the false-positive and true-negative subjects.

Older age is known to increase the prevalence of kneeOA, and disease prevalence plays an important role in theperformance of screening tools. So, to determine the effect ofage on the instruments’ performances, we split the validationsample into 2 groups: subjects who were #60 years old, andsubjects who were .60 years old. The instruments were thenevaluated in each group separately.

Sensitivity analyses. As a sensitivity analysis, we usedthe same instruments to screen for radiographic knee OA.Because of the possibility that some subjects reported kneepain on most days at the Framingham examination but had aresolution or diminution of the frequency of symptoms by thetime of the survey, we also performed a sensitivity analysisusing knee pain from the survey and the radiograph from theFramingham examination.

RESULTS

Demographic data. We received responses from1,944 of 2,318 persons to whom the questionnaire wasmailed (83.9%). Twenty-three respondents had incom-plete data for knee pain or radiographic scores fromtheir Framingham examination. This left 1,921 partici-pants for analysis. Participants included 1,077 women(56.1%), had mean 6 SD age of 61.2 6 12.5 years, andmean 6 SD body mass index of 27.3 6 4.86. Fourhundred eleven participants were from the originalFramingham Heart Study (21.4%), 496 reported pain onmost days in the past month (25.8%), 376 had radio-graphic knee OA (19.6%), and 200 had symptomaticknee OA (10.4%).

Nonparticipants were similar to participants withregard to age, body mass index, and sex. However, theproportion from the original Framingham Heart Studywas smaller for participants than nonparticipants (21.4%versus 31.2%; P 5 0.001), and there was lower preva-lence of radiographic (19.6% versus 31.9%; P 5 0.001)and symptomatic (10.4% versus 13.9%; P 5 0.106) OAin participants.

Randomization of participants into developmentand validation samples generated groups that werecomparable in terms of age (61.4 years versus 61.1 years;P 5 0.5), body mass index (27.1 versus 27.4; P 5 0.2), sex(54.1% female versus 58.0% female; P 5 0.09), propor-tion from the original Framingham Heart Study (21.0%versus 21.8%; P 5 0.7), and radiographic knee OA(19.1% versus 20.2%; P 5 0.5). However, the validationsample had a slightly higher prevalence of symptomaticknee OA (9.0% versus 11.8%; P 5 0.04).

1108 LAVALLEY ET AL

Page 5: Problems in the development and validation of questionnaire-based screening instruments for ascertaining cases with symptomatic knee osteoarthritis: The Framingham Study

Univariate associations of questions with symp-tomatic knee OA. Associations of questionnaire itemswith symptomatic knee OA in the development sampleare presented in Table 1. The percentage of positiveresponses to each item and the sensitivity, specificity,positive predictive value, and odds ratio of associationwith symptomatic knee OA are shown. Except forquestions 9iia (P 5 0.006) and 9iiic (P 5 0.080), all of the

odds ratios are different from 1 at the 0.001 significancelevel.

Fewer than 5 subjects reported extreme values ofpain on most days (question 6, Table 1), early morningstiffness (question 9i), and stiffness after sitting, lying, orresting later in the day (question 9ii). For these vari-ables, the extreme and severe responses were pooled.Simple yes/no versions of the questions about pain on

Figure 1. Sensitive (A), specific (B), and efficient (C) screeninginstruments for osteoarthritis (OA) of the knee. SxKOA 5 symptom-atic knee OA; PPV 5 positive predictive value of the instrument in thevalidation sample. Numbers in parentheses indicate the questionnumber shown in Table 1. Numbers on the left of the vertical barsindicate the number of subjects without symptomatic knee OA;numbers on the right indicate the number of subjects with symptomaticknee OA. All excluded subjects had missing answers on relevantquestionnaire items.

QUESTIONNAIRE SCREENING FOR KNEE OA 1109

Page 6: Problems in the development and validation of questionnaire-based screening instruments for ascertaining cases with symptomatic knee osteoarthritis: The Framingham Study

most days (question 6) and ever having pain lasting atleast a month (question 4) were found to perform betterthan using the response levels reflecting pain severity. Inaddition, the yes/no version of the item for physiciandiagnosis of arthritis (question 2) had a stronger associ-ation with symptomatic knee OA than a version (ques-tion not shown) with more levels incorporating followupquestions to determine the type of arthritis. Only thesimple yes/no versions of these items were included infurther analyses.

Development and properties of screening instru-ments. We derived 3 screening instruments with a rangeof sensitivity and specificity for symptomatic knee OA:the sensitive instrument is presented in Figure 1A, thespecific instrument in Figure 1B, and the efficient instru-ment in Figure 1C. All instruments grew from a primaryquestionnaire item, “During the last month, did youhave any knee pain or discomfort when walking 2–3blocks (one-fourth of a mile)” (question 7d, Table 1),which is denoted as “Walking Pain?” in Figure 1. Theefficient instrument consisted of this question alone. Thesensitive instrument used questionnaire items 9iii and 12in Table 1, which is denoted as “Duration of Stiffness?”and “Pain Twice?” in Figure 1A, to reduce the numberof subjects incorrectly classified as not having symptom-atic knee OA. The specific instrument used question 2 inTable 1, which is denoted as “Arthritis Diagnosis” inFigure 1B, to reduce the number of subjects withoutsymptomatic OA who might be classified as having thedisease.

Figures 1A–C show the numbers of subjects inthe validation sample with and without symptomaticknee OA at each stage of the screening process. Sensi-tivity, specificity, positive predictive value, and accuracyare shown at the bottom of the figures, along with 95%confidence intervals. Sensitivity ranges were from 46.2%to 84.2%, specificity from 72.8% to 94.1%, positivepredictive value from 30.5% to 52.1%, and accuracy

from 74.2% to 88.2%. If we were to use these instru-ments as the sole assessment of symptomatic OA, thesensitive and efficient instruments would bias the oddsratio from 2.0 to 1.19 or 1.20, respectively, while thespecific instrument would bias the odds ratio to 1.34.Thus, because of misclassification due to the screeninginstruments, risk factors for symptomatic OA would beharder to detect.

Knee symptoms and radiographic OA in false-positive and true-negative OA cases. Knee symptomprevalence in subjects who were false positive and inthose who were true negative is shown in Table 2. Theprevalence and odds ratios for radiographic knee OAaccording to false positive/true negative status is alsoshown in Table 2. We found that subjects withoutsymptomatic knee OA who screened positive on any ofthe instruments were more likely to have one of thecomponents of symptomatic knee OA (knee pain onmost days or radiographic OA) than those who screenednegative. Low values for the positive predictive valueand specificity are partly due to the detection of subclin-ical disease in subjects without full symptomatic kneeOA.

Effect of age on screening instruments. Table 3shows the sensitivity, specificity, positive predictivevalue, and accuracy for screening instruments by agegroup. The group that was .60 years old had a 4 timeshigher prevalence of disease, and in this group, allinstruments had positive predictive values that were 2 or3 times greater than those in the group that was #60years old. Sensitivity also was higher in the older groupthan in the younger group. However, specificity andaccuracy were slightly lower in the older group.

Odds ratios showing the bias due to misclassifi-cation are also included in Table 3. While odds ratios forthe older group were less biased than those for theyounger group, they were much lower than the truevalue of 2.0. The largest odds ratio, 1.4 for the specific

Table 2. Association of true-negative or false-positive status with knee pain on most days and withradiographic knee OA*

Outcome Instrument

Prevalence infalse-positive

subjects

Prevalence intrue-negative

subjects

Odds ratio ofassociation(95% CI)

Knee pain Sensitive 38.3 11.2 4.93 (3.31–7.34)Specific 60.0 17.5 7.07 (2.83–17.64)Efficient 41.6 14.5 4.19 (2.72–6.46)

Radiographic OA Sensitive 21.9 11.8 2.10 (1.36–3.23)Specific 25.5 14.4 1.97 (0.70–5.55)Efficient 23.9 13.2 2.06 (1.27–3.37)

* OA 5 osteoarthritis; 95% CI 5 95% confidence interval.

1110 LAVALLEY ET AL

Page 7: Problems in the development and validation of questionnaire-based screening instruments for ascertaining cases with symptomatic knee osteoarthritis: The Framingham Study

instrument, is below our cutoff of 1.5 for acceptableperformance.

Findings of sensitivity analysis: screening forradiographic OA. In general, the instruments had lowersensitivity and accuracy as screening instruments forradiographic OA, but had slightly enhanced specificityand improved positive predictive values due to the highprevalence of radiographic OA. Using the sensitiveinstrument to screen for radiographic OA gave a sensi-tivity of 62.4%, specificity of 75.0%, positive predictivevalue of 45.7%, and accuracy of 71.9%. Using thespecific instrument to screen for radiographic OA, wefound a sensitivity of 30.0%, specificity of 95.5%, posi-tive predictive value of 69.2%, and accuracy of 79.0%.The efficient instrument had a sensitivity of 39.9%,specificity of 86.7%, positive predictive value of 50.3%,and accuracy of 74.9%.

Findings of sensitivity analysis: use of symptomsat time of questionnaire. Of 496 subjects reporting kneepain on most days at the Framingham examination, 215did not report pain on most days on the screeningquestionnaire. An additional 194 who did not reportpain at the examination reported it on the questionnaire.Using a positive response to this item on the question-naire together with radiographic OA at the time of theFramingham examination as the criteria for symptom-atic OA resulted in improvements of the sensitivity andpositive predictive value of 5–10% and improvements of

2–3% in the specificity and accuracy. The sensitiveinstrument had a sensitivity of 95.1%, specificity of74.8%, positive predictive value of 35.0%, and accuracyof 74.8%. The specific instrument had a sensitivity of56.0%, specificity of 95.4%, positive predictive value of61.5%, and accuracy of 90.8%. The efficient instrumenthad a sensitivity of 69.3%, specificity of 86.9%, positivepredictive value of 41.4%, and accuracy of 84.9%.

Odds ratios showing the effect of misclassifica-tion had reduced bias, with values of 1.27, 1.49, and 1.31for the sensitive, specific, and efficient instruments,respectively. The specific tree falls just short of ourcutoff for acceptable bias in the odds ratio.

DISCUSSION

We developed and tested 3 screening instru-ments, which, on the basis of responses to 1–4 questions,can be used to ascertain cases of symptomatic knee OA.Each of these instruments starts with a question aboutpain while walking and adds further questions to in-crease either the sensitivity or the specificity. Whilethese instruments allow investigators to screen for symp-tomatic knee OA, all 3 permit too much misclassificationto entirely replace radiography in the screening process.The hope for a questionnaire that can accurately identifysubjects with knee OA for survey or clinical trial pur-poses does not appear to be realistic.

On embarking on this analysis, we were con-cerned that having the same knee symptom question inthe screening questionnaire as was used to characterizesymptomatic knee OA in the Framingham examinationswould create a strong association between symptomaticOA and that screening question, which would largelydrive the results. In fact, we found that a differentquestion, adapted from the Western Ontario and Mc-Master Universities Osteoarthritis Index (25), was bestable to detect subjects with symptomatic OA and wasused as the starting point for all of the instruments. Thisconsistency between instruments of varying sensitivityand specificity provides some validity for their use inscreening for symptomatic knee OA. Further validationof these instruments comes from their ability to detectradiographic OA irrespective of the presence of symp-toms.

The sensitivity and specificity for all of the instru-ments we developed was not optimal. The most sensitiveinstrument detected 84% of cases but misclassified 27%of people without knee OA. Conversely, the most spe-cific instrument generated only 6% false-positive cases,but detected only 46% of people with OA. Use of these

Table 3. Performance of screening instruments for symptomaticknee OA in sensitivity analyses done in the validation sample*

Age #60 Age .60

No. of Subjects 466 464% with symptomatic knee OA 4.9 19.6Sensitive instrument

Sensitivity, % 71.4 87.5Specificity, % 74.9 70.2Positive predictive value, % 13.0 42.7Accuracy, % 74.8 73.7Odds ratio biased 1.09 1.32

Specific instrumentSensitivity, % 31.8 50.0Specificity, % 96.2 91.3Positive predictive value, % 30.4 59.2Accuracy, % 93.1 82.9Odds ratio biased 1.22 1.40

Efficient instrumentSensitivity, % 40.9 60.7Specificity, % 87.8 81.7Positive predictive value, % 14.8 45.5Accuracy, % 85.5 77.5Odds ratio biased 1.09 1.28

* See Figures 1A–C for the sensitive, specific, and efficient instru-ments, respectively. OA 5 osteoarthritis.

QUESTIONNAIRE SCREENING FOR KNEE OA 1111

Page 8: Problems in the development and validation of questionnaire-based screening instruments for ascertaining cases with symptomatic knee osteoarthritis: The Framingham Study

instruments as the sole means of determining the pres-ence of OA would considerably bias measures of asso-ciation toward the null. These results have implicationsfor how the screening instruments might be used. Stud-ies in which maximal detection of cases is a priority, suchas recruiting subjects for a clinical trial, should utilize thesensitive instrument as a first-step screen, followed by aconfirmatory test such as a radiograph. Based on a 10%prevalence rate for knee OA, this would result in a 45%reduction in radiography rates compared with strategiesemploying radiography as a first step. In other settingswhere missing true cases does not threaten the validityof the study, such as a risk factor study, the specificinstrument may have greater utility by considerablyreducing radiography rates.

We did find an effect of age on the performanceof these instruments. Among subjects .60 years old, thesensitivity and positive predictive value of all instru-ments were increased compared with the younger sub-jects, but the specificity and accuracy were diminished.Biases in the odds ratios measuring the effect of mis-classification from using the screening instruments wereroughly the same for both age groups.

Only a limited number of questions were in-cluded in the survey that provided the basis for ourscreening instruments. It is possible that better perfor-mance could be obtained from screening instrumentsthat were based on questions that we did not use.However, we did include a large number of questionsabout knee pain, its timing, its association with activities,and diagnosis of arthritis. Many of these questions havebeen used in major studies of OA.

Participants had their OA status assessed at anexamination that preceded the mailing of the question-naire by 3.5 years, on average, and this time lag could bepartly responsible for the poor screening results. Whileradiographic changes occur slowly, the knee pain used inour criteria for symptomatic knee OA can be episodic.When a positive response to the knee pain question onthe questionnaire was used together with radiographicOA at the time of the Framingham examination as thecriteria for symptomatic OA, the performance of thescreening instruments improved. However, even withimprovement in performance, considerable bias wouldresult from using these instruments as a single-stagescreening procedure.

We also considered the impact of new cases ofsymptomatic knee OA occurring between the examina-tion and the questionnaire. A previous study (17) oforiginal Framingham cohort members found the annualincidence of symptomatic knee OA to be 1% for women

and 0.5% for men. Using these rates, we would expect 23new cases of symptomatic knee OA in the validationsample; all would have been incorrectly considered notto have OA in our analysis. If the new cases were nowcorrectly classified as having OA, and if new cases werescreened with the same sensitivity as existing OA cases,how much better could performance of the screeninginstruments be? All instruments would have roughly thesame sensitivity and accuracy as before. However, allwould have increased specificity of ;2%, positive pre-dictive values would be 37%, 64%, and 42%, and thebiased odds ratios due to misclassification would be 1.27,1.46, and 1.27 for the sensitive, specific, and efficientinstruments, respectively. The specific instrument showsthe most substantial improvement, because of the smallnumber of subjects who screen positive with that instru-ment. While these values are improved, they still fallshort of the level of screening accuracy needed forwidespread use of the instruments.

Another explanation for the disappointing per-formance of these instruments is that pain in the kneemight often emanate from the hip rather than from theknee, making our symptom assessment at the Framing-ham examination inaccurate. In an ongoing study, one ofus (DTF), a rheumatologist, examined 120 subjects fromthe community who reported knee pain on most days.All had findings pointing to the knee joint or periartic-ular area, and none had only hip disease with referredpain. This suggests that misclassification of symptomsthat emanate from the hip as knee symptoms is rare insubjects from the community.

In our definition of symptomatic knee OA, weused knee pain on most days, which has been widelyused as a criterion for this disease (16). This was the onlyknee pain question collected routinely as part of theFramingham examinations. It is possible that less restric-tive definitions of symptomatic knee OA, using lessfrequent knee pain together with radiographic OA,would be easier to screen for. However, we suspect thatthese less restrictive definitions would correspond to anentity which is less consistently reflective of OA andwhich would less frequently cause disability or requirecostly care.

In summary, we developed a set of screeninginstruments with a range of sensitivity and specificity inthe detection of symptomatic knee OA among individ-uals across a range of ages. None of these instruments isadequate for use as a single screen in a research study,but they can be fruitfully employed as a first step in a

1112 LAVALLEY ET AL

Page 9: Problems in the development and validation of questionnaire-based screening instruments for ascertaining cases with symptomatic knee osteoarthritis: The Framingham Study

multistage screening process that includes a confirma-tory step, such as imaging.

REFERENCES

1. Guccione AA, Felson DT, Anderson JJ, Anthony JM, Zhang Y,Wilson PW, et al. The effects of specific medical conditions on thefunctional limitations of elders in the Framingham Study. Am JPublic Health 1994;84:351–8.

2. Yelin E. The economics of osteoarthritis. In: Brandt K, DohertyM, Lohmander LS, editors. Osteoarthritis. New York: OxfordUniversity Press; 1998. p. 23–30.

3. McAlindon TE, Cooper C, Kirwan JR, Dieppe PA. Knee pain anddisability in the community. Br J Rheumatol 1992;31:189–92.

4. Felson DT, Zhang Y. An update on the epidemiology of knee andhip osteoarthritis with a view to prevention. Arthritis Rheum1998;41:1343–55.

5. Felson DT. Preventing knee and hip osteoarthritis. Bull RheumDis 1998;47:1–4.

6. Spector TD, Hochberg MC. Methodological problems in theepidemiological study of osteoarthritis. Ann Rheum Dis 1994;53:143–6.

7. Sahyoun NR, Brett KM, Hochberg MC, Pamuk ER. Estrogenreplacement therapy and incidence of self-reported physician-diagnosed arthritis. Prev Med 1999;28:458–64.

8. Cooper C, McAlindon T, Coggon D, Egger P, Dieppe P. Occupa-tional activity and osteoarthritis of the knee. Ann Rheum Dis1994;53:90–3.

9. O’Reilly SC, Muir KR, Doherty M. Knee pain and disability in theNottingham community: association with poor health status andpsychological distress. Br J Rheumatol 1998;37:870–3.

10. Hopman-Rock M, Odding E, Hofman A, Kraaimaat FW, BijlsmaJW. Differences in health status of older adults with pain in the hipor knee only and with additional mobility restricting conditions.J Rheumatol 1997;24:2416–23.

11. March LM, Schwarz JM, Carfrae BH, Bagge E. Clinical validationof self-reported osteoarthritis. Osteoarthritis Cartilage 1998;6:87–93.

12. Oliveria SA, Felson DT, Klein RA, Reed JI, Walker AM. Estrogenreplacement therapy and the development of osteoarthritis. Epi-demiology 1996;7:415–9.

13. Ledingham J, Regan M, Jones A, Doherty M. Factors affectingradiographic progression of knee osteoarthritis. Ann Rheum Dis1995;54:53–8.

14. O’Reilly SC, Muir KR, Doherty M. Screening for pain in kneeosteoarthritis: which question? Ann Rheum Dis 1996;55:931–3.

15. Liang MH, Meenan RF, Cathcart ES, Schur PH. A screeningstrategy for population studies in systemic lupus erythematosus:series design. Arthritis Rheum 1980;23:153–7.

16. Altman R, Asch E, Bloch D, Bole G, Borenstein D, Brandt K, etal. Development of criteria for the classification and reporting of

osteoarthritis: classification of osteoarthritis of the knee. ArthritisRheum 1986;29:1039–49.

17. Felson DT, Zhang Y, Hannan MT, Naimark A, Weissman BN,Aliabadi P, et al. The incidence and natural history of kneeosteoarthritis in the elderly: the Framingham Osteoarthritis Study.Arthritis Rheum 1995;38:1500–5.

18. Felson DT, Couropmitree NN, Chaisson CE, Hannan MT, ZhangY, McAlindon TE, et al. Evidence for a Mendelian gene in asegregation analysis of generalized radiographic osteoarthritis: theFramingham Study. Arthritis Rheum 1998;41:1064–71.

19. Felson DT, McAlindon TE, Anderson JJ, Naimark A, WeissmanBW, Aliabadi P, et al. Defining radiographic osteoarthritis for thewhole knee. Osteoarthritis Cartilage 1997;5:241–50.

20. Feinleib M, Garrison RJ, Stallones L, Kannel WB, Castelli WP,McNamara PM. A comparison of blood pressure, total cholesteroland cigarette smoking in parents in 1950 and their children in1970. Am J Epidemiol 1979;110:291–303.

21. Kellgren JK, Lawrence JS. Radiological assessment of osteoarthri-tis. Ann Rheum Dis 1957;16:494–501.

22. Hochberg MC, Lawrence RC, Everett DF, Cornoni-Huntley J.Epidemiologic associations of pain in osteoarthritis of the knee:data from the National Health and Nutrition Examination Surveyand the National Health and Nutrition Examination-I Epidemio-logic Follow-up Survey. Semin Arthritis Rheum 1989;18:4–9.

23. Felson DT, Naimark A, Anderson J, Kazis L, Castelli W, MeenanRF. The prevalence of knee osteoarthritis in the elderly: theFramingham Osteoarthritis Study. Arthritis Rheum 1987;30:914–8.

24. Cicuttini FM, Baker J, Hart DJ, Spector TD. Association of painwith radiological changes in different compartments and views ofthe knee joint. Osteoarthritis Cartilage 1996;4:143–7.

25. Bellamy N, Buchanan WW, Goldsmith CH, Campbell J, Stitt LW.Validation study of WOMAC: a health status instrument formeasuring clinically important patient relevant outcomes to anti-rheumatic drug therapy in patients with osteoarthritis of the hip orknee. J Rheumatol 1988;15:1833–40.

26. Chatfield C. Model uncertainty, data mining, and statistical infer-ence. J Roy Stat Soc Ser A 1995;158:419–66.

27. Harrell FE Jr, Lee KL, Mark DB. Multivariable prognosticmodels: issues in developing models, evaluating assumptions andadequacy, and measuring and reducing errors. Stat Med 1996;15:361–87.

28. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classificationand regression trees. Pacific Grove (CA): Wadsworth; 1984.

29. Steinberg D, Colla P. CART: a supplemental module for SYS-TAT. San Diego (CA): Salford Systems; 1995.

30. Bloch DA, Moses LE, Michel BA. Statistical approaches toclassification: methods for developing classification and othercriteria rules. Arthritis Rheum 1990;33:1137–44.

31. Fletcher RH, Fletcher SW, Wagner EH. Clinical epidemiology:the essentials. Baltimore (MD): Williams & Wilkins; 1982.

32. Barron BA. The effects of misclassification on the estimation ofrelative risk. Biometrics 1977;33:414–8.

QUESTIONNAIRE SCREENING FOR KNEE OA 1113