1103.full.pdf

5
CONCISE REPORT Reliability of the Southampton examination schedule for the diagnosis of upper limb disorders in the general population K Walker-Bone, P Byng, C Linaker, I Reading, D Coggon, K T Palmer, C Cooper ............................................................................................................................. Ann Rheum Dis 2002;61:1103–1106 Background: Epidemiological research in the field of soft tissue neck and upper limb disorders has been hampered by the lack of an agreed system of diagnostic classification. In 1997, a United Kingdom workshop agreed consensus definitions for nine of these conditions. From these criteria, an examination schedule was developed and validated in a hospital setting. Objective: To investigate the reliability of this schedule in the general population. Methods: Ninety seven adults of working age reporting recent neck or upper limb symptoms were invited to attend for clinical examination consisting of inspection and palpation of the upper limbs, measurement of active and passive ranges of motion, and clinical provocation tests. A doctor and a trained research nurse examined each patient separately, in random order and blinded to each other’s findings. Results: Between observer repeatability of the schedule was generally good, with a median κ coefficient of 0.66 (range 0.21 to 0.93) for each of the specific diagnoses considered. Conclusion: As expected, the repeatability of tests is poorer in the general population than in the hospital clinic, but the Southampton examination schedule is sufficiently reproducible for epidemiological research in the general population. M usculoskeletal disorders of the upper limb and neck are common in people of working age and costly in terms of time lost from work. 1 Research into their causes, management, and prevention is thus important. How- ever, investigation in this field has been hampered by the lack of an agreed system of case classification. 2 As a consequence, differences in taxonomy and the use of diagnostic labels have persisted, despite several attempts at standardisation. 3–9 More- over, even the standardised systems that have been developed have been criticised on methodological grounds, in particular for failing to show satisfactory repeatability and validity. 10 To consider these concerns, a multidisciplinary British group was convened in 1997 (the Birmingham workshop) with the objective of deriving consensus criteria for the more common upper limb disorders. 11 Based on these criteria, we have developed a standardised examination protocol and evaluated its performance in 88 patients attending hospital rheumatology and orthopaedic outpatient clinics with soft tissue disorders of the upper limb. The new examination sys- tem proved to be repeatable between observers (trained research nurses), and had acceptable diagnostic accuracy when compared with the clinical diagnosis of a physician as the reference standard. 12 However, our hospital based study population would have overrepresented the more severe and clear cut cases of soft tissue musculoskeletal complaints. The objective of this study therefore, was to evaluate the utility of the new examination protocol in the general population, where abnormal findings would be expected to be more sub- tle and less frequent. METHODS The clinical examination was performed in a sample of men and women aged 25–64 years who had participated in a popu- lation based cross sectional survey of the prevalence of neck and upper limb complaints. 13 All 6660 men and women regis- tered with a general practice in Southampton were sent a postal questionnaire about recent pain in the neck or upper limb. Everyone who indicated that they had experienced neck or upper limb pain lasting a day or longer, or numbness or tin- gling lasting three minutes or longer, in the preceding week (2162 of 3991 respondents) was invited to attend for an inter- view and physical examination. Among those who attended (1334 patients (62%)), a group of 97 (7%) were asked if they were willing to be examined on a second occasion. The sample of 97 was selected on the basis of attendance on a day of the week when two observers were present (but all working days were used equally). All who were invited agreed to participate. Each patient was examined independently by a nurse and a rheumatologist, at an interval of a few minutes. Examinations were carried out in random order with each observer unaware of the other’s findings. The same rheumatologist examined every patient and the study was designed so that each of two trained research nurses performed about half of the examina- tions. Each examination took about 15 minutes to perform. The between observer repeatability for eliciting items in the physical examination was assessed by calculating a Cohen’s κ coefficient for categorical variables, 14 and mean differences with limits of agreement for continuous variables. 15 The repeatability of the diagnoses derived from the physical find- ings of each examiner was also assessed. RESULTS Ninety seven patients were included in the study (49 examined by nurse 1 and 48 by nurse 2). The mean age of par- ticipants was 48 years (range 29–65 years) and 59 (61%) were women. The prevalence of abnormal physical signs was low in this population sample. The most frequent abnormality was the presence of Heberden’s nodes, noted in 105 hands in total (by either or both observers), with the second being Dupuytren’s contracture, found in only 35 hands. Table 1 sum- marises the between observer repeatability with which the two nurses and the rheumatologist elicited individual observations (symptoms and physical signs). κ Coefficients varied widely between different elements of the schedule (-0.03 to 0.94). Values were greater than 0.50 for shoulder pain (κ=0.94); anterior shoulder pain (κ=0.66); acromiocla- vicular joint pain (κ=0.56); shoulder pain on resisted elbow 1103 www.annrheumdis.com group.bmj.com on December 24, 2014 - Published by http://ard.bmj.com/ Downloaded from

Transcript of 1103.full.pdf

  • CONCISE REPORT

    Reliability of the Southampton examination schedule forthe diagnosis of upper limb disorders in the generalpopulationK Walker-Bone, P Byng, C Linaker, I Reading, D Coggon, K T Palmer, C Cooper. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Ann Rheum Dis 2002;61:11031106

    Background: Epidemiological research in the field of softtissue neck and upper limb disorders has been hamperedby the lack of an agreed system of diagnosticclassification. In 1997, a United Kingdom workshopagreed consensus definitions for nine of these conditions.From these criteria, an examination schedule wasdeveloped and validated in a hospital setting.Objective: To investigate the reliability of this schedule inthe general population.Methods: Ninety seven adults of working age reportingrecent neck or upper limb symptoms were invited to attendfor clinical examination consisting of inspection andpalpation of the upper limbs, measurement of active andpassive ranges of motion, and clinical provocation tests. Adoctor and a trained research nurse examined eachpatient separately, in random order and blinded to eachothers findings.Results: Between observer repeatability of the schedulewas generally good, with a median coefficient of 0.66(range 0.21 to 0.93) for each of the specific diagnosesconsidered.Conclusion: As expected, the repeatability of tests ispoorer in the general population than in the hospital clinic,but the Southampton examination schedule is sufficientlyreproducible for epidemiological research in the generalpopulation.

    Musculoskeletal disorders of the upper limb and neckare common in people of working age and costly interms of time lost from work.1 Research into theircauses, management, and prevention is thus important. How-ever, investigation in this field has been hampered by the lackof an agreed system of case classification.2 As a consequence,differences in taxonomy and the use of diagnostic labels havepersisted, despite several attempts at standardisation.39 More-over, even the standardised systems that have been developedhave been criticised on methodological grounds, in particularfor failing to show satisfactory repeatability and validity.10

    To consider these concerns, a multidisciplinary Britishgroup was convened in 1997 (the Birmingham workshop)with the objective of deriving consensus criteria for the morecommon upper limb disorders.11 Based on these criteria, wehave developed a standardised examination protocol andevaluated its performance in 88 patients attending hospitalrheumatology and orthopaedic outpatient clinics with softtissue disorders of the upper limb. The new examination sys-tem proved to be repeatable between observers (trainedresearch nurses), and had acceptable diagnostic accuracywhen compared with the clinical diagnosis of a physician asthe reference standard.12 However, our hospital based studypopulation would have overrepresented the more severe and

    clear cut cases of soft tissue musculoskeletal complaints. Theobjective of this study therefore, was to evaluate the utility ofthe new examination protocol in the general population,where abnormal findings would be expected to be more sub-tle and less frequent.

    METHODSThe clinical examination was performed in a sample of menand women aged 2564 years who had participated in a popu-lation based cross sectional survey of the prevalence of neckand upper limb complaints.13 All 6660 men and women regis-tered with a general practice in Southampton were sent apostal questionnaire about recent pain in the neck or upperlimb. Everyone who indicated that they had experienced neckor upper limb pain lasting a day or longer, or numbness or tin-gling lasting three minutes or longer, in the preceding week(2162 of 3991 respondents) was invited to attend for an inter-view and physical examination. Among those who attended(1334 patients (62%)), a group of 97 (7%) were asked if theywere willing to be examined on a second occasion. The sampleof 97 was selected on the basis of attendance on a day of theweek when two observers were present (but all working dayswere used equally). All who were invited agreed to participate.Each patient was examined independently by a nurse and a

    rheumatologist, at an interval of a few minutes. Examinationswere carried out in random order with each observer unawareof the others findings. The same rheumatologist examinedevery patient and the study was designed so that each of twotrained research nurses performed about half of the examina-tions. Each examination took about 15 minutes to perform.The between observer repeatability for eliciting items in the

    physical examination was assessed by calculating a Cohens coefficient for categorical variables,14 and mean differenceswith limits of agreement for continuous variables.15 Therepeatability of the diagnoses derived from the physical find-ings of each examiner was also assessed.

    RESULTSNinety seven patients were included in the study (49examined by nurse 1 and 48 by nurse 2). The mean age of par-ticipants was 48 years (range 2965 years) and 59 (61%) werewomen. The prevalence of abnormal physical signs was low inthis population sample. The most frequent abnormality wasthe presence of Heberdens nodes, noted in 105 hands in total(by either or both observers), with the second beingDupuytrens contracture, found in only 35 hands. Table 1 sum-marises the between observer repeatability with which thetwo nurses and the rheumatologist elicited individualobservations (symptoms and physical signs). Coefficientsvaried widely between different elements of the schedule(0.03 to 0.94). Values were greater than 0.50 for shoulderpain (=0.94); anterior shoulder pain (=0.66); acromiocla-vicular joint pain (=0.56); shoulder pain on resisted elbow

    1103

    www.annrheumdis.com

    group.bmj.com on December 24, 2014 - Published by http://ard.bmj.com/Downloaded from

  • flexion (=0.66); tenderness over the lateral (=0.52) andmedial elbow (=0.64); lateral elbow pain on resisted wristextension (=0.52); medial elbow pain on resisted wrist flex-ion (=0.56); Dupuytrens contracture (=0.69); Heberdensnodes (=0.60); a positive Phalens test (=0.68); and abnor-mal light touch in the thumb (=0.53).Measurements of the range of shoulder and neck move-

    ment showed a high level of concordance between observers,with the most repeatable being internal rotation of the shoul-der (mean difference 0.1o, limits of agreement 0.3o to +0.1o)and the least being abduction of the shoulder (meandifference9o; limits of agreement 11o to 7o).Individual physical findings were combined to yield specific

    diagnoses using the consensus criteria of the Birminghamworkshop.11 Table 2 shows the repeatability of classificationinto these diagnostic categories using the physical findingsfrom the two observers. Coefficients again varied widely(0.21 to 0.93). Poorest agreement was found for the diagnosisof hand/wrist tenosynovitis (=0.21), rotator cuff tendinitis(=0.35), and adhesive capsulitis (=0.39). There was greaterconcordance for the diagnoses of bicipital tendinitis (=0.49),

    medial epicondylitis (=0.66), De Quervains tenosynovitis(=0.66), lateral epicondylitis (=0.75), and carpal tunnelsyndrome (=0.93).The Birmingham consensus criteria for the diagnosis of

    rotator cuff tendinitis and adhesive capsulitis specify that painmust be in the deltoid region in the presence of characteris-tic physical signs.11 Table 1 shows that the reproducibility ofreporting pain in the deltoid region was poorer than that forthe whole shoulder. We therefore reanalysed our data usingthe more general criterion of shoulder pain in the presenceof the characteristic physical signs of rotator cuff tendinitisand adhesive capsulitis (table 2). The coefficient for adhesivecapsulitis improved to 0.66 and that for rotator cuff tendinitisimproved to 0.46.Analysis was repeated separately for each nurse versus the

    rheumatologist (data not presented), but no systematicdifferences were found. In addition, analyses which adjustedfor the nesting of observations within patients were con-ducted, but thesemade only slight differences to the estimated values (

  • DISCUSSIONSound epidemiological research depends on a reliable systemof defining and classifying cases. This is particularly problem-atic for soft tissue rheumatism, given the absence of a clear cutgold standard, and those classification schemes that have beendeveloped so far39 have been criticised on methodologicalgrounds.10

    Prerequisite criteria for a satisfactory scheme include faceand content validity, repeatability, and predictive validity.10 Inthis respect the Southampton examination schedule has sev-eral advantages. Firstly, it was developed after a workshop ofexperts from many disciplines, and is supported by clinicalconsensus.11 (Similar criteria to those of the Birminghamworkshop have also been developed in The Netherlands,thereby widening the extent of consensus.16) Secondly, it hasbeen tested previously in the hospital setting and was found tobe repeatable, with an acceptable diagnostic accuracy relativeto a specialist clinics independent opinion.12 Thirdly, the prac-tical feasibility of the schedule is well established. To this listmay now be added the evidence presented in this paper on itsrepeatability in a community setting.Generally, the between observer repeatability was found to

    be poorer in the general population than in the hospital clinic.According to Fleiss criteria, a of 0.2 denotes fair, and a of0.4 to 0.7 good agreement.14 On this basis, 18 of the variablesin table 1 had good reproducibility, but for 15 it was only fair.Those who present in secondary care with specific upper limbdisorders are likely to represent a severely affected group andit might be hypothesised that physical signs in the generalpopulation would be less clear cut and more difficult to detectreliably. However, the signs were not all of equal importance todiagnosis, and at this level the performance of the schedulewas better (>0.46 for seven of eight diagnoses).The prevalence of abnormality was low in this population

    samplea situation in which it is easier for observers to agreeby chance alone. The statistic is constructed to correct forthis effect, and demands greater agreement between observersfor a given value when the prevalence is low (or high) thanwhen measuring the repeatability of an abnormality with aprevalence near 50%.The study sample was drawn from patients who attended a

    first assessment in a community survey of upper limbdisorders. Everyone who was asked to undergo a secondexamination agreed to do so (participation rate 100%), but thesampling frame represented a subset of those to whom aquestionnaire had been mailed originally. If only severelyaffected patients agreed to attend the first assessment(spectrum bias), then this circumstance would tend to favouragreement on observations between observers. But, as judgedby their postal response, patients from the community survey

    who attended for interview were no more likely to havereported disabling pain than those who were invited but didnot attend (data not presented), so biased assessment ofbetween observer repeatability is unlikely to have arisen inthis way.The advantages of the schedule need to be considered

    alongside certain limitations. Our findings on repeatabilityfollowed a period of extensive training (about 1012 sessions)and periodic checking (612 monthly between observer stud-ies) of the research nurses, to promote consistency betweenobservers. It should be remembered that agreement betweenobservers is not a property that is fixed but that it can beimproved by careful attention to induction and refreshertraining. In some areas of weak repeatability, such as diagno-sis of tenosynovitis at the wrist, the need for better trainingwas highlighted by our data. The difficulty was not found to bein the technical procedure of performing the tests, but in therecognition of abnormal findings. Both nurses had performedsignificant numbers of examinations, but had seen few casesof wrist tenosynovitis, so that they found the physical signs ofswelling of tendon sheaths and pain on resisted movementdifficult to discriminate. In this context, it is noteworthy thatnurses in training should be given enough opportunity topractise all the tests in sufficient numbers of abnormal cases.Also, there were systematic differences between the nursesand rheumatologist for some outcomes (for example, therewere no cases of Dupuytrens contracture in which the nursemade the diagnosis and the rheumatologist did not, but theconverse happened; on the other hand, the nurse was morelikely to identify a positive Phalens test than the rheumatolo-gist). These circumstances again point to the need for carefuland sufficient training.To derive a workable examination protocol, the consensus

    statements of the Birmingham workshop were converted intoa detailed diagnostic algorithm that defined anatomicalregions and specified how to perform clinical tests. To do this,selections were made (from among a range of possiblechoices) aiming to maximise agreement and repeatability, butwhich were compatible with the workshops judgment. How-ever, our data from the shoulder illustrate the potential pitfallsof this approach. The consensus definitions proposed thatrotator cuff tendinitis and adhesive capsulitis were associatedwith pain in the deltoid region. Accordingly, we developed amannequin defining the deltoid region based on the epauletposition of the deltoid muscle. However, we found thatpatients in the general population were not reliably able tolocalise their shoulder pain to such a specific site, even whenexaminations were carried out on the same day, a fewmoments apart. It has been our observation that betweenobserver repeatability of these diagnoses was considerablyimproved by loosening the criteria to shoulder pain. Interest-ingly, most other classification systems have employed the cri-terion of shoulder pain for adhesive capsulitis35 and rotatorcuff tendinitis.35 8 9 Although deltoid pain was proposed asthe anatomical site of rotator cuff tendinitis in twoclassification systems,6 7 neither study reported the reliabilityof this criterion either within or between observers.On balance, although the repeatability of the diagnostic

    schedule is poorer in the general population than in the hos-pital clinic, the Southampton examination schedule seems tobe sufficiently reproducible for epidemiological research intosoft tissue musculoskeletal disorders of the neck and upperlimb in the community. The importance of making a diagno-sis lies in its utility in distinguishing groups of patients whorequire different case management, and diagnoses that carrydifferent prognoses or different associations with modifiablerisk factors. The predictive validity of a classification scheme isthus a critical test of its usefulness, and this now needs to beevaluated for the Southampton schedule.

    Table 2 Between observer repeatability of clinicaldiagnoses using the schedule

    Diagnosis No.

    Nurse v rheumatologist

    / +/ /+ +/+

    Adhesive capsulitis 194 179 5 6 4 0.39Rotator cuff tendinitis 194 181 3 7 3 0.35Bicipital tendinitis 194 191 1 1 1 0.49Lateral epicondylitis 194 189 2 0 3 0.75Medial epicondylitis 194 190 2 0 2 0.66De Quervains tenosynovitis 194 192 0 1 1 0.66Carpal tunnel syndrome 194 168 2 1 23 0.93Tenosynovitis 194 186 2 5 1 0.21

    Revised definition of shoulder pain:*Adhesive capsulitis 194 166 9 4 15 0.66Rotator cuff tendinitis 194 173 8 6 7 0.46

    *See text.

    Southampton examination schedule for the diagnosis of upper limb disorders 1105

    www.annrheumdis.com

    group.bmj.com on December 24, 2014 - Published by http://ard.bmj.com/Downloaded from

  • ACKNOWLEDGEMENTSWe are grateful to Vanessa Cox and Ken Cox, who providedcomputer support and to the staff at the MRC Unit,Southampton who helped in data handling. This study wassupported by a grant from the Health and Safety Executiveand a project grant PO552, from the Arthritis Research Cam-paign. Infrastructure support was provided by the MedicalResearch Council. KW-B was supported by an ARC ClinicalResearch Fellowship and IR by a fellowship grant from theColt Foundation.

    . . . . . . . . . . . . . . . . . . . . .Authors affiliationsK Walker-Bone, P Byng, C Linaker, I Reading, D Coggon,K T Palmer, C Cooper, MRC Environmental Epidemiology Unit,University of Southampton, UK

    Correspondence to: Professor C Cooper, MRC EnvironmentalEpidemiology Unit, Southampton General Hospital, Tremona Road,Southampton, SO16 6YD, UK; [email protected].

    Accepted 7 May 2002

    REFERENCES1 Bureau of Labor Statistics. Occupational injuries and illnesses in the

    United States by industry1990. Bulletin 2399. Washington DC: USDepartment of Labor, 1992:426.

    2 Palmer K, Coggon D, Cooper C, Doherty M. Work related upper limbdisorders: getting down to specifics. Ann Rheum Dis 1998;57:4456.

    3 Waris P, Kuorinka I, Kurppa K, Luopajarvi T, Virolainen M, Pesonen K,et al. Epidemiologic screening of occupational neck and upper limbdisorders. Scand J Work Environ Health 1979;S3:2538.

    4 Viikari-Juntura E. Neck and upper limb disorders amongslaughterhouse workers. Scand J Work Environ Health 1983;9:28390.

    5 Ohlsson K, Attewell RG, Johnsson B, Ahlm A, Skerfving S. Anassessment of neck and upper limb upper extremity disorders byquestionnaire and physical examination. Ergonomics 1994;37:8917.

    6 Ranney D, Wells R, Moore A. Upper limb musculoskeletal disorders inhighly repetitive industries: precise anatomical physical findings.Ergonomics 1995;38:140823.

    7 Silverstein BA. The prevalence of upper extremity cumulative traumadisorders in industry, [thesis]. The University of Michigan, OccupationalHealth and Safety, 1985.

    8 McCormack RR, Inman RD, Wells A, Berntsen A, Imbus HR. Prevalenceof tendinitis and related disorders of the upper extremity in amanufacturing workforce. J Rheumatol 1990;17:95864.

    9 Kaergaard A, Andersen JH. Musculoskeletal disorders of the neck andshoulders in female sewing machine operators. Occup Environ Med2000;57:52834.

    10 Buchbinder R, Goel V, Bombardier C, Hogg-Johnson S. Classificationsystems of soft tissue disorders of the neck and upper limb: do they satisfymethodological guidelines? J Clin Epidemiol 1996;49:1419.

    11 Harrington JM, Carter JT, Birrell L, Gompertz D. Surveillance casedefinitions for work related upper limb pain syndromes. Occup EnvironMed 1998;55:26471.

    12 Palmer K, Walker-Bone K, Linaker C, Reading I, Kellingray S, CoggonD, et al. The Southampton examination schedule for the diagnosis ofmusculoskeletal disorders of the neck and upper limb. Ann Rheum Dis2000;59:511.

    13 Walker-Bone, Palmer K, Reading I, Byng P, Shipp A, Linaker C, et al. Apopulation survey of the frequency of specific and non-specific disordersof the neck and upper limbs. Arthritis Rheum 2000;43:S369.

    14 Cohen S, Karmack T, Mermelstein R. A coefficient of agreement fornominal scales. Educational and Psychological Measurement1960;20:3746.

    15 Bland JM, Altman DG. Statistical methods for assessing agreementbetween two methods of clinical measurement. Lancet 1986;i:30710.

    16 Sluiter JK, Rest KM, Frings-Dresen M. Criteria document for evaluatingthe work-relatedness of upper-extremity musculoskeletal disorders. ScandJ Work Environ Health 2001;27(suppl 1):3102.

    1106 Walker-Bone, Byng, Linaker, et al

    www.annrheumdis.com

    group.bmj.com on December 24, 2014 - Published by http://ard.bmj.com/Downloaded from

  • disorders in the general populationschedule for the diagnosis of upper limb Reliability of the Southampton examination

    CooperK Walker-Bone, P Byng, C Linaker, I Reading, D Coggon, K T Palmer and C

    doi: 10.1136/ard.61.12.11032002 61: 1103-1106 Ann Rheum Dis

    http://ard.bmj.com/content/61/12/1103Updated information and services can be found at:

    These include:

    References #BIBLhttp://ard.bmj.com/content/61/12/1103

    This article cites 12 articles, 4 of which you can access for free at:

    serviceEmail alerting

    box at the top right corner of the online article. Receive free email alerts when new articles cite this article. Sign up in the

    CollectionsTopic Articles on similar topics can be found in the following collections

    (1209)Epidemiology

    Notes

    http://group.bmj.com/group/rights-licensing/permissionsTo request permissions go to:

    http://journals.bmj.com/cgi/reprintformTo order reprints go to:

    http://group.bmj.com/subscribe/To subscribe to BMJ go to:

    group.bmj.com on December 24, 2014 - Published by http://ard.bmj.com/Downloaded from