Measurement properties of clinical assessment methods for ... · Measurement Properties of Clinical...

32
American Journal of Medical Genetics Part C (Seminars in Medical Genetics) 175C:116147 (2017) R E S E A R C H R E V I E W Measurement Properties of Clinical Assessment Methods for Classifying Generalized Joint HypermobilityA Systematic Review BIRGIT JUUL-KRISTENSEN,* KAROLINE SCHMEDLING, LIES ROMBAUT, HANS LUND, AND RAOUL H. H. ENGELBERT The purpose was to perform a systematic review of clinical assessment methods for classifying Generalized Joint Hypermobility (GJH), evaluate their clinimetric properties, and perform the best evidence synthesis of these methods. Four test assessment methods (Beighton Score [BS], Carter and Wilkinson, Hospital del Mar, Rotes- Querol) and two questionnaire assessment methods (Five-part questionnaire [5PQ], Beighton Score-self reported [BS-self]) were identied on children or adults. Using the Consensus-based Standards for selection of health Measurement Instrument (COSMIN) checklist for evaluating the methodological quality of the identied studies, all included studies were rated fairor poor.Most studies were using BS, and for BS the reliability most of the studies showed limited positive to conicting evidence, with some shortcomings on studies for the validity. The three other test assessment methods lack satisfactory information on both reliability and validity. For the questionnaire assessment methods, 5PQ was the most frequently used, and reliability showed conicting evidence, while the validity had limited positive to conicting evidence compared with test assessment methods. For BS-self, the validity showed unknown evidence compared with test assessment methods. In conclusion, following recommended uniformity of testing procedures, the recommendation for clinical use in adults is BS with cut-point of 5 of 9 including historical information, while in children it is BS with cut-point of at least 6 of 9. However, more studies are needed to conclude on the validity properties of these assessment methods, and before evidence-based recommendations can be made for clinical use on the bestassessment method for classifying GJH. © 2017 Wiley Periodicals, Inc. KEY WORDS: Beighton tests; Carter and Wilkinson; Rotes-Querol; Hospital del Mar; five-part questionnaire; self-reported; clinimetrics; quality assessment; COSMIN; best evidence synthesis How to cite this article: Juul-Kristensen B, Schmedling K, Rombaut L, Lund H, Engelbert RHH. 2017. Measurement properties of clinical assessment methods for classifying generalized joint hypermo- bilityA systematic review. Am J Med Genet Part C Semin Med Genet 175C:116147. INTRODUCTION Generalized joint hypermobility (GJH) is relatively common, occurring in about 257% of different populations [Remvig et al., 2007b]. Important reasons for this may be the use of many different clinical assessment meth- ods and criteria for classification and interpretation of GJH by these clinical assessment methods [Remvig et al., 2007a,b]. GJH is characterized by an ability to exceed the joints beyond the normal range of motion in multiple joints, either congenital or acquired [Remvig et al., 2011]. Many individuals with GJH are asymptomatic, which also makes it difficult to accurately estimate Dr. Birgit Juul-Kristensen, Associate professor, PT, Department of Sports Sciences and Clinical Biomechanics, Research Unit of Musculoskeletal Function and Physiotherapy, University of Southern Denmark, Odense, Denmark. Karoline Schmedling, M.Sc., PT, Department of Health Sciences, Institute of Occupational Therapy, Physiotherapy and Radiography, Bergen University College, Bergen, Norway. Dr. Lies Rombaut, Postdoctoral researcher, PT, Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium. Dr. Hans Lund, Associate professor, PT, Department of Sports Sciences and Clinical Biomechanics, Research Unit of Musculoskeletal Function and Physiotherapy, University of Southern Denmark, Odense, Denmark; SEARCH (Synthesis of Evidence and Research), Department of Sports Science and Clinical Biomechanics, University of Southern Denmark, Odense, Denmark; Center for Evidence-Based Practice, Bergen University College, Bergen, Norway. Dr. Raoul Engelbert, Department of Rehabilitation, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands; ACHIEVE, Faculty of Health, Center for Applied Research, University of Applied Sciences, Amsterdam, the Netherlands. Conicts of interest: There are no other nancial interests that any of the authors may have, which could create a potential conict of interest or the appearance of a conict of interest with regards to the work. *Correspondence to: Birgit Juul-Kristensen, Research Unit for Musculoskeletal Function and Physiotherapy, Institute of Sports Science and Clinical Biomechanics, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark. E-mail: [email protected] DOI 10.1002/ajmg.c.31540 Article rst published online in Wiley Online Library (wileyonlinelibrary.com). ß 2017 Wiley Periodicals, Inc.

Transcript of Measurement properties of clinical assessment methods for ... · Measurement Properties of Clinical...

American Journal of Medical Genetics Part C (Seminars in Medical Genetics) 175C:116–147 (2017)

R E S E A R C H R E V I E W

Measurement Properties of Clinical AssessmentMethods for Classifying Generalized JointHypermobility—A Systematic ReviewBIRGIT JUUL-KRISTENSEN,* KAROLINE SCHMEDLING, LIES ROMBAUT, HANS LUND,AND RAOUL H. H. ENGELBERT

Dr. Birgit JuFunction and P

Karoline SchUniversity Coll

Dr. Lies RomDr. Hans Lun

Physiotherapy,Clinical BiomeNorway.

Dr. Raoul EnFaculty of Hea

Conflicts ofappearance of

*CorresponBiomechanics,

DOI 10.100Article first

� 2017 Wil

The purpose was to perform a systematic review of clinical assessment methods for classifying Generalized JointHypermobility (GJH), evaluate their clinimetric properties, and perform the best evidence synthesis of thesemethods. Four test assessment methods (Beighton Score [BS], Carter and Wilkinson, Hospital del Mar, Rotes-Querol) and two questionnaire assessmentmethods (Five-part questionnaire [5PQ], Beighton Score-self reported[BS-self]) were identified on children or adults. Using the Consensus-based Standards for selection of healthMeasurement Instrument (COSMIN) checklist for evaluating themethodological quality of the identified studies,all included studies were rated “fair” or “poor.”Most studies were using BS, and for BS the reliability most of thestudies showed limited positive to conflicting evidence, with some shortcomings on studies for the validity. Thethree other test assessment methods lack satisfactory information on both reliability and validity. For thequestionnaire assessment methods, 5PQ was the most frequently used, and reliability showed conflictingevidence, while the validity had limited positive to conflicting evidence comparedwith test assessmentmethods.For BS-self, the validity showed unknown evidence compared with test assessment methods. In conclusion,following recommended uniformity of testing procedures, the recommendation for clinical use in adults is BSwith cut-point of 5 of 9 including historical information, while in children it is BS with cut-point of at least 6 of 9.However, more studies are needed to conclude on the validity properties of these assessment methods, andbefore evidence-based recommendations can be made for clinical use on the “best” assessment method forclassifying GJH. © 2017 Wiley Periodicals, Inc.

KEYWORDS: Beighton tests; Carter andWilkinson; Rotes-Querol; Hospital delMar; five-part questionnaire; self-reported; clinimetrics; qualityassessment; COSMIN; best evidence synthesis

How to cite this article: Juul-Kristensen B, Schmedling K, Rombaut L, Lund H, Engelbert RHH. 2017.Measurement properties of clinical assessment methods for classifying generalized joint hypermo-

bility—A systematic review. Am J Med Genet Part C Semin Med Genet 175C:116–147.

INTRODUCTION

Generalized joint hypermobility (GJH)is relatively common, occurring inabout 2–57% of different populations[Remvig et al., 2007b]. Important

ul-Kristensen, Associate professor,hysiotherapy, University of Southermedling, M.Sc., PT, Department oege, Bergen, Norway.baut, Postdoctoral researcher, PT,d, Associate professor, PT, DepartmUniversity of Southern Denmark, Ochanics, University of Southern Den

gelbert, Department of Rehabilitatilth, Center for Applied Research, Uinterest: There are no other financiala conflict of interest with regards tdence to: Birgit Juul-Kristensen, ResUniversity of Southern Denmark, C2/ajmg.c.31540published online in Wiley Online Lib

ey Periodicals, Inc.

reasons for this may be the use ofmany different clinical assessment meth-ods and criteria for classification andinterpretation of GJH by these clinicalassessment methods [Remvig et al.,2007a,b]. GJH is characterized by an

PT, Department of Sports Sciences and Clinical Bion Denmark, Odense, Denmark.f Health Sciences, Institute of Occupational Thera

Center for Medical Genetics, Ghent University Hospent of Sports Sciences and Clinical Biomechanics, R

dense, Denmark; SEARCH (Synthesis of Evidence andmark, Odense, Denmark; Center for Evidence-Base

on, Academic Medical Center, University of Amsterniversity of Applied Sciences, Amsterdam, the Nethinterests that any of the authors may have, which coo the work.earch Unit for Musculoskeletal Function and Physiotampusvej 55, DK-5230 Odense M, Denmark. E-ma

rary (wileyonlinelibrary.com).

ability to exceed the joints beyond thenormal range of motion in multiplejoints, either congenital or acquired[Remvig et al., 2011]. Many individualswith GJH are asymptomatic, which alsomakes it difficult to accurately estimate

mechanics, Research Unit of Musculoskeletal

py, Physiotherapy and Radiography, Bergen

ital, Ghent, Belgium.esearch Unit of Musculoskeletal Function andResearch), Department of Sports Science and

d Practice, Bergen University College, Bergen,

dam, Amsterdam, the Netherlands; ACHIEVE,erlands.uld create a potential conflict of interest or the

herapy, Institute of Sports Science and Clinicalil: [email protected]

RESEARCH REVIEW AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) 117

the number of people with this condi-tion, as they are not recorded in thehealth system.

When GJH is accompanied withsymptoms, it is defined as a health-relateddisorder, for example, Joint Hypermo-bility Syndrome (JHS) or the Ehlers–Danlos Syndrome—Hypermobile Type(hEDS) with several complications asdescribed below. The two conditions(JHS and hEDS) have very close overlapto the point of being clinically indistin-guishable [Tinkle et al., 2009; Remviget al., 2011], and in the present study it isreferred to as JHS/hEDS. The conditionof JHS/hEDScan bedefined as anunder-and often misdiagnosed heritable con-nective tissue disorder, characterizedgenerally by GJH, complications of jointinstability, musculoskeletal pain, skininvolvement, and reduced quality of life[Rombaut et al., 2010; Castori et al.,2014; Scheper et al., 2016]. Until now,JHS is diagnosed by the Brighton testsand criteria [Grahame et al., 2000], andhEDS by the Villefranche criteria[Beighton et al., 1998], both includingthe Beighton scoring (BS) system of ninetests for assessment of GJH [Beightonet al., 1973].

BS consists of four bilateral testsand one test including low back andlower extremities (first finger opposi-tion, fifth finger extension, elbowextension, knee extension, and backforward bending), with scores rangingfrom 0 to 9. Influencing factors on BSare age, gender, ethnicity, and physicalfitness [Remvig et al., 2007b; Tinkleet al., 2009]. For adults, a cut-point of4/9 for GJH is included in theBrighton criteria for JHS [Grahameet al., 2000], while 5/9 for GJH is thecriteria for hEDS in the Villefranchecriteria [Beighton et al., 1998]. Forchildren, there is no consensus on aspecific cut-point for GJH, but cut-points of 5/9, 6/9, and 7/9 have beensuggested [Jansson et al., 2004]. Aprevious review has listed differenttest assessment methods of which theBeighton score [Beighton et al.,1973] was most frequently used. Thereview concluded that reproducibilityof Beighton or similar tests is good, butthere is lack of evidence for the validity

of this test assessment method [Remviget al., 2007a].

For adults, a cut-point of 4/9for GJH is included in theBrighton criteria for JHS,while 5/9 for GJH is thecriteria for hEDS in theVillefranche criteria. For

children, there is no consensuson a specific cut-point for

GJH, but cut-points of 5/9,6/9, and 7/9 have been

suggested.

Further, also questionnaire assess-ment methods are used for classifyingGJH, among which the five-part ques-tionnaire (5PQ) [Hakim and Grahame,2003; Mulvey et al., 2013]. The 5PQ, sofar used only for adults, consists of fivequestions, including actual and historicalinformation about joint hypermobility(forward bending of the back, first fingeropposition, the ability to amuse friendswith strange body shapes, dislocation ofshoulder/knee, perception of beingdouble-jointed). The 5PQ is claimedto have good reproducibility, in additionto satisfactory sensitivity and specificity[Hakim and Grahame, 2003]. However,clinimetric properties (reliability, differ-ent aspects of validity, and responsive-ness) have not been described fully forBS, 5PQ, or other potential clinicalassessment methods for classifying GJH.

Clear and valid diagnostic clinicalassessment methods and criteria forclassifying GJH with or without symp-toms are essential, both for diagnosingJHS/hEDS and measuring treatmenteffects of JHS/hEDS, in children[Scheper et al., 2013] as well in adults[Palmer et al., 2014; Smith et al., 2014;Scheper et al., 2016]. In summary, there islack of knowledge of clinimetric proper-ties on clinical assessment methods forclassifying GJH. Therefore, the purposeof this study was to perform a systematic

review for identifying the clinical assess-ment methods for classifying GJH, toevaluate their clinimetric properties(reliability and validity), and finally tosummarize the best evidence synthesis ofthese clinical assessment methods.

MATERIALS ANDMETHODS

This systematic review followed thePreferred Reporting Items for System-atic Reviews and Meta-Analysis state-ment guidelines, PRISMA, [Moheret al., 2009] and used the PICOSmethodto present the chosen research questions:Participants (humans with GJH andhealthy controls, ranging from childhoodto adults), Intervention (assessmentmethods for evaluation and classificationof GJH), Comparison (e.g., healthycontrol groups), Outcomes (reliability/validity), and Study design (e.g., reliabil-ity/case control/longitudinal studies).

The overall method used in thisreview can be divided into four steps: (1)Compile an exhaustive list of assessmentmethods for GJH on the basis of aninitial search (Search 1); (2) Additionalsearch for studies including clinimetricsof the identified assessment methodsfrom Search 1 (Search 2); (3) Criticallyappraisal of the methodological qualityof the identified measurement proper-ties in each study; and (4) synthesizing ofthe evidence as “a best evidencesynthesis.”

Selection Criteria

Search 1With restrictions on the date of publi-cation (January 01, 1965 to Decem-ber 31, 2015), humans and English, theincluded articles had to meet thefollowing criteria: (1) be originallypublished in peer-reviewed journalsinvolving human participants; (2) in-clude a clinical assessment method (testor questionnaire) to classify GJH; and (3)be reported in English. Studies wereexcluded if they: (1) contained otheradvanced assessment methods used asprimary assessment method and not as areference assessment; (2) were reviews,abstracts, theses, unpublished studies

118 AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) RESEARCH REVIEW

(“gray literature”); or (3) were animalstudies.

Search 2By using the names of the differentassessment methods found in Search 1,Search 2 was initiated, and the articleswere included if they: (1) explicitlyoutlined a purpose for evaluating clini-metric properties of an assessmentmethod (test or questionnaire) forclassifying GJH; and (2) included at leastone of the clinimetric properties ofreliability, validity, and responsiveness.To avoid confusion in relation to theterminology of clinimetric properties,this study relates to the COSMINterminology, including reliability (reli-ability and measurement error), validity(criterion validity and hypotheses test-ing), and responsiveness [Schellingerh-out et al., 2008; Mokkink et al., 2010].

Search Strategy and Data Sources

Search 1 (production of a list of clinicalassessment methods)The systematic review was performed byelectronic and manual searches in CI-NAHL (n¼ 153), Embase (n¼ 1,027),SportDiscus (n¼ 272), and MEDLINE(n¼ 833). Furthermore, reference lists ofrelevant articles were hand-searched foradditional literature, and the authorsconferred with experts within the fieldof GJH, in order to make sure no relevantarticles would be missing. In each of thefour databases, the following search termswere used for the Electronic Search 1 forproducing a method list: (joint�; hyper-mobility; instability; laxity; general�)AND(evaluation�; rating�; rate�; questionnaire�;test�; scale�; assess�; examin�; observ�;diagnos�; measure�) NOT (fracture�;surgical). The search terms were adjustedto the different databases where necessary.In all databases, the search fields includedtitle, abstract, and keywords.

Search 2 (identifying clinimetric properties)For the Electronic Search 2, using thesame databases as described in Search 1,and with a total of six identified assess-ment methods at this point, a total of 24searches were conducted; one in eachdatabase, on each assessment method.

The following search terms were used,for retrieving studies with clinimetricproperties on each of the six identifiedassessment methods: (psychometric�;clinimetric�; reproducibility; reliability;repeatability; responsiveness; sensitivity;specificity; validity; diagnos�; feasibility).When including the questionnaire as-sessment methods, the terms test� andtool� were left out. See PRISMA flowchart (Fig. 1) for the selection process.

Data Extraction and QualityAssessment

Two authors (KS/BJK) independentlyscreened the titles and abstracts, and agreedupon a final list of assessmentmethods tobeincluded in the current review. If therewere any disagreements, the full paper wasretrieved for detailed assessment, andconsensus was achieved. A third reviewer(RHE) was included if disagreement stillexisted. The handling of data wereperformed with the use of EndNoteWeb(https://www.myendnoteweb.com/), foreasy access and organizing of data. Screen-ing for additional references were per-formed based on the retrieved articles.

Eligible studies for each of theretrieved assessment methods were eval-uated by the Consensus-based Standardsfor the selection of health MeasurementInstruments (COSMIN) checklist forevaluating the methodological qualityon clinimetric properties—reliability,validity, and responsiveness [Mokkinket al., 2010]. The COSMIN checklist iscurrently the only recommended stan-dardized method [Terwee et al., 2012],and has been used in several differentstudies of clinical test assessmentmethods[Larsen et al., 2014; Kroman et al., 2014].The complete COSMIN checklist in-cludes 12 boxes, covering internal con-sistency, reliability, measurement error,validity, and responsiveness. The currentstudy used the reliability and validitydomains, in particular box B—reliability(14 items), box F—hypothesis testing (10items), and box H—criterion validity (7items). After inclusion of studies, thesewere grouped based on the clinimetricproperty assessed, in reliability (intra-,inter-tester and test-retest) and validity(hypothesis testing or criterion validity).

No studies on the responsiveness domainwere obtained.

A compiled list for the assessment ofthe item “minor methodological flaws”as used previously [Larsen et al.,2014] was included (no inclusion of thetarget population, only for reliabilitydomain; only one trial per measure-ment/lack of information on repetition;no random order of investigators/meas-urements; no description of any trainingphase; inadequate description on demo-graphic details). For the item “otherimportant methodological flaws” anadditional list was included (inadequatelydescribed/lacking of information aboutsubject eligibility criteria; doubt regard-ing the site of measurement). Finalscoring of the methodological qualityof each items, evaluated on a four-pointscoring system, (excellent, good, fair, andpoor methodological quality) was basedon the “worse score counts method” inthe checklist [Terwee et al., 2012].

Finally, a best evidence synthesis wasperformed, by compiling the assessmentof the methodological quality, the actualresults of the included studies, thenumber of studies, and the total samplesize, as outlined in connection to theCOSMIN evaluation [Terwee et al.,2007], and as also performed in a previousstudy [Kroman et al., 2014]. The rating ofthe best evidence synthesis ranged fromstrong, moderate, limited, positive/neg-ative, conflicting, or unknown. A notewas made whenever a study was rated“poor” due to only one single item. Inthe reliability domain this mainly con-cerned the item “only one measure-ment” (in box B), as often used in clinicalexaminations and always in questionnairestudies; in the validity domain studieswere mainly rated “poor” due to onerating based on the item “no informationon the measurement properties of thecomparator instrument(s)”; in addition, anote was made to studies rated “poor”due to “subject eligibility criteria inade-quately described/lacking.” Studies rated“poor” due to only one “poor” itemwere upgraded to “fair,” in line withprevious systematic reviews, describingthe limitations of the COSMIN whenused for clinical test assessment methods[Kroman et al., 2014; Larsen et al., 2014].

Figure 1. Flow diagram showing identification, screening, eligibility, and inclusion procedures.

RESEARCH REVIEW AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) 119

Studieswithmore thanone “poor”-rateditem were omitted from the final evi-dence synthesis.

RESULTS

Identification of ClinicalAssessment Methods

In total 2,285 references were identified,and after removal of duplicates 1,338references were included in the

screening procedure of titles and ab-stracts, of which 298 full-text articleswere eligible according to the inclusioncriteria. In Search 1, a total of sixprimary clinical assessment methods forclassifying GJH were identified, corre-sponding to four test assessment meth-ods (BS, Carter, and Wilkinson [CW],Hospital del Mar [HdM], Rotes-Querol[RQ]), and two questionnaire assess-ment methods (5PQ, Beighton Scoreself-reported [BS-self]). In Search 2, 163

references were identified, and afterremoval of duplicates 33 studies wereidentified describing the clinimetricproperties of the six clinical assessmentmethods (Fig. 1).

Clinimetric Properties

Methodological quality in relation toreliability of the four test assessmentmethods (BS, CW, HdM, and RQ) wasevaluated from eleven studies, and

120 AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) RESEARCH REVIEW

reliability of one of the two questionnaireassessment methods (5PQ)was evaluatedfrom two studies, while there were noreliability studies on BS-self (Table I).

Methodological quality in relation tovalidity of the four test assessmentmethods (BS, CW, HdM, RQ), wasevaluated fromtwenty studies, andvalidityof the two questionnaire assessmentmethods (5PQ, BS-self) was evaluatedfrom six studies (Table II).

All four tests were rated as havingpoor quality in all reliability studies, while64% (7 of 11) for BS, and 50% (1 of 2) forRQ could be upgraded to fair quality,when one rating (“only one measure-ment”) was omitted. Both studies of 5PQwere rated as having poor quality in test-retest reliability [Moraes et al., 2011;Bulbena et al., 2014], but both could beupgraded to fair. CW, RQ, and HdMwere all rated as having fair quality (3/3),while for BS 82% (14/17) were upgradedand thus rated as having fair quality. Allstudies on validity for the two question-naire assessment methods (for 5PQ: 5/5;for BS-self: 1/1) were upgraded, andtherefore, rated as fair (Table III).

As seen in Tables I and II, all testassessment methods BS, CW, HdM, andRQ, have been tested for reliability, andfor validity when compared with eachother. BS has further been tested fordifferent validity types, such as range ofmotion (ROM), associations with pain,injuries, and other diseases, whereasHdM has further been tested for validityon associations with shoulder injuries.5PQ is the only one of the questionnaireassessment methods that has been testedfor reliability, and for validity whencompared with test assessment methods,associations with pain, diseases, andanxiety. BS-self has only been testedfor validity compared with BS(Table III).

Best Evidence Synthesis: Levels ofEvidence

Test assessment methodsOf the 11 reliability studies for testassessment methods, 5 studies had poorratings on methodological quality[Bulbena et al., 1992; Mikkelsson et al.,1996; Hansen et al., 2002; Boyle et al.,

2003; Aslan et al., 2006], and since theycould not be upgraded to fair, they werenot included in the best evidence synthe-sis. For the only two studies includingintra-rater reliability on BS there waslimited positive evidence [Erkula et al.,2005; Hirsch et al., 2007] (Table IV).

For inter-rater reliability four stud-ies had limited positive evidence [Hickset al., 2003; Erkula et al., 2005; Hirschet al., 2007; Juul-Kristensen et al.,2007], while two studies had negativeevidence [Karim et al., 2011; Jungeet al., 2013], leaving the final evidence aslimited positive to conflicting evidence.A total of four out of the eleven studiesincluded children [Mikkelsson et al.,1996; Hansen et al., 2002; Erkula et al.,2005; Junge et al., 2013].

Validity of BS compared with othertest assessment methods showed limitedpositive to conflicting evidence in threestudies [Bulbena et al., 1992; Ferrariet al., 2005; Junge et al., 2013], whilecompared with ROM (trunk rotation,lower, and upper extremities) the validityin five studies showed limited negative toconflicting evidence (two on children)[Sauers et al., 2001; Erkula et al., 2005;Pearsall et al., 2006; Smits-Engelsmanet al., 2011; Naal et al., 2014]. Thevalidity for BS and the association withpain showed moderate positive to con-flicting evidence in five studies (all onchildren) [El-Metwally et al., 2004,2005, 2007, Tobias et al., 2013; Sohr-beck-Nøhr et al., 2014]. For the validityof BS and the association with injuriesthe validity showed conflicting evidencein three studies (one in children) [Rous-sel et al., 2009; Cameron et al., 2010;Junge et al., 2015]. For the validity of BSand the association with differentdiseases (Temporo-Mandibular Disor-ders, Chronic Fatigue Syndrome, Adhe-sive Capsulitis) there was limited positiveto conflicting evidence in three studies[Nijs et al., 2004; Hirsch et al., 2008;Terzi et al., 2013].

CW (almost similar to the BS), andRQ had unknown evidence for bothinter-rater reliability and validity com-pared with other test assessment methods[Bulbena et al., 1992], while HdMshowedunknown evidence for inter-raterreliability and validity in the association

with injuries, such as anterior shoulderdislocation [Bulbena et al., 1992; Chahalet al., 2010].

Questionnaire assessment methodsFor reliability 5PQ showed conflictingevidence in the two studies [Moraeset al., 2011; Bulbena et al., 2014]. Forthe validity 5PQ showed limited positiveto conflicting evidence compared withtest assessment methods (BS, HdM) inthe same two studies, while in theassociation with pain and tissue diseases(chronic widespread pain, JHS) 5PQshowed limited positive evidence in twostudies [Hakim and Grahame, 2003;Mulvey et al., 2013], and with anxiety itshowed unknown evidence [Sancheset al., 2014]. BS-self showed unknownevidence in the validity compared withBS in one study [Naal et al., 2014].

DISCUSSION

Four test assessment methods (BS, CW,HdM, RQ) and two questionnaireassessment methods (5PQ, BS-self)were identified for classifying GJH, inchildren and adults, with 33 studiesreporting their measurement properties.The four test assessment methods andone of the questionnaire assessmentmethods (5PQ) reported measurementproperties on both reliability and valid-ity. Most studies were on BS, and onlyBS and 5PQ reported aspects of validity.

Four test assessment methods(BS, CW, HdM, RQ) andtwo questionnaire assessmentmethods (5PQ, BS-self) wereidentified for classifying GJH,in children and adults, with33 studies reporting theirmeasurement properties.

The majority of the reliabilitystudies showed limited positive to con-flicting evidence for BS, and thus, mayseem acceptable to be used in clinical

TABLEI.

Inform

ationFrom

theIncluded

StudiesforCOSMIN

Sco

ringin

theReliabilityDomain

Ref/assessm

ent

metho

dStud

ysample/po

pulatio

nRegistratio

n/hand

ling

ofmissingdata

Design

Tim

einterval

Mainresult

Cut-offscore

Metho

dologicalflaws

Beighton(5

items)

Aslanet

al.(2006)

(BHJM

I)(þ

goniom

eter

for5th

finger,elbows,

knees[ly

ing])

n¼72

Asymp.

(meanage20

yrs,

range18–25)

Noinform

ationof

registratio

n/hand

ling

ofmissingdata

Intra-/inter-rater

Intra-rater:

mean12.84

(þ/�

7.41

days)

Inter-rater:

sameday

ICC

(mean)

Intra:0.92

Inter:0.82

%agreem

ent

Intra:86%

(category

scores)

43%

(com

positescores)

Inter:75%(category

scores)

42%(com

positescores)Com

posite

scores

categorized

(0–2:

cat.1)

(3–4:

cat.2)

(5–9:

cat.3)

Other

impo

rtant/minor

metho

dologicalflaws

(inadequately

described/

lackingof

info

abou

tsubject

eligibility

criteria,asym

p.,

norand

omorder)

Boyle

etal.(2003)

(BHJM

I)(þ

goniom

eter

for5th

finger,elbows,

knees[ly

ing])

n¼42

(intra)

n¼36

(inter)

Asymp.

(meanage25

yrs,

range15–45)

Noinform

ationof

registratio

n/hand

ling

ofmissingdata

Intra-

/inter-rater

Intra-rater:

1dayto

2weeks

apart

Inter-rater:sameday

orwith

in6days

%agreem

ent

Intra:81%

Inter:89%

Spearm

ansRho

(con

t.score)

Intra:0.81

Inter:0.87

(P<.0001)

Spearm

ansRho

(cat.scores)

Intra:.86

Inter:0.75

(P<.0001)

Com

posite

scores

categorized

(0–2:

cat.1)

(3–4:

cat.2)

(5–9:

cat.3)

Other

minor

metho

dological

flaws(asymp.,no

rand

omorder,lackingdemog

raph

icdetails)

Bulbena

etal.(1992)

n¼30

Symp.

(n¼20)

(meanage41

yrs)

Con

trol

(n¼10)

(meanage48

yrs)

Info

onmissingdata

provided,no

info

onthehand

lingof

the

data

Inter-rater

Noinform

ation

Kappa

(range)

0.79–0.93

NS

Other

minor

metho

dologicalflaws(lack

ofinfo

onrepetition,

norand

omorder,inadequate

demog

raph

icdetails)

Erkulaet

al.(2005)

n¼50

Asymp.

(children,

meanage10

yrs)

Noinform

ationof

registratio

n/hand

ling

ofmissingdata

Intra-

/inter-rater

Reassessm

ent

after2weeks

Spearm

ansRho

Intra:0.62

Inter:0.86

�7/9

(classified

assig

n.jointlaxity)

Other

impo

rtant/minor

metho

dologicalflaws

(sub

ject

eligibility

criteriainadequately

described,

asym

p.,

norand

omorder,no

training

phase,

noblinding

ofresults)

continued

RESEARCH REVIEW AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) 121

TABLEI.

(Continued)

Ref/assessm

ent

metho

dStud

ysample/po

pulatio

nRegistratio

n/hand

ling

ofmissingdata

Design

Tim

einterval

Mainresult

Cut-offscore

Metho

dologicalflaws

Hansenet

al.(2002)

(4/5

selected

tests

includ

ed,no

5th

finger)

n¼100

Asymp.

(children,

9–13

yrs)

Info

onmissingdata

provided,the

hand

lingof

thedata

canbe

dedu

ced

Inter-rater

Noinform

ation

Kappa

(range)

4tests:

0.44–0.82

(experts)

�0.40

(inexperienced/

parents)

NS

Other

minor

meth.

flaws

(asymp.,no

training

phase,

inadequate

demog

raph

icdetails,no

blinding)

Hicks

etal.(2003)

n¼63

Symp.

(meanage36

yrs,

range20-66)

Noinform

ationof

registratio

n/hand

ling

ofmissingdata

Inter-rater

15min

time-delay

betweenraters

ICC

(mean)

Pair1:

0.95

Pair2:

0.76

Pair3:

0.66

NS

Other

impo

rtant/minor

meth.

flaws

(inadequately

described/lackingof

info

abou

tsubject

eligibility

criteria,no

rand

omorder,

inadequate

demog

raph

icdetails)

Hirschet

al.(2007)

(+goniom

eter

forelbows,

knees)

n¼50

Asymp.

(meanage38

yrs,

range20–60)

(recruitedfrom

ageneraldental

practicesetting)

Info

onmissing

data

provided

(1subject),no

info

onthehand

lingof

thedata

Intra-/inter-rater

Anaverageof

24.6

days

betw

eenfirst

andfollow-up

exam

ICC

(mean)

Intra:>0.89

Inter:>0.84

Cronb

ach’salph

a(in

tra-

andinter-rater

agreem

ent)

Mean0.75

Median0.77

�4/9

Other

minor

meth.

flaws

(asymp.,no

rand

omorder)

Jungeet

al.(2013)

(2different

metho

ds)

n¼39

Asymp.

(schoo

lchildren,

aged

7–8and

10–12

yrs)

Noinform

ationof

registratio

n/hand

lingof

missing

data

Inter-rater

Approx.

30min

betweentesting

sessions

%agreem

ent

74–97%

(metho

dA)

72–97%

(metho

dB)

�5/9:82%

(A),80%

(B)

Kappa

(range)

0.49–0.94

(metho

dA)

0.30–0.84

(metho

dB)

�5/9:0.64

(A),0.59

(B)

�5/9

Noothermetho

dological

flaws

Juul-K

ristensenet

al.

(2007)

n¼40

Symp.

(meanage34

yrs)

Asymp.

Noinform

ationof

registratio

n/hand

lingof

missing

data

Inter-rater

Noinform

ation

Kappa

(range)

0.34–1.00

(curr)

0.60–1.00

(curr/hist)

�5/9:0

.66(curr),0.74

(curr/hist)

�5/9

Noothermetho

dological

flaws

continued

122 AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) RESEARCH REVIEW

TABLEI.

(Continued)

Ref/assessm

ent

metho

dStud

ysample/po

pulatio

nRegistratio

n/hand

ling

ofmissingdata

Design

Tim

einterval

Mainresult

Cut-offscore

Metho

dologicalflaws

(meanage46

yrs)

ICC

(mean):0.91(curr/

hist)

Karim

etal.(2011)

n¼30

Con

tempo

rary

Pro

dancers(m

eanage

24yrs,range

18–32)

Noinform

ationof

registratio

n/hand

lingof

missing

data

Inter-rater

Noinform

ation/

guidelineavailable

upon

requ

est

%agreem

ent

54–100%

Kappa

(mean)

0.60

NS

Other

minor

metho

dologicalflaws

(norand

omorder,no

descriptionof

blinding

ofresults)

Mikkelsson

etal.(1996)

n¼29

(inter)

n¼13

(intra)

Asymp.

scho

olchildren(m

eanages9

and11

yrs)

Noinform

ationof

registratio

n/hand

lingof

missing

data

Intra-/inter-rater

Intrarater:with

inthe

samelesson

(atthe

beginn

ingandat

theend)

Inter-

rater:du

ring

the

samelesson

Kappa

(mean)

Inter:0.78

Intra:0.75

ICC

(mean)

Inter:0.80

Intra:0.84

�6/9

Other

minor

metho

dologicalflaws

(asymp.,no

rand

omorder,no

training

phase,

inadequate

demog

raph

icdetails)

Carter&

Wilkinson(5

items,score0–5)

Bulbena

etal.(1992)

n¼30

Symp.

(n¼20)(m

ean

age41

yrs)

Con

trols(n¼10)

(meanage48

yrs)

Info

onmissingdata

provided,no

info

onthehand

lingof

thedata

Inter-rater

Noinform

ation

Kappa

(range)

0.68–0.92

NS

Other

minor

metho

dologicalflaws

(lack

ofinfo

onrepetition,

norand

omorder,inadequate

demog

raph

icdetails)

HospitaldelMar

(9/10items,score0–9)

Bulbena

etal.(1992)

(item

testing)

n¼30

Symp.

(n¼20)

(meanage41

yrs)

Con

trols(n¼10)

(meanage48

yrs)

Info

onmissingdata

provided,no

info

onthehand

lingof

thedata

Inter-rater

Noinform

ation

Kappa

(range)

0.61–1.00

NS

Other

minor

metho

dologicalflaws

(lack

ofinfo

onrepetition,

norand

omorder,inadequate

demog

raph

icdetails)

Rot�es-Q

u�erol(11items,score0–11)

Bulbena

etal.(1992)

n¼30

Symp.

(n¼20)

(meanage41

yrs)

Con

trols(n¼10)

(meanage48

yrs)

Info

onmissingdata

provided,no

info

onthehand

lingof

thedata

Inter-rater

Noinform

ation

Kappa

(range)

0.44–0.93

NS

Other

minor

metho

dologicalflaws

(lack

ofinfo

onrepetition,

norand

omorder,inadequate

demog

raph

icdetails)

Juul-K

ristensenet

al.

(2007)

n¼40

Symp.

Noinform

ationof

registratio

n/hand

lingof

missing

Inter-rater

Noinform

ation

Kappa

(range)

0.32–0.79

(curr)

0.31–0.80

(curr/hist)

Noothermetho

dological

flaws

continued

RESEARCH REVIEW AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) 123

TABLEI.

(Continued)

Ref/assessm

ent

metho

dStud

ysample/po

pulatio

nRegistratio

n/hand

ling

ofmissingdata

Design

Tim

einterval

Mainresult

Cut-offscore

Metho

dologicalflaws

(3items)

(meanage34

yrs)

Asymp.

(meanage46

yrs)

data

ICC

(mean):0.83

(curr/hist)

5-partqu

estio

nnaire

(5items,score0–5)

Bulbena

etal.(2014)

5-partqu

estio

nnaire

(þ2qu

estio

ns)

n¼33

Symp.

(anx

iety)

(meanage35

yrs)

Noinform

ationof

registratio

n/hand

lingof

missing

data

Test-retest

1week

Tau-kendallindex:

0.91

ICC

(mean):0.96

�3/7

Other

minor

metho

dologicalflaws

(norand

omorder,

inadequate

description

ondemog

raph

icdetails)

Moraeset

al.(2011)

5-partqu

estio

nnaire

n¼211

Asymp.

(ages17–24

yrs)

Info

onmissingdata

implicitprovided,

thehand

lingof

the

data

canbe

dedu

ced

Intra-grou

pagreem

ent

6mon

ths

Kappa

(mean)

Q1:

0.63

Q2:

0.70

Q3:

0.65

Q4:

0.57

Q5:

0.48

�2/5

Other

minor

metho

dologicalflaws

(asymp.,no

rand

omorder,inadequate

demog

raph

icdetails,no

blinding)

BHJM

I,BeightonandHoran

JointMob

ility

Index;

NS,

notstated;approx.,approximately,Intra,intrarater;Inter,interrater;yrs,years;asym

p.,asym

ptom

atic;symp.,sym

ptom

atic;

Q,q

uestion;

cont.,continuo

us;cat.,

category;curr,currently;h

ist,h

istorically;ICC,intraclasscorrelation;

sign.,sig

nificant;rho,

rank

correlationco-efficient.

124 AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) RESEARCH REVIEW

practice, provided that uniformity oftesting procedures is included in testingprocedures, in addition to historicalinformation, especially in adults. How-ever, there are shortcomings on studiesfor the validity of BS, while the threeother test assessment methods lackinformation on both reliability andvalidity. For the questionnaire assess-ment methods, 5PQ was the mostfrequently used, however, only in adultpopulation studies, and the reliabilityshowed conflicting evidence. Concern-ing the validity there were shortcomingson studies for 5PQ, while for BS-self thevalidity showed unknown evidence incomparison with BS. More studies areneeded to conclude on the measure-ment properties for BS-self.

Inter-rater reliability studies on testassessment methods were most fre-quently reported on BS, with themajority showing limited positive toconflicting evidence, and thus, may beacceptable for this assessment method.This may provide useful information forclinicians and researchers, in order toestablish uniformity in carrying out theprocedures. On the other hand, it alsoshows the need for more future com-prehensive studies of this test assessmentmethod, since unclear/vague or differ-ent descriptions of the procedures forperforming the Beighton tests were used(e.g., thumbs apposition with straight orflexed elbow, knee extension in standingor supine lying). The procedures ini-tially illustrated by photos for perform-ing the Beighton tests [Beighton et al.,1973] are recommended for futureclinical use, as described in detail inthe appendix of one of the reliabilitystudies [Juul-Kristensen et al., 2007], asthey have satisfactory reliability.

Some of the studies did not includeall nine tests as recommended, whichespecially is important when definingcut-points for classifying GJH, as dis-cussed further below. Other test assess-ment methods, such as CW, RQ, andHdM showed unknown evidence onreliability in one single study [Bulbenaet al., 1992], which is too limited toconclude on.

For the questionnaire assessmentmethods, most of the studies were

TABLEII.Inform

ationFrom

theIncluded

StudiesforCOSMIN

Sco

ringin

theValidityDomain

Ref/assessm

ent

metho

dYear

Stud

ysample/po

pulatio

nRegistratio

n/hand

ling

ofmissingdata

Con

struct

validity

hypo

theses

form

ulated/directio

n/criterion

validity

(adequ

ate“gold

standard”)

Mainresults

Cut-off

points

Metho

dologicalflaws

(explicitlystated

orlack

ofinform

ation)

BSandothertests/different

BSprocedures

Bulbena

etal.

•BS

•CW

(5items)

•RQ

(11items)

1992

n¼114

(Sym

p.)

n¼59

(con

trols)

Info

onmissingdata

provided,the

hand

lingof

data

can

bededu

ced

Agreementbetw

eenBS,

CW,

andRQ

criteriato

defin

ethe

HMS

Multip

lehypo

theses

form

ulated

apriori

Pearsoncorrelation

BSvs.RQ:0.89

BSvs.CW:0.97

CW

vs.RQ:0.86%

agreem

ent

CW

�3:96–98%

BS�5

:96–98%

RQ

�5:95–96%

Different

cutpoints

aretested

Other

impo

rtant/minor

meth.

flaws(sub

ject

eligibility

criteria

inadeq.described/

lackingforthecontrol

grou

p,1trialper

measure.,no

rand

omorder,no

training

phase,

inadequate

demog

raph

icdetails,

noblinding)

Ferrariet

al.

•BS

•Low

erLimb

Assessm

ent

Score(LLA

S)(12items)

2005

1)n¼21

(hyp.)

2)n¼88

(possib

lehyp.)

3)n¼116

(con

trols)

(range:5-16

yrs)

Noinfo

onmissing

data/handling

provided

Agreementbetw

eenBS

andLL

AS

Hypothesesvague/no

tform

ulated,bu

tpo

ssible

todedu

cewhatwas

expected

%agreem

ent

BSvs.L

LAS69%

(1),80%

(3)

Pearsoncorrelation0.43

(1),

0.79

(2),0.79

(3)

�5/9

(BS)

�7/12

(LLA

S)

Other

minor

metho

dologicalflaws

(1trialpermeasure.

session,

norand

omorder,no

training

phase,

inadequate

demog

raph

icdetails,

noblinding)

Jungeet

al.

•BSmetho

dA

•BSmetho

dB

2013

n¼103

Asymp.

(schoo

lchildren,

aged

7–8and

10–12

yrs)

Info

onmissingdata

provided,the

hand

lingof

thedata

canbe

dedu

ced

Agreementbetw

eenpreval.of

GJH

ofMetho

dsA

andB

Hypothesesvague/no

tform

ulated,bu

tpo

ssible

todedu

cewhatwas

expected

McN

emar

sign

prob

ability

test

Prevalence

onMetho

dA

andBNodifference

(P¼0.54)

�5/9

(BS

AþB)

Noothermeth.

flaws

BSandROM

Erkulaet

al.

•BS

2005

n¼1273

Noinfo

onmissing

data/handling

Agreementbetw

eenjointlaxity

andtrun

krotatio

nChi2-test

BSvs.scap.

�7/9

(BS)

Other

impo

rtant/minor

meth.

flaws co

ntinued

RESEARCH REVIEW AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) 125

TABLEII.(Continued)

Ref/assessm

ent

metho

dYear

Stud

ysample/po

pulatio

nRegistratio

n/hand

ling

ofmissingdata

Con

struct

validity

hypo

theses

form

ulated/directio

n/criterion

validity

(adequ

ate“gold

standard”)

Mainresults

Cut-off

points

Metho

dologicalflaws

(explicitlystated

orlack

ofinform

ation)

•Trunk

rotatio

n/asym

metry

Asymp.

(meanage10.4,

range8–15

yrs)

provided

Minim

alhypo

theses

form

ulated

apriori

asym

metry:

(P¼0.008)

BSvs.left

scap.elevation:

(P¼0.028)

(inadequately

describedsubject

eligibility

criteria,

only

1trialper

measure.session,

norand

omorder,no

training

phase,

inadequate

demograph

icdetails,

noblinding)

Naalet

al.

•BS

•Fem

oroacetabu

lar

impingem

ent

(FAI)

2014

n¼55

Symp.

(meanage28.5

yrs,SD

4.1yrs)

Noinfo

onmissing

data/handling

provided

Agreementbetw

eenBSand

Hip

ROM

Hypothesesvague/no

tform

ulated,bu

tpo

ssible

todedu

cewhatwas

expected

Spearm

anscorrelation

BSvs.

Hipflex:

0.61

Int.rot:0.56

Ext.rot:0.44

(allwith

P<0.01)

�4/9

(BS)

�6/9

(BS)

Other

minor

metho

dologicalflaws

(only1trialper

measurementsession,

norand

omorder,no

training

phase,

noblinding)

Pearsallet

al.

•BS

•Knee

arthrometer

(KT-2000)•A

nkle

arthrometer

2006

n¼57

Asymp.

(meanage20.9

yrs,SD

1.45

yrs)

Noinfo

onmissing

data/handling

provided

Agreementbetw

eenBSand

knee

andanklejointspecific

laxity

Hypothesesvague/no

tform

ulated,bu

tpo

ssible

todedu

cewhatwas

expected

Spearm

anscorrelation

BSvs.

Kneelax:0.37

(P¼0.110)

A/P

anklelax:

0.21

(P¼0.152)

Int-Ext

rot

anklelax:

0.24

(P¼0.101)

NS

Other

minor

meth.

flaws

(norand

omorder,no

training

phase,

noblinding)

Sauerset

al.

•BS(4/5

items,

forw

ardflexion

notinclud

ed)

•Sho

ulder

2001

n¼51/102

Asymp.

(health

yshou

lders)

Noinfo

onmissing

data/handling

provided

Agreementbetw

eenBSand

glenoh

umeraljointlaxity

Hypothesesvague/no

tform

ulated,bu

tpo

ssible

todedu

cewhatwas

expected

Pearsons

correlation

Bsvs.instr.

APlaxscore:

(0.23,

NS)

Clin

passROM:

NS

(BS0–8)

assumed)

Other

minor

metho

dologicalflaws

(only1trialper

measure.session,

norand

omorder,no

continued

126 AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) RESEARCH REVIEW

TABLEII.(Continued)

Ref/assessm

ent

metho

dYear

Stud

ysample/po

pulatio

nRegistratio

n/hand

ling

ofmissingdata

Con

struct

validity

hypo

theses

form

ulated/directio

n/criterion

validity

(adequ

ate“gold

standard”)

Mainresults

Cut-off

points

Metho

dologicalflaws

(explicitlystated

orlack

ofinform

ation)

arthrometer

•Clin

icalpassive

ROM

(meanage22

yrs,

SD2.8yrs)

(0.01–0.48,NS)

training

phase,

inadequate

demograph

icdetails,

noblinding)

Smits-Engelsm

anet

al.

•BS

•Gon

iometer

2011

n¼551

Asymp.

(meanage8yrs,

range6–12

yrs)

Noinfo

onmissing

data

provided,bu

tthehand

lingof

data

canbe

dedu

ced

Agreementbetw

eenBS

and16

ROM

Hypothesesvague/no

tform

ulated,bu

tpo

ssible

todedu

cewhatwas

expected

Variance

analysis

Sign.diffin

mean

ROM

betw

een

the3BSgrou

ps(P

<0.001,

except

knee

flexP¼0.02;

hipextP¼0.06)

BS: 0–4(not

hyp)

5–6(in

cr.

hyp)

7–9(hyp)

Other

minor

metho

dologicalflaws

(only1trialper

measure.session,

norand

omorder,no

blinding)

BSandpain

El-Metwally

etal.

•BS

•Musculoskeletal

pain

2004

n¼430

Asymp.

(meanage9.8

and11.8

yrs)

Info

onmissingdata,

andhand

lingof

data

provided

Predictivebaselin

efactorsfor

persistence/recurrenceof

MSK

pain,from

childho

odtilladolescence

Hypothesesv

ague

anddirection

notform

ulated,bu

tpo

ssible

todedu

cewhatw

asexpected

Pain

recurrence

4-yr

follow-up

(GLM

odelsþ

Risk

Ratio):

(RR¼1.35

[1.08–1.68])

�6/9

Other

minor

metho

dologicalflaws

(only1trialper

measure,no

rand

omorder,no

training

phase,

inadequate

demograph

icdetails)

El-Metwally

etal.

•BS

•Low

erlim

bpain

(LLP

)

2005

n¼1284

Asymp.

(meanage9.8

and11.8

yrs)

Noinfo

onmissing

data

provided,bu

tthehand

lingof

data

canbe

dedu

ced

Predictivebaselin

efactorsfor

persistence/recurrenceof

LLP,from

childho

odtilladolescence

Hypothesesv

ague

anddirection

notform

ulated,bu

tpo

ssible

todedu

cewhatw

asexpected

LLPrecurrence

4-yr

follow-up

(GLM

odelsþ

OddsR

atio)

(OR¼2.93

[1.13–7.70])

�6/9

Other

minor

metho

dologicalflaws

(only1trialper

measure,no

rand

omorder,inadequate

demograph

icdetails)

El-Metwally

etal.

•BS

•Musculoskeletal

pain

2007

n¼1113

Asymp.

(meanage9.8

and11.8

yrs)

Info

onmissingdata

provided,the

hand

lingof

data

can

bededu

ced

Predictivebaselin

efactorsfor

incidenceofmusculoskeletal

pain,from

childho

odtill

adolescence

Hypothesesv

ague

anddirection

Incidence4-yr

follow-up

(GLM

odelsþ

OddsR

atio)

BSvs.Non

-traum

.pain

(OR:0.83[0.44–1.56])

BSvs.Traum

atic

pain

�4/9

�6/9

Other

minor

metho

dologicalflaws

(only1trialper

measure,no

rand

omorder,inadequate

continued

RESEARCH REVIEW AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) 127

TABLEII.(Continued)

Ref/assessm

ent

metho

dYear

Stud

ysample/po

pulatio

nRegistratio

n/hand

ling

ofmissingdata

Con

struct

validity

hypo

theses

form

ulated/directio

n/criterion

validity

(adequ

ate“gold

standard”)

Mainresults

Cut-off

points

Metho

dologicalflaws

(explicitlystated

orlack

ofinform

ation)

notform

ulated,bu

tpo

ssible

todedu

cewhatw

asexpected

(OR:0.70

[0.16–3.04])NS

demog

raph

icdetails)

Sohrbeck-N

øhret

al.

•BS

•Pain(arthralgia)

2014

n¼301

Asymp.

(median

age14

yrs,range

13-15yrs)

Info

onmissingdata

provided,described

hand

ling

Predictivebaselin

efactorsfor

incidenceof

arthralgia,from

childho

odtilladolescence

Hypothesesv

ague

anddirection

notform

ulated,bu

tpo

ssible

todedu

cewhatw

asexpected

Arthralgiaincidence4and

6-yr

follow-up(Logistic

RegressMod

elsþ

Odds

Ratio)

BS�5

/9(O

R:3.00,[0.94–9.60])

�4/9

�5/9

�6/9

Other

minor

metho

dologicalflaws

(norand

omorder)

Tobias

etal.

•BS

•Musculoskeletal

pain

2013

n¼2901

Asymp.

(meanage

13.8

yrs)

Info

onmissingdata

provided,the

hand

lingof

data

can

bededu

ced

Predictivebaselin

efactorsfor

incidenceof

arthralgia,from

childho

odtilladolescence

Hypothesesv

ague

anddirection

notform

ulated,bu

tpo

ssible

todedu

cewhatw

asexpected

Pain

association4-yr

follow-up(Logistic

Regress

Mod

elsþ

OddsRatio)

Shou

lder

(OR1.68

[1.04–2.72])Knee

(OR

1.83

[1.10–3.02])

Ank

le/foo

t(O

R1.82

[1.05 –3.16])

�6/9

Other

minor

metho

dological

flaws(on

ly1trialper

measure,no

rand

omorder,no

blinding)

BSandinjuries

Cam

eron

etal.

•BS

•Gleno

humeral

instability

(GI,

historic

info)

2010

n¼714(soldiers,

meanage18.8

yrs,SD

1yr)

Info

onmissingdata

provided,described

hand

ling

Relationshipbetw

eenBSand

GIHypothesesvagueand

directionno

tform

ulated,b

utpo

ssible

todedu

cewhatwas

expected

GIassociation(Log

istic

RegressMod

elsþ

Odds

Ratio)

(OR

2.48

[1.19–5.20])

�2/9

Other

minor

metho

dologicalflaws:

(1trialpermeasure,

norand

omorder)

Chahalet

al.

•HdM

tests

(10items)þext.

rot>85°

(GLL

þER)

•Primaryanterior

shou

lder

dislo

catio

n(ASD

)

2010

n¼57(cases)/

92(con

trols)

(meanage24

yrs,

SD:0.32)

Noinfo

onmissing

data

provided,the

hand

lingof

data

can

bededu

ced

Predictiverisk

factor

ofGLL

þER

forAID

Minim

alnu

mberof

hypo

theses

form

ulated,expected

directionof

correlationstated

AID

association

(OddsRatio)

All(O

R:2.79

[1.27–6.09])

GLL

(OR:3.6

[1.49–8.68])

GLL

þER

Men:(O

R:7.43

[2.13–25.57])

Men:

�4/10

Wom

en:

�5/10

Other

minor

metho

dologicalflaws

(1trialpermeasure,

norand

omorder,

inadequate

demog

raph

icdetails,

noblinding) continued

128 AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) RESEARCH REVIEW

TABLEII.(Continued)

Ref/assessm

ent

metho

dYear

Stud

ysample/po

pulatio

nRegistratio

n/hand

ling

ofmissingdata

Con

struct

validity

hypo

theses

form

ulated/directio

n/criterion

validity

(adequ

ate“gold

standard”)

Mainresults

Cut-off

points

Metho

dologicalflaws

(explicitlystated

orlack

ofinform

ation)

GLL

(OR:6.75

[1.92–23.36])

Jungeet

al.

•BS;

Knee

hyperm

obility

(KH)

•Kneeinjuries

bySM

S-track(KI)

2015

n¼999

(schoo

lchildren,

9–14

yrs)

Info

onmissingdata

provided,described

hand

ling

Predictiverisk

factor

ofBSþKH

forKIMinim

alnu

mberof

hypo

theses

form

ulated,expected

directionof

correlationstated

KIassociation

(Logistic

Regress

mod

elsþ

OddsRatio)

Traum

atic

Inj(O

R:1.56

[0.43–5.61])BS

Traum

aticInj(O

R:2.22

[0.60–8.19])BSþKH

�5/9

Other

minor

metho

dologicalflaws

(1trialpermeasure,

norand

omorder,no

training

phase)

Rou

ssel

etal.

•BSþgoniom

eter

•Pelvicinjuries

(PI),

low

back

pain

(LBP)

2009

n¼32

(dancers)

n¼26

(students)

(mean20

yrs,

SD:2yrs)

Info

onmissingdata

provided,described

hand

ling

Predictiverisk

factor

ofBSfor

PIandLB

PHypothesesv

ague

anddirection

notform

ulated,bu

tpo

ssible

todedu

cewhatw

asexpected

PI/L

BP(Spearman

correlation:

(Rho

¼�0.03;

P¼0.89)PI,LB

P

BS: 0–3(tight)

4–6(hyp)

7–9(extr.

hyp)

Other

impo

rtant/minor

meth.

flaws(inadequately

describedsubject

eligibility

criteria,1

trialpermeasure,no

rand

omorder,no

training

phase,

inadequate

demog

raph

icdetails,

noblinding)

BSanddiseases

Hirschet

al.

•BS

•Tem

poro-

mandibu

lar

sympt

(TMs)

2008

n¼895

(meanage

40.6

yrs,SD

:11.6

yrs)

Noinfo

onmissing

data

provided,bu

tthehand

lingof

data

canbe

dedu

ced

Agreementbetw

eenBS

andTMs

Hypothesesvagueand

directionno

tform

ulated,b

utpo

ssible

todedu

cewhatwas

expected

Associatio

n(M

ultip

lelog.

Regress.

Mod

elsþ

OddsRatio)

Jointclick,

redu

ced

ROM

(OR:1.56

[1.01–2.39])

Category

0(BS¼0)

1(BS¼

1–3)

2(BS¼4–9)

Other

minor

metho

dologicalflaws

(only1trialper

measure.session,

norand

omorder,no

training

phase,

inadequate

continued

RESEARCH REVIEW AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) 129

TABLEII.(Continued)

Ref/assessm

ent

metho

dYear

Stud

ysample/po

pulatio

nRegistratio

n/hand

ling

ofmissingdata

Con

struct

validity

hypo

theses

form

ulated/directio

n/criterion

validity

(adequ

ate“gold

standard”)

Mainresults

Cut-off

points

Metho

dologicalflaws

(explicitlystated

orlack

ofinform

ation)

demograph

icdetails,

noblinding)

Nijs

etal.

•BS

•Chron

icfatig

uesynd

rome(C

FS)

2004

n¼44

Symp.

(CFS

)(m

eanage40.2

yrs,SD

:9.11)

Info

onmissingdata

provided,bu

tthe

hand

lingof

data

can

bededu

ced

Associatio

nbetween

widespreadpain

inpatients

with

CFS

andBS

Multip

lehypo

theses

form

ulated

apriori,d

irectio

nandbu

tmagnitude

notstated

Associatio

nbetw

eenBSand

CFS CFS

:63.6%

Non

-CFS

:36.5%

(Fischersexacttest:

P¼0.73)

CFS

-APQ

1:P¼0.75

CFS

-APQ

2:P¼0.69

(NS)

�4/9

Other

minor

metho

dologicalflaws

(only1trialper

measure,no

rand

omorder,no

training

phase,

inadequate

demograph

icdetails,

noblinding)

Terziet

al.

•BS

•Adh

esive

capsulitis(AC)

oftheshou

lder

•Sub

acromial

impingem

ent

synd

rome(SIS)

2013

n¼240

(120

with

AC,

120controls/SIS)

(meanage

53.9–54.08yrs,

SD:8

.21–10.68)

Noinfo

onmissing

data

provided,bu

tthehand

lingof

data

canbe

dedu

ced

Difference

inGJH

(BS)

inAC

vs.SIS

Minim

alnu

mberof

hypo

theses

form

ulated,expected

direction,

butno

tmagnitude

ofcorrelationstated

Chi-squ

are

GJH

inAC

vs.SIS:

(0.8%

vs.7.5%

,P¼0.010)

NS

Other

minor

metho

dologicalflaws

(only1trialper

measure,no

rand

omorder,no

training

phase,

noblinding)

Questionn

aires(5-part)andothertests

Bulbena

etal.

•5PQ

(þ2

questio

ns)

•HdM

(10items)

2014

n¼191pt’s

(potential

anxiety,mean

age35.5

yrs,

Noinfo

onmissing

data/handling

provided

Agreementbetw

een5P

Qþ2Q

andHDM

Hypothesesvagueanddirection

notform

ulated,bu

tpo

ssible

todedu

cewhatw

asexpected

Spearm

anscorrelation

5PQþ2

QandHdM

:(rho

¼0.75;P<0.001)

5PQþ2Q

and5P

Q:

(rho

¼0.86;P<0.001)

�3/7

Other

impo

rtant/minor

meth.

flaws(in

adeq.

describedsubject

eligibility

criteria,1

trialpermeasure,no

continued

130 AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) RESEARCH REVIEW

TABLEII.(Continued)

Ref/assessm

ent

metho

dYear

Stud

ysample/po

pulatio

nRegistratio

n/hand

ling

ofmissingdata

Con

struct

validity

hypo

theses

form

ulated/directio

n/criterion

validity

(adequ

ate“gold

standard”)

Mainresults

Cut-off

points

Metho

dologicalflaws

(explicitlystated

orlack

ofinform

ation)

SD:10.18yrs)

rand

omorder,no

training

phase,

inadeq.

demog

raph

icdetails,

noblinding)

Moraeset

al.

•5PQ

(transl.)

•BS

2011

n¼394

(university

stud

ents,age

17–24

yrs)

Info

onmissingdata

provided,the

hand

lingcanbe

dedu

ced

Agreementbetw

een5P

QandBS

Hypothesesv

ague

anddirection

notform

ulated,bu

tpo

ssible

todedu

cewhatw

asexpected

Assum

able

that

thecriterion

used

(BS)

canbe

considered

areason

able

“gold

standard”

%agreem

ent73.6%

Sensitivity

70.9

(62.1–78.6)

Specificity

77.4

(71.4–82.6)

�2/5

(5PQ

)

�4/9

(BS)

Other

minor

metho

dologicalflaws

(norand

omorder,

inadequate

demog

raph

icdetails,

noblinding)

5-partQ

andpain

Mulveyet

al.

•5-part

questio

nnaire

•Chron

icwidespreadpain

(CWP)

2013

n¼12,354

(pop

ulation,

medianage55

yrs,range

25–107yrs)

Info

onmissingdata

provided,

registratio

n/hand

lingdescribed

Associatio

nbetw

eenJH

and

musculoskeletalpain

Minim

alnu

mberof

hypo

theses

form

ulated,expected

directionof

correlationstated

Relativerisk

ratio

JHandCWP

gradeI:

(RRR

1.2;

P¼0.03

–mod

estass.)

gradeII:

(RRR

1.2,

P¼0.1–mod

estass.,

notstat.sig

n.)

gradeIII/IV:

(RRR

1.4;

P¼<0.001)

�2/5

Other

minor

metho

dologicalflaws

(1trialpermeasure.,

norand

omorder,no

training

phase,

inadequate

demog

raph

icdetails,

noblinding)

5-partQ

anddiseases

Hakim

andGrahame

•5PQ

2003

n¼269

(212

BJH

S,57

controls)

n¼220

(170

BJH

S,50

controls)

(range

Noinfo

onregistratio

n/hand

lingof

missing

data

Agreementbetw

een5P

QandBJH

SHypothesesvagueanddirection

notform

ulated,bu

tpo

ssible

todedu

cewhat

Sens.(mean)

77-85%

(1stand

2ndcoho

rt)Spec.80–89%

(1stand2n

dcoho

rt)

�2/5

Other

minor

metho

dologicalflaws

(1trialpermeasure.,

norand

omorder,no

training

phase, continued

RESEARCH REVIEW AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) 131

TABLEII.(Continued)

Ref/assessm

ent

metho

dYear

Stud

ysample/po

pulatio

nRegistratio

n/hand

ling

ofmissingdata

Con

struct

validity

hypo

theses

form

ulated/directio

n/criterion

validity

(adequ

ate“gold

standard”)

Mainresults

Cut-off

points

Metho

dologicalflaws

(explicitlystated

orlack

ofinform

ation)

16–80

yrs)

was

expected

Assum

able

that

thecriterion

used

(BS)

canbe

considered

areason

able

“gold

standard”

inadequate

demograph

icdetails,

noblinding)

Sancheset

al.

•5PQ

•Anx

iety

2014

n¼2300

(university

stud

ents,mean

age21

yrs,

SD:3.25)

Info

onmissingdata

provided,

registratio

n/hand

lingcanbe

dedu

ced

Associatio

nbetw

eenBeck

Anx

iety

Inventory(BAI)

and5P

QHypothesesvagueanddirection

notform

ulated,bu

tpo

ssible

todedu

cewhatw

asexpected

Pearsons

correlation

5PQ

andBAI

totalscore

(Wom

en:r¼

0.11,

P¼0.007,

weakass.)

(Men:r¼

0.04,

(P¼0.450),

notsig

n.ass.)

�2/5

Other

minor

metho

dologicalflaws

(1trialper

measurement,no

rand

omorder,no

training

phase,

inadequate

demograph

icdetails,

noblinding)

BSself-repo

rted

Naalet

al.

•BS

•BS-self(from

paperdraw

ings)

•Fem

oroacetabu

lar

impingem

ent

(FAI)

2014

n¼55

Symp.

(diagnosed

with

FAI)

(meanage

28.5

yrs,

SD:4.1yrs)

Info

onmissingdata

provided,

registratio

n/hand

lingcanbe

dedu

ced

Agreementbetw

eenBS-self

andBS/agreem

entbetw

een

BSandHip

ROM

Hypothesesvagueanddirection

notform

ulated,bu

tpo

ssible

todedu

cewhatw

asexpected

Kappa-values(L

w)

Total0.82

(0.72–0.91)

Single

testsrange:

0.61–0.96

Spearm

anscorrelationBSvs.

Hipflex:

0.61

Int.rot:0.56Ext.rot:0.44

(allwith

P<0.01)

�4/9

(BS)

�6/9

(BS)

Other

minor

metho

dologicalflaws

(1trialpermeasure,

norand

omorder,no

training

phase,

noblinding)

BS,BeightonScale;CW,C

arter&Wilkinson;

HDM,H

ospitaldelMar;R

Q,R

ot�es-Q

u�erol;LL

AS,Lo

wer

LimbAssessm

entS

core;5PQ

,5-partq

uestionn

aire;m

easure,m

easurement;

meth,

metho

dological;hyp,hyperm

obile;G

JH,generalized

jointh

ypermob

ility;H

MS,hyperm

obility

synd

rome;GLL

,generalized

ligam

entous

laxity;JH,joint

hyperm

obility;B

JHS,

benign

jointhypermob

ilitysynd

rome;MSK

,musculoskeletal;scap,scapular;lax,laxity

;ROM,range

ofmotion;clin,clin

ical;pass,passive;incr,increased;diff,difference;sign,sig

nificant;

ass,association;traum,traum

atic;transl.,translatio

n;inadeq,inadequ

ate;ER,externalrotation;ASD

,Primaryanterior

shou

lderdislo

catio

n;KH,kneehyperm

obility;K

I,kn

eeinjuries;P

I,pelvic

injuries;LB

P,low

back

pain;TMs,Tempo

romandibu

larsymptom

s;CFS,chronicfatig

uesynd

rome;

AC,adhesiv

ecapsulitis;SIS,

subacrom

ialim

pingem

entsynd

rome;

CWP,

chronicwidespreadpain;BAI,BeckAnx

iety

Inventory;

FAI,femoroacetabu

larim

pingem

ent;A/P,anterior/posterior;SD

,standard

deviation;

OR,od

dsratio

;RR,risk

ratio

;sens,

sensitivity;spec,specificity;NS,

non-sig

nificant;GLm

odel,g

enerallin

earregressio

nmod

el.

132 AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) RESEARCH REVIEW

TABLEIII.

COSMIN

Sco

ringoftheMethodologicalQualityofEachStudyonMeasuremen

tProperties

(Reliabilityan

dValidity)

Reliability

Validity

Assessm

ent

metho

ds/

reference

Stud

y(n)/

age

Design/result

Cut-offscore

COSM

INscore

Reference

Stud

y(n)/age

Design/result

Cut-off

score

COSM

INscore

Clin

icaltests

Carter&

Wilkinson(0–5)

(4bilateralBStestsþ

1ankletest)

Bulbena

etal.

(1992)

n¼30

(meanage

41yrs.

[sym

p.],

48yrs.

[con

trol])

Kappa:

substantial/

almostperfect

–Po

or (onlyon

emeasurement,

lackingof

info

abou

tsubject

eligibility

criteria)

Hyp.test.

(CW

vs.RQ/B

S)Pearsoncorr.

High(C

Wvs.RQ)

Very

high

(CW

vs.BS)

%agreem

ent

CW

�3:96–98%

–Po

or��

(other

imp.meth.

flaws:

subjecteligibility

criteriainadeq.

described/lackingfor

thecontrolgrou

p)

Beighton(0–9)

Erkulaet

al.

(2005)

n¼50

(meanage

10yrs)

Spearm

ansrho:

Mod

erate

(intra)

High(in

ter)

�7/9

(classif.

assig

nificant

joint

laxity)

Poor

(onlyon

emeasurement)

Hyp.test.

(BSvs.ROM)

Chi

2 -test:(P

¼0.008)

BSvs.scap.

asym

metry

(P¼0.028)

BSvs

leftscap.

elevation

�7/9

Poor (noinfo

onthe

measurementprop.of

thecomparatorinstr.,

otherim

portantmeth.

flaws-lackingof

info

abou

tsubjecteligibility

criteria)

Hirschet

al.

(2007)

n¼50

(meanage

38yrs,

range

20–60)

ICC:Excellent

(intra/inter)

–Po

or�

(onlyon

emeasurement)

Hirschet

al.

(2008)

n¼895

(meanage

40.6

yrs,

SD:11.6

yrs)

Hyp.test

(BSvs.TMDs)

OddsRatio:

Jointclick

(OR:1.56

(1.01–2.39)

Com

posite

scores

categoris.:

(0:cat.1)

(1–3:

cat.2)

(4–9:

cat.3)Fa

ir

Jungeet

al.

n¼39

Kappa:

�5/9

Poor

�Jungeet

al.

n¼109

Hyp.test

�5/9

Fair

continued

RESEARCH REVIEW AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) 133

TABLEIII.

(Continued)

Reliability

Validity

Assessm

ent

metho

ds/

reference

Stud

y(n)/

age

Design/result

Cut-offscore

COSM

INscore

Reference

Stud

y(n)/age

Design/result

Cut-off

score

COSM

INscore

(2013)

(2different

BS

metho

ds)

(aged

7–8

and

10–12

yrs)

mod

erate/

substantial,

substantial/

almost

perfect

(metho

dA)

Fair/m

oderate,

substantial/

almostperfect

(metho

dB)

�5/9:sub

stantial

(A),mod

erate

(B)

(onlyon

emeasurement)

(2013)

Jungeet

al.

(2015)

n¼999

(range

9–14

yrs)

(BSvs.BSAþB)

McN

emar

sign

prob

ability

test

Prevalence

onMetho

dA

andB:

Nodifference

(P¼0.54)

Hyp.test.

(BSvs.injuries)

OddsRatio:

Traum

atic

inj

(OR:1.56

[0.43–5.61])

�5/9

Poor

��

(inadequate

info

onthe

measurement

prop

ertiesof

the

comparator

instrument)

Juul- Kristensen

etal.(2007)

n¼40

(meanage

34yrs

[sym

p.],

46yrs

[asymp.])

Kappa:

substantial

(curr),

substantial

(curr/hist)

ICC:Excellent

(curr/hist)

�5/9

Poor

(onlyon

emeasurement)

Sohrbeck-

Nøh

retal.

(2014)

n¼301

(meanage

14yrs,range

13–15

yrs)

Hyp.test.

(BSandarthralgia):

OddsRatio:

BS�5

/9(O

R:3.00,

[0.94–9.60])

–Fair

Mikkelsson

etal.(1996)

n¼29 (inter)

n¼13

(intra)

(mean

Kappa:

Substantial

(intra/inter)

�6/9

Poor

(only1

measure.,

smallsample

size,

time

intervalno

tapprop

riate)

El-Metwally

etal.(2004)

n¼430

(meanage

9.8and

11.8

yrs)

Hyp.test.

(BSandpain)

Risk

Ratio:

(RR

¼1.35

[1.08–1.68])

�6/9

Fair

continued

134 AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) RESEARCH REVIEW

TABLEIII.

(Continued)

Reliability

Validity

Assessm

ent

metho

ds/

reference

Stud

y(n)/

age

Design/result

Cut-offscore

COSM

INscore

Reference

Stud

y(n)/age

Design/result

Cut-off

score

COSM

INscore

ages

9and11

yrs)

El-Metwally

etal.(2005)

El-Metwally

etal.(2007)

n¼1,284

(meanage

9.8and

11.8

yrs)

n¼1,113

(meanage

14yrs,

range

13–15

yrs)

OddsRatio:

(OR

¼2.93

[1.13–7.70])

OddsRatio:

BSvs.no

n-traum.

pain

(OR:0.83

[0.44–1.56])NS

Traum

atic

pain

(OR:

0.70

[0.16–3.04])NS

�6/9

�6/9

Fair

Fair

Aslanet

al.

(2006)

(BHJM

I)

n¼72

(meanage

20yrs,

range

18–25)

ICC:Excellent

(intra/inter)

Com

posite

scores

categorized

(0–2:

cat.1)

(3–4:

cat.2)

(5–9:

cat.3)

Poor

(other

impo

rtant

meth.

flaws,

inadequate

statistics

applied)

––

––

Boyle

etal.

(2003)

(BHJM

I)

n¼36/42

(meanage

25yrs,

range

15–45)

Spearm

ansrho:

Excellent

(intra/inter)

Com

posite

scores

categorized

(0–2:

cat.1)

(3–4:

cat.2)

(5–9:

cat.3)

Poor

(tim

eintervalno

tapprop

riate,

inadequate

statistics

applied)

––

––

Bulbena

etal.

(1992)

n¼30

(meanage

41yrs.

[sym

p.],

48yrs.

Kappa:

Substantial/

almostperfect

–Po

or (onlyon

emeasurement,

lackingof

info

abou

tsubject

eligibility

Hyp.test.

(BSvs.CW/R

Q)

Pearsons

corr.

Very

high

(BSvs.CW)

High(BSvs.RQ)

–Po

or��

(other

imp.

meth.

flaws:subjecteligibility

criteriainadeq.

described/lackingfor

thecontrolgrou

p)continued

RESEARCH REVIEW AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) 135

TABLEIII.

(Continued)

Reliability

Validity

Assessm

ent

metho

ds/

reference

Stud

y(n)/

age

Design/result

Cut-offscore

COSM

INscore

Reference

Stud

y(n)/age

Design/result

Cut-off

score

COSM

INscore

[con

trol])

criteria)

Hansenet

al.

(2002)

(4/5

tests,no

5th

finger)

n¼100

(range

9–13

yrs)

Kappa:

Mod

erate/

substantial,

substantial/

almostperfect

(experts)Fair

(inexp.)

–Po

or (only1

measurem.,

testcond

ition

sno

tsim

ilar)

––

––

Hicks

etal.

(2003)

n¼63

(meanage

36yrs,

range

20–66)

ICC:Fair/goo

d,good

/excellent

–Po

or�

(onlyon

emeasurement)

––

––

Karim

etal.

(2011)

n¼30

(meanage

24yrs,

range

18–32)

Kappa:

Mod

erate

–Po

or�

(onlyon

emeasurement)

––

––

––

––

–Cam

eron

etal.

(2010)

n¼714

(meanage

18.8

yrs,

SD:1yr)

Hyp.test.

(BSvs.injuries)

OddsRatio

GJI(O

R2.48

[1.19–

5.20])

(P¼0.16)

�2/9

(95th

percentile)

Fair

––

––

–Ferrariet

al.

(2005)

n¼21(1)/88

(2)/116

(3)

Hyp.test.

(BSvs.othertests)

%agreem

ent

BSvs.LL

AS:

�5/9

Fair

continued

136 AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) RESEARCH REVIEW

TABLEIII.

(Continued)

Reliability

Validity

Assessm

ent

metho

ds/

reference

Stud

y(n)/

age

Design/result

Cut-offscore

COSM

INscore

Reference

Stud

y(n)/age

Design/result

Cut-off

score

COSM

INscore

(range:

5–16

yrs)

69%

(1),80%

(3)

Pearsoncorrelation:

Low

(1),High

(2),High(3)

––

––

–Naalet

al.

(2014)

n¼55

(meanage

28.5

yrs,

SD 4.1yrs.)

Hyp.test.

(BSvs.Hip

ROM)

Spearm

anscorrelationBS

vs.Hipflex:

0.61

Int.

rot:0.56

Ext.rot:0.44

(allwith

P<0.01)

�4/9

(BS)

�6/9

(BS)

Poor

� (inadequate

info

onthemeasurementprop.

ofcomparatorinstr.)

––

––

–Nijs

etal.

(2004)

n¼44

(meanage

40.2

yrs,

SD:9.11)

Hyp.test.

(BSvs.CFS

)Associatio

nbetw

eenBS

andCFS

CFS

:63.6%

Non

-CFS

:36.5%

(Fischersexacttest:

P¼0.73)

CFS

-APQ

1:P¼0.75

CFS

-APQ

2:P¼0.69

(non

-sign)

�4/9

Fair

––

––

–Pearsallet

al.

(2006)

n¼57

(meanage

20.9

yrs,

SD 1.45

yrs)

Hyp.test

(BSvs.ROM)

Spearm

anscorrelation

BSvs.Kneelax:0.37

(P¼0.110)

AP

anklelax:

0.21,

P¼0.152IE

anklelax:

0.24,

P¼0.101

–Fair

continued

RESEARCH REVIEW AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) 137

TABLEIII.

(Continued)

Reliability

Validity

Assessm

ent

metho

ds/

reference

Stud

y(n)/

age

Design/result

Cut-offscore

COSM

INscore

Reference

Stud

y(n)/age

Design/result

Cut-off

score

COSM

INscore

––

––

–Rou

sseletal.

(2009)

n¼32/26

(mean20

yrs,SD

:2

yrs)

Hyp.test.

(BSvs.injuries)

PI/L

BP(Spearman

correlation:

(Rho

¼�0

.03;

P¼0.89)PI,LB

P

Com

posite

scores

subgr.:

(0–3:

gr.1,

tight)(4–6:

gr.2,h

yp.)

(7–9:gr.3,

extr.h

yp.)

Poor

(noinfo

onthe

measurementprop.of

comparatorinstr.,

otherim

portantmeth.

flaws-lackingof

info

abou

tsubjecteligibility

criteria)

––

––

–Sauerset

al.

(2001)

n¼51/102

(meanage

22yrs,SD

2.8yrs)

Hyp.test.

(BSvs.ROM)

Pearsons

correlation

BSvs.instrumented

APlaxscore:

(0.23,

Non

-S)

Clin

passROM:(0.01

–0.48,Non

-S)

–Fair

––

––

–Sm

its-

Engelsm

anet

al.

(2011)

n¼551

(meanage8

yrs,range

6–12

yrs)

Hyp.test.

(BSvs.ROM)

Variance

analysis:

Sign.diffin

mean

ROM

betw

eenthe

3BSgrou

ps(P

<0.001,

except

knee

flexP¼0.02;

hipextP¼0.06)

�7/9

Poor (nodescriptionof

the

constructsmeasuredby

thecomparatorinstr.,

noinfo

onthe

measurementprop.)

––

––

–Terziet

al.

(2013)

n¼240

Hyp.test.

(BSvs.AC)

–Po

or��

(noinfo

onthe

continued

138 AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) RESEARCH REVIEW

TABLEIII.

(Continued)

Reliability

Validity

Assessm

ent

metho

ds/

reference

Stud

y(n)/

age

Design/result

Cut-offscore

COSM

INscore

Reference

Stud

y(n)/age

Design/result

Cut-off

score

COSM

INscore

(meanage

53.9–

54.08yrs,

SD:8.21–

10.68yrs)

Chi-squ

areGJH

inAC

vs.SIS:

(0.8%

vs.7.5%

,P¼0.010)

measurementprop.of

thecomparatorinstr.)

––

–To

bias

etal.

(2013)

n¼2901

(meanage

13.8

yrs)

Hyp.test.(BSandpain)

OddsRatio:Sh

oulder

(OR

1.68

[1.04–

2.72])Knee(O

R1.83

[1.10–3.02])Ank

le/

foot

(OR

1.82

[1.05–

3.16])

�6/9

Poor

��

(noinfo

onthe

measurementprop.of

thecomparatorinstr.)

Rot�es-Q

u�erol(0–11)

Bulbena

etal.

(1992)

n¼30

(meanage

41yrs.

[sym

p.],

48yrs.

[con

trol])

Kappa:

Mod

erate/

substantial,

substantial/

almostperfect

–Po

or(onlyon

emeasurement,

lackingof

info

abou

tsubject

eligibility

criteria)

Hyp.test.

(RQ

vs.

CW/B

S)

PearsoncorrelationHigh

(RQ

vs.BS)

High

(RQ

vs.CW)

–Po

or��

(other

imp.

meth.flaws:

subjecteligibility

criteriainadeq.

described/lackingfor

thecontrolgrou

p)

Juul- Kristensen

etal.(2007)

(3tests)

n¼40

(meanage

34yrs

[sym

p.],

46yrs

[asymp.])

Kappa:Fair/

mod

erate,

mod

erate/

substantial

(curr)fair/

mod

erate,

substantial/

almostperfect

(curr/hist)

ICC:

Excellent

–Po

or�

(onlyon

emeasurement)

––

––

continued

RESEARCH REVIEW AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) 139

TABLEIII.

(Continued)

Reliability

Validity

Assessm

ent

metho

ds/

reference

Stud

y(n)/

age

Design/result

Cut-offscore

COSM

INscore

Reference

Stud

y(n)/age

Design/result

Cut-off

score

COSM

INscore

(curr/hist)

HospitaldelMar

(0–10)

Design

Bulbena

etal.

(1992)

n¼30

(meanage

41yrs.

[sym

p.],

48yrs.

[con

trol])

Kappa:

Substantial/

almostperfect

–Po

or(onlyon

emeasurement,

lackingof

info

abou

tsubject

eligibility

criteria)

––

––

Chahalet

al.

(2010)

n¼57/92

(Meanage

24yrs,

SD:0.32)

––

–Hyp.test.

(HdM

and

injuries)

OddsRatio:

AllGLL

�(O

R:2.79

[1.27–6.09])

GLL

þER�

(OR:3.6

[1.49–8.68])

Men G

LL�(O

R:7.43

[2.13–25.57])

GLL

þER�[O

R:

6.75

(1.92–23.36])

�4/10

(men)

�5/10

(wom

en)

Fair

Questionn

aires

5-partqu

estio

nnaire

(0–5)

Bulbena

etal.

(2014)

(þ2Q

)

n¼33

(meanage

35yrs)

ICC:Excellent

Tau-kendall

index:

Very

high

corr.

� 3/7

Poor

� (onlyon

emeasurement)

Hyp.test.

(5PQ

vs.

HdM

)

Spearm

anscorrelation

5PQþ2

QandHDM:

(rho

¼0.75;

P<0.001)

5PQþ2Q

and5P

Q:

(rho

¼0.86;

P<0.001)

�3/7

Poor

��

(noinfo

onthe

measurementprop.of

thecomparatorinstr.)

continued

140 AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) RESEARCH REVIEW

TABLEIII.

(Continued)

Reliability

Validity

Assessm

ent

metho

ds/

reference

Stud

y(n)/

age

Design/result

Cut-offscore

COSM

INscore

Reference

Stud

y(n)/age

Design/result

Cut-off

score

COSM

INscore

Moraeset

al.

(2011)

n¼211

(range

17–24

yrs)

Kappa:

Substantial

(Q1,

Q2,

Q3)

Mod

erate

(Q4,

Q5)

�2/5

Poor

(onlyon

emeasurement)

n¼394

Hyp.test.

(5PQ

vs.

BS)

Criterion

validity

%agreem

ent73.6%

Sens.(m

ean)

0.69

Spec.(m

ean)

0.75

�2/5

Fair (other

minor

meth.

flaws:inadequate

descriptionon

demog

raph

icdetails,

noblinding)

Fair (o

ther

minor)

––

––

Hakim

and

Grahame

(2003)

n¼212

(range

15–80

yrs)

Hyp.test.

(5PQ

vs.

BJH

S)

Criterion

validity

Sens.(m

ean)

77–85%

(1stand2n

dcoho

rt)

Spec. 80–89%

(1stand2n

dcoho

rt)

�2/5

Poor

��

(noinfo

onthe

measurementprop.of

thecomparatorinstr.)

Fair (o

ther

minor

meth.

flaws:inadequate

descriptionon

demog

raph

icdetails,

nodescriptionof

blinding)

––

––

Mulveyet

al.

(2013)

n¼2,354

(medianage

55yrs,

range

25–107yrs)

Hyp.test.

(5PQ

vs.

CWP)

Relativerisk

ratio

:JH

andCWP

gradeI:(R

RR

1.2;

P¼0.03

–mod

est

ass.)

gradeII:(R

RR

1.2,

P¼0.1–mod

est

ass.,

notstat.sig

n.)

gradeIII/IV:(R

RR

1.4;

P¼<0.001)

�2/5

Fair

continued

RESEARCH REVIEW AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) 141

TABLEIII.

(Continued)

Reliability

Validity

Assessm

ent

metho

ds/

reference

Stud

y(n)/

age

Design/result

Cut-offscore

COSM

INscore

Reference

Stud

y(n)/age

Design/result

Cut-off

score

COSM

INscore

––

––

Sancheset

al.

(2014)

n¼2300

(meanage21

yrs,SD

:3.25)

Hyp.test.

(5PQ

vs.

anxiety

disease)

Pearsons

correlation

5PQ

andBAItotal

score(W

omen:

r¼0.11,P¼0.007,

weakass)(M

en:

r¼0.04,P¼0.450,

notsig

nass)

�2/5

Fair

Beightonself-repo

rted

(0–9)

––

––

Naaletal.(2014)

n¼55

(meanage

28.5,SD

:4.1yrs)

Hyp.test.

(BSs-rvs

BS)

Kappa-values(L

w)To

tal:

0.82

(0.72–0.91)

Single

tests(range):

0.61–0.96

�4/9

Poor

��

(noinfo

onthe

measurementprop.of

thecomparatorinstr.)

BS,BeightonScore;CW,C

arter&Wilkinson;

HDM,H

ospitaldelMar;R

Q,R

ot�es-Q

u�erol;LL

AS,Lo

wer

LimbAssessm

entS

core;5PQ

,5-partq

uestionn

aire;m

easure,m

easurement;

meth,

metho

dological;prop,prop

erties;GJH

,generalized

jointhyperm

obility,HMS,

hyperm

obility

synd

rome;

GLL

,generalized

ligam

entous

laxity;JH

,jointhyperm

obility;BJH

S,benign

jointhypermob

ilitysynd

rome;hyp,hypo

thesis;

instr,instruments;scap,scapular;lax,laxity

;ROM,range

ofmotion;incr,increased;s-r,self-repo

rted;p,persistent;r,recurrent;inc,

incidence;classif,classified;cat,categorized;diff,difference;sign,sig

nificant;traum,traum

atic;G

JI,gleno

humeraljointinstability;ER,externalrotation;ASD

,Primaryanterior

shou

lder

142 AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) RESEARCH REVIEW

reported on 5PQ, which shows to be apromising assessment method for futurepopulation studies, where differentdomains of validity on GJH thus, canbe studied more carefully. The addi-tional questionnaire, BS-self, may alsoseem promising, as it contains illustra-tions of the test procedures for each ofthe BS tests. However, the questionnaireassessment methods need more evalua-tion before they can be used clinically,since very few studies have reportedmeasurement properties on their reli-ability and validity.

The current review covers a widerange of populations, children, andadults (for adults comprising 64%[7/11] on reliability, and 62% [16/26]on validity), men and women (threestudies on separately men or women),and different ethnic groups (mostlyCaucasian, few on American/Cana-dian/Brazilian). For children, only BShas been used, while for adults, differ-ent test and questionnaire assessmentmethods have been used, though, stillmostly BS, and 5PQ in populationstudies.

The current review demonstratescut-points varying for the differentclinical assessment methods. In theadult population, when using nine testsfor BS, mostly one cut-point was usedfor classifying GJH varying between 4and 6, but one study used 2/9[Cameron et al., 2010]. However,also two cut-points were used, with alower cut-point for “tight/not hyper-mobile” individuals varying between 1and 4, and an upper cut-point for“hypermobile/extremely hypermo-bile” individuals varying between 4and 7 [Boyle et al., 2003; Aslan et al.,2006; Hirsch et al., 2008; Rousselet al., 2009]. For the questionnaireassessment methods only one cut-pointwas used for classifying GJH, varyingbetween 2 and 3 and with a differenttotal score varying between 5 and 7[Hakim and Grahame, 2003; Bulbenaet al., 2014].

Generally, for adults, one cut-point, varying from 4 to 5 was usedin BS (4/9 and 5/9), and 2/5 in 5PQhave been used. For children, one cut-point varying from 5 to 7 was used in

TABLE IV. Levels of Evidence of Included Studies

Reliability Validity

Intra/inter Quality/(n)/pop Hypothesis testing Quality/(n)/pop

Clinical assessment toolsBeighton Aslan06: þþ P (72) vs. other tests:

Boyle03: þþ P (42/36) (CW/RQ) Bulbena92: þ P��� (173)Bulbena92: þ P (30) (LLAS) Ferrari05: þ F (225) CErkula05: þþ P� (50) C (A/B) Junge13: � F (103) CHansen02: � P (100) C (þ) to (þ/�) LimitedHicks03: þ P� (50) pos. to conflictingHirsch07: þþ P� (50)Junge13: � P� (39) C vs. ROM:Juul-Kr07: þ P� (40) (Trunkrot.) Erkula05: � P (1273) CKarim11: � P� (30) (Hip) Naal14: � P�� (55)Mikkelss.96:þþ P (29/13) C (k/a lax) Pearsall06: � F (57)

(sh lax) Sauers01: � F (51)Ia (gon) Smits-Eng11: þ P (551) C(þ) Limited pos. (�) to (þ) Limited neg. to

conflictingIe(þ) to (þ/�) Limited pos. toConflicting

vs. pain:F (430) C(p/r) El-Metwally04: þ

(LLP) El-Metwally05: þ F (1284) C(inc) El-Metwally07: � F (1113) C(inc) Sohrb.-Nøhr14: þ F (301) C(s/k/a)(inc/p)Tobias13:þ P�� (2901) C(þþ) to (þ/�)Moderate pos. to Conflicting

BS vs. injuries:(GI) Cameron10: þ(KI) Junge15: �

F (714)C

(PI) Roussel09: �P�� (999)

(þ/�) ConflictP (58)

vs. diseases:(TMs) Hirsch08: þ(CFS) Nijs04: �

F (895)

(ACprev) Terzi13: þF (44)

(þ)to (þ/�) LimitedP�� (240)

pos. to conflicting

Carter & Wilkinson Bulbena92: þ P (30) vs. other tests:P��� (173)(BS/RQ) Bulbena92: þInter: Unknown

UnknownRot�es-Qu�erol Bulbena92: þ P (30) vs. other tests:

Juul-Kr07: þ (3) P�(40) (CW/BS) Bulbena92: þ P��� (173)

Inter: Unknown Unknown

Hospital del Mar Bulbena92: þ P (30) (ASD) Chahal10: þ F (149)

Inter: Unknown Unknown

continued

RESEARCH REVIEW AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) 143

TABLE IV. (Continued)

Reliability Validity

Intra/inter Quality/(n)/pop Hypothesis testing Quality/(n)/pop

Questionnaires5-part Q. Bulbena14: þ

Moraes11: �Retest: (þ/�)Conflicting

P� (33)P� (211)

vs. other tests:(HdM) Bulbena14: þ(BS) deMoraes11: �/þ(þ) to (þ/�) Limitedpos. to Conflicting

P�� (191)F (394)

Criterion val.(BS) deMoraes11: �(BS) Hakim03:?Unknown

F (394)F (489)

vs. pain/tissue diseases:(CWP) Mulvey13: þ (BJHS)Hakim03: þ(þ) Limited pos.

F (2354)P�� (489)

vs. anxiety/psych. disease:(anx) Sanches14: �Unknown

F (2300)

BS-self-reported vs. tests:(BS) Naal14: þUnknown

P�� (55)

Ia, Intra-rater; Ie, Inter-rater; P, poor; F, fair; pop, population; BS, Beighton Score; CW, Carter & Wilkinson; HdM, Hospital del Mar; RQ,Rot�es-Qu�erol; 5-part Q, 5-part questionnaire; LLAS, Lower Limb Assessment Score; k/a lax, Knee/ankle laxity; sh lax, Shoulder laxity; gon,goniometry; p, persistent; r, recurrent; inc, incidence; s/k/a, shoulder/knee/ankle; GI, Glenohumeral joint instability; Y, youth; C, children,AID, Primary traumatic anterior shoulder dislocation; KI, knee injuries; PI, pelvic injuries; TMs, Temporomandibular symptoms; CFS, chronicfatigue syndrome; ACprev, Adhesive capsulitis prevalence; ASD, Anterior Shoulder Dislocation; CWP, chronic widespread pain; BJHS, benignjoint hypermobility syndrome; Anx, anxiety; 92, Superscripts shows year of publication.�Reliability studies rated poor on the basis of one single rating, based on ”only one measurement.”��Validity studies rated poor on the basis of one single rating “no information on the measurement properties of the comparator instrument(s).”

144 AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) RESEARCH REVIEW

BS (5/9, 6/9, and 7/9). In one testassessment method the total score was11 with a cut-point of 7/11 forclassifying GJH based on only tests inthe lower extremities [Ferrari et al.,2005], and another study used two cut-points the lower being 5/9 and theupper being 7/9 [Smits-Engelsmanet al., 2011]. Since JHS and hEDS inthe current review are recognized asone and the same condition, a specificcut-point needs to be decided, and 5/9may be suggested for future use inadults. However, since joint mobility,and therefore, BS is known to decreaseby age [Remvig et al., 2007b], there isa need for adults also to includeadditional historical information, as

described in the appendix of thereliability study using the BS, withphrasing “can you now or have youpreviously been able to . . . ” [Juul-Kristensen et al., 2007], and in thestudy describing 5PQ [Hakim andGrahame, 2003].

Generally, for adults, one cut-point, varying from 4 to 5 wasused in BS (4/9 and 5/9), and2/5 in 5PQ have been used.For children, one cut-point

varying from 5 to 7 was usedin BS (5/9, 6/9, and 7/9).

Since children have individualgrowth periods, this may be the reasonfor using two cut-points (a lower and anupper) as recently suggested [Smits-Engelsman et al., 2011], and therefore,the upper cut-point is suggested to be atleast 6/9 as used in previous populationstudies [El-Metwally et al., 2004, 2005,2007; Tobias et al., 2013].

Warming-up before performingflexibility tests may influence theoutcome of a test assessment method.However, almost no studies reportedwhether participants did warm-up, and

RESEARCH REVIEW AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) 145

the influence of such performance istherefore, unknown.

This review highlights a numberof areas warranting future research.Because of the limited studies on theclinical assessment methods for classi-fying GJH, more high quality studies,and especially those evaluating aspectsof validity are required (concurrent,predictive, measurement error, respon-siveness, and interpretability). Addi-tional clinical test assessment methodsmay further be considered in order tosupport and endorse the presence ofGJH in the diagnostic procedure ofheritable connective tissue disorders.Also of importance is that consensus iswarranted regarding selection of spe-cific test and questionnaire assessmentmethods for classifying GJH, the testperformance, and the cut-points bywhich age, gender, and ethnicity maybe taken into account.

Limitations of the study are thesmall amount of studies, for whichreason it was decided only to ratereliability (intra- and inter-rater) andvalidity (hypothesis testing or criterionvalidity). Use of COSMIN is recom-mended to be the best evaluationmethod until now, as has also beenused previously to evaluate clinical testassessment methods [Kroman et al.,2014; Larsen et al., 2014]. However,since the COSMIN originally wasdesigned for the evaluation of pa-tient-reported outcomes, there aresome adjustments that need to beconsidered when using COSMIN forclinical test assessment methods. Forexample, although rating of the num-ber of measurements taken may beuseful in settings with continuous scalesas in performance-based methods,rating scales in many clinical assessmentmethods (test or questionnaire) aredichotomous (positive/negative). Toadjust for this shortcoming, the presentreview adjusted the evaluation ofmethodological quality, correspondingto when “only one measurement” wasrated poor in reliability, the study wasupgraded from poor to fair, meaningthat the study thereby could beincluded in the best evidence synthesis.Furthermore, the sample size in clinical

studies is often much smaller than inquestionnaire studies, and therefore, itmay be suggested that minor samplesizes should not be rated that strictly asin questionnaire assessment methods,when studying clinical test assessmentmethods. For validity studies, one poorrating including “no information onthe measurement properties of thecomparator instrument” or “subjecteligibility criteria inadequately de-scribed/lacking,” allowed upgradingto fair, and the study could therebybe included in the best evidencesynthesis.

Strengths of this review are thesystematic and rigid use of recom-mended strategies for systematic re-views of clinical assessment methods,the evaluation of their clinimetricproperties and rating of the bestevidence synthesis.

CONCLUSION

In the current review, four test and twoquestionnaire assessment methods forclassifying GJH were found with mea-surement properties of varying method-ological strength and results of varyingweight. Most of the studies used the BS.The inter-rater reliability of this methodseems acceptable to be used in clinicalpractice, provided that uniformity oftesting procedures are included in thetesting procedures, in addition to his-torical information, especially in adults.However, shortcomings were found instudies on the validity of BS, while thethree other test assessment methods(CW, RQ, HdM) lack satisfactoryinformation on both reliability andvalidity. Regarding questionnaire assess-ment methods, 5PQ is the most fre-quently method used, however, only inadult population studies. In conclusionprovided uniformity of testing proce-dures, the recommendation for clinicaluse in adults is BS with cut-point of 5 of9 including historical information,while in children it is BS with cut-pointof at least 6 of 9. However, more studiesare needed to conclude, especially onthe validity properties of these assess-ment methods, and before evidence-based recommendations can bemade for

clinical use on the “best” assessmentmethod for classifying GJH.

In the current review, four testand two questionnaireassessment methods for

classifying GJH were foundwith measurement propertiesof varying methodological

strength and results of varyingweight. Most of the studies

used the BS.

AUTHORS’CONTRIBUTIONS

BJK contributed to conception anddesign of the study, including analysisand interpretation of data, writing of thearticle, critical revision of the article forimportant intellectual content, and finalapproval of the article. KS contributedto the collection and assembly of data,analysis and interpretation of data, andcritical revision of the article forimportant intellectual content and finalapproval of the article. RE and HLcontributed to conception and design ofthe study including analysis and inter-pretation of data, critical revision of thearticle for important intellectual con-tent, and final approval of the article. LRcontributed to interpretation of data,critically revision of the article forimportant intellectual content, and finalapproval of the article. First and lastauthors take responsibility for the integ-rity of the work as a whole, frominception to finished article.

ACKNOWLEDGMENT

Dr. JaimeBravo,Chile, andHylkeBosma,previous patient representative of theEhlers–Danlos Syndrome organization,the Netherlands, are acknowledged fortheir initial participation in this work andthe Beighton group.

146 AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) RESEARCH REVIEW

REFERENCES

Aslan BA, Celik E, Cavlak U, Akdag B. 2006.Evaluation of inter-rater and intra-raterreliability of BEighton and Horan JointMobility Index. Fizyoterapi Rehabil17:113–119.

Beighton P, De Paepe A, Steinmann B, TsipourasP, Wenstrup RJ. 1998. Ehlers–Danlos syn-dromes: Revised nosology, villefranche,1997. Ehlers–Danlos national foundation(USA) and Ehlers–Danlos support group(UK). Am J Med Genet 77:31–37.

Beighton P, Solomon L, Soskolne CL. 1973.Articular mobility in an African population.Annl Rheum Dis 32:413–418.

Boyle KL, Witt P, Riegger-Krugh C. 2003. Intra-rater and inter-rater reliability of theBeighton and Horan Joint Mobility Index.J Athl Train 38:281–285.

Bulbena A, Duro JC, Porta M, Faus S, Vallescar R,Martin-Santos R. 1992. Clinical assessmentof hypermobility of joints: Assemblingcriteria. J Rheumatol 19:115–122.

Bulbena A, Mallorqui-Bague N, Pailhez G,Rosado S, Gonzalez I, Blanch-Rubio J,Carbonell J. 2014. Self-reported screeningquestionnaire for the assessment of JointHypermobility Syndrome (SQ-CH), a col-lagen condition, in Spanish population. Eur JPsychiat 28:17–26.

Cameron KL, Duffey ML, DeBerardino TM,Stoneman PD, Jones CJ, Owens BD. 2010.Association of generalized joint hypermo-bility with a history of glenohumeral jointinstability. J Athl Train 45:253–258.

Castori M, Dordoni C, Valiante M, Sperduti I,Ritelli M, Morlino S, Chiarelli N, CellettiC, Venturini M, Camerota F, Calzavara-Pinton P, Grammatico P, Colombi M. 2014.Nosology and inheritance pattern(s) of jointhypermobility syndrome and Ehlers–Danlossyndrome, hypermobility type: A study ofintrafamilial and interfamilial variability in23 Italian pedigrees. Am JMed Genet Part A164A:3010–3020.

Chahal J, Leiter J, McKeeMD,Whelan DB. 2010.Generalized ligamentous laxity as a predis-posing factor for primary traumatic anteriorshoulder dislocation. J Shoulder Elbow Surg19:1238–1242.

El-Metwally A, Salminen JJ, Auvinen A,Kautiainen H, Mikkelsson M. 2004. Prog-nosis of non-specific musculoskeletal painin preadolescents: A prospective 4-yearfollow-up study till adolescence. Pain110:550–559.

El-Metwally A, Salminen JJ, Auvinen A, Kautiai-nen H, Mikkelsson M. 2005. Lower limbpain in a preadolescent population: Progno-sis and risk factors for chronicity—Aprospective 1- and 4-year follow-up study.Pediatrics 116:673–681.

El-Metwally A, Salminen JJ, Auvinen A, Macfar-lane G,MikkelssonM. 2007. Risk factors fordevelopment of non-specific musculoskele-tal pain in preteens and early adolescents: Aprospective 1-year follow-up study. BMCMusculoskelet Disord 8:46.

Erkula G, Kiter AE, Kilic BA, Er E, Demirkan F,Sponseller PD. 2005. The relation of jointlaxity and trunk rotation. J Pediatr OrthopPart B 14:38–41.

Ferrari J, Parslow C, Lim E, Hayward A. 2005.Joint hypermobility: The use of a newassessment tool to measure lower limbhypermobility. Clin Exp Rheumatol 23:413–420.

Fleiss JL. 1986. Reliability of measurement. Thedesign and analysis of clinical experiments.New York: John Wiley and Sons. pp. 1–32.

Grahame R, Bird HA, Child A. 2000. The revised(Brighton 1998) criteria for the diagnosis ofbenign joint hypermobility syndrome(BJHS). J Rheumatol 27:1777–1779.

Hakim AJ, Grahame R. 2003. A simple question-naire to detect hypermobility: An adjunct tothe assessment of patients with diffusemusculoskeletal pain. Int J Clin Pract 57:163–166.

Hansen A, Damsgaard R, Kristensen JH, BaggersJ, Remvig L. 2002. Interexaminer reliabilityof selected tests for hypermobility. J OrthopMed 25:48–51.

Hicks GE, Fritz JM, Delitto A, Mishock J. 2003.Interrater reliability of clinical examinationmeasures for identification of lumbar seg-mental instability. Arch Phys Med Rehab84:1858–1864.

Hirsch C, Hirsch M, John MT, Bock JJ. 2007.Reliability of the Beighton HypermobilityIndex to determinate the general joint laxityperformed by dentists. J Orof Orthop68:342–352.

Hirsch C, John MT, Stang A. 2008. Associationbetween generalized joint hypermobilityand signs and diagnoses of temporomandib-ular disorders. Eur J Oral Sci 116:525–530.

Jansson A, Saartok T, Werner S, Renstrom P. 2004.General joint laxity in 1845 Swedish schoolchildren of different ages: Age- and gender-specific distributions. Acta Paediatr 93:1202–1206.

Junge T, Jespersen E, Wedderkopp N, Juul-Kristensen B. 2013. Inter-tester reproduc-ibility and inter-method agreement of twovariations of the Beighton test for determin-ing Generalised Joint Hypermobility inprimary school children. BMC Pediatr13:214.

Junge T, Runge L, Juul-Kristensen B, Wedder-kopp N. 2015. The extent and risk of kneeinjuries in children aged 9–14 with Gener-alised Joint Hypermobility and kneejoint hypermobility—The CHAMPS-studyDenmark. BMC Musculoskelet Disord15:1–11.

Juul-Kristensen B, Rogind H, Jensen DV, RemvigL. 2007. Inter-examiner reproducibility oftests and criteria for generalized joint hyper-mobility and benign joint hypermobilitysyndrome. Rheumatology 46:1835–1841.

Karim A, Millet V, Massie K, Olson S, Mor-ganthaler A. 2011. Inter-rater reliability of amusculoskeletal screen as administered tofemale professional contemporary dancers.Work 40:281–288.

Kroman SL, Roos EM, Bennell KL, Hinman RS,Dobson F. 2014. Measurement properties ofperformance-based outcome measures toassess physical function in young and mid-dle-aged people known to be at high risk ofhip and/or knee osteoarthritis: A systematicreview. Osteoarthr Cartil 22:26–39.

LarsenCM, Juul-KristensenB, LundH, SogaardK.2014. Measurement properties of existingclinical assessment methods evaluating

scapular positioning and function. A system-atic review. Physiother Theory Pract30:453–482.

Landis JR, Koch GG. 1977. The measurement ofobserver agreement for categorical data.Biometrics 33:159–174.

Mikkelsson M, Salminen JJ, Kautiainen H. 1996.Joint hypermobility is not a contributingfactor to musculoskeletal pain in pre-adolescents. J Rheumatol 23:1963–1967.

Moher D, Liberati A, Tetzlaff J, Altman DG. 2009.Preferred reporting items for systematicreviews and meta-analyses: The PRISMAstatement. J Clin Epidemiol 62:1006–1012.

Mokkink LB, Terwee CB, Knol DL, Stratford PW,Alonso J, Patrick DL, Bouter LM, de VetHC. 2010. The COSMIN checklist forevaluating the methodological quality ofstudies on measurement properties: Aclarification of its content. BMC Med ResMethodol 10:22.

Moraes DA, Baptista CA, Crippa JA, Louzada-Junior P. 2011. Translation into BrazilianPortuguese and validation of the five-partquestionnaire for identifying hypermobility.Rev Bras Reumatol 51:53–69.

Mulvey MR, Macfarlane GJ, Beasley M, Sym-mons DP, Lovell K, Keeley P, Woby S,McBeth J. 2013. Modest association of jointhypermobility with disabling and limitingmusculoskeletal pain: Results from a large-scale general population-based survey. Ar-thritis Care Res (Hoboken) 65:1325–1333.

Naal FD, Hatzung G, Muller A, Impellizzeri F,LeunigM. 2014. Validation of a self-reportedBeighton score to assess hypermobility inpatients with femoroacetabular impinge-ment. Int Orthop 38:2245–2250.

Nijs J,Meirleir KD,Truyen S. 2004.Hypermobilityin patients with chronic fatigue syndrome:Preliminary observations. J MusculoskeletPain 12:9–17.

Palmer S, Bailey S, Barker L, Barney L, Elliott A.2014. The effectiveness of therapeutic exercisefor joint hypermobility syndrome: A system-atic review. Physiotherapy 100:220–227.

Pearsall AW, Kovaleski JE, Heitman RJ, GurchiekLR, Hollis JM. 2006. The relationshipsbetween instrumented measurements ofankle and knee ligamentous laxity andgeneralized joint laxity. J Sports Med PhysFitness 46:104–110.

Remvig L, Engelbert RH, Berglund B, BulbenaA, Byers PH, Grahame R, Juul-KristensenB, Lindgren KA, Uitto J, Wekre LL. 2011.Need for a consensus on the methods bywhich to measure joint mobility and thedefinition of norms for hypermobility thatreflect age, gender and ethnic-dependentvariation: Is revision of criteria for jointhypermobility syndrome and Ehlers–Danlossyndrome hypermobility type indicated?Rheumatology 50:1169–1171.

Remvig L, Jensen DV, Ward RC. 2007a. Arediagnostic criteria for general joint hyper-mobility and benign joint hypermobilitysyndrome based on reproducible and validtests? A reviewof the literature. J Rheumatol34:798–803.

Remvig L, Jensen DV,Ward RC. 2007b. Epidemi-ologyof general joint hypermobility and basisfor the proposed criteria for benign jointhypermobility syndrome: Review of theliterature. J Rheumatol 34:804–809.

RESEARCH REVIEW AMERICAN JOURNAL OF MEDICAL GENETICS PART C (SEMINARS IN MEDICAL GENETICS) 147

Rombaut L, Malfait F, Cools A, De Paepe A,Calders P. 2010. Musculoskeletal com-plaints, physical activity and health-relatedquality of life among patients with theEhlers–Danlos syndrome hypermobilitytype. Disabil Rehabil 32:1339–1345.

Roussel NA, Nijs J, Mottram S, Van Moorsel A,Truijen S, Stassijns G. 2009. Altered lumbo-pelvic movement control but not general-ized joint hypermobility is associated withincreased injury in dancers. A prospectivestudy. Man Ther 14:630–635.

Sanches SB, Osorio FL, Louzada-Junior P, MoraesD, Crippa JA, Martin-Santos R. 2014.Association between joint hypermobilityand anxiety in Brazilian university students:Gender-related differences. J PsychosomRes 77:558–561.

Sauers EL, Borsa PA, Herling DE, Stanley RD.2001. Instrumented measurement of gleno-humeral joint laxity and its relationship topassive range of motion and generalized jointlaxity. Am J Sports Med 29:143–150.

Schellingerhout JM, Verhagen AP, Thomas S, KoesBW. 2008. Lack of uniformity in diagnosticlabeling of shoulder pain: Time for a differentapproach. Man Ther 13:478–483.

Scheper MC, Engelbert RH, Rameckers EA,Verbunt J, Remvig L, Juul-Kristensen B.

2013. Children with generalised jointhypermobility and musculoskeletal com-plaints: State of the art on diagnostics,clinical characteristics, and treatment. Bi-omed Res Int 2013:121054.

Scheper MC, Juul-Kristensen B, Rombaut L,Rameckers EA, Verbunt J, Engelbert RH.2016. Disability in adolescents and adultsdiagnosed with hypermobility related disor-ders: Ameta-analysis. Arch PhysMedRehabil97:2174–2187.

Smith TO, BaconH, Jerman E, Easton V, ArmonK,Poland F, Macgregor AJ. 2014. Physiotherapyand occupational therapy interventions forpeople with benign joint hypermobilitysyndrome: A systematic review of clinicaltrials. Disabil Rehabil 36:797–803.

Smits-Engelsman B, Klerks M, Kirby A. 2011.Beighton score: A valid measure for gener-alized hypermobility in children. J Pediatr158:119–123,23 e1–4.

Sohrbeck-Nøhr O, Kristensen J, Boyle E, RemvigL, Juul-Kristensen B. 2014. Generalizedjoint hypermobility in childhood is apossible risk for the development of jointpain in adolescence: A cohort study. BMCPediatr 14:1–9.

Terwee CB, Bot SD, de Boer MR, van derWindt DA, Knol DL, Dekker J, Bouter

LM, de Vet HC. 2007. Quality criteriawere proposed for measurement propertiesof health status questionnaires. J ClinEpidemiol 60:34–42.

Terwee CB, Mokkink LB, Knol DL, OsteloRW, Bouter LM, de Vet HC. 2012.Rating the methodological quality insystematic reviews of studies on measure-ment properties: A scoring system for theCOSMIN checklist. Qual Life Res21:651–657.

Terzi Y, Akgun K, Aktas I, Palamar D, Can G.2013. The relationship between generalisedjoint hypermobility and adhesive capsulitisof the shoulder. Turk J Rheumatol28:234–241.

Tinkle BT, Bird HA, Grahame R, Lavallee M,Levy HP, Sillence D. 2009. The lack ofclinical distinction between the hyper-mobility type of Ehlers–Danlos syn-drome and the joint hypermobilitysyndrome (a.k.a. hypermobility syn-drome). Am J Med Genet Part A 149A:2368–2370.

Tobias JH, Deere K, Palmer S, Clark EM, Clinch J.2013. Hypermobility is a risk factor formusculoskeletal pain in adolescence: Find-ings from a prospective cohort study.Arthritis Rheuma 65:1107–1115.