Converting modified health assessment questionnaire (HAQ), multidimensional HAQ, and HAQII scores...

8
Converting Modified Health Assessment Questionnaire (HAQ), Multidimensional HAQ, and HAQII Scores Into Original HAQ Scores Using Models Developed With a Large Cohort of Rheumatoid Arthritis Patients JACLYN ANDERSON, 1 HARLAN SAYLES, 1 JEFFREY R. CURTIS, 2 FRED WOLFE, 3 AND KALEB MICHAUD 4 Objective. The Stanford Health Assessment Questionnaire Disability Index (HAQ) is the gold standard functional status questionnaire in rheumatology, but it is lengthy. Three shorter versions, the modified HAQ (MHAQ), the Multidimen- sional HAQ (MDHAQ), and the HAQII are often used in outcomes research as HAQ substitutes. We developed conversion formulas between these modified versions and the original HAQ. Methods. Analysis was limited to the comparison of rheumatoid arthritis (RA) patients at a random observation when the HAQ was recorded in conjunction with the MHAQ (n 29,596), the MDHAQ (n 13,665), or the HAQII (n 15,823). Development models were randomly limited to 80% of the data (development sample) and the remaining 20% was used for model validation. Results. Two conversion formulas were developed for each of the MHAQ, the MDHAQ, and the HAQII: a short model and a long model inclusive of questions common to both the modified measures and the original HAQ. Short models explained 81– 83%, and long models 82– 86%, of the variance. Predicted HAQ values of zero were assigned to all cases with an MDHAQ or HAQII score of zero, with remaining cases used for model estimation. Bland-Altman plots demonstrated good concordance between actual and predicted values for each measure. The validation sample closely approximated the results from the development sample (0.005 < R 2 < 0.009) for each measure. Conclusion. We have developed and validated highly accurate conversion formulas from the MHAQ, MDHAQ, and HAQII to the original HAQ in a large sample of RA patients. The developed models are useful for conversion of measures in the research setting. Because of substantial variability at the individual patient level, application of the formulas to individual patients is inadvisable. INTRODUCTION As the gold standard functional status questionnaire in rheumatology, the Stanford Health Assessment Question- naire Disability Index (HAQ) is used in most clinical trials and observational outcome studies (1,2) and is recom- mended by the American College of Rheumatology for measurement of physical function (3). While originally conceived as a measurement of patient outcome in rheu- matoid arthritis (RA) patients (4), the HAQ has been suc- cessfully applied to a variety of rheumatic diseases (5). Additionally, the HAQ has been shown to distinguish between placebo and treatment groups (6,7), with changes over time agreeing with and augmenting clinical and lab- oratory evidence of change (8 –11). The HAQ is also the best predictor of mortality (12), work disability (13), joint replacement (14), and medical costs (15) as compared with other measures of RA-related disease activity. Since the inception of the original HAQ, several modi- fied versions have been created in an effort to improve the precision of information gained and/or to reduce the length of the original 41-item instrument. The HAQ asks the patient to rate, on a 4-point ordered-category item scale, the degree of difficulty they have experienced over the last week with each of 20 tasks grouped into 8 func- 1 Jaclyn Anderson, DO, MS, Harlan Sayles, MS: University of Nebraska Medical Center, Omaha; 2 Jeffrey R. Curtis, MD, MPH: University of Alabama at Birmingham; 3 Fred Wolfe, MD: National Data Bank for Rheumatic Diseases, Wichita, Kansas; 4 Kaleb Michaud, PhD: University of Nebraska Med- ical Center, Omaha, and National Data Bank for Rheumatic Diseases, Wichita, Kansas. Address correspondence to Kaleb Michaud, PhD, 986270 Nebraska Medical Center, Omaha, NE 68198-6270. E-mail: [email protected]. Submitted for publication September 18, 2009; accepted in revised form May 12, 2010. Arthritis Care & Research Vol. 62, No. 10, October 2010, pp 1481–1488 DOI 10.1002/acr.20265 © 2010, American College of Rheumatology ORIGINAL ARTICLE 1481

Transcript of Converting modified health assessment questionnaire (HAQ), multidimensional HAQ, and HAQII scores...

Converting Modified Health AssessmentQuestionnaire (HAQ), Multidimensional HAQ, andHAQII Scores Into Original HAQ Scores UsingModels Developed With a Large Cohort ofRheumatoid Arthritis PatientsJACLYN ANDERSON,1 HARLAN SAYLES,1 JEFFREY R. CURTIS,2 FRED WOLFE,3 AND KALEB MICHAUD4

Objective. The Stanford Health Assessment Questionnaire Disability Index (HAQ) is the gold standard functional statusquestionnaire in rheumatology, but it is lengthy. Three shorter versions, the modified HAQ (MHAQ), the Multidimen-sional HAQ (MDHAQ), and the HAQII are often used in outcomes research as HAQ substitutes. We developed conversionformulas between these modified versions and the original HAQ.Methods. Analysis was limited to the comparison of rheumatoid arthritis (RA) patients at a random observation whenthe HAQ was recorded in conjunction with the MHAQ (n � 29,596), the MDHAQ (n � 13,665), or the HAQII (n � 15,823).Development models were randomly limited to 80% of the data (development sample) and the remaining 20% was usedfor model validation.Results. Two conversion formulas were developed for each of the MHAQ, the MDHAQ, and the HAQII: a short model anda long model inclusive of questions common to both the modified measures and the original HAQ. Short models explained81–83%, and long models 82–86%, of the variance. Predicted HAQ values of zero were assigned to all cases with anMDHAQ or HAQII score of zero, with remaining cases used for model estimation. Bland-Altman plots demonstrated goodconcordance between actual and predicted values for each measure. The validation sample closely approximated theresults from the development sample (0.005 < �R2 < 0.009) for each measure.Conclusion. We have developed and validated highly accurate conversion formulas from the MHAQ, MDHAQ, andHAQII to the original HAQ in a large sample of RA patients. The developed models are useful for conversion of measuresin the research setting. Because of substantial variability at the individual patient level, application of the formulas toindividual patients is inadvisable.

INTRODUCTION

As the gold standard functional status questionnaire inrheumatology, the Stanford Health Assessment Question-naire Disability Index (HAQ) is used in most clinical trialsand observational outcome studies (1,2) and is recom-mended by the American College of Rheumatology for

measurement of physical function (3). While originallyconceived as a measurement of patient outcome in rheu-matoid arthritis (RA) patients (4), the HAQ has been suc-cessfully applied to a variety of rheumatic diseases (5).Additionally, the HAQ has been shown to distinguishbetween placebo and treatment groups (6,7), with changesover time agreeing with and augmenting clinical and lab-oratory evidence of change (8–11). The HAQ is also thebest predictor of mortality (12), work disability (13), jointreplacement (14), and medical costs (15) as compared withother measures of RA-related disease activity.

Since the inception of the original HAQ, several modi-fied versions have been created in an effort to improve theprecision of information gained and/or to reduce thelength of the original 41-item instrument. The HAQ asksthe patient to rate, on a 4-point ordered-category itemscale, the degree of difficulty they have experienced overthe last week with each of 20 tasks grouped into 8 func-

1Jaclyn Anderson, DO, MS, Harlan Sayles, MS: Universityof Nebraska Medical Center, Omaha; 2Jeffrey R. Curtis, MD,MPH: University of Alabama at Birmingham; 3Fred Wolfe,MD: National Data Bank for Rheumatic Diseases, Wichita,Kansas; 4Kaleb Michaud, PhD: University of Nebraska Med-ical Center, Omaha, and National Data Bank for RheumaticDiseases, Wichita, Kansas.

Address correspondence to Kaleb Michaud, PhD, 986270Nebraska Medical Center, Omaha, NE 68198-6270. E-mail:[email protected].

Submitted for publication September 18, 2009; acceptedin revised form May 12, 2010.

Arthritis Care & ResearchVol. 62, No. 10, October 2010, pp 1481–1488DOI 10.1002/acr.20265© 2010, American College of Rheumatology

ORIGINAL ARTICLE

1481

tional areas, with scores further adjusted based on anadditional 21 questions regarding the use of companionaids or devices. Scores are then converted into an overallmean score ranging from 0–3, with 0 indicating no func-tional impairment and 3 indicating complete impairment(4,5,16).

The modified HAQ (MHAQ), the MultidimensionalHAQ (MDHAQ), and the HAQII are the most prominent ofthese attempts at improvement and are often used in out-comes research as HAQ substitutes without demonstratedequivalence; these may be reported simply as “HAQ”scores without specifying which instrument was used.Additionally, some studies initially use one HAQ versionand later switch to another. Although these instrumentsare at times used interchangeably (17), it is difficult tocompare summary scores between instruments due to thevariable psychometric properties each possesses. One ex-ample of such a problem is the “floor effect” phenomenon,which has been observed with varying degrees in eachversion of the HAQ. The floor effect is observed when apatient has a completely normal score (0.0) on the instru-ment despite some functional limitations (18). In otherwords, if an instrument has a floor effect, it cannot dis-criminate between individuals that have relatively good(but not perfect) function. Even the original HAQ deviatesfrom a normal distribution at values near zero (19) and hasbeen shown to demonstrate failure to detect clinical im-provement in up to 10% of patients (19–21). Since theadvent of more effective pharmacologic therapies, and theincreasing use of those therapies in patients with milderdisease, it is likely the pervasiveness of a floor effect hasincreased over time, particularly as shorter variations ofthe original HAQ have come into more common usagewith assessment of fewer items potentially missing subtlefunctional impairment.

Conceived as a shortened version of the HAQ, theMHAQ asks patients to answer 8 questions, 1 in each of the8 functional areas explored with the HAQ (22). The MHAQassesses the degree of change in difficulty with specifictasks over the preceding 3 months, and is therefore subjectto recall bias, although it has been shown that the MHAQis correlated with HAQ change scores (23). For comparisonwith the original HAQ, MHAQ scores are converted to arange between 0 and 3. However, while correlated withHAQ scores, MHAQ scores lack sensitivity to change (24–26), are routinely lower than HAQ scores by �0.3–0.5units (21,27), and tend to cluster at the lower end of thescale, leading to a non-normal distribution of values(19,20). These observations yield the conclusion that theMHAQ also has a more pronounced “floor effect” than theHAQ, preventing numerical improvement in scores de-spite clinical improvement in function in as many as 25%of patients (19–21,26,28).

The MDHAQ was created as a further modification of theHAQ and was designed with 10 formally scored activityquestions, as well as an additional 3 nonscored items toassess psychological status, with the resultant score againconverted into an overall mean score ranging from 0–3(29,30). The nonscored psychological status questionswere added in an attempt to measure additional healthdimensions (sleep, anxiety, and depression) in addition to

functional status (30), and therefore they may be omittedfrom our comparison of physical function scales. Com-pared with both the HAQ and MHAQ, the MDHAQ dem-onstrates a less prominent floor effect. However, similar tothe HAQ, the MDHAQ deviates from a normal distributionat values near zero (21,30). Additionally, the MDHAQ hasmore even spacing of scores than the HAQ and MHAQ,making a change of 0.5 more similar across the range of thescale (30), although outliers remain (19).

Based on the original HAQ, the HAQII is a 10-itemfunctional questionnaire with scores ranging from 0–3.Shorter and simpler than the HAQ, the HAQII has demon-strated levels of reliability and validity similar to the orig-inal HAQ, and it has a lesser floor effect as compared withthe HAQ and MHAQ, as well as potentially failing todetect clinical improvement in only 5.8% of patients(19,31,32). Like the HAQ, the HAQII deviates from normaldistribution at values near zero (19). Conversion formulasbetween the HAQII and HAQ have been developed previ-ously (19). A key benefit of the HAQII over the MHAQ andthe MDHAQ is that it is more closely correlated with theoriginal HAQ, with the previously derived conversion for-mula from HAQII to HAQ demonstrating R2 � 0.821 (31).Additionally, more even spacing of item difficulties fromRasch analysis produces the greatest uniformity of changeacross the range of the scale as compared with the above-mentioned tools (19,33). To our knowledge, no publishedformulas exist to convert between the MHAQ or theMDHAQ scores and the original HAQ. Because of substan-tial variability in the choice of which HAQ version each ofthe various rheumatic disease registries and trials uses, wesought to develop a method of conversion from MHAQand MDHAQ to the original HAQ, as well as to confirm theprevious work that provided a conversion formula be-tween the HAQII and HAQ. Prior work has not evaluatedthe addition of the individual questions composing theHAQII or addition of the common questions between theHAQII and HAQ on model fit. We examined the effect thatthe addition of these common questions has on the modelsince we expected their inclusion to produce a better con-version formula.

MATERIALS AND METHODS

Since 1998, various versions of the HAQ have been com-pleted a total of 203,041 times by 35,009 unique partici-pants in the National Data Bank for Rheumatic Diseases(NDB) long-term outcomes study. These methods havebeen previously described (15,34). Utilizing previouslycollected NDB data from 1998–2008, analysis was limitedto RA patients residing in the US, Canada, and Puerto Ricoat a time in which the HAQ was asked simultaneouslywith at least one of the MHAQ, MDHAQ, or HAQII. Anal-ysis was limited to questionnaires administered in En-glish, which excluded 68 MHAQ and 70 HAQII question-naires. To prevent bias introduced from multiplemeasurements on individual participants, only one pair ofdata per patient (HAQ and the corresponding MDHAQ,MHAQ, or HAQII) was included based on selection of arandom observation.

1482 Anderson et al

Variables. Descriptive statistics were used to comparescores from the HAQ with scores from the MHAQ,MDHAQ, and HAQII, and typical univariate transforma-tions (e.g., base 10 log, natural log, square root, and sec-ond, third, and fourth order polynomials) of the MHAQ,MDHAQ, and HAQII. Box-Cox transformation of theMHAQ, MDHAQ, and HAQII was also evaluated. Univa-riable regressions were performed with each of a variety ofexplanatory variables and interactions between variablesbelieved to be important indicators of the HAQ score,including domains of demographics, patient habits, co-morbidities, RA-specific factors, and other health-relatedquality of life indicators. Adjusted R2 was used to deter-mine model fit with variables not contributing to themodel, based on an improvement in R2 of at least 0.02,excluded from further analyses. Although sex and agewere not statistically significant in the HAQII model, theywere felt to be of clinically significant importance andwere further evaluated during model development. Wealso evaluated model fit without the addition of age andsex for the MHAQ and MDHAQ models, with addition ofall individual questions composing each of the measures,and with addition of individual question responses com-mon to both the HAQ and each of the MHAQ, MDHAQ, orHAQII to each of the models. Common HAQ categories/questions are: wash (able to wash and dry your body?),dress (able to dress yourself, including shoelaces and but-tons?), cup (able to lift a full cup or glass to your mouth?),faucet (able to turn faucets on and off?), bend (able to benddown and pick up clothing from the floor?), in car (able toget in and out of a car?), bed (able to get in and out of bed?),walk (able to walk outdoors on flat ground?), reach (able toreach up and get a 5-pound object [e.g., a bag of sugar] fromabove your head?), toilet (able to get on and off the toilet?),open car (able to open car doors?), and stand (able to standup from a straight chair?). At least 6 of the 8 commonquestions for MHAQ and MDHAQ, and 4 of the 5 commonquestions for HAQII, were required to be present in eval-uation of the longer models.

Statistical analysis. Model development was based on80% of the data with the remaining 20% used for validat-ing the final models. Predicted values were constrained tothe 0–3 range of the HAQ. If predicted values were �0 or�3 they were replaced with the value 0 or 3, respectively.Quantile-quantile plots (not shown) were used to assist inthe comparison between the predicted HAQ scores de-rived from the model and the actual HAQ scores obtainedat the same point in time. The above analysis was repeatedfor each measure (MHAQ, MDHAQ, and HAQII). As visualinspection suggested slight nonlinearity at values �1 forMHAQ, MDHAQ, and HAQII, splines with knots at 0.125,0.25, 0.5, and 0.75 were used to attempt improvement inmodel fit for each of the measures. Final models werechosen based on the best fit, which was based on improve-ment in adjusted R2 of at least 0.02. Two models weredeveloped for the conversion of each measure (MHAQ,MDHAQ, and HAQII) to the HAQ: a primary (long) modelinclusive of the common questions as described above and

a short model with inclusion of only the measure to betransformed with age and sex if statistically significant.

Further model modifications were investigated once itwas discovered that the original linear models for predict-ing HAQ scores from MDHAQ and HAQII scores predicted0.1% and 0% zeroes, respectively. Zero-inflated normalmodels were evaluated but showed minimal or no im-provement in model fit and did little to address the issue ofa lack of predicted zeroes. Models predictive of HAQ val-ues of zero were developed by assigning predicted HAQvalues of zero to all cases with an MDHAQ or HAQII scoreof zero and then using the remaining cases to estimate alinear regression model as before. This was not done forMHAQ because of the pronounced floor effect of the mea-sure. Predicted HAQ values for the remaining cases wereestimated using the coefficients from these models, andthe squared correlation between the predicted and actualHAQ values for all cases was calculated.

Graphical fits were presented using local polynomialcurves and 95% confidence intervals (95% CIs) with binwidths of 0.1. Bland-Altman concordance statistics wereused to evaluate for concordance between actual HAQvalues and predicted HAQ values for each measure andwere presented graphically with local polynomial curvessuperimposed. Bland-Altman plots allowed us to deter-mine if the differences between the observed and pre-dicted values exhibited any systematic bias over the rangeof possible values. Regression diagnostics were used tolook for multicolinearity by measurement of the varianceinflation factor (VIF), an index that measures how muchthe variance of a coefficient is increased because of colin-earity. For demographic data, differences in sample meansbetween development and validation samples for eachmeasurement tool and differences between samples withmissing and non-missing predictor variables were evalu-ated using 2-sample t-tests. All analysis was performedusing Stata statistical software, version 10.1 (Stata).

RESULTS

Single random observations of individual patients simul-taneously obtained for each of the MHAQ (n � 29,596),MDHAQ (n � 13,665), and HAQII (n � 15,823) at the sametime a HAQ was completed were available. Table 1 dis-plays the characteristics of these patients. No significantdifferences in patient characteristics were observed be-tween the 80% development and the 20% validation sam-ples for each measurement tool. There were small, butimportant, differences in patients with missing data, withall differences occurring when missing predictors repre-sented �3% of the total. The effects of missing data areoutside the scope of our analysis. Box-Cox transformationsdid not improve any of the models to a significant degree.Table 2 displays coefficients and SEs for both the short andlong final models. For clarity, we have included the equa-tions chosen as the best models for each measure withinthe text. We have developed 2 models for conversion backto the HAQ for each measure, a longer version as well as amore parsimonious version, with usage recommendationsdescribed below. For each measure, the predicted values

Transposing the MHAQ, MDHAQ, and HAQII Into HAQ Scores 1483

for the 20% validation sample closely approximated theactual HAQ values and demonstrated a nearly identicalline to the development sample (data not shown). Figure 1

shows the graphical representation of the actual versuspredicted MHAQ, MDHAQ, and HAQII values using the20% validation sample. Bland-Altman plots for long mod-

Table 1. Characteristics of RA patients in model and validation samples*

Characteristic

MHAQ (n � 30,754) MDHAQ (n � 13,764) HAQII (n � 15,929)

80%development

20%validation

80%development

20%validation

80%development

20%validation

HAQ version (range 0–3),mean � SD†

0.57 � 0.53 0.56 � 0.53 0.94 � 0.56 0.95 � 0.56 1.03 � 0.68 1.02 � 0.67

HAQ (range 0–3),mean � SD

1.15 � 0.75 1.14 � 0.74 1.30 � 0.72 1.28 � 0.71 1.06 � 0.74 1.05 � 0.73

Age, mean � SD years 59.79 � 13.73 59.97 � 13.77 58.39 � 13.70 58.08 � 13.46 61.12 � 13.20 61.11 � 13.20Female, % 76.58 77.37 76.74 76.01 78.65 78.39Ethnicity

White, % 87.40 88.13 84.98 86.48 90.24 90.09African American, % 6.11 6.26 7.73 6.96 4.69 4.38Other, % 5.61 6.49 7.29 6.56 5.07 5.53

Married, % 67.93 67.62 ‡ ‡ 67.60 68.69High school, % 87.59 87.64 85.29 86.60 91.08 91.30Disease duration, mean

� SD years13.30 � 10.90 13.45 � 11.02 12.04 � 10.47 12.12 � 10.33 15.18 � 11.11 15.14 � 10.95

Ever smoker, % 42.81 41.48 57.83 58.60 56.25 55.80

* For each measure, all differences of sample means had P � 0.10 between model development and validation samples. RA � rheumatoid arthritis;MHAQ � modified Health Assessment Questionnaire; MDHAQ � Multidimensional HAQ.† HAQ version indicates corresponding MHAQ, MDHAQ, and HAQII.‡ Marital status was not routinely collected at the time MDHAQ questionnaires were asked.

Table 2. Model coefficients (SE) for short and long models with adjusted R2 for development and validation samples*

Model

MHAQ MDHAQ HAQII

Short Long Short Long Short Long

80% development sample, no. 23,664 21,787 10,901 10,047 12,695 12,370�MHAQ† 1.542 (0.005) 1.108 (0.016) N/A N/A N/A N/AMDHAQ N/A N/A 1.109 (0.006) 0.949 (0.023) N/A N/AHAQII N/A N/A N/A N/A 0.998 (0.005) 0.646 (0.010)Age 0.006 (0.000) 0.006 (0.000) 0.004 (0.000) 0.004 (0.000) – –Male �0.239 (0.005) �0.251 (0.005) �0.205 (0.007) �0.225 (0.008) – –Constant �0.138 (0.010) �0.051 (0.010) 0.054 (0.015) 0.081 (0.015) 0.038 (0.006) 0.131 (0.006)Wash N/A 0.058 (0.005) N/A 0.011 (0.007) N/A N/ADress N/A 0.138 (0.005) N/A 0.148 (0.007) N/A N/ACup N/A 0.055 (0.005) N/A 0.014 (0.007) N/A N/AFaucet N/A �0.026 (0.005) N/A �0.040 (0.006) N/A N/ABend N/A 0.057 (0.004) N/A 0.015 (0.006) N/A N/AIn car N/A 0.032 (0.005) N/A 0.016 (0.008) N/A N/ABed N/A �0.024 (0.005) N/A �0.005 (0.007) N/A N/AWalk N/A 0.063 (0.004) N/A �0.016 (0.007) N/A 0.019 (0.005)Reach N/A N/A N/A N/A N/A 0.166 (0.005)Toilet N/A N/A N/A N/A N/A �0.002 (0.006)Open car N/A N/A N/A N/A N/A 0.075 (0.006)Stand N/A N/A N/A N/A N/A 0.138 (0.006)Adjusted R2 development

sample80% 0.810 0.829 0.806 0.821 0.834 0.86020% 0.805 0.824 0.801 0.816 0.825 0.852

* Dependant variable is the Health Assessment Questionnaire (HAQ) for all models. Short indicates most parsimonious model. Long indicates modelinclusive of individual question responses common to both the HAQ and each of the modified HAQ (MHAQ), Multidimensional HAQ (MDHAQ), orHAQII. For short models, 3.5%, 0.7%, and 0.2%, and for long models 11.4%, 8.3%, and 4.5%, of cases for MHAQ, MDHAQ, and HAQII, respectively,were dropped during model development because of missing data on the predictor variables. See Materials and Methods section for descriptions ofvariables. N/A � not applicable.† The square root of MHAQ was used in the conversion model.

1484 Anderson et al

els (Figure 2) show that the actual HAQ values for eachmeasure have slightly higher variability than the predictedHAQ values, with positive correlations for all measures:0.244 for MHAQ, 0.245 for MDHAQ, and 0.205 for HAQII.

MHAQ. In the development sample, the square root ofMHAQ was more closely correlated with HAQ than theuntransformed MHAQ variable (0.881 versus 0.857). Usingthe square root of MHAQ, model fit was better with theinclusion of age and sex (�R2 � 0.03). Splines at the 0.125,0.25, 0.50, and 0.75 levels all improved R2 by 0.01; there-fore, they were not used due to nonsignificant improve-ment in model fit. Adding individual question responsescommon to both the HAQ and the MHAQ to the modelimproved R2 by 0.02. For the above reasons the long modelwas chosen as best:

HAQ � 1.108 �MHAQ 0.006 AGE � 0.251

MALE 0.063 WALK 0.058 WASH 0.138

DRESS 0.055 CUP � 0.026 FAUCET 0.057

BEND 0.032 IN CAR � 0.024 BED � 0.051

The average VIF for the final model was 2.85. As ex-pected, there was some strong, but not perfect, multicolin-earity between the composite score and the individualitems (maximum VIF � 10.00), but the model is still esti-mable. Since we are primarily interested in the model’spredictive power, rather than the meaning of the individ-ual model coefficients, this multicolinearity is not consid-ered to be detrimental to our purposes.

MDHAQ. Numerically transformed versions of theMDHAQ did not significantly improve correlation to HAQscores or model fit; therefore, the untransformed variablewas used for analysis. Model fit was better with inclusionof age and sex (�R2 � 0.02). The use of splines at the 0.125,0.25, 0.50, and 0.75 levels did not significantly improvemodel fit for MDHAQ (�R2 �0.01); therefore, they werenot included in the final model. As compared with theshort model, addition of all individual questions compos-ing the MDHAQ improved R2 by 0.02. Addition of theindividual question responses common to both the HAQand the MDHAQ did not significantly improve R2 (�R2 �0.01). The average VIF for the model, including the ques-tion responses common to both the HAQ and the MDHAQ,was 3.28. Similar to the MHAQ model, the individualitems in the MDHAQ model demonstrated some multico-linearity (maximum VIF � 14.86); however, we again be-lieve that this is not detrimental to our purposes. Usingmodels predictive of zero, the percentage automaticallyscored as zero was 3.6% and 4.0% for short and longmodels, respectively. When rounded to the nearest hun-dredth, the squared correlation between predicted andactual HAQ values for all cases was identical for eachmodel (R2

short � 0.81 and R2long � 0.82), regardless of

inclusion of zero values. For the above reasons, the longmodel (obtained by assigning predicted HAQ values ofzero to all cases with an MDHAQ score of zero and thenusing the remaining cases to estimate a linear regressionmodel) was chosen as best:

HAQ � 0.949 MDHAQ 0.004 AGE � 0.225

MALE 0.011 WASH 0.148 DRESS 0.014

CUP � 0.040 FAUCET 0.015 BEND 0.016

IN CAR � 0.005 BED � 0.016 WALK 0.081

Figure 1. Predicted versus actual of the MHAQ (A), MDHAQ (B),and HAQII (C) using 20% validation sample for long models. Theshaded region corresponds to the 95% confidence interval for themean, with 95% of predicted values falling within the broken greylines. For comparison, the straight line indicates HAQ � pre-dicted HAQ. MHAQ � modified Health Assessment Question-naire; MDHAQ � Multidimensional HAQ.

Transposing the MHAQ, MDHAQ, and HAQII Into HAQ Scores 1485

HAQII. Numerically transformed versions of the HAQIIdid not significantly improve correlation to HAQ scores ormodel fit; therefore, the untransformed variable was usedfor analysis. Inclusion of age and sex into the HAQII modelproduced a nonsignificant improvement (�R2 �0.01). Theuse of splines at the 0.125, 0.25, 0.50, and 0.75 levels didnot significantly improve the model fit (�R2 �0.01). Theaddition of all individual questions composing the HAQII,and the addition of only the individual question responsescommon to both the HAQ and the HAQII, improved R2 by0.03 as compared with the short model; therefore, in favorof parsimony, the model inclusive of only the commonquestions was chosen. The average VIF for the final model,including the question responses common to both theHAQ and the HAQII, was 2.85, with a maximum VIF of5.77, indicating no significant issues with multicolinear-ity. Using models predictive of zero, the percentage auto-matically scored as zero was 7.3% and 7.5% for short andlong models, respectively. The squared correlation be-tween the predicted and actual HAQ values for all caseswas identical for each model (R2

short � 0.83 and R2long �

0.86), regardless of the inclusion of zero values. For theabove reasons, the long model (obtained by assigning pre-dicted HAQ values of zero to all cases with a HAQII scoreof zero and then using the remaining cases to estimate alinear regression model) was chosen as best:

HAQ � 0.646 HAQII � 0.002 TOILET 0.075

OPEN CAR 0.138 STAND 0.019 WALK

0.166 REACH 0.131

DISCUSSION

In this manuscript, we developed conversion formulasfrom the MHAQ, MDHAQ, and HAQII to the HAQ in alarge sample of RA patients. We demonstrated that averageMHAQ and MDHAQ scores were 0.58 and 0.34, respec-tively, lower than HAQ scores, while average HAQII scoreswere only minimally lower (by 0.04) than HAQ scores.Since a change of 0.22 in the HAQ is considered clinicallysignificant (32), these differences are important and illus-trate that the MHAQ and the MDHAQ are not equivalent tothe original HAQ. One strength of our analysis is thecomparison of simultaneously collected measurement ofthe HAQ with the MHAQ, MDHAQ, and HAQII scales.Although our data set includes relatively few values at theupper end of the HAQ scales, which in turn produce lesscertainty in the model at the upper ends of each scale, thisfinding is not different than other studies using these mea-sures (20,30,33,35). Additionally, the clustering of valuesat the lower end of each scale does not significantly impactthe model fit for any of the measures, as evidenced by thenonsignificant changes we found in our attempts to usesplines to correct for this issue. We have demonstratedconsistency between our models as shown by graphicalrepresentation of the 20% validation sample. AlthoughBland-Altman plots show that the actual HAQ values foreach measure have slightly higher variability than thepredicted HAQ values, this effect is minimal, and overallthe plots demonstrate a good level of concordance for eachmeasure when used for population means. While we

would not expect our models to explain 100% of thevariance due to moving from a 41-item to 8- and 10-item

Figure 2. Plot of difference between HAQ and predicted (fitted)HAQ values versus the mean of the observed HAQ and the pre-dicted (fitted) HAQ values for long models in usual Bland-Altman95% confidence interval limits of agreement (broken line) formedwith local polynomial smooth curves (solid line) superimposed.Zero on y-axis � line of perfect average agreement. MHAQ �modified Health Assessment Questionnaire; MDHAQ � Multidi-mensional HAQ.

1486 Anderson et al

questionnaires, the lowest adjusted R2 for our models wasin excess of 0.80, indicating that only up to 20% of thevariance remains unexplained. In contrast with the narrow95% CI of the predicted mean values for HAQ scores, weobserved relatively large 95% CIs for individual predictedvalues (Figure 1). This illustrates that application of ourconversion formulae to the individual patient for clinicalcare purposes is inadvisable. Moreover, prior work alsohas demonstrated that the HAQ and HAQII are not inter-changeable in an individual patient (19,33).

For each measure we have developed 2 models for con-version back to the HAQ: a longer, more explanatory ver-sion, as well as a more parsimonious version. It is likelythat existing data sets will not have all the variables thatwere at our disposal. We believe selection of the appropri-ate conversion model may be based on available data (i.e.,use the short model if individual question responses arenot available) rather than through imputation of data,since there were only small improvements in model fitwhen moving to longer models. In fact, the largest im-provement we found was going from the most parsimoni-ous model to the expanded model for the HAQII, whichdemonstrated an improvement in R2 of 0.03. For theMDHAQ and the HAQII, we found no significant improve-ments over models inclusive of only common questionsbetween measures versus models inclusive of all individ-ual questions composing the measures. Notably, all 8MHAQ questions are taken from the original HAQ withoutmodification or addition of other questions. We foundslightly different coefficients for conversion between theHAQII and the HAQ than with prior work done in thisarea. Wolfe et al (19) had previously reported a conversionequation based on 14,038 observations, of which 10,916were limited to RA (within the same database), whichyielded HAQ � 0.039 0.989 HAQII (the previouslypublished model, HAQ � 0.39 0.989 HAQII, is inerror and should read HAQ � 0.039 0.989 HAQII).

Our updated formula, HAQ � 0.038 0.998 HAQII, isvery similar. The updated formula will lead to slightlyhigher values (our model gives the HAQ values that are1.0% higher), but should be more robust for RA cohortssince our analysis was based on an additional 4,862 RA-specific simultaneous observations for the HAQ and theHAQII.

In addition to the measures discussed above, other mod-ified versions of the HAQ have been created; however,these measures are not widely used at this time. Para-mount among such measures is the Patient-Reported Out-come Measurement Information System HAQ, also knownas the improved HAQ, which is composed of modifiedversions of the 20 questions found in the original HAQ. Itis written in present tense, uses a 5-point ordered-categoryitem scale, and uses a scoring scale ranging between 0 and100 with adjustment for 4 questions asking about the useof aids, devices, or assistance (36). Should additional ver-sions of the HAQ become popular in the future, it wouldbe appropriate at that time to develop similar models toconvert values to the original HAQ in order to compareoutcomes between studies. Ideally, it would be prudent todevelop similar conversion models back to the originalHAQ as part of a development process for any new HAQ-

derived measure. Overall, we believe the models we havedeveloped are most useful for conversion of the MHAQ,the MDHAQ, and the HAQII to the HAQ in the researchsetting. As we limited our analysis to RA patients, theseformulas may not be applicable to other patient popula-tions.

Although all the measures discussed above show valid-ity in measuring function, the HAQII requires the leastmanipulation of data in order to compare with the originalHAQ and has been shown to have the greatest uniformitybetween values across the range of the scale. In light ofever improving treatments for RA, which produce a greaterlikelihood of remission, the large floor effect of the MHAQmakes it the least desirable of the measures we evaluated.Based on the strong relationship between the HAQ and theHAQII, and on the previously reported validation studiesdemonstrating equivalent prediction of outcomes such asmortality and work disability (19), we recommend thatfuture studies use the HAQII when a HAQ substitute isrequired. The above models will allow comparison of priordata where collections of different versions of the HAQwere used.

ACKNOWLEDGMENTThe authors thank Robin High of the University of Ne-braska Medical Center’s Department of Biostatistics for hisassistance with the estimation of the zero-inflated normalmodels.

AUTHOR CONTRIBUTIONS

All authors were involved in drafting the article or revising itcritically for important intellectual content, and all authors ap-proved the final version to be submitted for publication. Dr.Michaud had full access to all of the data in the study and takesresponsibility for the integrity of the data and the accuracy of thedata analysis.Study conception and design. Anderson, Sayles, Curtis, Michaud.Acquisition of data. Wolfe, Michaud.Analysis and interpretation of data. Anderson, Sayles, Curtis,Michaud.

REFERENCES

1. Allaart CF, Goekoop-Ruiterman YP, de Vries-Bouwstra JK,Breedveld FC, Dijkmans BA, and the FARR study group.Aiming at low disease activity in rheumatoid arthritis withinitial combination therapy or initial monotherapy strategies:the BeSt Study. Clin Exp Rheumatol 2006;24 Suppl 43:S77–82.

2. Grigor C, Capell H, Stirling A, McMahon AD, Lock P, VallanceR, et al. Effect of a treatment strategy of tight control forrheumatoid arthritis (the TICORA Study): a single-blind ran-domised controlled trial. Lancet 2004;364:263–9.

3. Felson DT, Anderson JJ, Boers M, Bombardier C, Chernoff M,Fried B, et al. The American College of Rheumatology prelim-inary core set of disease activity measures for rheumatoidarthritis clinical trials. Arthritis Rheum 1993;36:729–40.

4. Fries JF, Spitz P, Kraines RG, Holman HR. Measurement ofpatient outcome in arthritis. Arthritis Rheum 1980;23:137–45.

5. Ramey DR, Raynauld JP, Fries JF. The Health AssessmentQuestionnaire 1992: status and review. Arthritis Care Res1992;5:119–29.

6. Borg G, Allander E, Lund B, Berg E, Brodin U, Pettersson H, et

Transposing the MHAQ, MDHAQ, and HAQII Into HAQ Scores 1487

al. Auranofin improves outcome in early rheumatoid arthritis:results from a 2-year, double blind placebo controlled study.J Rheumatol 1988;15:1747–54.

7. Egsmose C, Lund B, Borg G, Pettersson H, Berg E, Brodin U, etal. Patients with rheumatoid arthritis benefit from early 2ndline therapy: 5 year followup of a prospective double blindplacebo controlled study. J Rheumatol 1995;22:2208–13.

8. Fitzpatrick R, Newman S, Lamb R, Shipley M. A comparisonof measures of health status in rheumatoid arthritis. Br JRheumatol 1989;28:201–6.

9. Wolfe F, Hawley DJ, Cathey MA. Clinical and health statusmeasures over time: prognosis and outcome assessment inrheumatoid arthritis. J Rheumatol 1991;18:1290–7.

10. Leigh JP, Fries JF. Predictors of disability in a longitudinalsample of patients with rheumatoid arthritis. Ann Rheum Dis1992;51:581–7.

11. Pincus T, Sokka T. Quantitative measures for assessing rheu-matoid arthritis in clinical trials and clinical care. Best PractRes Clin Rheumatol 2003;17:753–81.

12. Wolfe F, Michaud K, Gefeller O, Choi HK. Predicting mortal-ity in patients with rheumatoid arthritis. Arthritis Rheum2003;48:1530–42.

13. Wolfe F, Hawley DJ. The long-term outcomes of rheumatoidarthritis: work disability. A prospective 18 year study of 823patients. J Rheumatol 1998;25:2108–17.

14. Wolfe F, Zwillich SH. The long-term outcomes of rheumatoidarthritis: a 23-year prospective, longitudinal study of totaljoint replacement and its predictors in 1,600 patients withrheumatoid arthritis. Arthritis Rheum 1998;41:1072–82.

15. Michaud K, Messer J, Choi HK, Wolfe F. Direct medical costsand their predictors in patients with rheumatoid arthritis: athree-year study of 7,527 patients. Arthritis Rheum 2003;48:2750–62.

16. Department of Immunology and Rheumatology, Stanford Uni-versity Department of Medicine. ARAMIS. 2003. URL: http://aramis.stanford.edu/.

17. Smolen JS, Breedveld FC, Schiff MH, Kalden JR, Emery P,Eberl G, et al. A simplified disease activity index for rheuma-toid arthritis for use in clinical practice. Rheumatology (Ox-ford) 2003;42:244–57.

18. Bindman AB, Keane D, Lurie N. Measuring health changesamong severely ill patients: the floor phenomenon. Med Care1990;28:1142–52.

19. Wolfe F, Michaud K, Pincus T. Development and validationof the Health Assessment Questionnaire II: a revised versionof the Health Assessment Questionnaire. Arthritis Rheum2004;50:3296–305.

20. Stucki G, Stucki S, Bruhlmann P, Michel BA. Ceiling effectsof the Health Assessment Questionnaire and its modifiedversion in some ambulatory rheumatoid arthritis patients.Ann Rheum Dis 1995;54:461–5.

21. Pincus T, Swearingen C, Wolfe F. Toward a multidimensionalHealth Assessment Questionnaire (MDHAQ): assessment ofadvanced activities of daily living and psychological status inthe patient-friendly health assessment questionnaire format.Arthritis Rheum 1999;42:2220–30.

22. Pincus T, Summey JA, Soraci SA Jr, Wallston KA, HummonNP. Assessment of patient satisfaction in activities of dailyliving using a modified Stanford Health Assessment Ques-tionnaire. Arthritis Rheum 1983;26:1346–53.

23. Ziebland S, Fitzpatrick R, Jenkinson C, Mowat A, Mowat A.Comparison of two approaches to measuring change in healthstatus in rheumatoid arthritis: the health assessment ques-tionnaire (HAQ) and modified HAQ. Ann Rheum Dis 1992;51:1202–5.

24. Serrano MA, Beltran Fabregat J, Olmedo Garzon J. Should theMHAQ ever be used? [letter]. Ann Rheum Dis 1996;55:271–2.

25. Stucki G, Stucki S, Bruhlmann P, Michel BA. Should theMHAQ ever be used? Ann Rheum Dis 1996;55:461–5.

26. Wolfe F. Which HAQ is best? A comparison of the HAQ,MHAQ and RA-HAQ, a difficult 8 item HAQ (DHAQ), and arescored 20 item HAQ (HAQ20): analyses in 2,491 rheuma-toid arthritis patients following leflunomide initiation.J Rheumatol 2001;28:982–9.

27. Uhlig T, Kvien TK, Glennas A, Smedstad LM, Forre O. Theincidence and severity of rheumatoid arthritis, results from acounty register in Oslo, Norway. J Rheumatol 1998;25:1078–84.

28. Martin M, Kosinski M, Bjorner JB, Ware JE Jr, Maclean R, Li T.Item response theory methods can improve the measurementof physical function by combining the modified health assess-ment questionnaire and the SF-36 physical function scale.Qual Life Res 2007;16:647–60.

29. Pincus T. A multidimensional health assessment question-naire (MDHAQ) for all patients with rheumatic diseases tocomplete at all visits in standard clinical care. Bull NYU HospJt Dis 2007;65:150–60.

30. Pincus T, Sokka T, Kautiainen H. Further development of aphysical function scale on a MDHAQ (corrected) for standardcare of patients with rheumatic diseases. J Rheumatol 2005;32:1432–9.

31. Wolfe F. Why the HAQ-II can be an effective substitute for theHAQ. Clin Exp Rheumatol 2005;23 Suppl 39:S29–30.

32. Wells GA, Tugwell P, Kraag GR, Baker PR, Groh J, RedelmeierDA. Minimum important difference between patients withrheumatoid arthritis: the patient’s perspective. J Rheumatol1993;20:557–60.

33. Ten Klooster PM, Taal E, van de Laar MA. Rasch analysis ofthe Dutch Health Assessment Questionnaire disability indexand the Health Assessment Questionnaire II in patients withrheumatoid arthritis. Arthritis Rheum 2008;59:1721–8.

34. Wolfe F, Michaud K. Heart failure in rheumatoid arthritis:rates, predictors, and the effect of anti-tumor necrosis factortherapy. Am J Med 2004;116:305–11.

35. Fries JF, Spitz PW, Young DY. The dimensions of healthoutcomes: the health assessment questionnaire, disability andpain scales. J Rheumatol 1982;9:789–93.

36. Stanford University School of Medicine, Division of Immu-nology and Rheumatology. The Health Assessment Question-naire (HAQ) and the improved HAQ. 2009. URL: http://aramis.stanford.edu/downloads/HAQ%20Instructions%20(ARAMIS)%206-30-09.pdf.

1488 Anderson et al