Comparison of Functional Status Tools Used in Post …€¦ · vidual’s ability to carry out...

12
There is a growing health policy mandate for comprehensive monitoring of functional outcomes across post-acute care (PAC) set- tings. This article presents an empirical comparison of four functional outcome instruments used in PAC with respect to their content, breadth of coverage, and measurement precision. Results illustrate limitations in the range of content, breadth of coverage, and measurement precision in each outcome instrument. None appears well-equipped to meet the challenge of mon- itoring quality and functional outcomes across settings where PAC is provided. Limitations in existing assessment method- ology has stimulated the development of more comprehensive outcome assessment systems specifically for monitoring the qual- ity of services provided to PAC patients. INTRODUCTION A fundamental barrier to fulfilling the emerging health policy mandate in the United States for monitoring the quality and outcomes of PAC is the absence of standardized, patient-centered outcome data that can provide policy officials and managers with outcome data across differ- ent diagnostic categories, over time, and across different settings where PAC ser- vices are provided (Wilkerson and Johnston, 1997). Recently, the National Committee on Vital and Health Statistics (NCVHS) (2002) made recommendations on the potential for standardizing data collection and reporting for the purposes of quality assurance as well as for setting future research and health policy priorities in the U.S. The NCVHS (2002) was unanimous in stressing two major goals: “…to put func- tional status solidly on the radar screens of those responsible for health information policy, and to begin laying the groundwork for greater use of functional status infor- mation in and beyond clinical care…” The NCVHS project used the term functional status very broadly to cover both the indi- vidual’s ability to carry out activities of daily living (ADLs) and the individual’s participation in various life situations and society. Within PAC, functional outcome instru- ments have been developed and are widely used for various applications and for use in specific settings. Examples include the functional independence measure (FIM TM ) for acute medical rehabilitation (Guide for the Uniform Data Set for Medical Rehabilitation, 1997; Hamilton, Granger, and Sherwin, 1987), the minimum data set (MDS) for skilled nursing and subacute rehabilitation programs (Morris, Murphy, and Nonemaker, 1995), the Outcome and Assessment Information Set for Home Health Care (OASIS) (Shaughnessy, Crisler, and Schlenker, 1997) and the Short Form-36 (SF-36) for ambulatory care pro- grams (Ware and Kosiniski, 2001). If one looks carefully at the content of these HEALTH CARE FINANCING REVIEW/Spring 2003/Volume 24, Number 3 13 The authors are with Boston University. The research in this article was supported by the National Institute on Aging under Grant Number 5P50AG11669 and the National Institute on Disability and Rehabilitation Research under Grant Number H133B990005. The views expressed in this article are those of the authors and do not necessarily reflect the views of Boston University, the National Institute on Aging, the National Institute on Disability and Rehabilitation, or the Centers for Medicare & Medicaid Services. Comparison of Functional Status Tools Used in Post-Acute Care Alan M. Jette, Ph.D., Stephen M. Haley, Ph.D., and Pengsheng Ni, M.P.H, M.D.

Transcript of Comparison of Functional Status Tools Used in Post …€¦ · vidual’s ability to carry out...

Page 1: Comparison of Functional Status Tools Used in Post …€¦ · vidual’s ability to carry out activities of daily living (ADLs) ... functional independence measure ... brain injury,

There is a growing health policy mandatefor comprehensive monitoring of functionaloutcomes across post-acute care (PAC) set-tings. This article presents an empiricalcomparison of four functional outcomeinstruments used in PAC with respect totheir content, breadth of coverage, andmeasurement precision. Results illustratelimitations in the range of content, breadthof coverage, and measurement precision ineach outcome instrument. None appearswell-equipped to meet the challenge of mon-itoring quality and functional outcomesacross settings where PAC is provided.Limitations in existing assessment method-ology has stimulated the development ofmore comprehensive outcome assessmentsystems specifically for monitoring the qual-ity of services provided to PAC patients.

INTRODUCTION

A fundamental barrier to fulfilling theemerging health policy mandate in theUnited States for monitoring the qualityand outcomes of PAC is the absence ofstandardized, patient-centered outcomedata that can provide policy officials andmanagers with outcome data across differ-ent diagnostic categories, over time, andacross different settings where PAC ser-vices are provided (Wilkerson and Johnston,

1997). Recently, the National Committeeon Vital and Health Statistics (NCVHS)(2002) made recommendations on thepotential for standardizing data collectionand reporting for the purposes of qualityassurance as well as for setting futureresearch and health policy priorities in theU.S. The NCVHS (2002) was unanimous instressing two major goals: “…to put func-tional status solidly on the radar screens ofthose responsible for health informationpolicy, and to begin laying the groundworkfor greater use of functional status infor-mation in and beyond clinical care…” TheNCVHS project used the term functionalstatus very broadly to cover both the indi-vidual’s ability to carry out activities ofdaily living (ADLs) and the individual’sparticipation in various life situations andsociety.

Within PAC, functional outcome instru-ments have been developed and are widelyused for various applications and for use inspecific settings. Examples include thefunctional independence measure (FIMTM)for acute medical rehabilitation (Guide forthe Uniform Data Set for MedicalRehabilitation, 1997; Hamilton, Granger,and Sherwin, 1987), the minimum data set(MDS) for skilled nursing and subacuterehabilitation programs (Morris, Murphy,and Nonemaker, 1995), the Outcome andAssessment Information Set for HomeHealth Care (OASIS) (Shaughnessy,Crisler, and Schlenker, 1997) and the ShortForm-36 (SF-36) for ambulatory care pro-grams (Ware and Kosiniski, 2001). If onelooks carefully at the content of these

HEALTH CARE FINANCING REVIEW/Spring 2003/Volume 24, Number 3 13

The authors are with Boston University. The research in thisarticle was supported by the National Institute on Aging underGrant Number 5P50AG11669 and the National Institute onDisability and Rehabilitation Research under Grant NumberH133B990005. The views expressed in this article are those ofthe authors and do not necessarily reflect the views of BostonUniversity, the National Institute on Aging, the National Instituteon Disability and Rehabilitation, or the Centers for Medicare &Medicaid Services.

Comparison of Functional Status Tools Used in Post-Acute Care

Alan M. Jette, Ph.D., Stephen M. Haley, Ph.D., and Pengsheng Ni, M.P.H, M.D.

Page 2: Comparison of Functional Status Tools Used in Post …€¦ · vidual’s ability to carry out activities of daily living (ADLs) ... functional independence measure ... brain injury,

instruments, it becomes apparent that sub-stantial variations exist in item definitions,scoring, metrics, and content coverage,resulting in fragmentation in outcome dataavailable for use across different PAC set-tings (Haley and Langmuir, 2000).Differences in conceptual frameworksused to construct each instrument, theinability to translate scores from oneinstrument to another, and the lack of out-come coverage and precision to detectmeaningful functional changes across set-tings, severely limit the field’s ability tomeasure and analyze recovery through theperiod of PAC service provision. If thePAC field is to achieve the goal of compre-hensive functional outcome assessmentand quality monitoring for different patientdiagnostic groups across different PACsettings, efforts are needed to developfunctional outcome assessments that areapplicable across a continuum of post-acute services and settings.

To our knowledge, no studies exist thathave directly compared the content andoperating characteristics of functional out-come instruments commonly used in PACto examine their relative merits for moni-toring outcomes across care settings. Inthis article, therefore, we report the resultsof a direct empirical comparison of theFIM™, OASIS, MDS, and the physicalfunction scale (PF-10) of the SF-36, focus-ing on three aspects of each: instrumentcontent, range of coverage, and measure-ment precision. The objective of this com-parative analysis is to evaluate the com-monly held assumption that there existfundamental deficiencies in the currentarmamentarium of setting-specific out-come instruments that prevents theirapplicability for more comprehensive patient-centered functional outcome assessmentacross diagnoses, over time, and across dif-ferent settings where PAC is provided. Inresponse to identified deficiencies in exist-

ing instruments, we also discuss the poten-tial utility of contemporary measurementtechniques, such as item response theory(IRT) methods and computerized adaptivetechnology (CAT), to yield functional out-come instruments better suited for out-come monitoring across PAC settings.

METHODS

Subjects

These analyses use data from a sampleof 485 PAC patients drawn from six healthprovider networks in the greater Bostonarea. All patients enrolled in the studywere recruited by study coordinators with-in their own health care facility and com-pleted informed consent prior to participat-ing in the study. Patients were recruitedfrom inpatient (199 from acute inpatientrehabilitation and 90 from transitional careunits); and community settings (90 fromambulatory services; and 106 from homecare). Eligibility criteria included: (1)adults, age 18 or over, (2) recipients ofskilled rehabilitation services (physical,occupational, or speech therapy), and (3)English speaking. The sample was strati-fied by impairment group to includeapproximately an equal number of subjectswithin three major patient groups: (1) 33.2percent neurological (e.g., stroke, multiplesclerosis, Parkinson’s disease, brain injury,spinal cord injury, neuropathy); (2) 28.4percent musculoskeletal (e.g., fractures,joint replacements, orthopedic surgery,joint or muscular pain); and (3) 38.4 per-cent medically complex (e.g., debilityresulting from illness, cardiopulmonaryconditions, or post-surgical recovery). Toassure good representation of levels offunctional severity, the sampling designwas stratified to yield a wide distribution ofsubjects representing three distinct severi-ty levels: slight (35.9 percent), moderate

14 HEALTH CARE FINANCING REVIEW/Spring 2003/Volume 24, Number 3

Page 3: Comparison of Functional Status Tools Used in Post …€¦ · vidual’s ability to carry out activities of daily living (ADLs) ... functional independence measure ... brain injury,

(44.1 percent), and severe (20.0 percent)based on scores from an adapted modifiedRankin scale (vanSwieten et al., 1988).

The sample reflects the racial and ethnicdistribution of the greater Boston metro-politan population. The sample contained awide age range (19-100 years; mean age =62.7 years.) The majority of the subjectswere female (58.8 percent), white (81.6percent), and non-married (61.1 percent).More than one-half (51.3 percent) hadbeyond a high school education. The widerange of onset from initial injury/ illness(2.0-3.9 years) characterizes different stagesof recovery within both inpatient and com-munity settings. The SF-8 Health Survey(Ware et al., 1999) data indicated that phys-ical functioning of the overall sample(mean=40.3, standard deviation (s.d.)=9.9)was below the U.S. population norms(mean=50, s.d.=10), although mental func-tioning (mean=50.2, s.d.=10.3) was consis-tent with U.S. population norms (mean=50,s.d.=10).

Data Collection Procedures

Our overall data collection strategy wasto assess items from existing functionaloutcome tools used in PAC so that theycould be combined for analysis into onecommon scale. Due to practical data col-lection considerations and potentially highpatient response burden, we did not admin-ister items from all four instruments to all485 subjects. Rather, we linked itemstogether by administering to the entiresample a core set of 58 activity items applic-able for patients in both home and commu-nity service settings. This method hasbeen referred to as a common-item testequating design, in which a core set ofitems serve as a scale anchor from whichunbiased parameters can be estimated onitems with missing data (McHorney andCohen, 2000).

We collected activity items, when avail-able, from standardized instruments admin-istered and recorded in the medical record.These included the 18-item FIMTM for per-sons in inpatient rehabilitation settings(N=108), 19 MDS items (physical function-ing and selected cognitive items) for per-sons in skilled nursing home settings(N=91), and 19 ADL/individual ADL itemsfrom the OASIS or persons receiving homecare (N=103). Since there were no consis-tent data available from charts in the out-patient setting, we administered the 10physical functioning items of the SF-36 toindividuals receiving outpatient services(N=82).

We applied specific rules for handlingmissing item data within two of the stan-dardized instruments. In accordance withFIMTM scoring rules, items that were notadministered are scored as “total depen-dence.” The MDS items have a responseoption of “activity did not occur.” We con-verted these codes to the lowest score forthat item, namely “total dependence.” Wereasoned that the most likely explanationfor the activity not occurring was that theitem could not be performed, an assump-tion that others have made when compar-ing functional instruments. (Buchanan etal., 2002). Missing data for the PF-10 andOASIS were not systematically recoded.

The 58 core activity items collected on all485 subjects were used as scale anchors. Inorder to compare items from instrumentsacross PAC settings, a common link wasneeded to provide a stable functional basefor comparison purposes. The common-item test equating design was used so thatevery subject had a similar core set ofitems. Rasch (1980) models (Wright andMasters, 1982) can be conducted with miss-ing data if a core set of items is used to linkthe setting-specific instruments into acommon scale. Core items included: 15physical functioning, 14 self-care, 12 daily

HEALTH CARE FINANCING REVIEW/Spring 2003/Volume 24, Number 3 15

Page 4: Comparison of Functional Status Tools Used in Post …€¦ · vidual’s ability to carry out activities of daily living (ADLs) ... functional independence measure ... brain injury,

routine, 11 communication, and 6 interper-sonal interaction items. Activity questionsasked, “How much difficulty do you cur-rently have (without help from another per-son or device) with the following activi-ties…?” A polytomous response choiceincluded “not at all,” “a little,” “somewhat,”“a lot,” and “can not do.” We framed theactivity questions in a general fashion with-out specific attribution to health, medicalconditions, or disabling factors. For someindividuals in the community settings, wealso collected 52 additional functional items(N=196) and added those to the commonlinking solution, however the results ofthese items are not reported here. None ofthe subjects had difficulty in completing thecore set of 58 items.

Data on items from three of the existinginstruments (e.g., FIMTM, OASIS, andMDS) were recorded from retrospectivechart review. Data on the PF-10 items (out-patient programs) and core and communi-ty items were collected via subject or proxyinterviews. Each interview (approximate-ly 45-60 minutes) was conducted in a quietatmosphere in an inpatient setting, an out-patient facility, or participant’s home. Theability of a subject to take part in the inter-view self-report was assessed by a treatingclinician who determined if the respondentcould: (1) understand the interview ques-tions, (2) sustain attention during an inter-view, and (3) reliably respond to the ques-tions. If the answer to any of these screen-ing questions was no, then the interviewwas completed by either a clinician or fam-ily member proxy participant. A proxy par-ticipant completed less than 3 percent ofthe interviews.

Interviews that included information oncore items were timed to coincide withadministration of standard outcome instru-ments. For example, FIMTM is adminis-tered in most facilities within 3 days ofadmission and prior to discharge in the

inpatient rehabilitation setting. Thus, thesubject interview was arranged to takeplace within 3 days of admission or dis-charge such that FIMTM information wascollected close to the time of the interview.Likewise, interviews of subjects in transi-tional care units were scheduled to coin-cide with the MDS assessment. Subjectsreceiving home care therapy were inter-viewed either near to admission or dis-charge such that the OASIS assessmentwas performed in close proximity to thesubject interview. The interview was fullyscripted, with standard instructions and ananswer card to help subjects identify thedesired response choice. The order of pre-sentation of test sections was randomlyassigned to mitigate the possibility of largeportions of missing data in any one sectiondue to respondent fatigue.

Interviewers, who were experiencedrehabilitation clinicians (mean years ofclinical experience=5.7 years), receivedtraining and quality assurance from: (1) aninitial 3-hour training session, (2) a proto-col manual, (3) supervision by the researchproject staff on all first interviews, and (4)acceptable completion of a proceduralchecklist and inter-rater reliability on asubsequent interview. A project staff mem-ber accompanied interviewers on approxi-mately every 10th interview to check onadherence to study protocol and to assureoverall quality control. All subjects,regardless of setting, were administeredthe SF-8 items to establish a normativedescription of the sample.

Analyses

Analyses were conducted in threestages. First, we conducted one-parameterRasch (1980) partial credit analyses for theentire item pool (instruments andcore/community items) to develop anoverall functional ability scale (Wright and

16 HEALTH CARE FINANCING REVIEW/Spring 2003/Volume 24, Number 3

Page 5: Comparison of Functional Status Tools Used in Post …€¦ · vidual’s ability to carry out activities of daily living (ADLs) ... functional independence measure ... brain injury,

Masters, 1982). The Rasch partial creditmodel allows for different number ofresponse categories across items. Forexample, the FIM™ scale has a seven-pointrating scale, while the OASIS incorporatesmultiple response categories that differ peritem. This overall scale was used to esti-mate parameters for all items in each of thefour comparison instruments. We used amethod of concurrent calibration, whichinvolves estimating item and ability para-meters in the overall subject and item poolsimultaneously. This treats items nottaken by subjects as missing, as the Raschprogram uses information available to esti-mate person and item parameters. Ourgoal was to develop a general scale forfunctional abilities, and specifically to seethe location of item content across the fourexisting instruments. We calculated inter-nal consistency estimates—Cronbachalpha (1951)—for each of the individualinstruments and for the overall functionalscale to determine the levels of consisten-cy of the items within the overall function-al scale. We also calculated goodness of fittests for all items with the overall function-al scale using the standardized infit statis-tic (+/-2.0) to determine the number ofitems with poor fit within the overall scale.

Second, we examined the relativeamount and range of activity content cov-ered by each of the four existing instru-ments and made comparisons acrossinstruments. To do this, we used theextreme (highest and lowest) item thresh-old estimate for each instrument. Athreshold represents the point of separa-tion between adjacent item response cate-gories for each item. The Rasch partialcredit model estimates thresholds for eachitem, and the minimum and maximumthreshold values per instrument were usedto establish the range of content within theoverall functional ability scale.

Third, we calculated item and test infor-mation functions (Lord, 1980; Dodd andKoch, 1987; Murnki, 1993) to estimate thelocation of optimal measurement precisionof each instrument. The item informationfunction is an index of the degree and loca-tion of information that a particular func-tional item provides for estimating a scorealong a functional ability scale. Item infor-mation functions are related to the locationand shape of the item characteristic curve(ICC), which describe the probabilities ofresponding to particular response optionsof an item. The steeper the slope of theICC, the more information about function-al ability provided by an item in the scale,thus the greater level of item discrimina-tion and precision associated with esti-mates of the score at that point along thescale. ICCs that have a broad range of cov-erage along the scale provide informationfunctions across a wide range of the scale.Therefore, the information function is therelationship of the amount of informationof an item at a particular scale level and isdescribed by the ratio of the slope of theICC and the expected measurement error.Test information functions were calculatedby summing the item information func-tions to obtain an estimate of the measure-ment precision of the entire test at differ-ent levels of the functional ability continu-um. We compared the test informationfunction with the ability levels of the sam-ple in each major setting (inpatient, com-munity). We calculated a summary scoreconverted to 0-100 metric for each personbased on the overall item pool.

RESULTS

The internal consistency values of thefour functional ability instruments were asfollows: MDS=0.97, OASIS=0.99, FIM™=0.99, and the PF-10=0.99. The internal

HEALTH CARE FINANCING REVIEW/Spring 2003/Volume 24, Number 3 17

Page 6: Comparison of Functional Status Tools Used in Post …€¦ · vidual’s ability to carry out activities of daily living (ADLs) ... functional independence measure ... brain injury,

consistency of the items within the fullfunctional ability scale was 0.85. Only fiveof the items within the existing instru-ments (7.2 percent) exceeded the good-ness of fit values. Thus, we felt it wasacceptable to combine the items from eachof the four functional outcome instrumentsinto an overall functional ability scale forthe purposes of directly comparing theirrange of functional content, breadth of cov-erage, and measurement precision.

Figure 1 illustrates and compares thelocation of items in each of the four func-tional outcome instruments along the

broad content dimension of functional abil-ity. These locations are derived as esti-mates of the average functional ability para-meter generated for items in each instru-ment included in the analysis. Across allfour instruments it can be seen that cogni-tive, communication, and bowel and blad-der continence function items achieved thelowest functional ability estimates, indicat-ing that those items were usually less diffi-cult for persons in the sample to performcompared with other items contained inthese instruments. In general, the PF-10scale contained items with the highest item

18 HEALTH CARE FINANCING REVIEW/Spring 2003/Volume 24, Number 3

OASIS

Item ScaleLess Ability More Ability

20 25 30 35 40 45 50 55 60 65 70

MDS

PF-10

FIM

Functional Ability

Inst

rum

ent

Bowel

Inco

ntine

nce

Speec

h Clar

ity

Under

stand

Oth

ers

Mak

e Self

Und

ersto

od

Mem

ory

Decisi

onm

aking

Bladde

r Con

tinen

ce

Eating

Bowel

Contin

ence

Hygein

e

Bed M

obilit

y

Toile

t Use

Mov

e With

in Unit

Tran

sfer

Bathin

g or

Dre

ssing

Expre

ssion

Social

Inte

racti

on

Compr

ehen

sion

Eating

Proble

m S

olving

Mem

ory

Groom

ing

Dress

UE

Bed-C

hair T

rans

fer

Bladde

r Man

agem

ent

Bowel

Man

agem

ent

Bathin

g

Toile

ting

Toile

t Tra

nsfer

Dress

LE

Loco

mot

ion

Stair

Climbin

g

Walk

1 B

lock

Climb

1 Flig

ht

Walk

Bloc

ks

Bend

or K

neel

Climb

Man

y Flig

hts

Walk

1 M

ile

Carry

Gro

cerie

s

Mod

erat

e Acti

vity

Vigor A

ctivit

y

Dress

ing

Walk

in R

oom

Walk

Roo

m

Bathin

g

Walk

in H

all

Walk

Off

Unit

Oral E

xpre

ssion

Cognit

ive F

uncti

on

Urinar

y Inc

ontin

ence

Eating

Toile

ting

Telep

hone

Use

Tran

sfer

Dress

UE

Oral M

edica

tions

Fix Lig

ht M

eals

Groom

ing

Dress

LE

Tran

spor

tatio

n

Loco

mot

ion

Bathin

g

House

keep

ing

Laun

dry

Shopp

ing

NOTES: Locations are derived as estimates of the average item difficulty parameter generated for each instrument. FIM™ isfunctional independent measure. PF-10 is physical function index (10 items). MDS is minimum data set. OASIS isStandardized Outcome and Assessment Information Set for Home Health. UE is upper extremity. LE is lower extremity.

SOURCES: (Hamilton, Granger, and Sherwin, 1987; Morris, Murphy, and Nonemaker, 1995; Ware and Kosinski, 2001;Shaughnessy, Crisler, and Schlenker, 1997.)

Figure 1

Rasch Model Comparison of the Four Post-Acute Care Instruments

Page 7: Comparison of Functional Status Tools Used in Post …€¦ · vidual’s ability to carry out activities of daily living (ADLs) ... functional independence measure ... brain injury,

functional ability calibrations comparedwith the other three. A substantial numberof items in the FIM™, OASIS, and MDSinstruments achieved average functionalability calibrations in the mid-point of thefunctional ability continuum. Figure 1 alsoreveals the substantial overlap of item con-tent across these four instruments.

The actual range of functional ability thatis covered by the items contained in each ofthe four functional outcome instruments ispresented in Figure 2. The content cover-age is calculated by the high and low stepestimates for each item’s response cate-gories instead of average estimates, whichwere the basis of the information presentedin Figure 1. Consistent with the generalspread of the functional ability parameters

illustrated in Figure 1, the range of cover-age shown in Figure 2 appears greatest forboth the MDS and the OASIS instrumentsas compared with either the FIM™ or PF-10scales. Because of the high step estimatefor one of the transportation items in theOASIS, the actual range of coverage of theOASIS is nearly the entire range of all fourinstruments.

Figure 3 displays the measurement pre-cision of each instrument as depicted bythe test information function curves.These curves are superimposed with theaverage score on the functional ability con-tinuum and ±2 standard deviation for theinpatients and community samples. Notethe location of the peak amount of preci-sion for each instrument in relationship to

HEALTH CARE FINANCING REVIEW/Spring 2003/Volume 24, Number 3 19

100

90

80

70

60

50

40

30

20

10

00 0.5

89

99

25

86

50

100

MDS OASIS FIMTM PF-10

Instrument

Per

cen

t

NOTES: Content coverage is calculated by the high and low step estimates for each item's response cate-gory. MDS is minimum data set. OASIS is Standardized Outcome and Assessment Information Set forHome Health. FIM™ is functional independent measure. PF-10 is physical function index (10 items).

SOURCES: (Hamilton, Granger, and Sherwin, 1987; Morris, Murphy, and Nonemaker, 1995; Ware andKosinski, 2001; Shaughnessy, Crisler, and Schlenker, 1997.)

Figure 2

Comparison of Actual Ranges Covered by the Four Post-Acute Care Instruments

Page 8: Comparison of Functional Status Tools Used in Post …€¦ · vidual’s ability to carry out activities of daily living (ADLs) ... functional independence measure ... brain injury,

each sample. Although the OASIS itemscontain a broad range of content, as wasseen in Figure 2, the OASIS items providea high degree of measurement precision atonly the very low end of the functional abil-ity dimension for both the inpatient andcommunity samples. The precision ofMDS items is also greatest at the lowerfunctional ability dimension levels,although the MDS items have a greaterspan of functional ability in which they pro-vide some levels of precision. The FIM™items peak at the low to moderate end ofthe inpatient sample. In contrast, the infor-mation function of the PF-10 items peaks atthe high end of the community outpatientsample with very poor precision for theinpatient sample.

DISCUSSION

The results of these analyses of instru-ment content, coverage, and measurementprecision provide direct evidence of whatmany have argued are the major limita-tions of existing functional outcome instru-ments currently in use within PAC. Whileeach of the four instruments compared inthis analysis appears well suited for its pri-mary application, none of them appearswell-equipped for the current policy man-date for monitoring the quality and out-come of PAC provided over time andacross different PAC settings.

If one looks at the results for the FIM™,the most widely used outcome instrumentin PAC, one can see that the FIM™ items

20 HEALTH CARE FINANCING REVIEW/Spring 2003/Volume 24, Number 3

27

22

17

12

7

2

-3

20 30 40

InpatientsOutpatients

50 60

PF-10 items

FIMTM items

OASIS items

MDS items

Inpatients

Outpatients

70 80

Functional Ability

Inform

ationFunction

NOTES: Curves are superimposed with the average score on the functional continuum and ± 2 standard deviationfor inpatients and community samples. MDS is minimum data set. OASIS is Standardized Outcome andAssessment Information Set for Home Health. FIM™ is functional independent measure. PF-10 is physical functionindex (10 items).

SOURCES: (Hamilton, Granger, and Sherwin, 1987; Morris, Murphy, and Nonemaker, 1995; Ware and Kosinski,2001; Shaughnessy, Crisler, and Schlenker, 1997.)

Figure 3

Measurement Precision of Post-Acute Care Instruments, by Test

Page 9: Comparison of Functional Status Tools Used in Post …€¦ · vidual’s ability to carry out activities of daily living (ADLs) ... functional independence measure ... brain injury,

cover a very small portion of the functionalability continuum within a narrow range offunctional content. The FIM™ is most pre-cise and relevant for PAC inpatients whosefunction is at the low end of the continuum.All of these characteristics of the FIM™ areacceptable when one considers its primaryapplication is for evaluation of inpatientrehabilitation services. These FIM™ char-acteristics become severe limitations, how-ever, if the intended application is assess-ment of outcomes or quality across PAC set-tings. An instrument such as the PF-10 suf-fers from a similar type of limitation, as doesthe FIM™. The PF-10 covers a narrowrange of functional content, although, unlikethe FIM™, the content covered by the PF-10is at the higher end of the functional activitycontinuum. The PF-10 appears most pre-cise for community dwelling outpatients,but much less so for inpatients such asthose seen in many rehabilitation facilities.When used with high functioning communi-ty patients, the PF-10 covers appropriatecontent; its content and range is severelylimited in application to patients within insti-tutions. The MDS and OASIS instruments,in comparison with the FIM™ and PF-10,cover content from the mid portion of thefunctional ability continuum with less con-tent covering the low or high ends. Patientsfunctioning at the very high or very low endof the functional continuum would not be aswell served by the OASIS and MDS.

Concerns about existing instrumentsused in PAC, such as the four examined inthis analysis, have stimulated the develop-ment of more comprehensive functionaloutcome instruments developed specifical-ly for application across diagnostic groups,and across PAC settings. An example ofthis type of work is found in the activitymeasure for PAC (AM-PAC), developed bythe Rehabilitation Research & TrainingCenter for Outcomes based at BostonUniversity (Haley and Jette, 2000). In con-

structing the AM-PAC, its developers usedthe strategy of combining functional abilityitems found in existing instruments andfrom a variety of other sources into quanti-tative scales that can be employed toassess a wide range of functional contentneeded to assess quality and outcomes ofpatients seen across PAC settings.

Although instruments such as the AM-PAC are promising, the continued use oftraditional, fixed-form measurement method-ology for constructing functional outcomeinstruments presents the researcher withtwo common problems that limit their utilityin clinical outcomes assessment. One fun-damental problem encountered with fixed-form instruments of modest length is floorand ceiling effects where large numbers ofindividuals who complete these outcomeinstruments score at either the top or thebottom of the range of possible scores.These ceiling and floor effects severelyreduce measurement precision and thus,restrict the utility of the instruments(Andresen et al., 1999; Brunet et al., 1996;DiFabio et al., 1997; Rubenstein et al.,1998). In response to concerns over inade-quate measurement precision and inade-quate coverage of important outcomedomains, some researchers develop morecomprehensive and lengthy outcome instru-ments. Lengthy instruments lead to thefrustration and fatigue faced by many sub-jects and busy clinicians overwhelmed bylarge and burdensome batteries of instru-ments (Meyers, 1999).

One promising solution offered to mea-surement problems faced by traditionalfixed form instruments is offered throughthe combined application of IRT methodol-ogy and CAT techniques (Ware et al., 1999;Ware, Bjorner, and Kosinski, 2000; Weiss,1982; Hambleton, 2000). These techniquesof test construction are currently beingapplied to the development of a new gener-ation of functional assessment instruments

HEALTH CARE FINANCING REVIEW/Spring 2003/Volume 24, Number 3 21

Page 10: Comparison of Functional Status Tools Used in Post …€¦ · vidual’s ability to carry out activities of daily living (ADLs) ... functional independence measure ... brain injury,

designed for use in PAC settings (Haleyand Jette, 2000). CAT methodology uses acomputer interface for the patient (or acomputerized interview/clinician report)that is tailored to the unique functional abil-ity level of the patient. The basic notion ofan adapted test is to mimic what an experi-enced clinician would do. A clinicianlearns most when he/she directs ques-tions at the patient’s approximate level ofproficiency. Administering outcome itemsthat are either too easy or too hard pro-vides little information. An adaptive testfirst asks questions in the middle of theability range, and then directs questions tothe level based on the patient’s responses,without asking unnecessary questions.This allows for fewer items to be adminis-tered while gaining precise informationregarding an individual’s placement alonga continuum of functional ability. CATapplications require a large set of items inany one functional area (item pools), itemsthat consistently scale along a dimension oflow to high functional ability, and rulesguiding starting, stopping, and scoringprocedures. Large item sets such as theAM-PAC can be readily adapted to a CATformat in future work.

Methods like IRT that make it possibleto calibrate questionnaire items on a stan-dard metric (ruler) also yield the algo-rithms necessary to run the engine thatpowers CAT assessments. These statisti-cal models estimate how likely a person ateach level of function is to choose eachresponse to each survey question. Thislogic is reversed to estimate the probabili-ty of each health score from a particularpattern of item responses. The resultinglikelihood function makes it possible toestimate each person’s score, along with aperson-specific confidence interval. Inprinciple, one can derive an unbiased esti-

mate of an outcome, i.e., an estimate with-out systematic error, from any subset ofitems that fits the model. The number ofitems administered can be increased toachieve the desired level of precision. Moststatistical models for estimating such itemparameters can be traced to the work ofRasch (1980) or on a second tradition—IRT (Hambleton and Swaminathan, 1985).Both models assume unidimensionality;i.e., that the items included on a particularscale measure only one concept. Whetherthese techniques, if applied to functionaloutcome assessment, will solve the prob-lems presented by traditional, fixed formmethodology, needs to be carefully evalu-ated in future research.

There are several limitations in theanalyses reprinted in this article. Toachieve a direct comparison of these fourinstruments, we used a liberal interpreta-tion of unidimensionality to combine itemsfrom all of the four comparison instru-ments into a single scale. This simplifiedthe presentation so that an instrument-based rather than a content-based compar-ison could be made. A more detailedexamination of common functional dimen-sions that underlie the item set was beyondthe scope of this article. We also point outthat, for practical reasons, we combineddata from medical records and from inter-views to develop the item calibrations forthe instrument comparisons. Although notideal, we do find only small amounts oferror from clinician and self-report inter-view modes of testing within these func-tional items (Andres, Haley, and Ni,Forthcoming). More work in this area ofcombining data across respondents andmodes of data collection will be needed asthe field advances in assessing functionalability across post-acute core settings.

22 HEALTH CARE FINANCING REVIEW/Spring 2003/Volume 24, Number 3

Page 11: Comparison of Functional Status Tools Used in Post …€¦ · vidual’s ability to carry out activities of daily living (ADLs) ... functional independence measure ... brain injury,

CONCLUSION

Results of this analysis illustrate some ofthe inherent limitations in the range of con-tent, breadth of coverage, and measure-ment precision found in functional outcomeinstruments currently in use within PAC.Limitations in existing functional outcomemethodology has stimulated the ongoingdevelopment of more comprehensive out-come assessment systems specifically formonitoring the quality of services providedfor patients with different across diagnosesand across PAC settings. The careful use ofIRT-based measurement methods coupledwith CAT outcome assessment techniquesmay hold future promise for making out-comes assessment briefer and less burden-some to patients, and thus more acceptablefor use in busy clinical settings. What isneeded is functional outcome data that isapplicable to patients treated across differ-ent clinical settings and applications, moreefficient and less costly to administer, and,sufficiently precise to detect clinicallymeaningful changes in functional outcomes.Contemporary measurement methodologymay hold considerable promise as a vehiclefor advancing PAC outcomes evaluation,thus avoiding the pitfalls of traditionalassessment methodology.

REFERENCES

Andres, P.L., Haley, S.M., and Ni, P.S.: Is Patient-Reported Function Reliable for Monitoring Post-Acute Outcomes? American Journal of PhysicalMedicine and Rehabilitation. Forthcoming.Andresen, E.M., Gravitt, G.W.J., Aydelotte M.E., etal.: Limitations of the SF-36 in a Sample of NursingHome Residents. Age and Aging 28:562-566, 1999.Brunet, D.G., Hopman, W.M., Singer, M.A., et al.:Measurement of Health-Related Quality of Life inMultiple Sclerosis Patients. Canadian Journal ofNeurological Sciences 23:99-103, 1996.Buchanan, J., Andres, P., Haley, S., et al.: FinalReport on the Assessment Instrument for PPS, MR-1501-CMS. RAND. Santa Monica, CA. 2002.

Cronbach, L.: Coefficient Alpha and the InternalStructure of Tests. Psychometrika 16(3):297-334.1951.DiFabio, R.P., Choi, T., Soderberg, J., et al.: Health-Related Quality of Life for Patients with ProgressiveMultiple Sclerosis: Influence of Rehabilitation.Physical Therapy 77:1704-1716, 1997.Dodd, B., and Koch, W.: Effects of Variations inItem Step Values on Item and Test Information inthe Partial Credit Model. Applied PsychologicalMeasurement 11:371-384, 1987.Guide for the Uniform Data Set for MedicalRehabilitation, (Including the FIM™ Instrument,Version 5.1.) State University of New York atBuffalo. Buffalo, NY. 1997.Haley, S., and Jette, A.: Extending the Frontier ofRehabilitation Outcome Measurement andResearch. Journal of Rehabilitation OutcomeMeasurement 4(4): 31-41, 2000.Haley, S.M., and Langmuir, L.: How Do CurrentPost-Acute Functional Assessments Compare Withthe Activity Dimensions of the International Classi-fication of Functioning and Disability (ICIDH-2)?Journal of Rehabilitation Outcome Measurement4(4):51-56, 2000.Hambleton, R.: Emergence of Item ResponseModeling in Instrument Development and DataAnalysis. Medical Care 38(9 Supplement II); II60-II65, 2000.Hambleton, R. and Swaminathan, H.: Item ResponseTheory: Principles and Applications. Kluwer Nijhoff.Boston, MA. 1985.Hamilton, B., Granger, C., and Sherwin, F.: AUniform National Data System for MedicalRehabilitation. In Fuhrer, M. (Ed.) RehabilitationOutcomes: Analysis and Measurement. Paul H.Brooks. Baltimore, MD. 1987.Lord, F.: Applications of Item Response Theory toPractical Testing Problems. Erlbaum Associates.Hillsdale, NJ. 1980.McHorney, C., and Cohen, A.: Equating HealthStatus Measures with Item Response Theory.Medical Care 38(9 Supplement II); II43-II59, 2000.Meyers, A.: Enabling Our Instruments: AssuringAccess to Health Care Research. Proceedings of theNational Conference on Health Statistics.Washington, DC. 1999.Morris, J.N., Murphy, K., and Nonemaker, S.: Long-Term Resident Care Assessment User’s Manual.Version 2.0. American Health Care Association.Washington, DC. 1995.Murnki, E.: Information Function of theGeneralized Partial Credit Model. AppliedPsychological Measurement 17:351-363, 1993.

HEALTH CARE FINANCING REVIEW/Spring 2003/Volume 24, Number 3 23

Page 12: Comparison of Functional Status Tools Used in Post …€¦ · vidual’s ability to carry out activities of daily living (ADLs) ... functional independence measure ... brain injury,

National Committee on Vital and Health Statistics.Classifying and Reporting Functional Status.Internet address: http://ncvhs.hhs.gov. (Accessed2002).Rasch, G.: Probabilistic Models for Some Intelligenceand Attainment Tests. University of Chicago Press.Chicago, IL. 1980.Rubenstein, L.M., Voelker, M.D., Chrischilles, E.A.,et al.: The Usefulness of the Functional StatusQuestionnaire and Medical Outcomes Study ShortForm in Parkinson’s Disease Research. Quality ofLife Research 7:279-290, 1998.Shaughnessy, P., Crisler, K., and Schlenker, R.:Medicare’s Oasis: Standardized Outcome andAssessment Information Set for Home Health Care:Oasis-B. Center for Health Services and PolicyResearch. Denver, CO. 1997. van Swieten, J., Koudstaal, P., Visser, M., et al.:Interobserver Agreement for the Assessment ofHandicap in Stroke Patients. Stroke 19:604-607,1988.Ware, J., Bjorner, J., and Kosinski, M.: PracticalImplications of Item Response Theory andComputerized Adaptive Testing. Medical Care38(Supplement II) II73-II82, 2000.

Ware, J.E., and Kosinski, M.: The SF-36® Physicaland Mental Health Summary Scales: A Manual forUser’s of Version 1, 2nd Edition. Boston, MA. 2001.Ware, J.E., Kosinski, M., Dewey, J., et al.: How toScore and Interpret Single-Item Health StatusMeasures: A Manual for Users of the SF-8™ HealthSurvey. Quality Metric. Lincoln, RI. 1999.Weiss, D.: Improving Measurement Quality andEfficiency with Adaptive Testing. AppliedPsychological Testing 6:473-492, 1982.Wilkerson, D., and Johnston, M.: Clinical ProgramMonitoring Systems: Current Capability and Futuredirections. In: Assessing Medical RehabilitationPractices: The Promise of Outcomes Research. Paul H.Brookes Publishing Company. Baltimore, MD. 1997.Wright, B.D., and Masters, G.N.: Rating ScaleAnalysis: Rasch Measurement. MESA Press.Chicago, IL. 1982.

Reprint Requests: Alan M. Jette, Ph.D., Sargent College ofHealth and Rehabilitation Sciences, Boston University, 635Commonwealth Avenue, Boston, MA 02215. E-mail: [email protected]

24 HEALTH CARE FINANCING REVIEW/Spring 2003/Volume 24, Number 3