Jack M. Fletcher David J. Francis Robin D. Morris G. Reid...

18
Journal of Clinical Child and Adolescent Psychology 2005, Vol. 34, No. 3, 506-522 Copyright © 2005 by Lawrence Erlbaum Associates, Inc. Evidence-Based Assessment of Learning Disabilities in Children and Adolescents Jack M. Fletcher Department of Pediatrics and the Center for Academic and Reading Skills, University of Texas Health Science Center at Houston David J. Francis Department of Psychology and the Texas Institute for Measurement, Evaluation and Statistics, University of Houston Robin D. Morris Department of Psychology, Georgia State University G. Reid Lyon Child Development and Behavior Branch, National Institute of Child Health and Human Development The reliability and validity of 4 approaches to the assessment of children and adoles- cents with learning disabilities (LD) are reviewed, including models based on (a) ap- titude-achievement discrepancies, (b) low achievement, (c) intra-individual differ- ences, and (d) response to intervention (RTI). We identify serious psychometric problems that affect the reliability of models based on aptitude-achievement discrep- ancies and low achievement. There are also significant validity problems for models based on aptitude-achievement discrepancies and intra-individual differences. Mod- els that incorporate RTI have considerable potential for addressing both the reliabil- ity and validity issues but cannot represent the sole criterion for LD identification. We suggest that models incorporating both low achievement and RTI concepts have the strongest evidence base and the most direct relation to treatment. The assessment of children for LD must reflect a stronger underlying classification that takes into ac- count relations with other childhood disorders as well as the reliability and validity of the underlying classification and resultant assessment and identification system. The implications of this type of model for clinical assessments of children for whom LD is a concern are discussed. Assessment metbods for identifying children and adolescents with learning disabilities (LD) are mul- tiple, varied, and the subject of heated debates among practitioners. Those debates involve issues that extend beyond the value of specific tests, often reflecting dif- ferent views of how LD is best identified. These views reflect variations in the definition of LD and, therefore, variations in what measures are selected to opera- tionalize the definition (Fletcher, Foorman, et al., 2002). Any focus on the "best tests" leads to a hopeless Grants from the National Institute of Child Health and Human De- velopment, P50 21888, Center for Learning and Attention Disorders, and National Science Foundation 9979968, Early Reading Develop- ment; A Cognitive Neuroscience Approach supported this article. We gratefully acknowledge contributions of Rita Taylor to prep- aration of this article. Requests for reprints should be sent to Jack M. Fletcher, Depart- ment of Pediatrics, University of Texas Health Science Center at Houston, 7000 Fannin Street, UCT 2478, Houston, TX 77030. E-mail: [email protected] morass of confusion in an area such as LD that has not successfully addressed the classification and definition issues that lead to identification of who does and who does not possesses characteristics of LD. Definitions always reflect an implicit classification indicating how different constructs are measured and used to identify members of the class in terms of similarities and differ- ences relative to other entities that are not considered members of the class (Morris & Fletcher, 1988). For LD, children who are members of this class are his- torically differentiated from children who have other achievement-related difficulties, such as mental retar- dation, sensory disorders, emotional or behavioral dis- turbances, and environmental causes of underachieve- ment, including economic disadvantage, minority language status, and inadequate instruction (Fletcher, Francis, Rourke, Shaywitz, & Shaywitz, 1993; Lyon, Fletcher, & Barnes, 2003). If the classification is valid, children with LD may share characteristics that are similar with other groups of underachievers, but they 506

Transcript of Jack M. Fletcher David J. Francis Robin D. Morris G. Reid...

Page 1: Jack M. Fletcher David J. Francis Robin D. Morris G. Reid Lyonusers.phhp.ufl.edu/rbauer/EBPP/evidence_based_assessment_LD.pdf · Assessment metbods for identifying children and adolescents

Journal of Clinical Child and Adolescent Psychology2005, Vol. 34, No. 3, 506-522

Copyright © 2005 byLawrence Erlbaum Associates, Inc.

Evidence-Based Assessment of Learning Disabilitiesin Children and Adolescents

Jack M. FletcherDepartment of Pediatrics and the Center for Academic and Reading Skills,

University of Texas Health Science Center at Houston

David J. FrancisDepartment of Psychology and the Texas Institute for Measurement, Evaluation and Statistics,

University of Houston

Robin D. MorrisDepartment of Psychology, Georgia State University

G. Reid LyonChild Development and Behavior Branch, National Institute of Child Health and Human Development

The reliability and validity of 4 approaches to the assessment of children and adoles-cents with learning disabilities (LD) are reviewed, including models based on (a) ap-titude-achievement discrepancies, (b) low achievement, (c) intra-individual differ-ences, and (d) response to intervention (RTI). We identify serious psychometricproblems that affect the reliability of models based on aptitude-achievement discrep-ancies and low achievement. There are also significant validity problems for modelsbased on aptitude-achievement discrepancies and intra-individual differences. Mod-els that incorporate RTI have considerable potential for addressing both the reliabil-ity and validity issues but cannot represent the sole criterion for LD identification. Wesuggest that models incorporating both low achievement and RTI concepts have thestrongest evidence base and the most direct relation to treatment. The assessment ofchildren for LD must reflect a stronger underlying classification that takes into ac-count relations with other childhood disorders as well as the reliability and validity ofthe underlying classification and resultant assessment and identification system. Theimplications of this type of model for clinical assessments of children for whom LD isa concern are discussed.

Assessment metbods for identifying children andadolescents with learning disabilities (LD) are mul-tiple, varied, and the subject of heated debates amongpractitioners. Those debates involve issues that extendbeyond the value of specific tests, often reflecting dif-ferent views of how LD is best identified. These viewsreflect variations in the definition of LD and, therefore,variations in what measures are selected to opera-tionalize the definition (Fletcher, Foorman, et al.,2002). Any focus on the "best tests" leads to a hopeless

Grants from the National Institute of Child Health and Human De-velopment, P50 21888, Center for Learning and Attention Disorders,and National Science Foundation 9979968, Early Reading Develop-ment; A Cognitive Neuroscience Approach supported this article.

We gratefully acknowledge contributions of Rita Taylor to prep-aration of this article.

Requests for reprints should be sent to Jack M. Fletcher, Depart-ment of Pediatrics, University of Texas Health Science Center atHouston, 7000 Fannin Street, UCT 2478, Houston, TX 77030.E-mail: [email protected]

morass of confusion in an area such as LD that has notsuccessfully addressed the classification and definitionissues that lead to identification of who does and whodoes not possesses characteristics of LD. Definitionsalways reflect an implicit classification indicating howdifferent constructs are measured and used to identifymembers of the class in terms of similarities and differ-ences relative to other entities that are not consideredmembers of the class (Morris & Fletcher, 1988). ForLD, children who are members of this class are his-torically differentiated from children who have otherachievement-related difficulties, such as mental retar-dation, sensory disorders, emotional or behavioral dis-turbances, and environmental causes of underachieve-ment, including economic disadvantage, minoritylanguage status, and inadequate instruction (Fletcher,Francis, Rourke, Shaywitz, & Shaywitz, 1993; Lyon,Fletcher, & Barnes, 2003). If the classification is valid,children with LD may share characteristics that aresimilar with other groups of underachievers, but they

506

Page 2: Jack M. Fletcher David J. Francis Robin D. Morris G. Reid Lyonusers.phhp.ufl.edu/rbauer/EBPP/evidence_based_assessment_LD.pdf · Assessment metbods for identifying children and adolescents

ASSESSMENT OF LD

should also differ in ways that can be measured andthat can serve to define and operationalize the class ofchildren and adolescents with LD.

In this article, we consider evidence-based ap-proaches to the assessment of LD in the context of differ-ent approaches to the classification and identification ofLD. We argue that the measurement systems that areused to identify children and adolescents with LD are in-separable from the classifications from which the identi-fication criteria evolve. Moreover, all measurement sys-tems are imperfect attempts to measure a construct (LD)that operates as a latent variable that is unknowable in-dependently of how it is measured and therefore of howLD is classified. The construct of LD is imperfectlymeasured simply because the measurement tools them-selves are not error free (Francis et al., 2005). Differentapproaches to classification and definition capitalize onthis error of measurement in ways that reduce or in-crease the reliability of the classification itself Simi-larly, evaluating similarities and differences amonggroups of students who are identified as LD and not LDis a test of the validity of the underlying classification, solong as the variables used to assess this form of validityare not the same as those used for identification (Morris& Fletcher, 1988). As with any form of validity, ade-quate reliability is essential. Classifications can be reli-able and still lack validity. The converse is not true; theycannot be valid and lack reliability. A valid classifica-tion of LD predicts important characteristics of thegroup. Consistent with the spirit of this special section,the most important characteristic is whether the classifi-cation is meaningfully related to intervention. For LD, aclassification should also predict a variety of differenceson cognitive skills, behavioral attributes, and achieve-ment variables not used to form the classification,developmental course, response to intervention (RTI),neurobiological variables, or prognosis (Fletcher, Lyon,et al., 2002).

To address these issues, we consider the reliabilityand validity of four approaches to the classification andassessment of LD: (a) IQ discrepancy and other formsof aptitude-achievement discrepancy, (b) low achieve-ment, (c) intra-individual differences, and (d) modelsincorporating RTI and some form of curriculum-basedmeasurement. We consider how each classification re-flects the historically prominent concept of "unex-pected underachievement" as the key construct in LDassessment (Lyon et al., 2001), that is, what many earlyobservers characterized as a group of children unableto master academic skills despite the absence of knowncauses of poor achievement (sensory disorder, mentalretardation, emotional disturbances, economic disad-vantages, inadequate instruction). From this perspec-tive, a valid classification and measurement system forLD must identify a unique group of underachieversthat is clearly differentiated from groups with otherforms of underachievement.

Defining LD

Historically, definition and classification issueshave haunted the field of LD. As reviewed in Lyon etal. (2001), most early conceptualizations viewed LDsimply as a form of "unexpected" underachievement.The primary approach to assessment involved the iden-tification of intra-individual variability as a marker forthe unexpectedness of LD, along with the exclusion ofother causes of underachievement that would be ex-pected to produce underachievement. This type of defi-nition was explicitly coded into U.S. federal statuteswhen LD was identified as an eligibility category forspecial education in Public Law 94-142 in 1975; es-sentially the same definition is part of current U.S. fed-eral statues in the Individuals with Disabilities Educa-tion Act (1997).

The U.S. statutory definition of LD is essentially aset of concepts that in itself is difficult to operation-alize. In 1977, recommendations for operationalizingthe federal definition of LD were provided to states af-ter passage of Public Law 94—142 to help identify chil-dren in this category of special education (U. S. Officeof Education, 1977). In these regulations, LD wasdefined as a heterogeneous group of seven disorders(oral language, listening comprehension, basic read-ing, reading comprehension, math calculations, mathreasoning, written language) with a common marker ofintra-individual variability represented by a discrep-ancy between IQ and achievement (i.e., unexpectedunderachievement). Unexpectedness was also indi-cated by maintaining the exclusionary criteria presentin the statutory definition that presumably lead to ex-pected underachievement. Other parts of the regula-tions emphasize the need to ensure that the child's edu-cational program provided adequate opportunity tolearn. No recommendations were made concerning theassessment of psychological processes, most likely be-cause it was not clear that reliable methods existedfor assessing processing skills and because the fieldwas not clear on what processes should be assessed(Reschly, Hosp, & Smied, 2003).

This approach to definition is now widely imple-mented with substantial variability across schools, dis-tricts, and states in which students are served in specialeducation as LD (MacMillan & Siperstein, 2002; Mer-cer, Jordan, Allsop, & Mercer, 1996; Reschly et al.,2003). It is also the basis for assessments of LD outsideof schools. Consider, for example, thedefinition of read-ing disorders in the Diagnostic and Statistical Manualof Mental Disorders (4th ed.; American Psychiatric As-sociation, 1994), which indicates that the student mustperform below levels expected for age and IQ, and spec-ifies only sensory disorders as exclusionary:

A. Reading achievement, as measured by individ-ually administered standardized tests of read-

507

Page 3: Jack M. Fletcher David J. Francis Robin D. Morris G. Reid Lyonusers.phhp.ufl.edu/rbauer/EBPP/evidence_based_assessment_LD.pdf · Assessment metbods for identifying children and adolescents

FLETCHER, FRANCIS, MORRIS, LYON

ing accuracy or comprehension, is substan-tially below that expected given the person'schronological age, measured intelligence, andage-appropriate education.

B. The disturbance in Criterion A significantly in-terferes with academic achievement or activi-ties of daily living that require reading skills.

C. If a sensory deficit is present, the reading diffi-culties are in excess of those usually associatedwith it.

The International Classification of Diseases-10 hasa similar definition. It differs largely in being more spe-cific in requiring use of a regression-adjusted discrep-ancy, specifying cut points (achievement two standarderrors below IQ) for identifying a child with LD, andexpanding the range of exclusions.

Although these definitions are used in what are of-ten disparate realms of practice, they lead to similar ap-proaches to the identification of children and adoles-cents as LD. Across these realms, children commonlyreceive IQ and achievement tests. The IQ test is com-monly interpreted as an aptitude measure or indexagainst which achievement is compared. Differentachievement tests are used because LD may affectachievement in reading, math, or written language. Theheterogeneity is recognized explicitly in the U.S. statu-tory and regulatory definitions of LD (Individuals WithDisabihties Education Act, 1997) and in the psychi-atric classifications by the provision of separate defini-tions for each academic domain. However, it is stillessentially the same definition applied in different do-mains. In many settings, this basic assessment is sup-plemented with tests of processing skills derived frommultiple perspectives (neuropsychology, informationprocessing, and theories of LD). The approach boilsdown to administration of a battery of tests to identifyLD, presumably with treatment implications.

Underlying Classification Hypotheses

Implicit in all these definitions are slight variationson a classification model of individuals with LD asthose who show a measurable discrepancy in some butnot all domains of skill development and who are notidentified into another subgroup of poor achievers. Insome instances, the discrepancy is quantified with twotests in an aptitude-achievement model epitomized bythe IQ-discrepancy approach in the U.S. federal regu-latory definition and the psychiatric classifications ofthe Diagnostic and Statistical Manual of Mental Dis-orders (4th ed.; American Psychiatric Association,1994) and the International Classification of Dis-eases-10. Here the classification model implicitly stip-ulates that those who meet an IQ-discrepancyinclusionary criterion are different in meaningful waysfrom those who are underachievers and do not meet the

discrepancy criteria or criteria for one of the exclu-sionary conditions. Some have argued that this modellacks validity and propose that LD is synonymous withunderachievement, so that it should be identified solelyby achievement tests (Siegel, 1992), often with someexclusionary criteria to help ensure that the achieve-ment problem is unexpected. Thus, the contrast is re-ally between a two-test aptitude-achievement discrep-ancy and a one-test chronological age-achievementdiscrepancy with achievement low relative to age-based (or grade-based) expectations. If processingmeasures are added, the model becomes a multitestdiscrepancy model. Identification of a child as LD inall three of these models is typically based on assess-ment at a single point in time, so we refer to them as"status" models. Finally, RTI models emphasize the"adequate opportunity to learn" exclusionary criterionby assessing the child's response to different instruc-tional efforts over time with frequent brief assess-ments, that is, a "change" model. The child who is LDbecomes one who demonstrates intractability in learn-ing characteristics by not responding adequately to in-struction that is effective with most other students.

Dimensional Nature of LD

Each of these four models can be evaluated for reli-ability and validity. Unexpected underachievement, aconcept critically important to the validity of the under-lying construct of LD, can also be examined. The reli-ability issues are similar across the first three modelsand stem from the dimensional nature of LD. Most pop-ulation-based studies have shown that reading and mathskills are normally distributed (Jorm, Share, Matthews,& Matthews, 1986; Lewis, Hitch, & Walker, 1994;Rodgers, 1983; Shalev, Auerbach, Manor, & Gross-Tsur, 2000; Shaywitz, Escobar, Shaywitz, Fletcher, &Makuch, 1992; Silva, McGee, & Williams, 1985).These findings are buttressed by behavioral geneticstudies, which are not consistent with the presence ofqualitatively different characteristics associated withthe heritability of reading and math disorders (Fisher &DeFries, 2002; Gilger, 2002). As dimensional traits thatexist on a continuum, there would be no expectation ofnatural cut points that differentiate individuals with LDfrom those who are underachievers but not identified asLD (Shaywitz et al., 1992).

The unobservable nature of LD makes two-testand one-test discrepancy models unreliable in waysthat are psychometrically predictable but not in waysthat simply equate LD with poor achievement (Fran-cis et al., 2005; Stuebing et al., 2002). The problem isthat the measurement approach is based on a staticassessment model that possesses insufficient informa-tion about the underlying construct to allow for reli-able classifications of individuals along what is es-sentially an unobservable dimension. If LD was a

508

Page 4: Jack M. Fletcher David J. Francis Robin D. Morris G. Reid Lyonusers.phhp.ufl.edu/rbauer/EBPP/evidence_based_assessment_LD.pdf · Assessment metbods for identifying children and adolescents

ASSESSMENT OF LD

manifest concept that was directly observable in thebehavior of affected individuals, or if there were nat-ural discontinuities that represented a qualitativebreakpoint in the distribution of achievement skills orthe cognitive skills on which achievement depends,this problem would be less of an obstacle. However,like achievement or intelligence, LD is a latent con-struct that must be inferred from the pattern of perfor-mance on directly observable operationalizations ofother latent constructs (namely, test scores that indexconstructs like reading achievement, phonologicalawareness, aptitude, and so on). The more informa-tion available to support the inference of LD, themore reliable (and valid) that inference becomes, thussupporting the fine-grained distinctions necessitatedby two-test and one-test discrepancy models. To theextent that the latent construct, LD, is categorical, bywhich we mean that the construct indexes differentclasses of learners (i.e., children who learn differ-ently) as opposed to simply different levels ofachievement, then systems of identification that relyon one measurable variable lack sufficient informa-tion to identify the latent classes and assign individu-als to those classes without placing additional,untestable, and unsupportable constraints on the sys-tem. It is simply not possible to use a single meanand standard deviation and to estimate separatemeans and standard deviations for two (or more)unobservable latent classes of individuals and deter-mine the percentage of individuals falling into eachclass, let alone to classify specific individuals intothose classes. Without constraints, such as specifyingthe magnitude of differences in the means of the la-tent classes, the ratio of standard deviations, and theodds of membership in the two (or more) classes, tbesystem is under-identified, which simply means thatthere are many different solutions that cannot be dis-tinguished from one another.

When the system is under-identified, the only solu-tion is to expand the measurement system to increase thenumber of observed relations, which in one sense iswhat intra-individual difference models attempt by add-ing assessments of processing skills. Other criteria arenecessary because it is impossible to uniquely identify adistinct subgroup of underachieving individuals consis-tent with the construct of LD when identification isbased on a single assessment at a single time point.Adding external criteria, such as an aptitude measure ormultiple assessments of processing skills, increases thedimensionality of the measurement system and makeslatent classification more feasible, even when the othercriteria are themselves imperfect. But the main issuesfor one-test, two-test, and multitest identification mod-els involve the reliability of the underlying classifica-tions and whether they identify a unique subgroup of un-derachievers. In the next section, we examine variationsin reliability and validity for each of these models, fo-

cusing on the importance of reliability, as the validity ofthe classifications can be no stronger than their reliability.

Models Based on Two-TestDiscrepancies

Although the IQ-discrepancy model is the mostwidely utilized approach to identifying LD, there aremany different ways to operationalize the model. Forexample, some implementations are based on a com-posite IQ score, whereas others utilize either a verbalor nonverbal IQ score. Qther approaches drop IQ as theaptitude measure and use a measure such as listeningcomprehension. In the validity section, we discusseach of these approaches. The reliability issues aresimilar for each example of an aptitude-achievementdiscrepancy.

Reliability

Specific reliability problems for two-test discrep-ancy models pertain to any comparison of two corre-lated assessments that involve the determination of achild's performance relative to a cut point on a continu-ous distribution. Discrepancy involves the calculationof a difference score (D) to estimate the true difference(A) between two latent constructs. Thus, discussionsabout discrepancy must distinguish between problemswith the manifest (i.e., observed) difference (D) as anindex of the true difference (A) but also must considerwhether the true difference (A) reflects the construct ofinterest. Problems with the reliability of D based ondifferences between two tests are well known, albeitnot in the LD context (Bereiter, 1967). However, thereis nothing that fundamentally limits the applicability ofthis research to LD if we are willing to accept a notionof A as a marker for LD. There are major problemswith this assumption that are reviewed in Francis et al.(2005). The most significant is regression to the mean.On average, regression to the mean indicates thatscores that are above the mean will be lower when thetest is repeated or when a second correlated test is usedto compute D. In this example, individuals who haveIQ scores above the mean will obtain achievement testscores that, on average, will be lower than the IQ testscore because the achievement score will move towardthe mean. The opposite is true for individuals with IQscores below the mean. This leads to the paradox ofchildren with achievement scores that exceed IQ, or theidentification of low-achieving, higher IQ childrenwith achievement above the average range as LD.

Although adjusting for the correlation of IQ andachievement helps correct for regression effects (Rey-nolds, 1984-1985), unreliability also stems from theattempt to assess a person's standing relative to a cutpoint on a continuous distribution. As discussed in the

509

Page 5: Jack M. Fletcher David J. Francis Robin D. Morris G. Reid Lyonusers.phhp.ufl.edu/rbauer/EBPP/evidence_based_assessment_LD.pdf · Assessment metbods for identifying children and adolescents

FLETCHER, FRANCIS, MORRIS, LYON

following section on low achievement models, thisproblem makes identification with a single test—evenone with small amounts of measurement error—poten-tially unreliable, a problem for any status model.

None of this discussion addresses the validity ques-tion concerning A. Specifically, does A embody LD aswe would want to conceptualize it (e.g., as unexpectedunderachievement), or is A merely a convenient con-ceptualization of LD because it is a conceptualizationthat leads directly to easily implemented, operationaldefinitions, however fiawed they might be?

Validity

The validity of the IQ-discrepancy model has beenextensively studied. Two independent meta-analyseshave shown that effect sizes on measures of achieve-ment and cognitive functions are in the negligible tosmall range (at best) for the comparison of groupsformed on the basis of discrepancies between IQ andreading achievement versus poor readers without an IQdiscrepancy (Hoskyn & Swanson, 2000; Stuebing etal., 2002), findings similar to studies not included inthese meta-analyses (Stanovich & Siegei, 1994). Qthervalidity studies have not found that discrepant andnondiscrepant poor readers differ in long-term prog-nosis (Francis, Shaywitz, Stuebing, Shaywitz, & Flet-cher, 1996; Silva et al., 1985), response to instruction(Fletcher, Lyon, et al., 2002; Jimenez et al., 2003;Stage, Abbott, Jenkins, & Beminger, 2003; Vellutino,Scanlon, & Jaccard, 2003), or neuroimaging correlates(Lyon et al., 2003; but also see Shaywitz et al., 2003,which shows differences in groups varying in IQ butnot IQ discrepancy). Studies of genetic variabilityshow negligible to small differences related to IQ-dis-crepancy models that may reflect regression to themean (Pennington, Gilger, Olson, & DeFries, 1992;Wadsworth, Olson, Pennington, & DeFries, 2000).Similar empirical evidence has been reported for LD inmath and language (Fletcher, Lyon, et al., 2002;Mazzocco & Myers, 2003). This is not surprising giventhat the problems are inherent in the underlyingpsychometric model and have little to do with the spe-cific measures involved in the model except to the ex-tent that specific test reliabilities and intertest correla-tions enter into the equations.

Despite the evidence of weak validity for the practiceof differentiating discrepant and nondiscrepant stu-dents, alternatives based on discrepancy models con-tinue to be proposed, and psychologists outside ofschools commonly implement this flawed model. How-ever, given the reliability problems inherent in IQ dis-crepancy models, it is not surprising that these other at-tempts to operationalize aptitude-achievementdiscrepancy have not met with success. In the Stuebinget al. (2002) meta-analysis, 32 of the 46 major studieshad a clearly defined aptitude measure. Of these studies.

510

19 used Full Scale IQ, 8 used Verbal IQ, 4 used Perfor-mance IQ, and 1 study used a discrepancy of listeningcomprehension and reading comprehension. Not sur-prisingly, these different discrepancy models did notyield results that were different from those when a com-posite IQ measure was utilized. Neither Fletcher et al.(1994) nor Aaron, Kuchta, and Grapenthin (1988) wereable to demonstrate major differences between discrep-ant and low achievement groups formed on the basis oflistening comprehension and reading comprehension.

The differences in these models involve slightchanges in who is identified as discrepant or lowachieving depending on the cut point and the correla-tion of the aptitude and achievement measures. Thechanges simply reflect fluctuations around the cutpoint where children are most similar. It is not surpris-ing that effect sizes comparing poor achievers with andwithout IQ discrepancies are uniformly low acrossthese different models. Current practices based on thisapproach to identification of LD epitomized by thefederal regulatory definition and psychiatric classifica-tions are fundamentally flawed.

One-Test (Low Achievement) Models

Reliability

The measurement problems that emerge when aspecific cut point is used for identification purposes af-fect any psychometric approach to LD identification.These problems are more significant when the testscore is not criterion referenced, or when the score dis-tributions have been smoothed to create a normal uni-variate distribution. To reiterate, the presence of a natu-ral breakpoint in the score distribution, typicallyobserved in multimodal distributions, would make itsimple to validate cut points. But natural breaks are notusually apparent in achievement distributions becausereading and math achievement distributions are nor-mal. Thus, LD is essentially a dimensional trait, or avariation on normal development.

Regardless of normality, measurement error attendsany psychometric procedure and affects cut points in anormal distribution (Shepard, 1980). Because of mea-surement error, any cut point set on the observed distri-bution will lead to instability in the identification ofclass members because observed test scores will fluc-tuate around the cut point with repeated testing or useof an alternative measure of the same construct (e.g.,two reading tests). This fluctuation is not just a prob-lem of correlated tests or simply a matter of settingbetter cut scores or developing better tests. Rather, nosingle observed test score can capture perfectly a stu-dent's ability on an imperfectly measured latent vari-able. The fluctuation in identifications will vary acrossdifferent tests, depending in part on the measurement

Page 6: Jack M. Fletcher David J. Francis Robin D. Morris G. Reid Lyonusers.phhp.ufl.edu/rbauer/EBPP/evidence_based_assessment_LD.pdf · Assessment metbods for identifying children and adolescents

ASSESSMENT OF LD

error. In both real and simulated data sets, fluctuationsin up to 35% of cases are found when a single test isused to identify a cut point. Similar problems are ap-parent if a two-test discrepancy model is used (Franciset al., 2005; Shaywitz et al., 1992).

This problem is less of an issue for research, whichrarely hinges on the identification of individual chil-dren. Thus, it does not have great impact on the validityof a low achievement classification because, on aver-age, children around the cut point who may be fluctuat-ing in and out of the class of interest with repeated test-ing are not very different. However, the problems foran individual child who is being considered for specialeducation placement or a psychiatric diagnosis are ob-vious. A positive identification in either example oftencarries a poor prognosis.

Validity

Models based on the use of achievement markerscan be shown to have a great deal of validity (seeFletcher, Lyon, et al., 2002; Fletcher, Morris, & Lyon,2003; Siegel, 1992). In this respect, if groups areformed such that the participants do not meet criteriafor mental retardation and have achievement scoresthat are below the 25th percentile, a variety of compari-sons show that subgroups of underachievers emergethat can be validly differentiated on external variablesand help demonstrate the viability of tbe construct ofLD. For example, if children with reading and mathdisabilities identified in this manner are compared totypical achievers, it is possible to show that these threegroups display different cognitive correlates. In addi-tion, neurobiological studies show that these groupsdiffer both in the neural correlates of reading and mathperformance as well as the heritability of reading andmath disorders (Lyon et al., 2003). These achievementsubgroups, which by definition include children whomeet either low achievement or IQ-discrepancy crite-ria, even differ in RTI, providing strong evidence for"aptitude by treatment" interactions; math interven-tions provided for children with reading problems aredemonstrably ineffective, and vice versa.

Despite this evidence for validity, concerns emergeabout definitions based solely on achievement cutpoints. Simply utilizing a low achievement definition,even when different exclusionary criteria are applied,does not operationalize the true meaning of unexpectedunderachievement. Although such an approach toidentification is deceptively simple, it is arguablewhether the subgroups that remain represent a uniquegroup of underachievers. For example, how well areunderachievers whose low performance is attributed toLD differentiated from underachievers whose low per-formance is attributed to emotional disturbance, eco-nomic disadvantage, or inadequate instruction (Lyon etal., 2001)? To use the example of word recognition.

there is little evidence that these subgroups vary interms of phonological awareness or other languagetasks, RTI, or even neuroimaging correlates. In this re-spect, the validity is weak because the underlying con-struct of LD is not adequately assessed. Additional cri-teria are needed, but simply adding a single aptitudemeasure decreases reliability and does not add to thevalidity of a low achievement definition.

Models Based on Intra-individualDifferences

A commonly proposed alternative to models basedon aptitude-achievement discrepancies or low achieve-ment involves an examination of individual differenceson measures of cognitive function. Thus, for example,a recent consensus article from 10 major advocacygroups organized by the National Center for LearningDisabilities (2002) stated that "while IQ tests do notmeasure or predict a student's response to instruction,measures of neuropsychological functioning and infor-mation processing could be included in evaluation pro-tocols in ways that document the areas of strength andvulnerability needed to make informed decisions abouteligibility for services, or more importantly, what ser-vices are needed. An essential characteristic of LD isfailure to achieve at a level of expected performancebased upon tbe student's other abilities" (p. 4).

This statement proposes intra-individual differ-ences as a marker for unexpected underachievement.As opposed to a single marker such as IQ discrepancyor low achievement, unexpectedness is operationalizedas unevenness in scores across multiple tests. The per-son identified as LD (by definition) has strengths inmany areas of cognitive or neuropsychological func-tion but weaknesses in core attributes that lead tounderachievement. The LD is unexpected because theweaknesses lead to selected and narrow difficultieswith achievement and adaptive functions. Proponentsof this view believe that such approaches identify chil-dren as LD based on profiles across tests that differen-tiate types of LD and also differentiate LD from otherchildhood disorders, such as mental retardation and be-havioral disorders such as attention deficit hyperactiv-ity disorder (ADHD). This approach leads to defini-tions based on inclusionary criteria in which childrenare identified as LD based on characteristics that relateto intra-individual differences (Lyon et al., 2001).

Reliability

In essence, the intra-individual difference modelemploys a multitest discrepancy approach and carrieswith it the problems involved with estimation of dis-crepancies and cut points. These problems are inherentin any attempt to identify a person as LD (Fletcher et

511

Page 7: Jack M. Fletcher David J. Francis Robin D. Morris G. Reid Lyonusers.phhp.ufl.edu/rbauer/EBPP/evidence_based_assessment_LD.pdf · Assessment metbods for identifying children and adolescents

FLETCHER, FRANCIS, MORRIS, LYON

al., 2003). However, examining patterns of test scoreshas long been favored by clinical neuropsychologists,largely because it seems to correspond more closelywith clinical practice and because it adds informationto the decision-making process (see the elegant discus-sion of differential test scores, discrepancies, and pro-files in Rourke, 1975). The unique reliability issue in-volves the idea that LD is represented by unevenness intest profiles. This may be true, but does this observa-tion mean that children with flatter profiles are not LD?Severity is correlated with the shape of a profile due tothe lack of independence of different tests that mightbe used to construct the profile (Morris, Fletcher, &Francis, 1993). Children with increasingly severe read-ing problems, for example, will show increasingly flatprofiles across processing measures (e.g., phonologi-cal awareness, rapid naming, and vocabulary) in directcorrespondence to severity because all these measuresare moderately correlated. Thus, if the inclusionarycriterion for the presence of LD is evidence of a dis-crepancy in neuropsychological or processing skills,such an approach may exclude the most severely im-paired children, irrespective of global measures such asIQ, because more severely impaired children are lesslikely to show skill discrepancies due to the inter-correlation of the tests (Morris et al., 1993, 1998).

Validity

A major assumption of a multitest intra-individualdifferences model is that identification based on per-formance patterns will lead to enhanced treatment ofchildren with LD. It is commonly assumed that suchtests point out areas that need intervention. However,there is little evidence that strengths and weaknesses inprocessing skills are related to intervention outcomes.It is well established that training in underlying pro-cesses does not usually generalize into the related aca-demic area (Lyon & Moats, 1988; Reschly, Tilly, &Grimes, 1999; Vellutino, 1979). For example, trainingon phonological awareness skills without explicittransfer to a letter component produces gains in phono-logical awareness but not in reading (National ReadingPanel, 2000). Training in auditory or visual perceptualskills does not lead to better outcomes for childrenidentified as "auditory" or "visual" learners (Lyon,Fletcher, Fuchs, & Chhabra, in press; Vellutino,Fletcher, Scanlon, & Snowling, 2004).

There is support for the idea that intra-individualdifferences identify some children as LD, epitomizedby the link of dyslexia with word recognition and pho-nological processing (Vellutino et al., 2004). Even herethe intra-individual differences model focuses on skillsthat are only correlated with the achievement domain.Simply identifying children with LD based solely onprocessing skills is questionable and would likely yield

many false positive identifications of children as LDwithout achievement difficulties (Torgesen, 2002). Thereliability of many processing measures is lower thanthose associated with the achievement (or IQ) domain,so such false positives should be expected. Other thanthe word recognition-phonological processing link,relations of processing and other forms of LD are notwell established (Torgesen, 2002). Finally, what do welearn about variability in processing skills that is notapparent in profiles across achievement domains(Fletcher et al., 2003)? In fact, the model has the mostvalidity at the level of achievement markers but simplycollapses into a low achievement model in the absenceof processing measures. Thus, if we accept the notionthat specific discrepancies in cognitive domains are aunique marker for LD, given that the processing mea-sures are usually linked to an achievement domain,what is unique about variations in processing skills thatis not apparent in variations in achievement domains?Would we eliminate as LD students who have difficul-ties in reading, math, and writing? This is not viable, asimpairments in all domains often occur in non-mentallyretarded children with language-based difficulties.

Models Incorporating RTI

An alternative approach to status models that wouldincrease the reliability of these would increase thenumber of time points whereby a child was assessed.Shepard (1980), for example, proposed that IQ dis-crepancies could be assessed more reliably if a childwas tested four times. The impracticality of such an ap-proach, which would require about 10 to 12 hr perchild, is obvious, not to mention that even more re-sources would be devoted to determination of eligibility,taking away funds and time needed for intervention.

Another approach to increasing the number of timepoints would involve much shorter assessments of keyachievement skills over time. These approaches, orRTI models, typically involve identification practicesbased in part on multiple short assessment probes ofknowledge and performance in a specific academic do-main, such as reading or math (Fuchs & Fuchs, 1998).By linking multiple assessments to specific attempts tointervene with the child, the construct of unexpectedunderachievement can be operationalized, in part, onthe basis of nonresponsiveness to instruction to whichmost other students respond (Gresham, 2002). In fact,this is still a variation of a discrepancy model, but theadvantage is that the model is better identified becauseof multiple short assessments of a key attribute (e.g.,reading, math) over time.

Such models have been proposed in several recentconsensus reports that address LD identification(Bradley, Danielson, & Hallahan, 2002; President's

512

Page 8: Jack M. Fletcher David J. Francis Robin D. Morris G. Reid Lyonusers.phhp.ufl.edu/rbauer/EBPP/evidence_based_assessment_LD.pdf · Assessment metbods for identifying children and adolescents

ASSESSMENT OF LD

Commission on Excellence in Special Education,2002), most notably in a recent report of the NationalResearch Council (Donavon & Cross, 2002). These re-ports suggest that one criterion for LD identification iswhen a student does not respond to high-quality in-struction and intervention. The implementation of thisapproach requires frequent monitoring of progress asthe student receives the intervention (Fuchs & Fuchs,1998). Speece and Case (2001) found that serial as-sessments of growth and level of performance in read-ing fluency predicted reading problems in at-risk chil-dren better than a single assessment of fluency. Thisapproach is anchored in a system known as curricu-lum-based measurement, where the assessments them-selves have adequate reliability and are constantly im-proving (Fuchs & Fuchs, 1999; Shinn, 1998).

Reliability

Are RTI approaches that involve multiple assess-ments over time psychometrically more rehable thantraditional approaches to LD identification? An ap-proach based on multiple measures over time has thepotential to reduce the difficulties encountered with re-liance on a single assessment at a single time point.Certainly the reliability of the multiple assessment ap-proach is greater than if the single assessment is used toform a discrepancy, because typically the discrepancywill be a poorer (i.e., less reliable) measure of the truedifference than the observed measures of their respec-tive underlying constructs. Focusing on successivemeasurements over time has the effect of moving theidentification process from "ability-ability" compari-sons (two different abilities compared at one point intime) to "ability change" models (same ability overtime). Such approaches have the potential to amelio-rate the difficulties associated with ability-ability dis-crepancies, whether univariate or bivariate, becausethey involve the use of more than two assessment timepoints. Generally, the more information that is broughtto bear on any eligibility or diagnostic decision, themore reliable the decision, although it is certainly pos-sible to create counterexamples by combining infor-mation from irrelevant or confounding sources. Suchirrelevancies are not likely to be introduced by assess-ing the same skill over time as in a model that incorpo-rates RTI, when that skill was previously deemed rele-vant to assess at a single time point.

Conceptually, the study of change is made more fea-sible by the collection of multiple assessments becausethe precision by which change can be measured im-proves as the number of time points increases (Rogosa,1995). When more than two assessment time points arecollected, the reliability of estimated change can also beestimated directly from the data, and the imprecision in-herent in individual estimates can be used to provide im-

proved estimates of growth parameters for individualstudents as well as for groups of students. If change is notlinear, the use of four or more time points can map theform of growth. And for those who favor status modelsover change or learning models, it remains possible touse the intercept term in the individual growth model asan estimate of status. This intercept provides a more pre-cise estimate of true status at any single point in timethan would any single assessment.

These approaches are not without difficulty. The in-troduction of serial assessments has not eliminated thenecessity of indirect estimation of the parameters of in-terest. In the discrepancy model, D is used to estimateA. A model incorporating RTI uses a complex functionof the observed data for individual / as well as the datafrom many other individuals to estimate each of the TZij,they true learning parameters for individual /. Differentapproaches to this estimation problem have varyingstrengths and weaknesses but all will make assump-tions about the arithmetic form of the model, the distri-bution of the learning parameters, and the distributionsof the errors. The ramifications of these assumptionsfor inferences about individual learning parametersmust be studied in the LD context.

Models based on RTI also involve imperfect mea-sures that include measurement error (Fletcher et al.,2003). However, this problem is reduced because ofthe use of multiple assessments and the borrowing ofprecision from the entire collection of data to provide amore precise estimate of the growth parameters of eachindividual. Thus, it becomes possible to estimate achild's "true" status more precisely as well as to esti-mate the rate of skill acquisition and to use these esti-mates as indicators of LD. In addition, this approach toestimation makes assumptions about the distribution oferrors of measurement. In some cases, errors might beassumed to be uncorrelated. Again, this assumptionmust be examined in terms of its importance to infer-ences about individual status and rates of learning. Inmany cases, the inclusion of multiple assessment timepoints will allow this assumption to be relaxed, and thecorrelation among errors of measurement can be esti-mated and taken into account in forming inferencesabout individual status and rates of learning.

There still could be a need to identify individualchildren as LD based on cut points unless the entireprocess devolves to clinical judgment. Models that in-clude RTI do not solve the issue of the dimensional ver-sus categorical nature of LD. Determining cut pointsand benchmarks, for example, will continue to be anarbitrary process until cut points are linked to func-tional outcomes (Cisek, 2001), an issue never reallyaddressed in LD identification for any identificationmodel. However, models that include RTI have thepromise of incorporating functional outcomes becausethey are tied to intervention response.

513

Page 9: Jack M. Fletcher David J. Francis Robin D. Morris G. Reid Lyonusers.phhp.ufl.edu/rbauer/EBPP/evidence_based_assessment_LD.pdf · Assessment metbods for identifying children and adolescents

FLETCHER, FRANCIS, MORRIS, LYON

Validity

The introduction of serial assessments has an ad-vantage beyond any statistical advantage it may conferfor the estimation of individual's true status. Specific-ally, the introduction of serial assessments bringslearning and the measurement of change to the fore-front in conceptualizations of LD. The collection of se-rial assessments under specified conditions of effectiveinstruction simultaneously focuses the definition ofLD on a failure to learn, where learning can be mea-sured more directly. Moreover, the specific instruc-tional elements and the conditions under which theyare implemented can be described, thereby providing aclearer basis for the expectation of learning and the un-expectedness of any failure to learn. Finally, focusingon multiple assessments in a RTI model has the advan-tage of clearly tying the identification process to themost important component of the construct of LD,which is unexpected underachievement. Models thatincorporate RTI may identify a unique group of chil-dren that can be clearly differentiated from other lowachievers in terms of cognitive correlates, prognosis,and even neurobiological factors.

Studies of children defined using different methodsas responders and nonresponders clearly show signifi-cant differences in cognitive skills. For example. Stageet al. (2003), Vellutino et al. (2003), and Vaughn,Linan-Thompson, and Hickman-Davis (2003) foundthat nonresponders to early intervention differed fromresponders in both preintervention achievement scoresand preintervention cognitive tasks. Nonresponderstypically had more severe deficits in both reading-re-lated factors (e.g., phonemic awareness, fluency) andreading skills. In recent imaging studies involving bothearly intervention and remediation of older students(see Fletcher, Simos, Papanicolaou, & Denton, 2004),we likewise found that individuals who were non-responders showed more severe reading difficultiesprior to intervention. More dramatic were the differ-ences in neuroimaging correlates between those whoresponded to intervention and those who did not. Wehave found that nonresponders persist with a brain acti-vation pattern that generally demonstrates a failure toactivate left hemisphere areas known to be involved inthe development of reading skills. In fact, those whowere nonresponders showed predominant right-hemi-sphere activity much like that observed in children andadults with identified reading disabilities (Fletcher etal., 2004).

Implications for Clinical Assessments

This review of classification models may seem re-moved from the question of how to conduct clinical as-sessments of children suspected of LD. In fact, when a

psychologist conducts any assessment for LD, the se-lection of tests reflects the underlying classificationmodel and the constructs it specifies. If the psycholo-gist or educator adopts an aptitude-achievement dis-crepancy model, the primary tools will be the tests usedto operationalize aptitude (e.g., IQ) and achievement.If the clinician adopts a low achievement model, apti-tude will not be measured—just achievement. An intra-individual differences model will require neuropsycho-logical or cognitive processing measures. If a model isused that incorporates RTI, assessments of the integrityof the implementation of the intervention and progressmonitoring assessments are necessary.

In evaluating models, we found little evidence thatsupports the aptitude-achievement and intra-individ-ual difference models. Both involve the assessment ofcognitive processes that do not contribute to the identi-fication of a unique group of underachievers with LDand have serious reliability problems. The low achieve-ment model has more reliability and validity but doesnot identify a unique group of underachievers. RTI cri-teria may permit identification of a unique group of un-derachievers but by themselves are not sufficient foridentification of LD. Combining the strengths of thelow achievement and RTI models leads to a hybridmodel that invokes concepts of low achievement andRTI. This model can be expanded to incorporate as-sessment of contextual factors and other disorders thatshould be evaluated because of the need for differentialtreatment (Fletcher, Foorman, et al., 2002).

Learning disorders attributable to mental retarda-tion, sensory problems (blindness, deafness), languagestatus (e.g., English as a second language), or transientfactors (adjustment difficulties, disruption of the homeor school environment) should not be identified as LD.We have not included economic disadvantage, comor-bid emotional and behavior disorders, or establishedneurological disorders as exclusionary criteria andwould stipulate that the only way to exclude LD in chil-dren with these associated conditions is to provide anintervention that is appropriate and evaluate RTI. Aclassification of LD may exclude children with emo-tional or neurological disorders, or those who are eco-nomically disadvantaged from the LD category be-cause of policy or resource issues—all are eligible forspecial education—but children with these associatedconditions have forms of underachievement that aredifficult to distinguish from those in children with LD.In the end, LD should be identified only after adequateopportunity to learn has been systematically evaluated.Those who do not respond to intervention need morespecialized, individualized, and intensive treatments,as well as the probable conferment of disability statusand the civil rights protections that come with identifi-cation. It is the intractability as indicated by an inade-quate response to quality instruction that must be pres-ent to identify a child as LD. If a child responds, LD is

514

Page 10: Jack M. Fletcher David J. Francis Robin D. Morris G. Reid Lyonusers.phhp.ufl.edu/rbauer/EBPP/evidence_based_assessment_LD.pdf · Assessment metbods for identifying children and adolescents

ASSESSMENT OF LD

not indicated. Any child who has achievement dif-ficulties should receive intervention, whether it is tu-toring with support by a college student or intensiveintervention by an experienced, well-trained academictherapist.

This model is quite different from the one that childclinical and other psychologists have utilized for thepast few decades, and some may respond by suggestingthat this model can only be implemented in schools. Infact, we argue that in the absence of an evaluation ofRTI, LD should not be identified in any setting—school, clinic, hospital, and so on. We conceptualizetraditional clinical evaluations as an opportunity toidentify children as "at risk" for LD and to intervenewith any child who is struggling to achieve. In schools,screening for reading problems can be done on alarge-scale basis in kindergarten and Grade 1 as advo-cated in Donovan and Cross (2002) and implementedin states such as Texas (Fletcher, Foorman, et al.,2002). Those who are identified as at-risk have theirprogress monitored and receive increasingly intense,multitiered interventions that may eventuate in identi-fication for special education in LD (Vaughn & Fuchs,2003). In a multitiered intervention approach, childrenare screened for risk characteristics, such as weak-nesses in letter-sound knowledge and phonologicalawareness in kindergarten and word reading in Grade1, with immediate monitoring of progress (Torgesen,2002). Depending on the rate of progress, interventionsare intensified and modified in an effort to acceleratethe rate of development of an academic skill. Childrenare not identified with disabilities until the final tier ofthe process.

Evaluations outside of schools should utilize a simi-lar approach based on measurement of the three com-ponents of the hybrid model proposed by the consen-sus group in Bradley et al. (2002): (a) low achievement,(b) RTI, and (c) consideration of contextual factors andexclusions. Any psychological evaluation of a child oradolescent should consider the relevant achievementconstructs (see following section) that represent thedifferent types of LD. If there is evidence of lowachievement, the focus should not be on extensive as-sessments of processing skills but on referral to an ap-propriate source for intervention. The psychologistshould expect to have a working relationship with theintervention source so that RTI will be measured. Thismeans that clinical child psychologists must be knowl-edgeable about educational interventions and preparedto develop a treatment plan that incorporates this formof intervention, just as they may be prepared to workwith a physician around medication for problems withattention or anxiety. It is perfectly reasonable to ask thechild to return every 4 to 6 months to repeat achieve-ment tests and independently evaluate progress in con-junction with more frequent assessments of progressobtained by the intervention source.

The psychologist should also evaluate for other prob-lems that may be associated with low achievement toadequately plan treatment. If mental retardation is sus-pected, IQ, adaptive behavior, and related assessmentsconsistent with this classification can be administered.But note that if the child or adolescent has achievementscores in reading comprehension or math that are with-in two standard deviations of the mean (consistent withtraditional legal definitions of mental retardation), ordevelopment of adaptive behavior obviously inconsis-tent with mental retardation, assessment of IQ is notnecessary as such levels of performance preclude men-tal retardation. Some children may have oral languagedisorders that require speech and language interven-tion that will require referral and additional evaluation.Screening with vocabulary measures and through in-teracting with the child will help identify these chil-dren; the vocabulary screen will also help identify chil-dren who may benefit from additional intellectual screen-ing. Many children with achievement difficulties orLD also have comorbid difficulties with attention andboth internalizing and externalizing psychopathology.These disorders need to be assessed and treated, assimply referring a child for educational interventionwithout addressing comorbidities will surely increasethe probability of a poor RTI. We believe that no clini-cal evaluation of a child should be conducted without adocumentation of achievement levels through directassessment or school report of such an assessment.If achievement deficits are apparent, intervention ofsome sort should be provided. It is not likely that treat-ing a child for a comorbid disorder, such as ADHD,will result in improved levels of achievement in the ab-sence of educational intervention.

Altogether, we are suggesting that from the per-spective of LD, psychologists should perform assess-ments for emotional and behavioral disorders consis-tent with other articles in this special section. For LD,they need to administer achievement tests and evaluateRTI. This is regardless of subdiscipline (e.g., schoolpsychologist, child clinical psychologist, neuropsy-chologist) or setting. To evaluate achievement, indi-vidualized norm-referenced assessments should beconducted. RTI requires assessments of interventionintegrity and monitoring of progress.

Evaluating Achievement

Identifying specific achievement tests is not diffi-cult, although tests for some domains are better devel-oped than others. Lyon et al. (2003) suggested that LDrepresented six major achievement types, including (a)word recognition; (b) reading fluency; (c) readingcomprehension; (d) mathematics computations; (e)reading and math, which is not really a comorbid asso-ciation but a more severe reading problem with distinctmath difficulties; and (f) written expression, which

515

Page 11: Jack M. Fletcher David J. Francis Robin D. Morris G. Reid Lyonusers.phhp.ufl.edu/rbauer/EBPP/evidence_based_assessment_LD.pdf · Assessment metbods for identifying children and adolescents

FLETCHER, FRANCIS, MORRIS, LYON

could involve spelling, handwriting, or text generation.These patterns were drawn from the research literature(e.g., Rourke & Finlayson, 1978; Siegel & Ryan, 1988;Stothard & Hulme, 1996), but an extensive discussionof the evidence for these types is beyond the scope ofthis article (see Lyon et al., 2003). Tbe assessment im-plications are straightforward. Many children and ado-lescents will have difficulties in more than one domain,so a thorough assessment of academic achievement isvery important.

A set of achievement tests should be used. It is help-ful to use tests from the same battery because the nor-mative group is the same, which facilitates compari-sons across tests. However, the battery from whichthese tests are chosen is less important than the con-structs that are measured. Any single battery hasstrengths and weaknesses that can be supplementedwitb measures from other assessments. Given the sug-gestion that six types of LD may exist, the importantconstructs are word recognition, reading fluency, read-ing comprehension, math computations, and writtenexpression. We usually assess spelling as a screen forwritten expression and handwriting difficulties andmath and writing fluency as supplemental assessments.

Table 1 outlines these constructs and how theycan be assessed with the commonly used Woodcock-Johnson Achievement Battery-III (WJ; Woodcock,McGrew, & Mather, 2001) or the Wechsler IndividualAchievement Test-n (WIAT; Wechsler, 2001). We usethe WJ and WIAT because they meet established crite-ria for reliability (internal consistency and test-retest)and validity (construct and concurrent) and were devel-oped to account for variations in ethnicity and socio-economic status. In particular, the normative samplingtook into account this type of variation, and analyses(differential item functioning) were conducted to iden-tify items that were not comparable across these sourc-es of normative variation. There are also other norm-referenced assessments that can be used to supplementthe WJ or WIAT, which we discuss later. Few of thesesupplemental measures have been developed with the

care of the WJ or the WIAT, particularly with regard tothe adequacy of the normative base and attempts to ad-dress different forms of normative variation.

Table 1 should not be taken to indicate that there are11 different types of LD, one for each test. To reiterate,many children have problems in multiple domains. Tbepattem of academic strengths and weaknesses is an im-portant consideration (Fletcher, Foorman, et al., 2002;Rourke, 1975). Table 1 identifies constructs and coretests that would be administered to every child and sup-plemental tests that would be used if there were con-cerns about a particular academic domain. If the re-ferral indicated concerns about a particular area,additional tests from other measures would be used.Most children with significant academic problemswhere LD may eventually be a concern have difficultywith word recognition and consequently tend to haveproblems across domains of reading. Going beyond thecore tests is usually not necessary if the child has prob-lems with word recognition. Isolated problems withreading comprehension and written expression occurinfrequendy. If the problem is specifically matb, usingassessments in addition to the WJ or WIAT is helpful inensuring that the deficiency is not just a matter of atten-tion difficulties.

An advantage of the WJ and WIAT is the assess-ment of word recognition for both real words andpseudowords, the latter permitting an assessment ofthe child's ability to apply phonics rules to sound outwords. Most achievement batteries assess recognitionof real words, which is the essential component. Thesemeasures tend to be highly intercorrelated across dif-ferent assessment batteries, including tbe Wide RangeAchievement Test-III (Wilkinson, 1993), and the Ac-curacy measure from the Gray Oral Reading Test-Fourth Edition (Wiederholt & Bryant, 2001).

The WJ also has a silent reading speed subtest tbat,in our assessments, is highly correlated with other flu-ency measures despite the fact that it is not simply oralreading speed, requiring the child to answer somequestions while reading a series of passages for 3 min.

Table 1. Achievement Constructs in Relation to Subtests From the WJ and the WIAT

Construct WJ Subtest WIAT Subtest

Core TestsWord Recognition

Reading FluencyReading ComprehensionMath ComputationsWritten Expression

Supplemental TestsMath RuencyWriting FluencyMath ConceptsWritten Expression

Word IdentificationWord AttackReading FluencyPassage ComprehensionCalculationSpelling

Math FluencyWriting FluencyQuantitative ConceptsWriting Samples

Word ReadingPseudoword Decoding

Reading Comprehension"Numerical OperationsSpelling

Written Expression''

"Also assesses fluency

516

Page 12: Jack M. Fletcher David J. Francis Robin D. Morris G. Reid Lyonusers.phhp.ufl.edu/rbauer/EBPP/evidence_based_assessment_LD.pdf · Assessment metbods for identifying children and adolescents

ASSESSMENT OF LD

The WIAT permits assessment of reading speed duringsilent-reading comprehension. Both assessments areeasily supplemented with the Test of Word Reading Ef-ficiency (Torgesen, Wagner, & Raschotte 1999), whichinvolves oral reading of real words and pseudowordson a list. The Test of Reading Fluency (Deno & Mar-ston, 2001) is an option that requires text reading. Bothmeasures are quick and efficient, and the former wasdesigned with item analyses addressing differentialitem responses across ethnic groups. Whenever text isread out loud, fluency can be assessed as words readcorrectly per minute. The Gray Oral Reading Test-Fourth Edition includes a score for fluency of oral textreading.

Reading comprehension can only be screened withthe WJ Passage Comprehension subtests which is acloze-based assessment in which the child reads asentence or passage and fills in a blank with a miss-ing word. The Reading Vocabulary subtest is used tocreate a reading comprehension composite, but itplaces such a premium on decoding that we usuallydo not administer it. The WIAT also does not demandmuch reading of text. Some children who struggle tocomprehend text in the classroom do not have diffi-culties on these subtests because the level of com-plexity rarely parallels what children are expected toread on an everyday basis. Supplemental assessmentsusing the Group Reading Assessment and DiagnosticEducation (Williams, Cassidy, & Samuels, 2001), theGray Oral Reading Test-Fourth Edition, or even oneof the well-constructed reading comprehension as-sessments from the group-based Stanford Achieve-ment Test-lOth Edition (Harcourt Assessment, 2002),Iowa Test of Basic Skills (Hoover, Hieronymous,Frisbie, & Dunbar, 2001), or similar instrument is es-sential. Often children have had these assessments inschool, and it is useful to review results as part of theoverall evaluation.

Reading comprehension is a difficult construct toassess (Francis, Fletcher, Catts, & Tomblin, in press).In evaluating comprehension skills, the assessorshould attend to the nature of the material the child isasked to read and the response format. Reading com-prehension tests vary in what the child reads (sen-tences, paragraphs, pages), the response format (cloze,open-ended questions, multiple-choice, think aloud),memory demands (answering questions with and with-out the text available), and how deeper aspects ofmeaning are evaluated (understanding of the essentialmeaning vs. literal understanding, vocabulary knowl-edge and elaboration, ability to infer or predict). It maybe difficult to determine the source of the child's diffi-culties based on a single measure. Thus, if the issue iscomprehension and the source is not in the child'sword recognition or fluency skills, multiple measuresthat assess reading comprehension in different waysare needed.

For math, the Calculations subtest of the WJ andNumerical Operations subtest of the WIAT are pa-per-and-pencil tests of math computations (Table 1).Low scores on this type of task predict variation in cog-nitive skills depending on other academic strengthsand weaknesses (Rourke, 1993). However, low scorescould reflect problems with fact retrieval and verbalworking memory if word recognition is comparablylower, as opposed to problems with procedural knowl-edge if word recognition is significantly higher and notdeficient. Deficient scores can also reflect problemspaying attention, especially in children with ADHD.The math computations subtests from the Wide RangeAchievement Test-III is also frequently used and isuseful because it is timed and the problems are less or-ganized. The key is the paper-and-pencil assessment ofmath computations, which is how difficulties in mathare typically manifested in children who do not havereading problems. As in reading, assessments of flu-ency are helpful, although there is no evidence sugges-tive of a math fluency disorder. In Table 1, the WJ MathFluency subtest is identified as a supplemental mea-sure, representing a timed assessment of single-digitarithmetic facts that may be helpful for identifyingchildren who lack speed in basic arithmetic skills. Suchdifficulties make it difficult to master more advancedaspects of mathematics. If an assessment of math con-cepts is needed, which we would do only if math wasan overriding concern, the Quantitative Concepts sub-test of the WJ is more useful than the WJ Applied Prob-lems or WIAT Math Reasoning subtests, which intro-duce word problems that are difficult for children withreading difficulties.

Written expression is most difficult to assess, partlybecause it is not clear what constitutes a disorder ofwritten expression—spelling, handwriting, or text gen-eration (Lyon et al., 2003). Obviously problems withthe first two components will constrain text generation.Spelling should be assessed as it may represent the pri-mary source of difficulty with written expression forchildren, especially if they also have word-recognitiondifficulties. The analysis of spelling errors (Rourke,Fisk, & Strang, 1986) can be informative in under-standing whether the problem is with the phonologicalcomponent of language or with the visual form of let-ters (i.e., orthography). Spelling also permits an infor-mal assessment of handwriting.

Table 1 identifies WJ and WIAT measures of writ-ten expression. The utility of these measures is not wellestablished, and the significant generation of text interms of construction and writing of passages and sto-ries is not really required. As with reading comprehen-sion, it may be important to supplement or even replacethis assessment with a test such as the Thematic Matu-rity subtest of the Test of Written Language (Hammill& Larsen, 1998). Measuring fluency with a measuresuch as the WJ Writing Fluency subtest may also be

517

Page 13: Jack M. Fletcher David J. Francis Robin D. Morris G. Reid Lyonusers.phhp.ufl.edu/rbauer/EBPP/evidence_based_assessment_LD.pdf · Assessment metbods for identifying children and adolescents

FLETCHER, FRANCIS, MORRIS, LYON

useful. As in reading and math, fluency of writing pre-dicts the quality of composition.

From this type of assessment, characteristic pat-terns emerge that will demarcate the classification andindicate a need for specific kinds of intervention. Foreach of the six types of LD, there are interventions withevidence of efficacy that should be utilized in or out ofa school setting (Lyon et al., in press). The goal is not todiagnose LD, which is not feasible in a one-shot evalu-ation for the psychometric and conceptual reasons out-lined previously, but to identify achievement difficul-ties that can be addressed through intervention. If theassessor is knowledgeable about these patterns, veryspecific intervention recommendations, as well as theneed for other assessments, can be made.

Table 2 summarizes achievement patterns that arewell established in research (Fletcher, Foorman, et al.,2002; Lyon et al., 2003). Intervention should be con-sidered for any child who performs below the 25th per-centile on a well-established assessment, with an un-derstanding that these are not firm cut points andshould be evaluated across all the measures. We are notindicating that 25% of all children have a LD, only thatscores below the 25th percentile are commonly associ-ated with low performance in school, assuming the cutpoint is reliably attained. In examining Table 2, the de-cision rules should not be rigidly applied and are sim-ply guidelines to assist clinicians. Identifying LD is al-ways based on factors beyond just the test scores. Thedecision process should focus on what is needed for in-tervention, which requires an assessment of contextual

variables and the presence of eomorbid disorders thatinfluence decisions about what sort of plan will bemost effective for an individual child. Low achieve-ment is related to many contextual variables, which iswhy the flexibility in special education guidelines al-lows interdisciplinary teams to base decisions on fac-tors that go beyond test scores. The purpose of assess-ment is ultimately to develop an intervention plan.

Evaluating Response to Instruction

Once a child is screened or tested for achievementdeficits, progress should be monitored if a problem isidentified. It is astonishing that U.S. special educationguidelines do not require at least yearly readminis-tration of the achievement tests that were used to justifythe placement as one method of assessing the efficacy ofthe intervention plan. If a child is responding to inter-vention, his or her rate of development should be accel-erated relative to the normative population (i.e., theachievement gap is closed). As part of this assessment ofRTI, progress should be monitored on a frequent basis ifthe problem is with word recognition or fluency, mathcomputations, or spelling. Reading comprehension andhigher forms of written expression will show less rapidchange and progress, as monitoring tools for these typesof problems have not been adequately developed.

Most of the tests mentioned here have alternativeforms. But some have been developed to permit as-sessments with even more frequency and are referredto "curriculum-based assessments" (Fuchs & Fuchs,

Table 2. Eight Achievement Patterns Associated With Intervention

1. Decoding and Spelling < 90; Arithmetic one half standard deviation higher than word recognition and spelling and at least 90. This is apattern characterized by problems with single word decoding skills and better arithmetic ability. Reading comprehension will varydepending on how it is assessed but is usually impaired. Children with this pattern have significant phonological language problems andstrengths in spatial and motor skills (Rourke & Finlayson, 1978).

2. Arithmetic < 90, Decoding and Spelling > 90 and at least 7 points higher. Children with difficulties that only involve math show thispattern, which is associated with problems with motor and spatial skills, problem-solving deficiencies, and disorganization (Rourke &Finlayson, 1978). It usually represents problems with math procedures as opposed to math facts (Lyon et al., 2003).

3. Decoding, Comprehension, Spelling, and Arithmetic < 90. This pattern represents a problem with word recognition characterized bylanguage and working memory problems more severe than in children with poor decoding and better development of math skills (Rourke& Finlayson, 1978). The math problem involves learning and retrieving math facts (Lyon et al., 2003).

4. Spelling and Arithmetic < 90, Decoding > 90 and 7 points higher. Essentially the same pattern as Number 3 except the motor (and writing)component is more severe.

5. Reading Comprehension < 90 and 7 points below decoding. This pattern often reflects long-term oral language disorder. Problems withreceptive language, short-term memory, and attention are apparent, with strengths in phonological language skills (Stothard & Hulme,1996).

6. Decoding skills 7 points lower than Comprehension skills and < 90. This pattern reflects a phonological language problem with usuallybetter than average semantic language and spatial skills (Stothard & Hulme, 1996). The pattern is not apparent for reading comprehensionmeasures that are timed or require significant amounts of text reading.

7. Reading Fluency < 90 and < Decoding by one half standard deviation will reflect a problem where accuracy of word reading is less of aproblem than automaticity of word reading (Lyon et al., 2003).

8. Spelling < 90. This pattern reflects (a) motor deficits in a young child or (b) residuals of earlier phonological language problems that havebeen remediated or compensated in older children and adults. The pattern is common in adults with a history of word recognitiondifficulties. Fluency is often impaired.

Note: The patterns are based on relations of reading decoding, reading fluency, reading comprehension, spelling, and arithmetic. It is assumedthat any score below the 25th percentile (standard score = 90) is impaired and that a difference of one half standard deviations is important (± 7standard score points). The patterns should be considered prototypes and the rules loosely applied (adapted from Fletcher, Foorman,Boudousquie, Bames, Schatschneider, & Francis, 2002). These patterns are not related to IQ scores.

518

Page 14: Jack M. Fletcher David J. Francis Robin D. Morris G. Reid Lyonusers.phhp.ufl.edu/rbauer/EBPP/evidence_based_assessment_LD.pdf · Assessment metbods for identifying children and adolescents

ASSESSMENT OF LD

1999). Such measures are often used by the intervener(e.g., teacher) to document how well a child is respond-ing to instruction. Typically a child would read a shortreading passage appropriate for grade level (or do a setof math computations) for 2 to 3 min. The number ofwords (or math problems) correctly read (or computed)would be graphed over time and compared againstgrade-level benchmarks, representing a criterion-refer-enced form of assessment. A child may be screenedwith these measures, and those performing below thebenchmark may be candidates for intervention, espe-cially in schools.

Such assessments should also be accompanied byobservations of the integrity of the implementation ofthe intervention, including the amount of time spent onsupplemental instruction, especially if the child doesnot appear to be making progress. School psycholo-gists are often well prepared in this area of assessment.Although a psychologist operating outside of a schoolmay not be in a position to do curriculum-based assess-ments or to personally evaluate the intervention, suchassessments should be expected, especially if the refer-ral is to a private academic therapist.

A variety of methods have been developed, and theassessments with the most widespread utilization arethe Monitoring Basic Skills Progress (Fuchs, Hamlett,& Fuchs, 1990), which assesses reading, math, andspelling fluency, and the Dynamic Indicators of BasicEarly Reading Skills (Good, Simmons, & Kame'enui,2001), a battery of different reading fiuency measures.Some of these tools are focused primarily on the lowergrades, but the norm-referenced assessments of flu-ency identified previously—especially if they havealternative forms—can be used with older students,.These measures meet accepted psychometric criteriafor reliability and validity. The curriculum-based as-sessment measures have not been assessed as formallyfor differential item functioning but have been widelyemployed with school populations that are quite di-verse (Fuchs & Fuchs, 1999; Shinn, 1998).

Conclusions

Based on our evaluation of models, we propose a hy-brid model that incorporates features of low achieve-ment and RTI models for the identification ofchiidren asLD. We specifically do not find evidence to support ex-tensive assessments of cognitive, neuropsychological,or intellectual skills to identify children as LD. Al-though some may view this model as only for schools,we reject the idea that the routine evaluations done in thepast by psychologists and educators outside of schoolsettings are useful for LD. We find little value in the ideaof evaluating a child in a single assessment and conclud-ing that the child has LD based on an IQ-achievementdiscrepancy, low achievement, or profiles on neuropsy-

chological tests, largely because such assessments arenot directly related to treatment and the diagnosis itselfis not reliable. As soon as it is apparent that the childhas an achievement problem, a referral for interventionshould be made and the resources that might be spent ondiagnosis should be spent on intervention. Childrenshould not be diagnosed as LD until a proper attempt atinstruction has been made. Assessment of achievementskills should be a routine part of any psychological eval-uation of a child and cannot be seen as the province ofjust the schools. Serial monitoring of RTI and the integ-rity of instruction should be completed before childrenare identified as LD. There are issues involved in the in-tervention component, estimation of slope and intercepteffects, as well as decisions that have to be made aboutcut points that will differentiate responders and non-responders (Gresham, 2002). For these reasons alone,RTI cannot be the sole criterion for identification, andflexibility in decision making is required. At the sametime, there appears to be considerable validity to thisapproach, implying that it is indeed possible to reliab-ly identify nonresponders as a group with unexpectedunderachievement.

In addition to the evidence for validity (and thegreater reliability of the underlying psychometricmodel), the model does not require the use of exclu-sionary criteria (especially emotional disturbance andeconomic disadvantage) to operationalize unexpectedunderachievement, thus capturing the construct of LD.This is an important consideration given the lack of ev-idence validating classifications that utilize these par-ticular exclusions (Kavale, 1988; Lyon et al., 2001).The model does operationalize the concept of opportu-nity to learn, which is rarely directly assessed as part ofLD identification. It is also a model that can only beimplemented in an instructional setting, such as aschool, or in clinical settings outside of public schoolswhere remediation is utilized, such as an academictherapy setting. But it is not consistent with the tradi-tional approach to LD identification based on a singleadministration of a test battery and consideration of adiagnosis, which we believe is an outmoded model thatdetracts from intervention. In the absence of an attemptto systematically instruct the child, LD cannot be "di-agnosed," obviating the traditional "test and treat"model, as identifying LD must be the end product of anattempt to instruct the child (i.e., "treat and test"). Thisis not a post hoc approach but rather an argument thatin the absence of the opportunity to learn exclusion, theconcept of LD has no basis in evidence, and lowachievement per se is not adequate evidence for LD.Such an approach ties the concept of LD to treatment,which is important. It may be that a single assessmentmay indicate "risk" or even an achievement disorder.But such an assessment cannot indicate a "disability"in the absence of functional criteria that would includeopportunity to learn.

519

Page 15: Jack M. Fletcher David J. Francis Robin D. Morris G. Reid Lyonusers.phhp.ufl.edu/rbauer/EBPP/evidence_based_assessment_LD.pdf · Assessment metbods for identifying children and adolescents

FLETCHER, FRANCIS, MORRIS, LYON

A final comment involves what some will see asequating LD with measurable deficits on achievementtests. Some will argue that the mere presence of a defi-cit on a measure of processing skills means that personshould be identified by LD, in part because of the beliefthat such deficits indicate a brain anomaly. The mostcommon example is the linking of "executive func-tion" deficits with LD. We argue that the concept of LDis empty without a focus on achievement, largely be-cause it becomes more difficult to identify a uniquesubgroup representing LD that would be different fromother classes of childhood disorders. Executive func-tions, for example, are often linked to ADHD, but clas-sifications of ADHD based on executive function defi-cits as assessed by cognitive tests do not have muchvalidity (Barkley, 1997). Moreover, executive functiondeficits characterize many childhood populations.

More fundamentally, consider an overarching clas-sification of childhood learning and behavioral diff-iculties. For LD, achievement deficits representmarkers for the underlying classification. What distin-guishes the LD prototype from, for example, a behav-ioral disorder such as ADHD is the presence of anachievement deficit. If a child with ADHD has anachievement deficit, it is usually reflective of a comor-bid association (Fletcher, Shaywitz, & Shaywitz,1999). If we expand our classification to mental retar-dation, the key for differentiating mental retardationfrom LD (or ADHD) is not just the intelligence testscore. Rather, the major difference is in adaptive be-havior, where mental retardation should reflect a per-vasive deficit in adaptive behavior and LD as a rela-tively narrow deficit (Bradley et al., 2002). So aclassification of these three major disorders requiresmarkers for achievement, attention-related behaviors,and adaptive behavior. In the absence of these types ofmarkers, and a focus on classification, all children withproblems are simply disordered and there is no needfor assessment because they would all require the sameinterventions. When LD is tied to levels and patterns ofachievement, an evidence base for differential inter-ventions focused on learning in specific academic do-mains emerges. This is the strongest evidence for thevalidity of the concept of LD, its classification, and thesource of evidence-based approaches to assessmentand identification.

References

Aaron, P. G., Kuchta, S., & Grapenthin. C. T. (1988). Is there a thingcalled dyslexia? Annals of Dyslexia, 38, 33-49.

American Psychiatric Association. (1994). Diagnostic and statisti-cal manual of mental disorders (4th ed.). Washington, DC:Author.

Barkley, R. A. (1997). ADHD and the nature of self-control. NewYork: Guilford.

Bereiter, C. (1967). Some persisting dilemmas in the measurement ofchange. In C. W. Harris (Ed.), Problems in the measurement ofchange. Madison: University of Wisconsin Press.

Bradley, R., Danielson, L., & Hallahan, D. R (Eds.). (2002). Identifi-cation of learning disabilities: Research to practice. Mahwah,NJ: Lawrence Erlbaum Associates, Inc.

Christensen, C. A. (1992). Discrepancy defmitions of reading dis-ability: Has the quest led us astray? Reading Research Quar-terly, 27, 276-278.

Cisek, G. J. (2001). Conjectures on the rise and call of standard set-ting: An introduction to context and practice. In G. J. Cisek(Ed.), Setting performance standards: Concepts, methods, andperspectives (pp. 3-18). Mahwah, NJ: Lawrence Erlbaum As-sociates, Inc.

Deno, S. L., & Marston, D. (2001). Test of Oral Reading Fluency.Minneapolis: Educators Testing Service.

Donavon, M. S., & Cross, C. T. (2002). Minority students in specialand gifted education. Washington, DC: National AcademyPress.

Fisher, S. E., & DeFries, J. C. (2002). Developmental dyslexia: Ge-netic dissection of a complex cognitive trait. Nature: Review ofNeuroscience, 10, 767-780.

Fletcher, J. M., Foorman, B. R., Boudousquie, A. B., Barnes, M. A.,Schatschneider, C , & Francis, D. J. (2002). Assessment ofreading and learning disabilities: A research-based, inter-vention-oriented approach. Joumal of School Psychology, 40,27-63.

Fletcher, J. M., Francis, D. J., Rourke, B. P, Shaywitz, B. A., &Shaywitz, S. E. (1993). Classification of learning disabilities:Relationships with other childhood disorders. In G. R. Lyon, D.Gray, J. Kavanagh, & N. Krasnegor (Eds.), Better understand-ing learning disabilities (pp. 27-55). New York: Brookes.

Fletcher, J. M., Lyon, G. R., Barnes, M. Stuebing, K. K., Francis, D.J., Olson, R., et al. (2002). Classification of learning disabili-ties: An evidence-based evaluation. In R. Bradley, L. Daniel-son, & D. P. Hallahan (Eds.), Identification of learning disa-bilities: Research to practice (pp. 185-250). Mahwah. NJ:Lawrence Erlbaum Associates, Inc.

Fletcher, J. M., Morris, R. D., & Lyon, G. R. (2003). Classificationand defmition of learning disabilities: An integrative perspec-tive. In H. L. Swanson, K. R. Harris, & S. Graham. Handbook oflearning disabilities (pp. 30-56). New York: Guilford.

Fletcher, J. M., Shaywitz, S. E., Shankweiler, D. P, Katz, L.,Liberman, I. Y., Stuebing, K. K., et al. (1994). Cognitive pro-files of reading disability: Comparisons of discrepancy and lowachievement definitions. Journal of Educational Psychology,85, 1-18.

Fletcher, J. M., Shaywitz, S. E., & Shaywitz, B. A. (1999).Comorbidity of learning and attention disorders: Separate butequal. Pediatric Clinics of North America, 46, 885-897.

Fletcher, J. M., Simos, P. G., Papanicolaou, A. C , & Denton, C.(2004). Neuroimaging in reading research. In N. K. Duke & M.H. Mallette (Eds.), Literacy research methods (pp. 252-286).New York: Guilford.

Francis, D. J., Retcher, J. M., Catts, H., & Tomblin, B. (in press). Di-mensions affecting the assessment of reading comprehension.In S. G. Paris & S. A. Stahl (Eds.), Current issues in readingcomprehension and assessment. Mahwah, NJ: Lawrence Erl-baum Associate, Inc.

Francis, D. J., Fletcher, J. M., Stuebing, K. K., Lyon, G. R.,Shaywitz, B. A., & Shaywitz, S. E. (2005). Psychometric ap-proaches to the identification of learning disabilities: IQ andachievement scores are not sufficient. Joumal of Learning Dis-abilities, 38, 98-108.

Francis, D. J., Shaywitz, S. E., Stuebing, K. K., Shaywitz, B. A., &Fletcher, J. M. (1996). Developmental lag versus deficit modelsof reading disability: A longitudinal, individual growth curvesanalysis. Journal of Educational Psychology, 88, 3-17.

520

Page 16: Jack M. Fletcher David J. Francis Robin D. Morris G. Reid Lyonusers.phhp.ufl.edu/rbauer/EBPP/evidence_based_assessment_LD.pdf · Assessment metbods for identifying children and adolescents

ASSESSMENT OF LD

Fuchs, L., & Fuchs, D. (1998). Treatment validity: A simplifyingconcept for reconceptualizing the identification of learning dis-abilities. Learning Disabilities Research and Practice, 4,204-219.

Fuchs, L., & Fuchs, D. (1999). Monitoring student progress towardthe development of reading competence: A review of threeforms of classroom-based assessment. The School PsychologyReview, 28, 659-611.

Fuchs, L. S., Hamlett, C. L., & Fuchs, D. (1990). Monitoring basicskills pmgress. Austin, TX: PRO-ED.

Gilger, J. W. (2002). Current issues in the neurology and genetics oflearning-related traits and disorders: Introduction to the specialissue. Journal of Learning Disabilities, 34, 490-491.

Good, R. H., Simmons, D. C , & Kame'enui, E. J. (2001). The im-portance and decision-making utility of a continuum of flu-ency-based indicators of foundational reading skills for third-grade high-stakes outcomes. Scientific Studies of Reading, 5,257-288.

Gresham, F. M. (2002). Responsiveness to intervention: An alterna-tive approach to the identification of learning disabilities. In R.Bradley. L. Danielson, & D. P. Hallahan (Eds.), Identification oflearning disabilities: Research to practice (pp. 467-564). Mah-wah, NJ: Lawrence Erlbaum Associates, Inc.

Hammill, D. D., & Larsen, S. C. (1996). Test of Written Language.Austin, TX: PRO-ED.

Harcourt Assessment. (2002). Stanford Achievement Test {lOth ed.).New York: Author.

Hoover, A. N., Hieronymous, A. N., Frisbie, D. A., & Dunbar, S. B.(2001). Iowa Test of Basic Skill. Itasca, IL: Riverside.

Hoskyn, M., & Swanson, H. L (2000). Cognitive processing of lowachievers and children with reading disabilities: A selectivemeta-analytic review of the published literature. The SchoolPsychology Review, 29, 102-119.

Individuals With Disabilities Education Act, 20 U. S. C. 1400 et seq.Stat. 34 C F . R . 300(1997).

Jimenez, J. E., del Rosario Ortiz, M., Rodrigo, M., Hernandez-Valle,1., Ramirez, G., Estfivez, A., et al. (2003). Do the effects of com-puter-assisted practice differ for children with reading disabili-ties with and without IQ-achievement discrepancy? Journal ofLearning Disabilities, 36, 34-41.

Jorm, A. F., Share, D. L., Matthews, M., & Matthews, R. (1986).Cognitive factors at school entry predictive of specific readingretardation and general reading backwardness: A research note.Journal of Child Psychology, 27, 45-54.

Kavale, K. A. (1988). Learning disability and cultural disadvantage:The case for a relationship. Learning Disability Quarterly, J I,195-210.

Lewis, C , Hitch, G. J., & Walker, P (1994). The prevalence of spe-cific arithmetic difficulties and specific reading difficulties in 9-to 10-year-old boys and girls. Journal of Child Psychology Psy-chiatry, 35, 283-292.

Lyon, G. R., Fletcher, J. M., & Barnes, M. C. (2003). Learning dis-abilities. In E. J. Mash & R. A. Barkley (Eds.), Child psy-chopathology (2nd ed.., pp. 520-586). New York: Guilford.

Lyon, G. R., Fletcher, J. M., Fuchs, L., & Chhabra, V. (in press).Treatment of learning disabilities. In E. Mash & R. Barkley(Eds.), Treatment of childhood disorders. New York: Guilford.

Lyon, G. R., Fletcher, J. M., Shaywitz, S. E., Shaywitz, B. A.,Torgesen, J. K., Wood, F. B., et al. (2001). Rethinking learningdisabilities. In C. E. Finn, Jr., R. A. J. Rotherham, & C. R.Hokanson, Jr. (Eds.), Rethinking special education for a newcentury (pp. 259-287). Washington, DC: Thomas B. FordhamFoundation and Progressive Policy Institute.

Lyon, G. R., & Moats, L. C. (1988). Critical issues in the instructionof the learning disabled. Journal of Consulting and ClinicalPsychology, 56, 830-835.

MacMillan, D. L., & Siperstein, G. N. (2002). Learning disabilitiesas operationally defined by schools. In R. Bradley, L. Daniel-

son, & D. P. Hallahan (Eds.), Identification of learning disa-bilities: Research to practice (pp. 287-368). Mahwah, NJ:Lawrence Erlbaum Associates, Inc.

Mazzocco, M. M. M., & Myers, G. F. (2003). Complexities in identi-fying and defining mathematics learning disability in the pri-mary school age years. Annals of Dyslexia, 53, 218-253.

Mercer, C. D., Jordan, L., AUsop, D. H., & Mercer, A. R. (1996). Learn-ing disabilities definitions and criteria used by state education de-partments. Learning Disability Quarterly, 19, 217-232.

Morris, R., & Fletcher, J. M. (1988). Classification in neuropsychology:A theoretical framework and research paradigm. Joumal of Clini-cal and Experimental Neuropsychology, 10, 640-658.

Morris, R. D., Fletcher, J. M., & Francis, D. J. (1993). Conceptualand psychometric issues in the neuropsychological assessmentof children: Measurement of ability discrepancy and change. InI. Rapin & S. Segalovitz (Eds.), Handbook of neuropsychology(Vol. 7, pp. 341-352). Amsterdam: Elsevier.

Morris, R. D., Stuebing, K. K., Fletcher, J. M., Shaywitz, S. E., Lyon,G. R., Shankweiler, D. P., et al. (1998). Subtypes of reading dis-ability: Variability around a phonological core. Journal of Edu-cational Psychology, 90, 347-373.

National Center for Learning Disabilities. (2002). Achieving betteroutcomes—maintaining rights: An approach to identifying andserving students with specific learning disabilities. New York:Author.

National Reading Panel. (2000). Teaching children to read: An evi-dence-based assessment of the scientific research literature onreading and its implications for reading instruction. Wash-ington, DC: National Institute of Child Health and HumanDevelopment.

Pennington, B. F., Gilger, J. W., Olson, R. K., & DeFries, J. C.(1992). The external validity of age- versus IQ-discrepancy def-initions of reading disability: Lessons from a twin study. Jour-nal of Learning Disabilities, 25, 562-573.

President's Commission on Excellence in Special Education.(2002). A new era: Revitalizing special education for childrenand their families. Washington, DC: Department of Education.

Reschly, D. J., Hosp, J. L., & Smied, C. M. (2003)./4n(/m(7e.s to go...:State SLD requirements and authoritative recommendations.National Research Center on Learning Disabilities. RetrievedJuly 22, 2003, from the World Wide Web: www.nrcld.org

Reschly, D. J., Tilly, W. D., & Grimes, J. P. (1999). Special educationin transition: Functional assessment and noncategorical pro-gramming. Longmont, CO: Sopris West.

Reynolds, C. (1984-1985). Critical measurement issues in learningd.\sdh'\\\i\&fi. Journal of Special Education, 18, 451—476.

Rodgers, B. (1983). The identification and prevalence of specificreading retardation. British Journal of Educational Psychology,53, 369-373.

Rogosa, D. (1995). Myths about longitudinal research (plus supple-mental questions). In J. M. Gottman (Ed.), The analysis ofchange (pp. 3-66). Mahwah, NJ: Lawrence Erlbaum Associ-ates, Inc.

Rourke, B. P. (1975). Brain-behavior relationships in children withlearning disabilities: A research program. American Psycholo-gist, 30, 9\\-920.

Rourke, B. P. (1993). Arithmetic disabilities specific or otherwise: Anueropsychological perspective. Journal of Learning Disabil-ities, 26, 214-226.

Rourke, B. P., & Finlayson, M. A. J. (1978). Neuropsychological sig-nificance of variations in patterns of academic performance:Verbal and visual-spatial abilities. Journal of Pediatric Psy-chology, 3, 62-66.

Rourke, B. P., Fisk, J., & Strang, J. (1986). Neuropsychological as-sessment of children. New York: Guilford.

Shalev, R. S., Auerbach, J., Manor, O., & Gross-Tsur, V. (2000). De-velopmental dyscalculia: Prevalence and prognosis. EuropeanAdolescent Psychiatry, 9, 58-64.

521

Page 17: Jack M. Fletcher David J. Francis Robin D. Morris G. Reid Lyonusers.phhp.ufl.edu/rbauer/EBPP/evidence_based_assessment_LD.pdf · Assessment metbods for identifying children and adolescents

FLETCHER, FRANCIS, MORRIS, LYON

Shaywitz, S. E., Escobar, M. D., Shaywitz, B. A., Fletcher, J. M., &Makuch, R. (1992). Distribution and temporal stability of dys-lexia in an epidemiological sample of 414 children followed lon-gitudinally. New England Journal of Medicine, 326, 145-150.

Shaywitz, S, E., Shaywitz, B. A., Fulbright, R. K., Skudlarski, P.,Mencl, W. E., Constable, R. T., et al. (2003). Neural systems forcompensation and persistence: Young adult outcomes of child-hood reading disability. Biological Psychiatry, 54, 25-33.

Shepard, L. (1980). An evaluation of the regression discrepancymethod for identifying children with learning disabilities. Jour-nal of Special Education, 14, 79-91.

Shinn, M. R. (1998). Advanced applications of curriculum-basedmeasurement. New York: Guilford.

Siegel, L. S. (1992). An evaluation of the discrepancy definition ofdyslexia. Journal of Learning Disabilities, 25, 618-629.

Siegel, L. S., & Ryan, E. (1988). Working memory in subtypes oflearning disabled children. Journal of Clinical and Experimen-tal Neuropsychology, 10, 55.

Silva, P. A., McGee, R., & Williams, S. (1985). Some characteristicsof 9-year-old boys with general reading backwardness or spe-cific reading retardation. Journal of Child Psychology and Psy-chiatry, 26, 401-421.

Speece, D. L., & Case, L. P. (2001). Classification in context: An al-ternative approach to identify early reading disability. Journalof Educational Psychology, 93, 735-749.

Stage, S. A., Abbott, R. D., Jenkins, J. R., & Berninger, V. W. (2003).Predicting response to early reading intervention from verbalIQ, reading-related language abilities, attention ratings, andverbal IQ-word reading discrepancy: Failure to validate dis-crepancy method. Journal of Learning Disabilities, 36, 24-33.

Stanovich, K. E., & Siegel, L. S. (1994). Phenotypic performanceprofile of children with reading disabilities: A regression-basedtest of the phonological-core variable-difference model. Jour-nal of Educational Psychology, 86, 24-53.

Stothard, S. E., & Hulme, C. (1996). A comparison of reading com-prehension and decoding difficulties in children. In C. Comoldi& J. Oakhill (Eds.), Reading comprehension difficulties: Pro-cesses and intervention (pp. 93-112). Mahwah, NJ: LawrenceEribaum Associates, Inc.

Stuebing, K. K., Hetcher, J. M., LeDoux, J. M., Lyon, G. R.,Shaywitz, S. E., & Shaywitz, B. A. (2002). Validity of IQ-dis-crepancy classifications of reading disabilities: A meta-analy-sis. American Educational Research Journal, 39, 469-518.

Torgesen, J. K. (2002). Empirical and theoretical support for directdiagnosis of learning disabilities by assessment of intrinsic pro-cessing weaknesses. In R. Bradley, L. Danielson, & D. Hal-

lahan (Eds.), Identification of learning disabilities: Research topractice (pp. 565-650). Mahwah, NJ: Lawrence Eribaum. As-sociates, Inc.

Torgesen, J., Wagner, R., & Raschotte, C. (1999). Test of Word Read-ing Efficiency. Austin, TX: PRO-ED.

U.S. Qffice of Education. (1977). Assistance to states for educationfor handicapped children: Procedures for evaluating specificlearning disabilities. Federal Register, 42, GI082-G1085.

Vaughn, S., & Fuchs, L. S. (2003). Redefining learning disabilities asinadequate response to instruction. Learning Disabilities Re-search and Practice, 18, 137-146.

Vaughn, S. R., Linan-Thompson, S., & Hickman-Davis, P (2003).Response to treatment as a means of identifying students withreading/learning disabilities. Exceptional Children, 69,391-409.

Vellutino, F. R. (1979). Dyslexia: Theory and research. Cambridge,MA: MIT Press.

Vellutino, F. R., Fletcher, J. M., Scanlon, D. M., & Snowling, M. J.(2004). Specific reading disability (dyslexia): What have welearned in the past four decades? Journal of Child Psychiatryand Psychology, 45, 2-40.

Vellutino, F. R., Scanlon, D. M., & Jaccard, J. (2003). Toward distin-guishing between cognitive and experiential deficits as primarysources of difficulty in learning to read: A two year follow-up ofdifficult remediate and readily remediate poor readers. In B. R.Foorman (Ed.), Preventing and remediating reading difficul-ties: Bringing science to scale fpp. 73-120). Baltimore: York.

Wadsworth, S. J., Olson, R. K., Pennington, B. F., & DeFries, J. C.(2000). Differential genetic etiology of reading disability as afunction of IQ. Journal of Learning Disabilities, 33, 192-199.

Wechsler, D. (2001). Wechsler Individual Achievement Test (2nded.). San Antonio, TX: Psychological Corporation.

Wiederholt, J. L, & Bryant, B. R. (2001). Gray Oral Reading Test(4th ed.). Austin, TX: PRO-ED.

Wilkinson, G. (1993). Wide Range Achievement Test-3. Wilmington,DE: Wide Range.

Williams, K. T, Cassidy, J., & Samuels, S. J. (2001). Croup readingassessment and diagnostic education. Circle Pines, MN: Amer-ican Guidance Services.

Woodcock, R. W., McGrew, K. S., & Mather, N. (2001). Wood-cock-Johnson III. Itasca, IL: Riverside.

Received March 4, 2004Accepted November 10, 2004

522

Page 18: Jack M. Fletcher David J. Francis Robin D. Morris G. Reid Lyonusers.phhp.ufl.edu/rbauer/EBPP/evidence_based_assessment_LD.pdf · Assessment metbods for identifying children and adolescents