Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research,...

357

Transcript of Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research,...

Page 1: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy
Page 2: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

Handbook of Accessible AchievementTests for All Students

Page 3: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy
Page 4: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

Stephen N. Elliott · Ryan J. Kettler ·Peter A. Beddow · Alexander KurzEditors

Handbookof AccessibleAchievement Testsfor All Students

Bridging the Gaps BetweenResearch, Practice, and Policy

123

Page 5: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

EditorsStephen N. ElliottLearning Sciences InstituteArizona State UniversityTempe, AZ, [email protected]

Ryan J. KettlerDepartment of Special EducationVanderbilt UniversityNashville, TN, [email protected]

Peter A. BeddowDepartment of Special EducationVanderbilt UniversityNashville, TN, [email protected]

Alexander KurzDepartment of Special EducationVanderbilt UniversityNashville, TN, [email protected]

ISBN 978-1-4419-9355-7 e-ISBN 978-1-4419-9356-4DOI 10.1007/978-1-4419-9356-4Springer New York Dordrecht Heidelberg London

Library of Congress Control Number: 2011925233

© Springer Science+Business Media, LLC 2011All rights reserved. This work may not be translated or copied in whole or in part without thewritten permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street,New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarlyanalysis. Use in connection with any form of information storage and retrieval, electronicadaptation, computer software, or by similar or dissimilar methodology now known or hereafterdeveloped is forbidden.The use in this publication of trade names, trademarks, service marks, and similar terms, even ifthey are not identified as such, is not to be taken as an expression of opinion as to whether or notthey are subject to proprietary rights.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

Page 6: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

I dedicate this book to state assessment leaders like SandraBerndt, Elizabeth Compton, Leola Sueoka, and Dawn McGrathwho have dedicated their careers to supporting teachers andmaking sure all students have access to quality instruction andmeaningful assessments. Without them and colleagues like themin other states, this book would be irrelevant. With them, thereis hope.

- Steve

I dedicate this book to Kelly and Austin, who are alwayssupportive of my career.

- Ryan

Special thanks to Darin Gordon, who showed me the way out ofthe mire, and to Austin Cagle, who helped set my feet on therock.

- Peter

Wir stehen am Anfang. Der Dank hierfür geht an Maria,Heinrich, Rebecca, Gavin, Zachary und Doktorvater Steve.Together we can try to turn the tide.

- Alex

Page 7: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy
Page 8: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

Preface

For several decades, the concept of access has been emphasized by educators,researchers, and policy makers in reference to the critical need for educa-tional equity for all students. Recently, the term accessibility has been usedto describe the degree to which achievement tests permit the full range oftest-takers to demonstrate what they know and can do, regardless of disabilitystatus or other individual characteristics. This handbook contains the perspec-tives of experts in policy, research, and practice who share the common goalof defining and operationalizing accessibility to advance the development anduse of tests that yield valid inferences about achievement for all students.

Tempe, Arizona Stephen N. ElliottNashville, Tennessee Ryan J. KettlerNashville, Tennessee Peter A. BeddowNashville, Tennessee Alexander Kurz

vii

Page 9: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy
Page 10: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

Contents

1 Creating Access to Instruction and Testsof Achievement: Challenges and Solutions . . . . . . . . . . . 1Stephen N. Elliott, Peter A. Beddow, Alexander Kurz,and Ryan J. Kettler

Part I Government Policies and Legal Considerations

2 U.S. Policies Supporting Inclusive Assessmentsfor Students with Disabilities . . . . . . . . . . . . . . . . . . 19Susan C. Weigert

3 U.S. Legal Issues in Educational Testing of Special Populations 33S.E. Phillips

4 IEP Team Decision-Making for More InclusiveAssessments: Policies, Percentages, and Personal Decisions . . 69Naomi Zigmond, Amanda Kloo, and Christopher J. Lemons

5 Australian Policies to Support Inclusive Assessments . . . . . 83Michael Davies and Ian Dempsey

Part II Classroom Connections

6 Access to What Should Be Taught and Will BeTested: Students’ Opportunity to Learn the IntendedCurriculum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99Alexander Kurz

7 Instructional Adaptations: Accommodations andModifications That Support Accessible Instruction . . . . . . 131Leanne R. Ketterlin-Geller and Elisa M. Jamgochian

8 Test-Taking Skills and Their Impact on Accessibilityfor All Students . . . . . . . . . . . . . . . . . . . . . . . . . . 147Ryan J. Kettler, Jeffery P. Braden, and Peter A. Beddow

ix

Page 11: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

x Contents

Part III Test Design and Innovative Practices

9 Accessibility Theory: Guiding the Science andPractice of Test Item Design with the Test-Taker in Mind . . . 163Peter A. Beddow, Alexander Kurz, and Jennifer R. Frey

10 Validity Evidence for Making Decisions AboutAccommodated and Modified Large-Scale Tests . . . . . . . . 183Gerald Tindal and Daniel Anderson

11 Item-Writing Practice and Evidence . . . . . . . . . . . . . . 201Michael C. Rodriguez

12 Language Issues in the Design of Accessible Items . . . . . . . 217Jamal Abedi

13 Effects of Modification Packages to Improve Testand Item Accessibility: Less Is More . . . . . . . . . . . . . . 231Ryan J. Kettler

14 Including Student Voices in the Design of MoreInclusive Assessments . . . . . . . . . . . . . . . . . . . . . . 243Andrew T. Roach and Peter A. Beddow

15 Computerized Tests Sensitive to Individual Needs . . . . . . . 255Michael Russell

16 The 6D Framework: A Validity Frameworkfor Defining Proficient Performance and SettingCut Scores for Accessible Tests . . . . . . . . . . . . . . . . . 275Karla L. Egan, M. Christina Schneider,and Steve Ferrara

Part IV Conclusions

17 Implementing Modified Achievement Tests:Questions, Challenges, Pretending, and PotentialNegative Consequences . . . . . . . . . . . . . . . . . . . . . . 295Christopher J. Lemons, Amanda Kloo,and Naomi Zigmond

18 Accessible Tests of Student Achievement: Accessand Innovations for Excellence . . . . . . . . . . . . . . . . . 319Stephen N. Elliott, Ryan J. Kettler, Peter A. Beddow,and Alexander Kurz

Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

Page 12: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

Contributors

Jamal Abedi Graduate School of Education, University of California,Davis, CA 95616, USA, [email protected]

Daniel Anderson University of Oregon, Eugene, OR 97403, USA,[email protected]

Peter A. Beddow Department of Special Education, Peabody College ofVanderbilt University, Nashville, TN 37067, USA, [email protected]

Jeffery P. Braden College of Humanities and Social Sciences, NorthCarolina State University, Raleigh, NC 27695-8101, USA,[email protected]

Michael Davies School of Education and Professional Studies, Facultyof Education, Griffith University, Brisbane, QLD 4122, Australia,[email protected]

Ian Dempsey School of Education, University of Newcastle, Callaghan,NSW 2308, Australia, [email protected]

Karla L. Egan CTB/McGraw-Hill, Monterey, CA 93940, USA,[email protected]

Stephen N. Elliott Learning Sciences Institute, Arizona State University,Tempe, AZ 85287, USA, [email protected]

Steve Ferrara CTB/McGraw-Hill, Washington, DC 20005, USA,[email protected]

Jennifer R. Frey Peabody College of Vanderbilt University, Nashville,TN 37067, USA, [email protected]

Elisa M. Jamgochian University of Oregon, Eugene, OR 97403, USA,[email protected]

Ryan J. Kettler Department of Special Education, Peabody Collegeof Vanderbilt University, Nashville, TN 37067, USA,[email protected]

Leanne R. Ketterlin-Geller Southern Methodist University, Dallas,TX 75252, USA, [email protected]

xi

Page 13: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

xii Contributors

Amanda Kloo Department of Instruction and Learning, Schoolof Education, University of Pittsburgh, Pittsburgh, PA 15260, USA,[email protected]

Alexander Kurz Department of Special Education, Peabody Collegeof Vanderbilt University, Nashville, TN 37067, USA,[email protected]

Christopher J. Lemons University of Pittsburgh, Pittsburgh, PA 15260,USA, [email protected]

S.E. Phillips Consultant, Mesa, AZ 85205, USA, [email protected]

Andrew T. Roach Department of Counseling and Psychological Services,Georgia State University, Atlanta, GA 30302, USA, [email protected]

Michael C. Rodriguez University of Minnesota, Minneapolis, MN 55455,USA, [email protected]

Michael Russell Boston College, Chestnut Hill, MA, USA; NimbleInnovation Lab, Measured Progress, Newton, MA 02458, USA,[email protected]

M. Christina Schneider CTB/McGraw-Hill, Columbia, SC 29205, USA,[email protected]

Gerald Tindal University of Oregon, Eugene, OR 97403, USA,[email protected]

Susan C. Weigert U.S. Department of Education, Office of SpecialEducation Programs, Alexandria, VA, USA, [email protected]

Naomi Zigmond University of Pittsburgh, Pittsburgh, PA 15260, USA,[email protected]

Page 14: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

About the Editors

Stephen N. Elliott, PhD, is the founding Director of the Learning SciencesInstitute, a trans-university research enterprise at Arizona State University,and is the Mickelson Foundation Professor of Education. He received hisdoctorate at Arizona State University in 1980 and has been on the facultyat several major research universities, including the University of Wisconsin-Madison and Vanderbilt University. At Wisconsin (1987–2004), Steve was aprofessor of educational psychology and served as the Associate Director ofthe Wisconsin Center for Education Research. At Vanderbilt (2004–2010), hewas the Dunn Family Professor of Educational and Psychological Assessmentin the Special Education Department and directed the Learning SciencesInstitute and Dunn Family Scholars Program. His research focuses on scaledevelopment and educational assessment practices. In particular, he has pub-lished articles on (a) the assessment of children’s social skills and academiccompetence, (b) the use of testing accommodations and alternate assessmentmethods for evaluating the academic performance of students with disabili-ties for educational accountability, and (c) students’ opportunities to learn theintended curriculum. Steve’s scholarly and professional contributions havebeen recognized by his colleagues in education and psychology researchas evidenced by being selected as an American Psychological AssociationSenior Scientist in 2009. Steve consults with state assessment leaders on theassessment and instruction of Pre-K-12 students, and serves on ETS’s VisitingResearch Panel, and is the Director of Research and Scientific Practice for theSociety of the Study of School Psychology.

Ryan J. Kettler, PhD, is a Research Assistant Professor in SpecialEducation at Peabody College of Vanderbilt University. He received hisdoctorate in Educational Psychology, with a specialization in SchoolPsychology, from the University of Wisconsin-Madison in 2005. Ryan’sdissertation, Identifying students who need help early: Validation ofthe Brief Academic Competence Evaluation Screening System, wonthe 2006 Outstanding Dissertation award from the Wisconsin SchoolPsychologists Association. In 2007, he was named an Early CareerScholar by the Society for the Study of School Psychology. Prior to join-ing Vanderbilt University, Ryan was an assistant professor at CaliforniaState University, Los Angeles, and completed an APA-accredited intern-ship at Ethan Allen School in Wales, Wisconsin. He has worked onmultiple federally funded grants examining the effectiveness of alter-nate assessments, academic and behavioral screening systems, and testing

xiii

Page 15: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

xiv About the Editors

accommodations. Ryan is the author of peer-reviewed publications and pre-sentations within the broader area of data-based assessment for intervention,representing specific interests in academic and behavioral screening, inclu-sive assessment, reliability and validity issues, and rating scale technology.He currently serves as a consultant to College Board and to the WisconsinCenter for Education Research, providing expertise in the area of inclusiveassessment.

Peter A. Beddow, PhD, received his doctorate in Special Education andEducational Psychology at Vanderbilt University in 2011. His researchfocuses on test accessibility and item writing for assessments of studentachievement. He is the senior author of the Test Accessibility and ModificationInventory (TAMI) and the Accessibility Rating Matrix, a set of tools for evalu-ating the accessibility of test items for learners with a broad range of abilitiesand needs. Based on his work on accessibility theory, Peter was awarded theBonsal Education Research Entrepreneurship Award in 2009 and the MelvynR. Semmel Dissertation Research Award in 2010. Prior to beginning his aca-demic career, Peter taught for 7 years in Los Angeles County, including 5years teaching Special Education for students with emotional and behaviorproblems at Five Acres School, part of a residential treatment facility forchildren who are wards-of-the-court for reasons of abuse and neglect. Peter’sprimary goal is to help children realize their infinite value and achieve theirultimate potential. Peter lives in Nashville, Tennessee.

Alexander Kurz, MEd, is a doctoral student in Special Education andthe Interdisciplinary Program in Educational Psychology at VanderbiltUniversity. He has studied in Germany and the United States, earning degreesin Special Education and Philosophy. Upon moving to the United States, heworked as a special education teacher in Tennessee and California, designedand implemented curricula for reading intervention classes, and participatedin school reform activities through the Bill and Melinda Gates Foundation.Prior to beginning his doctoral studies, Alex worked as behavior analyst forchildren with autism and as an educational consultant to Discovery EducationAssessment. During his graduate work at Vanderbilt, he collaborated withthe Wisconsin Center for Education Research and Discovery EducationAssessment, leading research efforts to examine curricular alignment andits relation to student achievement for students with and without disabili-ties. Alex has coauthored several peer-reviewed publications on alignmentand alternate assessment. His latest scholarly contributions have reexam-ined the concepts of opportunity-to-learn (OTL), alignment, and access tothe general curriculum in the context of curricular frameworks for generaland special education. Alex is the senior author of My Instructional LearningOpportunities Guidance System, a teacher-oriented OTL measurement tool.His current interest in educational technology and innovation is aimed atidentifying and creating pragmatic solutions to the problems of practice.

Page 16: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy
Page 17: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

1Creating Access to Instruction andTests of Achievement: Challengesand Solutions

Stephen N. Elliott, Peter A. Beddow,Alexander Kurz, and Ryan J. Kettler

Access is a central issue in instruction and testingof all students. Most of us can remember a test-ing experience, whether for low or high stakes,where the test questions covered content that wehad not been taught. Many of us also have hadtesting experiences where the test items seemed“tricky” or poorly written, thus making it dif-ficult to show what we had learned about thecontent the test was intended to measure. In bothtypes of testing situations, one’s test performanceis often negatively influenced because access tothe intended knowledge and skills is limited byincomplete instruction or poor test items. In addi-tion, it is likely that these testing situations wereconsidered unfair and engendered negative atti-tudes about tests. These are not desired outcomesof instruction or testing. By improving access toinstruction and tests, the results of testing can bemore meaningful, and potentially positive, for allusers.

For many students with disabilities, testing sit-uations like these continue to occur. Access hasbeen affected, if not denied, as a result of lim-ited opportunities to learn valued knowledge andskills as well as by test items that feature extra-neous content and designs insensitive to personswith various sensory and cognitive disabilities.

S.N. Elliott (�)Learning Sciences Institute, Arizona State University,Tempe, AZ 85287, USAe-mail: [email protected]

Access to education, and in particular the grade-level curriculum, lies at the heart of virtually allfederal legislation for students with disabilities.This access to instruction is a prerequisite andnecessary condition for validity claims about testscores from statewide assessments of academicachievement. Ideally, all students are providedhigh-quality instruction that offers the oppor-tunity to learn the knowledge and skills in astate’s intended curriculum and assessed on thestate’s achievement test. Ideally, eligible studentsalso are provided needed testing accommodationsto reduce the effects of disability-related char-acteristics, and thus facilitate access to testedcontent. Tests used to measure student achieve-ment should be designed to provide all studentsoptimal access to the targeted constructs with-out introducing variance due to extraneous testfeatures.

Unfortunately, this ideal scenario of an unob-structed access pathway to learning and demon-strating the knowledge and skills expressed inthe general curriculum is not verified by recentresearch or our observations of educational prac-tices in numerous states. Too often it seemsthat students have not had opportunities to learnessential knowledge and skills nor have the testsused to evaluate their achievement been highlyaccessible or well aligned with their instruction.When access is limited to high-quality instructionand the tests that measure it, the test results aremisleading at best and the side effects on studentsare potentially demoralizing.

1S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4_1, © Springer Science+Business Media, LLC 2011

Page 18: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

2 S.N. Elliott et al.

This chapter is intended to lay the foundationfor subsequent chapters by researchers with expe-rience in teaching, testing, and advancing equi-table educational accountability. Here we exam-ine access challenges related to instruction andtesting often experienced by students with spe-cial needs, and survey-relevant research relatedto three areas: (a) opportunity to learn (OTL), (b)testing accommodations, and (c) the accessibil-ity of achievement tests. In addition, we advanceactions for overcoming barriers to access for allstudents. The context for this focus on access hasbeen strongly influenced by federal legislation,inclusive assessment research, test score validitytheory, and the desire to improve education forstudents with special needs.

Legislative Context and Key Concepts

The education of students with disabilities isstrongly influenced by federal legislation thatmandates physical and intellectual access to cur-riculum, instruction, and assessment. Key federallegislation on access for students with disabili-ties includes the Rehabilitation Act of 1973 andthe Individuals with Disabilities Education Act(IDEA, 1975) and its subsequent reauthoriza-tions. (See Chapter 3 by Phillips for a comprehen-sive review of legal issues of access for students.)These legal mandates serve as the foundationfor the inclusion of students with disabilities instandards-based reform and test-based account-ability under the No Child Left Behind Act(NCLB) of 2001. The reauthorization of IDEAin 1997 included the access to the general cur-riculum mandates, which were intended to (a)provide all students with disabilities access to achallenging curriculum; (b) yield high expecta-tions for all students with disabilities; and (c)ensure that all students with disabilities wereincluded in test-based accountability mechanismssuch as large-scale testing, progress monitoring,and public performance reporting. The universalaccountability provisions of NCLB continued tounderscore and expand access for students withdisabilities by mandating academic content thatis aligned with the local and statewide grade-level

standards of students without disabilities (Kurz &Elliott, in press).

Current federal legislation requires the appli-cation of universal design principles to the devel-opment of all state and district-wide achieve-ment tests. Universal design (UD), as definedin the Assistive Technology Act (P.L. 105-394,1998), is “a concept or philosophy for design-ing and delivering products and services that areusable by people with the widest possible rangeof functional capabilities, which include prod-ucts and services that are directly usable (with-out requiring assistive technologies) and productsand services that are made usable with assistivetechnologies” (§3(17)). This legislation providesthe rationale for the use of UD principles asfollows:

The use of universal design principles reduces theneed for many specific kinds of assistive tech-nology devices and assistive technology servicesby incorporating accommodations for individualswith disabilities before rather than after produc-tion. The use of universal design principles alsoincreases the likelihood that products (includingservices) will be compatible with existing assis-tive technologies. These principles are increasinglyimportant to enhance access to information tech-nology, telecommunications, transportation, phys-ical structures, and consumer products (PL105-394(§3(10))).

Prior to the 2007 amendments to NCLB (U.S.Department of Education, 2007a, 2007b), accessbarriers in testing were addressed primarily bythe use of testing accommodations, which aretypically defined as changes in the administra-tion procedures of a test to address the spe-cial needs of individual test-takers (Hollenbeck,2002). More recently, test developers have begunexamining tests and items with the goal of mod-ifying them to reduce the influence of intrin-sic access barriers on subsequent test scores(e.g., Kettler, Elliott, & Beddow, 2009). Thisprocess has led to the development of whatBeddow (2010) has called accessibility theory.Accessibility is defined as “the degree to whicha test and its constituent item set permit the test-taker to demonstrate his or her knowledge ofthe target construct of the test” (Beddow, 2010,see Chapter 9). Accessibility is conceptualized as

Page 19: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

1 Creating Access to Instruction and Tests of Achievement: Challenges and Solutions 3

the sum of interactions between features of thetest and individual test-taker characteristics.

Although the term accessibility is not used inthe definition of UD, the principles as appliedto assessment technology clearly are intended toaddress issues of access. To the extent a test con-tains access barriers for a portion of the testedpopulation, the validity of inferences made fromtest scores is affected. The validity of subsequentnorms and comparisons across the population isalso likely affected. In summary, the validity oftest score inferences is dependent on the acces-sibility of the test for the entirety of the targettest-taker population.

One of the final amendments to NCLB (U.S.Department of Education, 2007a, 2007b) indi-cates that a small group of students with dis-abilities is allowed to show proficiency throughan alternate assessment based on modified aca-demic achievement standards (AA-MAS). Thesestudents can take a version of the regular assess-ment test with modifications and may constituteup to 2% of all who are reported proficientwithin a school. According to a recent report,14 states are developing AA-MASs (Lazarus,Hodgson, & Thurlow, 2010). Modifications aredefined as changes to a test’s content or itemformat. The regulations strongly emphasize thatalthough modifications may make a test easier,out-of-level (i.e., below grade level) testing is notacceptable, leaving developers and users of theseAA-MASs to determine at which point a test is nolonger within the intended level. Modifications tolarge-scale achievement tests, like testing accom-modations, are intended to facilitate access tothe assessment for students with special needs,so that their scores can be meaningfully com-pared with the scores of students who take thestandard test. If this can be accomplished, betterassessment and accountability for students withdisabilities will be the result (Kettleret al., 2009).

The legislative push to develop AA-MASs forstudents identified with disabilities has inspiredthe examination of accessibility and its relationto test score validity. The resulting theory andevidence supporting its importance has indicatedaccessibility affects students’ test performanceacross the range of the test-taker population.

Indeed, the differential boost observed acrossthe testing accommodations literature is similarlyevident in the results of research on item modi-fications (Kettler et al., in press; Sireci, Scarpati,& Li, 2005). Much more will be reported aboutthe effects of item modifications in Chapter 9 byBeddow.

The terms access, accommodations, and mod-ifications all have been used for decades whendiscussing educational testing and the validityof resulting test scores. These terms representkey concepts in the world of testing and fed-eral assessment and accountability policies. Thus,these terms demand attention to ensure they areunderstood in the context of emerging issuesaround tests and testing for students with disabil-ities.

For instruction, access is the opportunity fora student to learn the content of the intendedand assessed curricula. In the current educa-tion reform framework, this means students havemeaningful opportunities to acquire the knowl-edge and skills featured in the content standardsof their state and ultimately assessed on thestate’s end-of-year achievement test. Teachers areencouraged to teach to the standards, not the test,and create engaging instruction for all students toincrease the chances that learning occurs.

For educational testing, access is the opportu-nity for test-takers to demonstrate proficiency onthe target construct of a test (e.g., language arts,mathematics, or science) or item (e.g., synonyms,homonyms, and homographs). In essence, com-plete access is manifest when a test-taker is fullyable to show the degree to which he or she knowsthe tested content. Access, therefore, must beunderstood as an interaction between individualtest-taker characteristics and features of the testitself.

The purpose of both testing accommoda-tions and modifications is to increase individ-uals’ access to tests. The definitions of theseaccess-enabling strategies, however, have beenthe subject of debate, in part because of theirinconsistent use in the Standards for Educationaland Psychological Testing (Standards; AmericanEducational Research Association, AmericanPsychological Association, & National Council

Page 20: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

4 S.N. Elliott et al.

for Measurement in Education, 1999) and somestates’ testing guidelines. An examination of theStandards for Testing finds the term modificationcited nearly a dozen times in the index. A num-ber of the standards refer to a modification as anaccommodation. For example, in the section enti-tled “Testing Individuals with Disabilities” in theStandards for Testing, it is stated:

The terms accommodation and modification havevarying connotations in different subfields. Hereaccommodation is used as the general term forany action taken in response to a determinationthat an individual’s disability requires a departurefrom established testing protocol. Depending oncircumstances, such accommodation may includemodification of test administration processes ormodification of test content. No connotation thatmodification implies a change in construct(s) beingmeasured is intended. (AERA et al., 1999, p. 101)

The implementation of testing accommodationsfor students with disabilities is a policy endorsedin all states; some states even allow testingaccommodations for all students if they havebeen provided the same accommodations duringinstruction. Accommodations are widely recog-nized in state testing guidelines as changes tothe setting, scheduling, presentation format, orresponse format of an assessment (Kettler &Elliott, 2010). The modification of test content,however, is inconsistent with the definition ofa testing accommodation in the majority ofstate testing accommodation guidelines (Lazarus,Thurlow, Lail, Eisenbraun, & Kato, 2006).Accommodations are made to increase the valid-ity of inferences that can be made from astudent’s scores, so that those scores can bemeaningfully compared to scores of students forwhom testing accommodations are not needed.

The modification of test content is inconsistentwith the definition of a testing accommodationin the majority of state testing accommodationguidelines (Lazarus et al., 2006). The AA-MASpolicy, however, extends the notion of access andthe spirit of individualized accommodations tochanges made to item content. Such changes aredefined as modifications by most test develop-ers and administrators. As we have previouslyobserved, when item and test alterations are made(e.g., by changing the layout, reducing the length

of the reading passage, adding graphic support),it is not always clear without test results whetherthe changes affect only access to the test or, infact, also affect the construct being measured andthe subsequent inferences that are drawn fromscores (Kettler et al., 2009). If the content of anitem or test has been changed and evidence isnot available to show that scores remain compa-rable to the original construct to be measured, ithas been customary to consider the alteration amodification (Phillips & Camara, 2006; Koretz &Hamilton, 2006). To ensure the modifications areacceptable under AA-MAS policy, research isneeded to confirm the modified test measures thesame construct(s) as the original test. Indeed, thepolicy assumes AA-MASs for eligible studentsmeasure the same grade-level content standardsand performance objectives (constructs) as gen-eral assessments for students without disabilities.It should be noted that some modifications mayresult in an AA-MAS that yields scores that arenot comparable to scores obtained from the orig-inal test. These modifications could still be per-missible for an AA-MAS, assuming the contentis commensurate with the intended grade-level ofthe test (Kettler et al., 2009).

Given the approach we recommend for devel-oping an AA-MAS, we defined the term mod-ification to refer “to a process by which thetest developer starts with a pool of existing testitems with known psychometric properties, andmakes changes to the items, creating a new testwith enhanced accessibility for the target popu-lation. When analyses indicate inferences madefrom the resulting test scores are valid indica-tors of grade-level achievement, the modifica-tions are considered appropriate. Conversely, ifanalytic evidence suggests the inferences madefrom resulting scores are invalid indicators ofgrade-level achievement, the modifications areinappropriate” (Kettler et al., 2009, p. 531).Thus, like individualized testing accommoda-tions, modifications must be studied to deter-mine their appropriateness. Unlike accommo-dations, modifications are intended to affordaccess to an entire group of students, result-ing in better measurement of their achievedknowledge.

Page 21: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

1 Creating Access to Instruction and Tests of Achievement: Challenges and Solutions 5

Providing Access to Overcome Barriers

We believe three critical points exist for improv-ing access for students within a test-basedaccountability system. These points occur duringinstruction, arrangement of the testing situation,and the design of tests and include providingstudents meaningful opportunities to learn theintended curriculum and to show what they havelearned on highly accessible test with appropri-ate accommodations. The validity of some testscore inferences depends on students’ opportu-nity to learn the intended curriculum as well ason students’ access to tests that are well alignedwith the intended curriculum. When access atthese points is not optimal, it is difficult to drawinferences about students’ learning in school.Figure 1.1 illustrates how the failure to providestudents access to curriculum and tests createsbarriers to success.

The state of research differs for each accessbarrier. OTL features a substantial body oftheory-driven research that supports its impor-tance for student achievement, especially in thecontext of test-based accountability (Airasian& Madaus, 1983; Hermann, Klein, & Abedi,2000; McDonnell, 1995; Porter, 1995). However,challenges remain regarding the conceptualiza-tion and measurement of OTL using practicaltools (Kurz, Elliott, Wehby, & Smithson, 2010;

Pullin & Haertel, 2008; Roach, Niebling, &Kurz, 2008). Testing accommodations researchhas grown significantly over the past decadewith reports focusing on their appropriateness(Hollenbeck, 2002; Phillips, 1994), assignmentand delivery (Elliott, 2007; Fuchs & Fuchs, 2001;Ketterlin-Geller, Alonzo, Braun-Monegan, &Tindal, 2007), and effects on student achievement(Elliott, Kratochwill, & McKevitt, 2001; Elliott& Marquart, 2004; Sireci, Scarpati, & Li, 2005).In short, strategies for increasing access via test-ing accommodations and its typical effects (ifdelivered with integrity) are available. Researchconcerned with designing accessible tests is rel-atively new and largely based on UD princi-ples (Ketterlin-Geller, 2005; Kettler et al., 2009;Thurlow et al., 2009). More recently, researchershave applied principles of UD, cognitive load the-ory, and item development research to modifyitems and tests in an effort to increase access forstudents with disabilities (Elliott, Kurz, Beddow,& Frey, 2009; Kettler et al., 2009). In the remain-der of this chapter, we examine critical researchrelated to each access barrier and discuss avail-able strategies for increasing access.

Access via Opportunity to Learn

Although the primacy of OTL in the contextof accessibility may be apparent – after all,

Fig. 1.1 Access barriers to the general curriculum and tests for students with disabilities

Page 22: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

6 S.N. Elliott et al.

one fundamental assumption of schooling is thatstudent learning occurs as a function of for-mal instruction – OTL has not yet captured afoothold in test-based educational accountability.The conceptual and methodological challengessurrounding OTL are at least partly responsiblefor its slow ascendancy in the realm of curricu-lum, instruction, and assessment (Porter, 1995).In its most general definition, OTL refers to theopportunities that schools afford their students tolearn what is expected of them (Herman et al.,2000). This definition highlights two importantconceptual issues: the “who” and “what” of OTL.With regard to the latter issue, standards-basedreforms influenced by minimum competencytesting, performance assessment, and large-scaleassessment for educational accountability led tothe establishment of performance expectationsfor students via subject-specific content standardsacross the grade spectrum. The content of thesestandards is typically referred to as the intendedcurriculum (Porter, 2006). Consequently, the def-inition by Herman et al. can be revised to OTLas referring to students’ opportunity to learn theintended curriculum (Kurz & Elliott, in press).

Numerous factors can affect OTL at theschool (e.g., class size, instructional resources),teacher (e.g., instructional content, subject mat-ter knowledge, instructional time), and studentlevel (e.g., engagement, extracurricular activi-ties), which has contributed to the proliferationof the OTL acronym under disparate conceptualdefinitions including OTL as teachers’ opportu-nity to learn subject matter knowledge (Schmidtet al., 2008). These various factors notwithstand-ing, the most proximal variable to the instruc-tional lives of students and their opportunityto learn the intended curriculum is the teach-ers’ instruction (e.g., Kurz & Elliott, in press;Rowan, Camburn, & Correnti, 2004). In the caseof special education teachers, there even existsa legal precedence to provide students with dis-abilities access to intended curriculum underIDEA’s access to the general curriculum man-dates (Cushing, Clark, Carter, & Kennedy, 2005;Roach et al., 2008). Conceptual and methodolog-ical challenges regarding OTL, however, persistin the decomposition of classroom instruction:

What aspects of classroom instruction are signif-icant contributors to student achievement? Howcan these aspects be measured?

Over the last four decades, three broad strandsof OTL research have emerged related to teacherinstruction focused primarily on the contentof instruction (e.g., Husén, 1967; Rowan &Correnti, 2009), the time on instruction (e.g.,Carroll, 1963; Vannest & Hagan-Burke, 2009),and the quality of instruction (e.g., Brophy &Good, 1986; Pianta, Belsky, Houts, Morrison, &NICHD, 2009). A merger of the variousOTL conceptualizations was first articulated byAnderson (1986) and explicitly conceptualizedby Stevens (1993), although some researchershave combined some of these OTL variablesprior to Stevens (e.g., Cooley & Leinhardt,1980). Researchers have further provided empir-ical support for the relation between each ofthose variables and student achievement (e.g.,Gamoran, Porter, Smithson, & White, 1997;Rowan, 1998; Thurlow, Ysseldyke, Graden, &Algozzine, 1984). However, few research studieshave examined all three aspects of OTL to deter-mine their combined and unique contributions(e.g., Wang, 1998).

The measurement of OTL related to the con-tent of instruction was initially motivated byconcerns about the validity of test score infer-ences (e.g., Husen, 1967; Airasian & Madaus,1983; Haertel & Calfee, 1983). Two popularapproaches were item-based OTL measures andtaxonomic OTL measures. To determine students’opportunity to learn tested content, researchersadopted item-based OTL measures that requiredteachers to indicate the extent to which theycovered the content measured by different testitems using some type of rating scale (e.g.,Comber & Keeves, 1973). To determine students’opportunity to learn important content objec-tives, researchers developed content taxonomiesthat could be used to judge whether differenttests covered the same objectives delineated inthe taxonomy (e.g., Porter et al., 1978). Thelatter approach permitted content comparisonsacross different tests and other media such astextbooks. The flexibility of this approach con-tributed to its popularity. Porter and colleagues,

Page 23: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

1 Creating Access to Instruction and Tests of Achievement: Challenges and Solutions 7

for example, continued their work on taxonomicOTL measures and eventually developed theSurveys of the Enacted Curriculum (SEC), one oftwo alignment methodologies presently used bystates to document the content overlap betweenassessments and state standards (CCSSO, 2009).Besides teacher self-report, other measurementtechniques used to measure the content of instruc-tion include direct observation and review ofpermanent products (see Porter, Kirst, Osthoff,Smithson, & Schneider, 1993).

The concept of OTL as time on instruction wasintroduced as part of John Carroll’s (1963) modelof school learning. He defined OTL as the amountof time allocated for learning at the school orprogram level. Subsequent research examinedthe relation between allocated time and studentachievement, continuously refining the “quantityof instruction” from allocated time (e.g., 45-minlessons) to time actually engaged in learning(e.g., Harnischfeger & Wiley, 1976; Denham &Lieberman, 1980; Gettinger & Seibert, 2002).The measurement of time on instruction initiallyinvolved the straightforward summing of allo-cated class time across the school year, while latermeasures relied on teacher report or direct obser-vation (e.g., Ysseldyke et al., 1984; O’Sullivan,Ysseldyke, Christenson, & Thurlow, 1990). Theamount of time dedicated to instruction hasreceived substantial empirical support in predict-ing student achievement (e.g., Berliner, 1979;Brophy & Good, 1986; Fisher & Berliner, 1985;Scheerens & Bosker, 1997; Walberg, 1988).Vannest and Parker (2009) examined time usagerelated to instruction and concluded that time oninstruction represents the single best documentedpredictor of student achievement across schools,classes, student abilities, grade levels, and subjectareas.

School and classroom indicators of OTLrelated to the quality of instruction were firstintroduced as part of several models of schoollearning (e.g., Bloom, 1976; Carroll, 1963;Gagné, 1977; Harnischfeger & Wiley, 1976).Walberg (1986) reviewed 91 studies that exam-ined the effect of putative quality indica-tors on student achievement such as frequencyof praise statements, frequency of corrective

feedback, availability of instructional resources,and instructional grouping and reported the high-est mean effect sizes for praise and correc-tive feedback with 1.17 and 0.97, respectively.Brophy (1986) in a review of his meta-analysisreported active teaching, effective classroommanagement, and teacher expectations relatedto the content of instruction as key qualityvariables of OTL with strong empirical sup-port (e.g., Brophy & Everston, 1976; Coker,Medley, & Soar, 1980; Fisher et al., 1980). Morerecently, OTL research on the quality of instruc-tion also considered teacher expectations forthe enacted curriculum (i.e., cognitive demands)and instructional resources such as access totextbooks, calculators, and computers (e.g.,Boscardin, Aguirre-Munoz, Chinen, Leon, &Shin, 2004; Herman & Klein, 1997; Porter, 1991,1993; Wang, 1998). Teacher self-report and directobservation by trained observers represent thetypical measurement techniques for determiningquality aspects of instruction such as expecta-tions for student learning, instructional practices,and/or instructional resources (e.g., Rowan &Correnti, 2009; Pianta & Hamre, 2009).

In summary, the concept of OTL is concernedwith students’ opportunity to learn the intendedcurriculum. Students access the intended curricu-lum indirectly via the teacher’s instruction. Inthe context of test-based educational accountabil-ity, access to the intended curriculum – OTL –is critical, because end-of-year testing pro-grams sample across the content domains ofthe intended curriculum for purposes of mea-suring student achievement and the extent towhich schools and teachers have contributed toachievement. Although the various strands ofOTL research have converged on the teacher’senacted curriculum, they did so by focusing ondifferent aspects of a teacher’s instruction: itstime, content, and quality. Empirical data sup-port each OTL variable as a correlate of studentachievement, yet few researchers have investi-gated all three OTL variables – time, content,and quality – as a set to determine their uniqueand combined contributions. This gap in theresearcher literature is unfortunate, because nei-ther aspect of OTL can occur in isolation for

Page 24: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

8 S.N. Elliott et al.

all practical purposes. That is, instructional con-tent enacted by a teacher always has to unfoldalong (at least) two additional dimensions: timeand quality. For example, a teacher’s instructionis not adequately captured by referring solely tothe content of instruction such as solving alge-braic equations. In actuality, a teacher decidesto teach students the application of solving analgebraic equation to a context outside of mathe-matics for 35 min through guided practice. Thedifferent sets of italicized words refer to vari-ous aspects of OTL – time, content, and qualityof instruction – that have to occur in conjunc-tion with one another whenever instruction isenacted by a teacher. Two important challengesthus have to be met before OTL can begin toplay an integral part in curriculum, instruction,and assessment. First, the measurement of OTLhas to be refined to allow for the efficient andreliable data collection of the enacted curricu-lum along time, content, and quality aspects ofinstruction. Second, the measurement of OTL hasto move beyond measurement for measurement’ssake. Besides measuring and verifying compre-hensive OTL as a significant contributor to stu-dent achievement, we envision a shift in OTLresearch and measurement that provides practi-tioners with tools that yield additional formativebenefits that can enhance instruction and increaseaccess for all students. Kurz addresses this shiftin detail in Chapter 6.

Access via Testing Accommodations

Testing accommodations historically have beenused with the aim of reducing construct-irrelevantvariance due to a variety of access skill deficitsexhibited by students with disabilities. Testingaccommodations are individualized depending oneach student’s access needs. Typically, accom-modations involve changes in the presentationformat of a test (e.g., oral delivery, paraphrasing,Braille, sign language, encouragement, permit-ting the use of manipulatives), the timing orscheduling of a test (e.g., extended time, deliver-ing the test across multiple days), the recordingor response format (e.g., permitting test-takersto respond in the test booklet instead of on the

answer sheet, transcription), or the assessmentenvironment (e.g., separate room, eliminationof distractions) (Elliott, Kratochwill, Gilbertson-Schulte, 1999; Kettler & Elliott, 2010).

Appropriate testing accommodations, whileapplied individually based on specific studentneeds, should not interfere with the test’s mea-surement of the target construct and should per-mit the same validity of inferences from theresults of the test as those from students notreceiving accommodations (Hollenbeck, Rozek-Tedesco, & Finzel, 2000). The application ofaccommodations should also differentially affecttest results of students for whom accommoda-tions are intended, compared to those for whomtesting accommodations are not needed. Mostaccommodation researchers subscribe to the con-cept of differential boost (Fuchs & Fuchs, 2001;Fuchs et al., 2000), which is used to identifyvalid accommodations as those that “will leadto greater score improvements for students withdisabilities than for students without disabilities”(Sireci, Scarpati, & Li, 2005, p. 481). Sireciet al. differentiated the concept of differentialboost from the traditional definition of the inter-action hypothesis, which states that “(a) when testaccommodations are given to the SWD [studentswith disabilities] who need them, their test scoreswill improve, related to the scores they wouldattain when taking the test under standard con-ditions; and (b) students without disabilities willnot exhibit higher scores when taking the testwith those accommodations” (p. 458).

A review of the research literature revealsa number of studies examining the differen-tial boost of testing accommodations and thevalidity of score interpretations (e.g., Elliott,McKevitt, Kettler, 2002; Feldman, Kim, &Elliott, 2009; Fuchs, Fuchs, & Cappizzi, 2005;Kettler et al., 2005; Kosciolek & Ysseldyke,2000; Lang, Elliott, Bolt, & Kratochwill, 2008;McKevitt & Elliott, 2003; Pitoniak & Royer,2001; Sireci et al., 2005). The National ResearchCouncil’s commissioned review of research ontesting accommodations by Sireci et al. (2005)is one of the most comprehensive reviews ofthe evidence for effects on test scores of testingaccommodations. Specifically, Sireci et al. (2005)

Page 25: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

1 Creating Access to Instruction and Tests of Achievement: Challenges and Solutions 9

reviewed 28 experimental, quasi-experimental,and non-experimental empirical studies on theeffects of testing accommodations over nearlytwo decades. They found the most commonaccommodations were reading support (39%) andextra time (24%). Aggregate results of studies onreading support (usually in the form of verbatimpresentation of directions and test items) weremixed. The interaction hypothesis was upheld forfive of the six studies examining the effect ofthe accommodation on scores from mathematicstests. For two studies on reading and two stud-ies across multiple content areas, the interactionhypothesis was not upheld. The authors con-cluded that reading support, while likely increas-ing the validity of inferences for mathematicstests, may not have the desired effect when usedwith tests of other content domains. Results offive out of eight studies on extended time indi-cated students identified with disabilities exhibithigher score gains than students not identifiedwith disabilities. The results of one study rejectedthe interaction hypotheses, and the results of twoother studies that did not indicate extra timeresulted in gains for either group. Based on thesefindings, Sireci and colleagues concluded thatwhile the interaction hypothesis was not strictlyupheld, “evidence . . . is tilted in that direction”(p. 469). Sireci and colleagues also reviewedseveral studies on the effects of multiple accom-modations (i.e., accommodations packages). Thefindings of the four studies that used experimentaldesigns supported the interaction hypothesis.

Reported effect sizes of most testing accom-modations studies appear small, but there is evi-dence they are practically meaningful. In a surveyof the accommodations literature, Kettler andElliott (2010) reported in some studies, effectsizes from accommodations for students withIEPs (Individualized Education Program) weretwice as large as those for students without IEPs.In one study, effect sizes ranged from 0.13 forstudents without IEPs to 0.42 students with IEPs.While conventional interpretations of effect sizes(e.g., Cohen, 1992) would suggest these effectsare small, a meta-analysis conducted by Bloom,Hill, Black, and Lipsey (2008) provides evidencethat they are meaningful within the context of

changes in achievement test scores. Bloom et al.found that mean effect sizes across six standard-ized achievement tests from the spring semesterof one school year to the next ranged from 0.06 to1.52 in reading and 0.01 to 1.14 in mathematics,with larger effect sizes consistently observed forlower grades and steadily decreasing until grade12. Further, the data suggest a steep increase ineffect sizes from grade K until grade 5, afterwhich no effect sizes above 0.41 are observedfor either reading or mathematics through grade12. This indicates that effect sizes of 0.40 orhigher for students with disabilities may denotethat the impact of accommodations is meaning-ful. Indeed, the differential boost reported byKettler and Elliott provides evidence of an inter-action that may heretofore have been underes-timated. As applied to the accommodations lit-erature, these results suggest for some students,appropriate accommodations may indeed reducebarriers and yield more accurate measures ofachievement.

Although testing accommodations can behelpful for many students with disabilities, thereare a number of challenges associated with imple-menting them. First, many students are averseto testing accommodations for different reasonsincluding the fact that the accommodations oftendraw attention to student challenges (Feldmanet al., 2009). Additionally, there are logisti-cal challenges associated with their appropri-ate implementation including time, personnel,and cost, which often result in poor integrity.Another challenge is the difficulty in identifyingwhich students should receive specific accommo-dations, and which combination of accommoda-tions may be appropriate. Further, little is knownabout the extent to which accommodations inter-act with each other differentially across studentsor packages, notwithstanding the breadth of theresearch based on their use. Finally, each timeaccommodations are used, general and compar-ative validity are threatened. Not only is an addi-tional variable introduced into the test event witheach accommodation, but it is also introduced forsome students and not for other students who mayneed some of the same accommodations but arenot eligible for them (Kettler & Elliott, 2010).

Page 26: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

10 S.N. Elliott et al.

Access via Well-Designed Test Items

Systematically attending to the accessibility oftests and their constituent items can reduce theneed for individualized accommodations, therebyreducing potential threats to validity across thetest-taker population. Results of research onitem accessibility suggest many achievement testitems can be improved to reduce access barri-ers for more test-takers (see Elliott et al., 2010;Kettler et al., in press). Cognitive interviews andtest-taker survey data suggest when presentedwith two similar items, students respond favor-ably to the item developed with a focus onaccessibility (Roach et al., 2010).

Universal design (UD; Mace, 1991), origi-nally conceptualized as a set of guiding principlesfor ensuring buildings permit equal access forall individuals, provides a useful framework forunderstanding the variety of perspectives thatmust be taken into consideration when under-taking the effort to ensure assessment instru-ments are equally accessible across the range ofthe test-taker population. The theory does not,however, contain guidelines that apply specif-ically to test and item development. In 2002,Thompson, Johnstone, and Thurlow applied theprinciples of UD to large-scale assessment anddistilled them into seven recommendations, asfollows: (a) inclusive assessment population; (b)precisely defined constructs; (c) accessible, non-biased items; (d) amendable to accommodations;(e) simple, clear, and intuitive instructions andprocedures; (f) maximum readability and com-prehensibility; and (g) maximum legibility. Thisreport was a step forward in that it extended theuse of UD beyond its architectural origins, but itcontained few specific guidelines that informedthe development of accessible tests or items.

Accessibility theory (Beddow, 2010) opera-tionalizes test accessibility as the sum of inter-actions between characteristics of the individ-ual test-taker and features of the test and itsconstituent item set (see also Ketterlin-Geller,2008). Accessibility, therefore, is an attribute ofthe test event. Optimally accessible tests gen-erate test events that minimize the influence of

construct-irrelevant interactions on subsequenttest scores. While in theory, this objective isshared with testing accommodations, it must benoted that test items typically are delivered uni-versally across the test-taker population. As such,item writers should focus on maximizing the uni-versality of all test items, focusing particularly onitem features that may reduce their accessibilityfor some test-takers.

Locating accessibility in the test event asopposed to defining it as an attribute of the testitself is an important distinction insofar as a testor test item may contain features that pose accessbarriers for one test-taker while permitting unfet-tered access to the target construct for another.Accessible items, therefore, must contain little orno content that compels the test-taker to demon-strate skills that are irrelevant to the constructintended for measurement. This is of particu-lar importance when skills that are required inaddition to the target construct (referred to as pre-requisite skills) are challenging for the test-taker.Of these prerequisite skills, the clearest exampleis found in the vast number of mathematics itemsin which the target problem is situated in text. Fora test-taker with low reading ability, complex textin a mathematics test likely represents an accessbarrier that may preclude him or her from fullydemonstrating knowledge, skills, and/or abilitiesin math.

The inclusion of extraneous and/or construct-irrelevant demands, therefore, must be addressedat both the test and item levels to ensure thatthe resulting scores represent, to the extent pos-sible, a measure of the target construct that is freefrom the influence of ancillary interactions due toaccess barriers. To this end, cognitive load theory(CLT; Chandler & Sweller, 1991), a model forunderstanding the effects of various features ofinstructional task demands on learning outcomes,offers a useful lens through which to evaluate theaccessibility of tests and items. Based on Miller’s(1956) notion of the limitations of working mem-ory, CLT disaggregates task demands into threetypes of cognitive demand. For optimal learningefficiency, proponents of the theory recommenddesigners of instructional materials to eliminate

Page 27: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

1 Creating Access to Instruction and Tests of Achievement: Challenges and Solutions 11

extraneous load while maximizing intrinsic load.This ensures the learner is permitted to allocatehis or her cognitive resources to the primaryobjectives of the task. Beddow, Kurz, and Freydetail the application of CLT to the design ofaccessible test items in Chapter 15.

In general, there are two primary options forensuring tests are accessible for all test-takers.The first is to use existing research and the-ory to guide the process of developing tests anditems, with the goal of maximizing accessibilityfor all test-takers. The second option is to eval-uate existing tests and items with the purpose ofidentifying potential access barriers. Evaluationdata are then used to guide test and item mod-ifications based on specific access concerns toenhance their accessibility for more test-takers.

The goals of modification are twofold. First,modifications should preserve the target constructof the original test or test item. This includespreserving essential content, depth of knowledge,grade-levelness, and intended knowledge, skills,and abilities targeted for measurement. Second,modifications should isolate the target constructby reducing, to the degree possible, the influenceof ancillary interactions on the test score. Thesemodifications may include eliminating extrane-ous material, reorganizing item graphics and pagelayouts, and highlighting essential content.

Kettler et al. (2009) described a paradigmto guide the process of developing accessibletests and items that yield scores from whichinferences are equally valid for all test-takers.The model comprises five stages. The first stageconsists of a systematic accessibility review ofan existing pool of test items. To facilitate thecomprehensive and systematic analysis of testsand items based on the principles of accessibil-ity theory, the authors developed a set of toolscalled the Test Accessibility and ModificationInventory (TAMI; Beddow, Kettler, & Elliott,2008) and TAMI Accessibility Rating Matrix(ARM; Beddow, Elliott, & Kettler, 2009). Thesecond stage involves collaborative modifica-tion, including content area experts, assessmentspecialists, educators, and/or item writers withexpertise in issues of accessibility. The third

stage involves documenting all changes to thetest and items. In the fourth stage, the newitems are pilot tested with a small sample ofrespondents to gather information on the appro-priateness and feasibility of the modifications.The final stage involves a large field test toevaluate item characteristics such as difficulty,discrimination, and reliability. Beddow and col-leagues provide a detailed examination of theoryand tools for designing highly accessible items inChapter 9.

Actions and Innovations Neededto Keep Moving Forward

To ensure and expand access to instruction andthe tests designed to measure what all studentshave learned from instruction, we believe thereare a number of actions and innovations needed.The theoretical and empirical foundations forthese actions and innovations for improvingaccess have been introduced in this chapter andare thoroughly examined throughout the rest ofthis book.

Key actions and innovations needed toimprove access to instruction for all studentsinclude the following:• Providing teachers more support to implement

curricula that are highly aligned with contentstandards and testing blueprints, and sub-sequently providing these teachers tools forgetting periodic feedback about their effortsto increase students’ opportunities to learnthe intended content. To provide the supportteachers need is certain to involve more andinnovative professional development activitiesthat focus on the intended curriculum. Withregard to feedback on their efforts to increaseopportunities to learn, teachers will need newtools for measuring and monitoring instruc-tional content, time, and quality.

• Providing teachers more information aboutwhat students are learning as part of instruc-tion. To accomplish this action, instructionallysensitive, standards aligned, and accessibleinterim measures of achievement are needed.

Page 28: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

12 S.N. Elliott et al.

These measures will be most effective if deliv-ered online and scored immediately so thatfeedback about the relation between what hasbeen taught and what has been learned istimely and salient.Key actions and innovations needed to

improve access to achievement tests for all stu-dents include the following:• The provision of testing accommodations for

students who demonstrate a need and doc-umentation that such accommodations wereimplemented with integrity. To accomplish thisaction, it is likely that teachers will need sup-port from colleagues during testing events andtools for documenting what accommodationsare actually delivered will be needed. Anotherpossible innovation needed to accomplish thisaction is to use computer-delivered tests withsoftware that makes the accommodations pos-sible and also records their implementation.

• The development of test accessibility reviewpanels, similar to fairness review panels, toincrease the likelihood that all test items meethigh standards of accessibility. To accomplishthis action, state assessment leaders simplyneed to recognize that the accessibility ofmany of the items they currently use can beenhanced. Innovative tools designed to mea-sure test item accessibility and provide diag-nostic information for both guiding the devel-opment and evaluation of items are needed.

• The support of ongoing research and evalu-ation of test results to ensure they are validindicators of what students have been taughtand have learned. To accomplish this action,state assessment leaders and staff need to con-duct or hire others to conduct periodic validitystudies that consider the effect of OTL, testingaccommodations, and test item accessibilityon the test scores of students with varyinglevels of ability.We believe all these actions and related inno-

vations can be accomplished on a large scale. Wehave the knowledge and tools today to move for-ward; we now need to share this information andhelp educators create the infrastructure for usingit to ensure more access for all students to theintended and assessed curriculum.

ConclusionsDrawing inferences about whether the stu-dents in a school are successfully learning thelessons indicated in content standards, basedon the instruction that teachers are providing,is a complicated process. Any accountabilitysystem that is designed to assist in makingsuch inferences must take into considerationthe behaviors and characteristics of more thanone group. The system must yield informa-tion about the instructional practices of teach-ers, most practically obtained using teacher-rating measures, as well as information aboutthe knowledge and skills that students havelearned. This latter piece of information hashistorically been obtained using achievementtests. These two pieces of information – OTLand achievement – must be aligned to thesame grade level content standards and shouldbe used together in an organized framework.One such framework would involve draw-ing different conclusions based on whetherOTL at a school is higher or lower than onetarget threshold, along with whether studentachievement is higher or lower than a sec-ond target threshold. Table 1.1 depicts such aframework.

When OTL is high but achievement islow (A), the school should thoroughly exam-ine its tests and process for assigning testingaccommodations, to insure that the assess-ment is accessible. It may also be importantto examine the system for identifying andhelping students with special needs, someof whom may be unidentified, and withoutproper intervention may be achieving at avery low level. A school is only providingoptimal access to grade-level content stan-dards when both OTL and achievement arehigh (B). When both OTL and achievement

Table 1.1 Proposed interpretive framework for studentachievement by instructional quality

Student achievement

Instructionalquality

Low percentageproficient

High percentageproficient

High mean OTL A B

Low mean OTL C D

Page 29: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

1 Creating Access to Instruction and Tests of Achievement: Challenges and Solutions 13

are low (C), professional development may beneeded in the delivery of high-quality instruc-tion related to content standards. Examiningtests and accommodations might also be agood step, but until high-quality instruction isestablished, it is difficult to know whether thescores are low due to lack of OTL or due tobarriers to assessment. Lastly, when OTL islow but achievement is high (D), professionaldevelopment in instruction of content stan-dards is necessary. It may also be appropriatein this case to raise standards for achievement,in line with what could be obtained if OTL forstudents were higher.

Opportunity to learn and accessibleachievement tests are both necessary toproviding optimal access for students tograde-level content standards. Testing accom-modations bridge these two constructs asindividualized methods of improving access,which are best identified in part by transfer-ring instructional accommodations to the testevent. Although work remains in refining themeasurement both of OTL and achievement,including the development of measures thatare practical, formative, and accessible, thestate of the field is such that large-scale assess-ment systems incorporating both constructscould become the norm in the near future.

References

Airasian, P. W., & Madaus, G. F. (1983). Linking testingand instruction: Policy issues. Journal of EducationalMeasurement, 20(2), 103–118.

American Educational Research Association, AmericanPsychological Association, & National Council onMeasurement in Education. (1999). Standards for edu-cational and psychological testing. Washington, DC:Author.

Anderson, L. W. (1986). Opportunity to learn. In T. Husén& T. Postlethwaite (Eds.), International encyclope-dia of education: Research and studies. Oxford, UK:Pergamon.

Assistive Technology Act, 29 U.S.C. §3001 et seq. (1998).Beddow, P. A. (2010). Beyond universal design:

Accessibility theory to advance testing for all students.In M. Russell (Ed.), Assessing students in the margins:Challenges, strategies, and techniques. Charlotte, NC:Information Age Publishing.

Beddow, P. A., Elliott, S. N., & Kettler, R. J. (2009). TAMIaccessibility rating matrix. Nashville, TN: VanderbiltUniversity.

Beddow, P. A., Kettler, R. J., & Elliott, S. N. (2008). Testaccessibility and modification inventory. Nashville,TN: Vanderbilt University.

Berliner, D. C. (1979). Tempus educare. In P. L. Peterson& H. J. Walberg (Eds.), Research on teaching:Concepts, findings, and implications (pp. 120–135).Berkeley, CA: McCutchan.

Bloom, B. S. (1976). Human characteristics and schoollearning. New York: McGraw-Hill.

Bloom, H. S., Hill, C. J., Black, A. R., & Lipsey,M. W. (2008). Performance trajectories and perfor-mance gaps as achievement effect-size benchmarksfor educational interventions. Journal of Research onEducational Effectiveness, 1, 289–328.

Boscardin, C. K., Aguirre-Muñoz, Z., Chinen, M.,Leon, S., & Shin, H. S. (2004). Consequencesand validity of performance assessment for Englishlearners: Assessing opportunity to learn (OTL) ingrade 6 language arts (CSE Report No. 635). LosAngeles: University of California, National Centerfor Research on Evaluation, Standards, and StudentTesting.

Brophy, J. (1986). Teacher influences on student achieve-ment. American Psychologist, 41(10), 1069–1077.

Brophy, J., & Good, T. L. (1986). Teacher behav-ior and student achievement. In M. C. Wittrock(Ed.), Handbook of research on teaching (3rd ed.,pp. 328–375). New York: Macmillian.

Brophy, J. E., & Evertson, C. M. (1978). Context vari-ables in teaching. Educational Psychologist, 12(3),310–316.

Carroll, J. (1963). A model of school learning. TeachersCollege Record, 64(8), 723–733.

Chandler, P., & Sweller, J. (1991). Cognitive load the-ory and the format of instruction. Cognition andInstruction, 8, 293–332.

Cohen, J. (1992). A power primer. Psychological Bulletin,112, 155–159.

Coker, H., Medley, D. M., & Soar, R. S. (1980). How validare expert opinions about effective teaching? Phi DeltaKappan, 62(2), 131–149.

Comber, L. C., & Keeves, J. P. (1973). Science educationin nineteen countries. New York: Halsted Press.

Cooley, W. W., & Leinhardt, G. (1980). The instructionaldimensions study. Educational Evaluation and PolicyAnalysis, 2(1), 7–25.

Cushing, L. S., Clark, N. M., Carter, E. W., & Kennedy,C. H. (2005). Access to the general education curricu-lum for students with significant cognitive disabilities.Teaching Exceptional Children, 38(2), 6–13.

Denham, C., & Lieberman, A. (Eds.). (1980). Timeto learn. Washington, DC: National Institute ofEducation.

Elliott, S. N. (2007). Selecting and using testing accom-modations to facilitate meaningful participation of allstudents in state and district assessments. In L. Cook& C. Cahalan (Eds.), Large scale assessment and

Page 30: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

14 S.N. Elliott et al.

accommodations: What works? (pp. 1–9). Princeton,NJ: Educational Testing Service.

Elliott, S. N., Kettler, R. J., Beddow, P. A., Kurz, A.,Compton, E., McGrath, D., et al. (2010). Using mod-ified items to test students with and without persistentacademic difficulties: Effects on groups and individualstudents. Exceptional Children, 76(4), 475–495.

Elliott, S. N., Kratochwill, T. R., & Gilbertson-Schulte, A.(1999). Assessment accommodaions checklist/guide.Monterey, CA: CTB/McGraw-Hill [www.CTB.com].

Elliott, S. N., Kratochwill, T. R., & McKevitt, B. C.(2001). Experimental analysis of the effects of test-ing accommodations on the scores of students withand without disabilities. Journal of School Psychology,39(1), 3–24.

Elliott, S. N., Kurz, A., Beddow, P. A., & Frey, J. R. (2009,February). Cognitive load theory: Instruction-basedresearch with applications for designing tests. Paperpresented at the meeting of the national association ofschool psychologists, Boston.

Elliott, S. N., & Marquart, A. M. (2004). Extended timeas a testing accommodation: Its effects and perceivedconsequences. Exceptional Children, 70(3), 349–367.

Elliott, S. N., McKevitt, B. C., & Kettler, R. J. (2002).Testing accommodations research and decision mak-ing: The case of “good” scores being highly valued butdifficult to achieve for all students. Measurement andEvaluation in Counseling and Development, 35(3),153–166.

Feldman, E., Kim, J. S., & Elliott, S. N. (2009).The effects of accommodations on adolescents’self-efficacy and test performance. Journal ofSpecial Education. Advance online publication.doi:10.1177/0022466909353791

Fisher, C. W., & Berliner, D. C. (Eds.). (1985).Perspectives on instructional time. New York:Longman.

Fisher, C. W., Berliner, D. C., Filby, N., Marliave, R.,Cahen, L., & Dishaw, M. (1980). Teaching behav-iors, academic learning time, and student achievement:An overview. In C. Denham & A. Lieberman (Eds.),Time to learn (pp. 7–22). Washington, DC: NationalInstitute of Education.

Fuchs, L. S., & Fuchs, D. (2001). Helping teachers for-mulate sound test accommodation decisions for stu-dents with learning disabilities. Learning DisabilitiesResearch & Practice, 16(3), 174–181.

Fuchs, L. S., Fuchs, D., & Capizzi, A. M. (2005).Identifying appropriate test accommodations for stu-dents with learning disabilities. Focus on ExceptionalChildren, 37(6), 1–8.

Fuchs, L. S., Fuchs, D., Eaton, S. B., Hamlett, C. L., &Karns, K. M. (2000). Supplemental teacher judgmentsof mathematics test accommodations with objectivedata sources. School Psychology Review, 29(1), 65–85.

Gagné, R. M. (1977). The conditions of learning. Chicago:Holt, Rinehart & Winston.

Gamoran, A., Porter, A. C., Smithson, J., & White,P. A. (1997). Upgrading high school mathematics

instruction: Improving learning opportunities for low-achieving, low-income youth. Educational Evaluationand Policy Analysis, 19(4), 325–338.

Gettinger, M., & Seibert, J. K. (2002). Best practices inincreasing academic learning time. In A. Thomas &J. Grimes (Eds.), Best practices in school psychol-ogy IV (Vol. 1, pp. 773–787). Bethesda, MD: NationalAssociation of School Psychologists.

Gibson, D., Haeberli, F. B., Glover, T. A., & Witter, E.A. (2005). The use of recommended and providedtesting accommodations. Assessment for EffectiveIntervention, 31, 19–36.

Haertel, E., & Calfee, R. (1983). School achievement:Thinking about what to test. Journal of EducationalMeasurement, 20(2), 119–132.

Harnischfeger, A., & Wiley, D. E. (1976). The teaching–learning process in elementary schools: A synopticview. Curriculum Inquiry, 6(1), 5–43.

Herman, J. L., & Klein, D. C. D. (1997). Assessingopportunity to learn: A California example (CSETechnical Report No. 453). Los Angeles: Universityof California, National Center for Research onEvaluation, Standards, and Student Testing.

Herman, J. L., Klein, D. C., & Abedi, J. (2000). Assessingstudents’ opportunity to learn: Teacher and studentperspectives. Educational Measurement: Issues andPractice, 19(4), 16–24.

Hollenbeck, K. (2002). Determining when test alterationsare valid accommodations or modifications for large-scale assessment. In G. Tindal & T. M. Haladyna(Eds.), Large-scale assessment programs for all stu-dents: Validity, technical adequacy, and implementa-tion (pp. 395–425). Mahwah, NJ: Lawrence Erlbaum.

Hollenbeck, K., Rozek-Tedesco, M. A., & Finzel, A.(2000, April). Defining valid accommodations as afunction of setting, task, and response. Paper presentedat the meeting of the council of exceptional children,Vancouver, BC, Canada.

Husén, T. (1967). International study of achievement inmathematics: A comparison of twelve countries. NewYork: Wiley.

Individuals with Disabilities Education Act Amendments,20 U.S.C. §1400 et seq. (1997).

Individuals with Disabilities Education Act, 20 U.S.C.§1400 et seq. (1975).

Ketterlin-Geller, L. R. (2005). Knowing what all studentsknow: Procedures for developing universal designfor assessment. Journal of Technology, Learning, andAssessment, 4(2), 1–23.

Ketterlin-Geller, L. R., Alonzo, J., Braun-Monegan, J., &Tindal, G. (2007). Recommendations for accommoda-tions: Implications of (in) consistency. Remedial andSpecial Education, 28(4), 194–206.

Kettler, R. J., & Elliott, S. N. (2010). Assessment accom-modations for children with special needs. In B.McGaw, E. Baker & P. Peterson (Eds.), Internationalencyclopedia of education (3rd ed.). Oxford: Elsevier.

Kettler, R. J., Elliott, S. N., & Beddow, P. A. (2009).Modifying achievement test items: A theory-guided

Page 31: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

1 Creating Access to Instruction and Tests of Achievement: Challenges and Solutions 15

and data-based approach for better measure-ment of what students with disabilities know.Peabody Journal of Education, 84(4), 529–551.doi:10.1080/01619560903240996.

Kettler, R. J., Niebling, B. C., Mroch, A. A., Feldman, E.S., Newell, M. L., Elliott, S. N., et al. (2005). Effectsof testing accommodations on math and readingscores: An experimental analysis of the performanceof fourth- and eighth-grade students with and withoutdisabilities. Assessment for Effective Intervention,31(1), 37–48.

Kettler, R. J., Rodriguez, M. R., Bolt, D. M., Elliott,S. N., Beddow, P. A., & Kurz, A. (in press).Modified multiple-choice items for alternate assess-ments: Reliability, difficulty, and differential boost.Applied Measurement in Education.

Kettler, R., Russell, M., Camacho, C., Thurlow, M.,Geller, L. K., Godin, K., et al. (2009, April).Improving reading measurement for alternate assess-ment: Suggestions for designing research on item andtest alterations. Paper presented at the annual meetingof the american educational research association, SanDiego, CA.

Koretz, D. M., & Hamilton, L. S. (2006). Testing foraccountability in K-12. In R. L. Brennan (Ed.),Educational measurement (4th ed., pp. 531–578).Westport, CT: Praeger.

Kosciolek, S., & Ysseldyke, J. E. (2000). Effects ofa reading accommodation on the validity of areading test (Technical Report 28). Minneapolis,MN: University of Minnesota, National Center onEducational Outcomes.

Kurz, A., & Elliott, S. N. (in press). Overcoming bar-riers to access for students with disabilities: Testingaccommodations and beyond. In M. Russell (Ed.),Assessing students in the margins: Challenges, strate-gies, and techniques. Charlotte, NC: Information AgePublishing.

Kurz, A., Elliott, S. N., Wehby, J. H., & Smithson,J. L. (2010). Alignment of the intended, planned,and enacted curriculum in general and special edu-cation and its relation to student achievement.Journal of Special Education, 44(3), 131–145.doi:10.1177/0022466909341196

Lang, S. C., Elliott, S. N., Bolt, D. M., & Kratochwill, T.R. (2008). The effects of testing accommodations onstudents’ performances and reactions to testing. SchoolPsychology Quarterly, 23(1), 107–124.

Lazarus, S.S., Hodgson, J., & Thurlow, M.L. (2010).States’ participation guidelines for alternate assess-ments based on modified academic achievement stan-dards (AA-MAS) in 2009 (Synthesis Report 75).Minneapolis, MN: University of Minnesota, NationalCenter on Educational Outcomes.

Lazarus, S. S., Thurlow, M. L., Lail, K. E., Eisenbraun, K.D., & Kato, K. (2006). 2005 state policies on assess-ment participation and accommodations for studentswith disabilities (Synthesis Report 64). Minneapolis,MN: University of Minnesota, National Center onEducational Outcomes.

Mace, R. L. (1991). Definitions: Accessible, adaptable,and universal design (Fact Sheet). Raleigh, NC: Centerfor Universal Design, NCSU.

McDonnell, L. M. (1995). Opportunity to learn as aresearch concept and a policy instrument. EducationalEvaluation and Policy Analysis, 17(3), 305–322.

McKevitt, B. C., & Elliott, S. N. (2003). Effects andperceived consequences of using read-aloud andteacher-recommended testing accommodations on areading achievement test. School Psychology Review,32(4), 583–600.

Miller, G. A. (1956). The magical number seven, plus orminus two: Some limits on our capacity for processinginformation. Psychological Review, 63(2), 81–97.

No Child Left Behind Act, 20 U.S.C. §16301 et seq.(2001).

O’Sullivan, P. J., Ysseldyke, J. E., Christenson, S. L.,& Thurlow, M. L. (1990). Mildly handicapped ele-mentary students’ opportunity to learn during readinginstruction in mainstream and special education set-tings. Reading Research Quarterly, 25(2), 131–146.

Phillips, S. E. (1994). High-stakes testing accommo-dations: Validity versus disabled rights. AppliedMeasurement in Education, 7(2), 93–120.

Phillips, S. E., & Camara, W. J. (2006). Legal and ethicalissues. In R. L. Brennan (Ed.), Educational measure-ment (4th ed., pp. 733–755). Westport, CT: Praeger.

Pianta, R. C., Belsky, J., Houts, R., Morrison, F., &NICHD. (2007). Opportunities to learn in America’selementary classrooms. Science, 315, 1795–1796.

Pianta, R. C., & Hamre, B. K. (2009). Conceptualization,measurement, and improvement of classroom pro-cesses: Standardized observation can leverage capac-ity. Educational Researcher, 38(2), 109–119.

Pianta, R. C., LaParo, K. M., & Hamre, B. K. (2008).Classroom assessment scoring system. Baltimore:Brookes.

Pitoniak, M. J., & Royer, J. M. (2001). Testing accommo-dations for examinees with disabilities: A review ofpsychometric, legal, and social policy issues. Reviewof Educational Research, 71(1), 53–104.

Porter, A. C. (1991). Creating a system of school pro-cess indicators. Educational Evaluation and PolicyAnalysis, 13(1), 13–29.

Porter, A. C. (1993). School delivery standards.Educational Researcher, 22(5), 24–30.

Porter, A. C. (1995). The uses and misuses of opportunity-to-learn standards. Educational Researcher, 24(1),21–27.

Porter, A. C. (2006). Curriculum assessment. In J. L.Green, G. Camilli & P. B. Elmore (Eds.), Handbookof complementary methods in education research (pp.141–159). Mahwah, NJ: Lawrence Erlbaum.

Porter, A. C., Kirst, M. W., Osthoff, E. J., Smithson, J. L.,& Schneider, S. A. (1993). Reform up close: An analy-sis of high school mathematics and science classrooms(Final Report). Madison, WI: University of Wisconsin,Wisconsin Center for Education Research.

Porter, A. C., Schmidt, W. H., Floden, R. E., & Freeman,D. J. (1978). Impact on what? The importance of

Page 32: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

16 S.N. Elliott et al.

content covered (Research Series No. 2). East Lansing,MI: Michigan State University, Institute for Researchon Teaching.

Pullin, D. C., & Haertel, E. H. (2008). Assessment throughthe lens of “opportunity to learn”. In P. A. Moss, D. C.Pullin, J. P. Gee, E. H. Haertel & L. J. Young (Eds.),Assessment, equity, and opportunity to learn (pp.17–41). Cambridge, MA: Cambridge University Press.

Roach, A. T., Beddow, P. B., Kurz, A., Kettler, R. J., &Elliott, S. N. (2010). Using student responses and per-ceptions to inform item development for an alternateassessment based on modified achievement standards.Exceptional Children, 77(1), 61–84.

Roach, A. T., Chilungu, E. N., LaSalle, T. P., Talapatra,D., Vignieri, M. J., & Kurz, A. (2009). Opportunitiesand options for facilitating and evaluating access tothe general curriculum for students with disabili-ties. Peabody Journal of Education, 84(4), 511–528.doi:10.1080/01619560903240954

Roach, A. T., Niebling, B. C., & Kurz, A. (2008).Evaluating the alignment among curriculum, instruc-tion, and assessments: Implications and applicationsfor research and practice. Psychology in the Schools,45(2), 158–176.

Rowan, B. (1998). The task characteristics of teach-ing: Implications for the organizational design ofschools. In R. Bernhardt, C. N. Hedley, G. Cattaro& V. Svolopoulus (Eds.), Curriculum leadership:Rethinking schools for the 21st century. Cresskill, NJ:Hampton.

Rowan, B., Camburn, E., & Correnti, R. (2004). Usingteacher logs to measure the enacted curriculum: Astudy of literacy teaching in third-grade classrooms.The Elementary School Journal, 105(1), 75–101.

Rowan, B., & Correnti, R. (2009). Studying readinginstruction with teacher logs: Lessons from the studyof instructional improvement. Educational Researcher,38(2), 120–131.

Scheerens, J., & Bosker, R. J. (1997). The foundations ofeducational effectiveness. Oxford: Pergamon.

Schmidt, W. H., Houang, R. T., Cogan, L., Blömeke, S.,Tatto, M. T., Hsieh, F. J., et al. (2008). Opportunity tolearn in the preparation of mathematics teachers: Itsstructure and how it varies across six countries. ZDMMathematics Education, 40(5), 735–747.

Sireci, S. G., Scarpati, S. E., & Li, S. (2005). Test accom-modations for students with disabilities: An analysisof the interaction hypothesis. Review of EducationalResearch, 75(4), 457–490.

Stevens, F. I. (1993). Applying an opportunity-to-learnconceptual framework to the investigation of theeffects of teaching practices via secondary analyses ofmultiple-case-study summary data. Journal of NegroEducation, 62(3), 232–248.

Thompson, S. J., Johnstone, C. J., & Thurlow, M.L. (2002). Universal design applied to large scaleassessments (Synthesis Report 44). Minneapolis,MN: University of Minnesota, National Center onEducational Outcomes. Retrieved [2-1-11], from theWorld Wide Web: http://education.umn.edu/NCEO/OnlinePubs/Synthesis44.html

Thurlow, M. L., Laitusis, C. C., Dillion, D. R., Cook, L.L., Moen, R. E., Abedi, J., et al. (2009). Accessibilityprinciples for reading assessments. Minneapolis,NN: National Accessible Reading AssessmentProjects.

Thurlow, M. L., Ysseldyke, J. E., Graden, J., &Algozzine, B. (1984). Opportunity to learn for LDstudents receiving different levels of special educa-tion services. Learning Disability Quarterly, 7(1),55–67.

U.S. Department of Education. (2007a, April). Modifiedacademic achievement standards: Non-regulatoryguidance. Washington, DC: Author.

U.S. Department of Education. (2007b, revised July).Standards and assessments peer review guidance.Washington, DC: Author.

Vannest, K. J., & Hagan-Burke, S. (2009). Teachertime use in special education. Remedial andSpecial Education. Advance online publication.doi:10.1177/0741932508327459.

Vannest, K. J., & Parker, R. I. (2009). Measuring time: Thestability of special education teacher time use. Journalof Special Education. Advance online publication.doi:10.1177/0022466908329826

Walberg, H. J. (1986). Syntheses of research on teach-ing. In M. C. Wittrock (Ed.), Handbook of researchon teaching (3rd ed., pp. 214–229). New York:Macmillian.

Walberg, H. J. (1988). Synthesis of research on time andlearning. Educational Leadership, 45(6), 76–85.

Wang, J. (1998). Opportunity to learn: The impactsand policy implications. Educational Evaluation andPolicy Analysis, 20(3), 137–156.

Ysseldyke, J. E., Thurlow, M. L., Mecklenburg, C., &Graden, J. (1984). Opportunity to learn for regular andspecial education students during reading instruction.Remedial and Special Education, 5(1), 29.

Page 33: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

Part I

Government Policies and LegalConsiderations

Page 34: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy
Page 35: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

2U.S. Policies Supporting InclusiveAssessments for Studentswith Disabilities

Susan C. Weigert

U.S. Policies Supporting InclusiveAssessment of Students withDisabilities

This overview of U.S. policies supporting inclu-sive assessment practices traces the policies andsupporting educational contexts within the histor-ical framework of key legislation and regulationsover the past 50 years.

Assessment Policies in the 1960s and1970s: Inclusion and ‘Equal Terms’

The history of inclusive assessment policies in theUnited States has been a product of many polit-ical and practical influences, but at the root oftheir developments has been the central and fun-damental tenet of equal protection. Title VI of theCivil Rights Act of 1964 prohibited discrimina-tion on the basis of race, sex, color, or nationalorigin, rather than on the basis of disability. Yetthe spirit of the law swept students with disabil-ities (SWDs) into its strong political current and

S.C. Weigert (�)U.S. Department of Education, Office of SpecialEducation Programs, Alexandria, VA, USAe-mail: [email protected]

The views expressed in this chapter are solely those ofthe author in her private capacity. No official supportor endorsement by the U.S. Department of Education isintended nor should it be inferred.

promised to ensure and protect the equity of theireducational opportunities. The passion for equalaccess to education as a civil right was best char-acterized by Chief Justice Warren’s opinion onBrown v. Board of Education in 1954:

In these days it is doubtful that any child mayreasonably be expected to succeed in life if he isdenied the opportunity of an education. Such anopportunity, where the State has undertaken to pro-vide it, is a right that must be made available toall on equal terms. (Brown v. Board of Education,347 U.S. 483 (1954), quoted in Russo & Osborne,2008 p. 493)

While policies of inclusion constituted a pop-ular solution to problems of inequity in publiceducation, compulsory education statutes duringthe 1960s and early 70s left the authority toschool districts to decide whether SWDs could‘benefit’ from instruction (Russo & Osborne,2008). The new inclusion principles were even-tually codified in PL 93-112, Section 504 of theRehabilitation Act of 1973, which prohibited dis-crimination against individuals with disabilitiesin federally funded programs, and required rea-sonable accommodations for students with phys-ical or mental impairments that ‘substantiallylimited’ them in one or more major life activi-ties, including learning (29 USC § 706 (7)(B)).In addition to physical and sensory handicaps,Section 504 of the Rehabilitation Act applied topersons with ‘mental’ disabilities such as men-tal retardation, traumatic or organic brain syn-dromes, emotional disturbance, specific learning

19S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4_2, © Springer Science+Business Media, LLC 2011

Page 36: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

20 S.C. Weigert

disabilities, and other cognitively disabling con-ditions (Phillips, 1994).

The Rehabilitation Act further defined themeaning of a free, appropriate, public education(FAPE), and specified that appropriate educa-tion included educational services designed tomeet the individual education needs of studentswith disabilities ‘as adequately’ as the needs ofnon-disabled students were met. Yet the onlyassessment-related provisions of the Act werethose requiring that assessments be provided in achild’s ‘normal mode of communication’ (includ-ing native language) unless it was clearly notfeasible to do so.

The principle of inclusion in the RehabilitationAct was later incorporated into the elementaryand secondary education act (ESEA) amend-ments of 1974 (PL 93-380), which also mandatedthe ‘free appropriate public education (FAPE)’ inthe ‘least restrictive environment (LRE).’ Theseprovisions were codified a year later in theEducation for All Handicapped Children Act(P.L. 94-142). Yet at the time, FAPE simplymeant access to special education and relatedservices—in conformity with individualized aca-demic and behavioral goals stated in the student’sIEP, rather than connoting access to the gen-eral education curriculum. A new requirement forinclusive assessment of SWDs in PL 94-142 §612(5)(C) mandated that testing and evaluative mate-rials used for evaluation and placement not bethe sole criterion for determining an appropriateeducational program for a child with a disability.

The 1977 regulations amending theRehabilitation Act of 1973 (§104.35) requiredthat tests and other evaluation materials meetrequirements for validity for the specific purposefor which the tests were used, and that they beadministered by trained personnel in conformitywith test-developer’s instructions. Further, thetype of assessments to be used for educationalevaluation of SWDs were to include thosetailored to assess specific areas of educationalneeds, not merely those designed to measure astudent’s I.Q. Finally, educational tests were tobe selected and administered to ensure that theresults of testing accurately reflected the student’seducational aptitude or achievement level (or

whatever educational factor was to be measuredby the test), rather than merely reflecting thestudent’s impaired sensory, manual, or speakingskills (except when those skills were the factorsmeasured by the test). The protections against‘disparate impact’ of assessments for SWDswas restricted to ‘otherwise qualified’ individ-uals with disabilities, meaning that the studentwho might have impaired sensory, manual, orspeaking skills, still had to be capable of meetingthe standards required to pass the test. As U.S.Supreme Court justice Powell commented:

Section 504 imposes no requirement upon an edu-cational institution to lower or to effect substantialmodifications of standards to accommodate a hand-icapped person. (Southeastern Community Collegev. Davis 442 U.S. 397, 1979 Supreme Court ofUnited States)

The 1980s and 1990s: IEPas Curriculum

As had been the case in the 1977 regulations,regulatory changes to the Rehabilitation Act in1980 required that assessments used for collegeadmissions constitute validated predictors of col-lege aptitude or college achievement, rather thanmerely reflecting the applicant’s impaired sen-sory, manual, or speaking skills. Yet the 1980regulations essentially permitted use of tests withestablished disproportionate adverse effects onSWDs, provided that an alternate test with aless disproportionate effect was unavailable. In1980, when President Jimmy Carter establishedthe Department of Education (ED) as a cabinet-level agency with a mission to ensure that educa-tional opportunities were not denied on accountof race, creed, color, national origin, or sex,disability was not included among the list of pro-tected categories. Importantly, Section 103 (a)of the Department of Education OrganizationAct (PL 96-88) prohibited ED from exercisingany control over the curriculum, or any pro-gram of instruction, or selection of instructionalmaterials by any school system or educationalinstitution.

Page 37: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

2 U.S. Policies Supporting Inclusive Assessments for Students with Disabilities 21

Soon after the establishment of the newagency, Education Secretary Terrel Bell cre-ated a National Commission on Excellence inEducation, which produced a report on the sta-tus of American education entitled ‘A Nationat Risk,’ which concluded that the country wasthreatened by a ‘rising tide of mediocrity,’ thatover 10% of 17-year-olds were functionally illit-erate, that SAT scores were declining across thecountry, and that many students required reme-diation courses even after entering college. Thereport concluded that comprehensive strategiesto reform education across the country wereneeded (National Commission on Excellence inEducation, 1983). It was believed that the neededreforms could be better comprehended after get-ting a fuller picture of student performance, byinstituting the national assessment of all students(Ginsberg, Noell, & Plisko, 1988). During thisdecade and throughout the next, however, partici-pation of SWDs in the national assessment (later,the National Assessment of Educational Progressor NAEP) was minimal (Shriner & Thurlow,1993). In addition, most state assessment pro-grams based their inclusion decisions for SWDsprimarily upon those specified by the NAEP oron the basis of time spent in the regular classroom(Thurlow & Yessledyke, 1993). Some factors thatbelied high exclusion rates on the NAEP cited byNCEO included unclear participation guidelines,sampling plans that systematically excluded stu-dents in separate schools or those not in gradedclasses, and an ‘altruistic’ motivation to reducestress on students not expected to perform well(Zigmond & Kloo, 2009). Some states were sim-ply unwilling to make accommodations to stu-dents to permit participation of SWDs in theNAEP (Ysseldyke & Thurlow, 1994a).

On the state assessment front, SWDs wereincluded only slightly more often than on theNAEP. Shriner and Thurlow (1993) documentthat in the early 1990s, less than 10% ofSWDs were being included in state assess-ments. As a consequence of widespread assess-ment exclusion policies for the first two decadesafter the establishment of the Office of SpecialEducation Programs (OSEP), there was verylittle known about the academic outcomes of

SWDs (Ysseldyke, Thurlow, McGrew, & Shrine,1994b).

The IDEA was reauthorized in 1990 asPL 101-476, and a focus remained on physi-cal inclusion—greater inclusion in communityschools, least restrictive placement of students,and transition services. Placements of SWDs inclasses were to be age and grade appropriate witha minimum of placement in self-contained class-rooms. Teaching methods for including SWDsin the general education classrooms began toinvolve cooperative learning and peer-instructionmodels. Yet an emphasis was placed on the phys-ical inclusion of SWDs over the quality or effec-tiveness of their academic experience of SWDs(Danielson, personal communication, October22, 2009). While teaching staff were expected to‘adapt’ the curricular content, in doing so, theywere encouraged to choose a grade-level curricu-lum that seemed developmentally most suited tomeet each SWDs IEP objectives, rather than toensure access to grade-level standards (Simon,Karasoff, & Smith, 1991).

IDEA 1990 also funded studies and inves-tigations through which to collect informationneeded for program and system improvements bystates and LEAs. The results of studies, such asthe National Longitudinal Transition Study beganto be available to OSEP prior to the 1997 autho-rization, and later shed light on the degree towhich inclusion efforts were failing to ensureeffective instruction of SWDs (Danielson, per-sonal communication, October 22, 2009).

Prior to the 1993 ESEA reauthorization, TitleI funds were to be distributed to schools onthe basis of the poverty level and economicneeds of students rather than on the basisof performance on State assessments. But thereauthorized 1993 ESEA shifted the focus toassessing outcomes for all children, includingstudents with special needs, in key disciplines—mathematics, science, history, geography, civics,English, the arts, and other languages. The reau-thorized ESEA attempted to ensure that ‘all stu-dents,’ including ‘special needs’ students, methigh academic standards, that teaching and learn-ing improved, that government offered flexi-bility coupled with responsibility for student

Page 38: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

22 S.C. Weigert

performance, that schools work cooperativelywith parents and the community, and that Federalaid go to the poorest students (U.S. Departmentof Education, 1993).

ESEA 1993 endeavored to improve learningthrough reform approaches similar to those ofother countries whose students were thought tobe outperforming American students, particularlyin the fields of science and mathematics. Thus,closely following the ESEA reauthorization of1993 was the Goals 2000 Educate America Act(PL 103-227), which was signed into law onMarch 31, 1994. The essence of Goals 2000 wasthat by the year 2000 all students would leavegrades 4, 8, and 12 with competency in English,mathematics, science, foreign languages, civicsand government, economics, arts, history, andgeography. Every student was to be preparedfor responsible citizenship, postsecondary learn-ing, and productive employment. The reformsof Goals 2000 were grounded in the expecta-tion that States develop more challenging con-tent and performance standards, design instruc-tion and assessments aligned to those standards,and participate in accountability reporting onthe extent to which schools and students weremeeting the State standards (The White House,1990; National Academy of Education, 1998).For some states, this was the first effort at tryingto develop a broad framework for a general cur-riculum (National Academy of Education, 1998).Ultimately, under the ESEA Title I requirement,all states were expected to have valid, reliable,and aligned assessments based on their new con-tent standards in the four core academic subjectsby school year 2000–2001.

At the same time, among disability advo-cates, it was well understood that there wasboth an education gap as well as an ‘assessmentgap’ for SWDs (Danielson, personal communica-tion, October 22, 2009). The National Center onEducational Outcomes (NCEO) publicly posedthe question of whether SWDs were seriouslybeing considered in the standards-based reformmovement, and pointed out that when identi-fying sources of data for monitoring progresstoward the national goals, in 1991 the National

Education Goals Panel identified data collectionprograms that had excluded up to 50% of SWDs(McGrew et al., 1992). An NCEO Synthesisreport ended with ‘Our nation in its quest tobecome first in the world has forgotten many ofits students’ (Thurlow & Yesseldyke, 1993).

The Council for Exceptional Children testifiedto Congress in 1992 that the standards themselvesshould be constructed so as to accommodateall students, and it called for an investigationinto alternative forms of assessments as well asways to ensure that when educators worked onstandards for assessments that at least one mem-ber be included who had expertise in workingwith individuals with disabilities (CEC testimonybefore House Subcommittee on Elem, Sec, andVocational Education, 1992). By the time IDEA1997 was reauthorized, most states had estab-lished content standards in the four core con-tent areas, yet the question of which studentswith disabilities could access these standards,and participate in assessments based upon them,was a subject of debate. Moreover, the type oftests that was being constructed posed barri-ers over and above the content standards. Statesbegan moving away from flexible, or ‘authen-tic assessments,’ which held greater promise forinclusiveness of a range of student ability lev-els, in order to fulfill the pragmatic require-ments of large-scale testing. Such ‘authentic’assessments, popular during the era, were dif-ficult to standardize across large numbers ofdiverse students. Moreover, while most spe-cial educators believed that performance-basedassessments provided more accurate descrip-tions of student progress and were morehelpful in informing classroom practice, theiradministration was expensive and overly timeconsuming for consideration in accountabilitytesting.

In the midst of the standards movement, therewas a pervasive concern that norm-referencedassessments were not appropriate to the goals ofstandards-based reforms, not just in the case ofSWDs, but for all students. More importantly,norm-referenced tests were not well aligned tothe curricula that students were to be taught

Page 39: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

2 U.S. Policies Supporting Inclusive Assessments for Students with Disabilities 23

under the new standards movement. During themid-1990s, there had been much concern aboutthe fairness and the validity of norm-referencedscoring approaches for use with special popu-lations. Many states also justified the exclusionof SWDs from standardized testing on the basisof fairness—that students had not received anopportunity to learn the material assessed on gen-eral assessments—and on the basis of utility—arguing that the results of assessment scoresdid not provide useful information about theacademic performance or educational needs ofSWDs. Advocates complained that the use ofnorm-referenced testing, in which SWDs wereusually ranked lowest, led to the perpetuation ofassumptions that SWDs were incapable of mak-ing any academic progress. Yet, excluding themfrom participation provided no information atall about their academic performance and, someargued, denied FAPE.

While many in the special education field weredivided as to how SWDs should participate inthe standards-based accountability movement ofthe 1980’s, most later came to agree, as onestate policymaker commented, that ‘the removalof special education students from the “account-ability track” resulted in their removal from the“curriculum track” ’ (Koehler, 1992).

Following Goals 2000, as most statesembarked on a full-scale revision of theirassessment systems and attempted to define‘what all students should know’ in their newcontent standards, nearly all states shifted tothe use of criterion-referenced assessmentsand began including a percentage of SWDs inthese new assessments. In order to assist SWDsin accessing these new criterion-referencedassessments, States developed a list of ‘standardaccommodations.’ The four classes of accom-modations included the following: presentationformat, which were changes in how tests werepresented and involved accommodations likeproviding Braille versions of the tests or orallyreading the directions to students; responseformat, which were changes in the manner inwhich students gave their responses and includedaccommodations such as having a student point

to a response or use a computer for responding;setting of the test, which could be alone, or insmall groups; and finally, timing of the test,which could include extending the time allowed,or providing more breaks during testing.

In response to questions about the attainabil-ity of performance standards for all studentswith disabilities, ED advised states to imple-ment alternative assessments for a ‘small num-ber’ of students and to implement accommo-dations to ensure an ‘equal playing-field’ forthose students, stating, ‘Assessment accommoda-tions help students show what they know withoutbeing placed at a disadvantage by their disability’(U.S. Department of Education, 1997). However,claims about the capacity of accommodationsalone to overcome the disadvantages created by astudent’s disability were considered true for stu-dents with sensory, manual, or speaking skills,but not for SWDs with cognitive impairments.

The implementation of testing participa-tion guidelines for SWDs was the subject ofconsiderable controversy among disability advo-cates across states. Policy experts maintained,often based upon the Supreme Court rulingin Southeastern Community College v. Davisthat standards could never not be lowered forSWDs taking accountability assessments, evenif those assessments were also to be used toguide instruction (e.g., Philips, 2002). Yet mostadvocates maintained that, as students withdisabilities were not included when the standardswere developed, it seemed inappropriate to holdthem to the standards.

Prior to the passage of the Americans withDisabilities Act of 1990 test developers weremost familiar with the provision of accommo-dations for students with sensory impairments.However, following the passage of the ADA,advocates for the disabled argued that federal lawshould ensure the availability of testing accom-modations and modifications for mental disabil-ities such as dyslexia and other learning dis-abilities. Yet policymakers responded again thatthe effect of accommodations for cognitive dis-abilities undermined the valid interpretation of astudent’s test score (e.g., Philips, 1993).

Page 40: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

24 S.C. Weigert

IDEA 97 and Options for AlternateAssessment

OSEP hoped, by providing funding opportu-nities, to spur the Research and DevelopmentCommunity to go beyond what had been con-sidered technically feasible, to respond to theincreasing demand for teaching tools and newapproaches to the assessment of SWDs. Thereauthorized IDEA of 1997 provided for thedevelopment of new assessments to both identifyareas of academic need and to measure academicprogress for children with disabilities. The newassessments were also to be used in educationalprogram planning and for placement in specialeducation, related services, and/or early interven-tion under § 641, (1)(G) of the law. Funds were tobe made available for the development of alterna-tive assessments for the inclusion of non-nativeEnglish speakers and other minority students,to prevent misidentification of such students asSWDs. The mandates of IDEA 97 called forassessments to meet requirements that could notbe entirely met by inclusion of SWDs in large-scale state assessments alone. Scientific measure-ment of student progress called for new typesof classroom assessments to monitor academicprogress (e.g., Fuchs & Fuchs, 2004), in responseto intensive and evidence-based interventions.These new ‘curriculum-based’ assessments(CBM) were inherently inclusive, as they couldbe individualized for students working across abroad continuum of skill levels. The new CBMmeasures also represented a significant improve-ment from previous classroom assessmentswhich relied on ‘mastery measurement,’ or sum-mative, end-of-unit assessments, as the new mea-sures permitted the monitoring of incrementalprogress over time by teachers (Stecker, 2005).

While progress-monitoring assessments per-mitted inclusiveness and informed decision-making in the classroom, some advocates main-tained they had the disadvantage of isolatingSWDs from the general education curriculum:

These assessments frequently were conducted inisolation from the larger general education cur-riculum. The assessments focused on immediate

and discrete skill deficits and IEPs often were acollection of isolated skill objectives that led toisolated instruction. . .Too often, the IEP becamethe curriculum for the student, instead of a toolfor defining how to implement a general educationcurriculum. (Nolet & McLaughlin, 2000, p. 10)

Overall, the 1997 reauthorization was most sig-nificant for its endorsement of the participationof all children with disabilities in state assess-ments and for the requirement that alternateassessments be made available, by July 2000, forany of those students who could not meaning-fully participate in the regular state assessmenteven with accommodations (U.S. Departmentof Education, 2000). The sanctioning of alter-nate assessments was to shift the views of fed-eral and state policymakers on what constitutedfair assessment practices by moving away fromthe principle that a single standard of perfor-mance on state standards, even for purposes ofaccountability, would by necessity apply to ‘allstudents.’

The states had concluded that, if all studentswere to be included in the new accountabilitysystems, new assessments based on the standardswould have to be developed for a small percent-age of students with the most severe disabilities—generally students labeled with ‘severe-profounddisabilities and trainable mentally handicapped’(Quenemoen, 2009). According to Browder, &Wakeman, & Flowers (2009), prior to IDEA 97there were three classes of SWDs: (a) thosewho pursued a general education curriculumwith expectations for grade-level achievement,(b) those who required a remedial curricu-lum (e.g., a 7th grader working on 4th grademath), and (c) those who required functionallife skills to prepare for independent living.While prior to 1997, teachers expected that onlythe first group of SWDs would participate instate assessments, after the 1997 reauthoriza-tion, the inclusion of SWDs in the new alternateassessments (Browder et al., 2009) constituted amajor shift in assessment inclusion policies, per-mitting all subgroups of SWDs to participate withvalidity. The new alternate assessments were tobe ‘aligned with the general curriculum standardsset for all students and should not be assumed

Page 41: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

2 U.S. Policies Supporting Inclusive Assessments for Students with Disabilities 25

appropriate only for those students with signifi-cant cognitive impairments’ (34 CFR §200).

In spite of the inclusion accomplished by the1997 IDEA, the disability community was tornas some advocates continued to maintain thatexclusion from state assessments and substitu-tion of measurements of progress toward IEPgoals were the only appropriate responses tothe new standards movement. Others contendedthat the substitution of IEP goals for state andNational assessment participation would violatethe spirit of inclusion—especially consideringthat IEP goals were often chosen from a list ofuseful skills of ‘everyday life’ (Browder et al.,2009), rather than being designed to provide equi-table access to the full scope of the State contentstandards.

In response to the mandate for alternateassessments, states developed a variety of typesof assessments which came in a variety of forms,including teacher checklists of functional skills,reports on progress toward IEP goals, portfoliosof student work, and performance tasks mod-erately aligned to grade-level content standards(Thompson & Thurlow, 2000).

Significant policy input into the 1997 IDEAreauthorization came through David Hoppe, anaid to Senator Trent Lott, and a parent of aperson with disabilities, who exercized a gentletouch in bringing diverse stakeholders to consen-sus. While the field grew to be unanimous inbelieving accountability and assessments basedentirely on IEP rubrics did not make sense, at thesame time, there continued to be debates aboutthe validity of the scores of SWDs, especiallythose with cognitive impairments, who had notbeen exposed to the curriculum being assessed.Hoppe convinced policymakers to sidestep thepartisanship in reauthorizing IDEA 97 and urgedCongress to come up with a bill to please bothsides of the debate (Danielson, personal commu-nication, October 22, 2009).

The critical new elements in the 1997 IDEAamendments were accountability for inclusion ofSWDs in general state and district-wide assess-ments, with appropriate accommodations andmodifications, if necessary, and the establishmentof performance goals for SWDs as a condition

of funding under IDEA Part B. In the infrequentcases when an IEP team or Section 504 teamdetermined that standard assessments, even withreasonable accommodations, did not provide astudent with an opportunity to demonstrate hisor her knowledge and skill, the State or schooldistrict was to provide an alternate assessment.Yet whatever assessment approach was taken,the scores of students with disabilities were tobe included in the assessment system for pur-poses of public reporting and school and districtaccountability.

The reauthorized IDEA 97 also required theconsideration of assistive technology needs forparticipation, as well as the communication needsof children who were deaf, hard of hearing, orthose with limited English language proficiency.A further requirement for assessments used forthe evaluation of SWDs was the inclusion ofinformation that was ‘instructionally relevant’ inthe evaluation, in order to help a child becomeinvolved in and make progress in the general edu-cation curriculum. Assessment instruments usedfor disability evaluation were also required to betechnically sound to assess the ‘relative contribu-tions’ of both cognitive and behavioral factors, inaddition to physical or developmental factors, onstudents’ academic performance.

States were, once again, required to admin-ister assessments to SWDs in a child’s nativelanguage, or typical mode of communication.Importantly, any standardized tests given to achild were required to be validated for the specificpurposes for which they were to be used. In addi-tion, assessment tools and strategies that directlyassisted teachers in determining the educationalneeds of the child were also to be provided underIDEA 97.

In response to the mandate for new alternateassessments, OSEP funded a variety of projects,such as supporting computer-adaptive assess-ments aligned to the state standards that wouldbe capable of identifying, through ‘dynamic’assessment techniques, learning issues of stu-dents with learning disabilities or other ‘gap stu-dents’ in order to uncover the instructional gapsthey were manifesting in general education set-tings (e.g., see Tindal, 2008). It was known that

Page 42: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

26 S.C. Weigert

the population of students with specific learningdisabilities (SLD) consisted of slow learners whocould eventually address all content standards,though not necessarily in the time frame requiredto participate fairly in the summative end-of-yearassessments. OSEP struggled with how to bal-ance the learning needs of high-incidence SLDstudents with the mandate to include them instate assessments, as well as how to overcomethe historical problem of low expectations. Theanswer OSEP arrived at to best address this wasto mandate that instruction be provided by skilledteachers specifically trained to work with theSLD population. OSEP later funded the AccessCenter to help teachers adapt and individualizeinstruction aligned to standards that were appro-priate for the student’s grade and age-level, ratherthan ‘out-of-level’ standards, as had been a com-mon teaching practice prior to 1997. Yet, tomany advocates in the field, assessments basedon standards that many SWDs could not mas-ter in the same time frame were also considered‘out-of-level’ assessment, since such assessmentsrequired that a typical SLD student would needto make more than a year’s worth of averageprogress in a year to learn enough grade-levelmaterial to be fairly assessed on the full scope ofmaterial being tested (Danielson, personal com-munication, October 22, 2009).

The effect of the 1997 IDEA was to shiftreform efforts to the IEP, envisioning it not asa guide to what SWDs were to be learning, butrather rendering it into a tool to ensure inclusionand progress in the grade-level general educationcurriculum by defining each student’s presentlevel of performance, including how the student’sdisability affected his or her ability to be involvedin and make progress in the general educationcurriculum. Additionally, the law required a state-ment in the IEP about the program modificationsand supports to be used by school personnel toenable the child to be involved in and makeprogress in the general education curriculum andto participate with his or her non-disabled peers.

Subsequent to the IDEA Part B regulationsin 1999, which mandated inclusion of all SWDsin standards-based reform programs, however,many SEAs did not succeed in ensuring that local

education agencies (LEAs) and schools taughtSWDs the grade-level curriculum.

2001 No Child Left Behind Act

Under the 1994 ESEA, States were required totest only three times during a student’s tenurein the K-12 educational system. For policymak-ers crafting the reauthorized ESEA, this left toomany intervening years in which children’s aca-demic difficulties could go unaddressed, withthe result that many children were being ‘leftbehind,’ academically. Under the ‘No Child LeftBehind Act’(NCLB) of 2001, States were obligedto enhance their existing assessment systems toinclude annual assessments in reading/languagearts and mathematics for all public school stu-dents in grades 3 through 8 and at least once ingrades 10 through 12 by the 2005–2006 schoolyear. Additionally, by the 2007–2008 school year,all States were to annually assess their studentsin science at least once in grades 3 through 5,once in grades 6 through 9, and once in grades10 through 12 (U.S. Department of Education,2003).

The NCLB required annual testing in readingand mathematics, the demonstration of ‘adequateyearly progress’ against state-specified perfor-mance targets, and the inclusion of all students inannual assessments. Secretary of Education, RodPaige, later succeeded by White House domesticpolicy advisor, Margaret Spellings, emphasizedthat the purpose of the NCLB provisions was toensure that every child was learning ‘on gradelevel.’ The accountability for the SWD subgroupalso required steps to recruit, hire, train, andretain highly qualified personnel, research-basedteaching methods, and the creation of improve-ment programs to address local systems that fellshort of performance goals.

During the same year, President Bush cre-ated the President’s Commission on Excellencein Special Education, a program designed toimprove the dropout rate among SWDs, whowere leaving school at twice the rate of theirpeers, and whose enrollment in higher education

Page 43: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

2 U.S. Policies Supporting Inclusive Assessments for Students with Disabilities 27

was 50% lower. Moreover, the SLD subgrouphad grown over 300% since 1976, and 80% ofthose with SLD reportedly had never learnedto read (President’s Commission on Education,2002). The claim was made that few childrenin special education were closing the achieve-ment gap to a point where they could read andlearn like their peers. A major thrust of theCommission was that although special educationwas based in civil rights and legal protections,most SWDs remained at risk of being left behind.Several findings of the Commission included crit-icisms that the reauthorized 1997 IDEA placedprocess above results and compliance above stu-dent achievement and outcomes. Further, SpecialEducation did not appear to guarantee more effec-tive instruction. The identification of studentsfor special education services was criticized forbeing based upon a ‘wait-to-fail’ model. The crit-icism was launched that ED had become twoseparate systems instructionally, when it was crit-ical that general education and special educationshare responsibilities for the education of SWDs.Among the recommendations of the report was acall for improved assessment policies to preventexclusion from State and district-wide assess-ments, still a common practice in 2001.

2002–2003 Title I RegulationsPermitting Alternate AchievementStandards in Accountability

The ESEA regulations of 2002 implementingthe assessment provisions of NCLB authorizedthe use of alternate assessments in accountabil-ity assessments and required that States makeavailable alternate assessments for any studentunable to participate in the State’s general assess-ments, even with accommodations. The sub-sequent ESEA regulations of 2003 permittedstates to develop alternate achievement standardsfor students with the most significant cogni-tive disabilities. The 2003 regulations requiredthat the alternate assessment be aligned withthe State’s academic content standards, promoteaccess to the general curriculum, and reflect pro-fessional judgment of the highest achievement

standards possible (34 CFR§200.1). These reg-ulations effectively forced most states to beginto revise the assessments they had originally cre-ated in response to the more liberal 1997 IDEArequirements. While the due date for the devel-opment of alternate assessments was 2000, therewas little knowledge in the field of the appro-priate academic content to base such tests on.While there had been some early work on howto teach general education curriculum content tostudents with severe disabilities (e.g., Downing& Demchak, 1996), the mandate for alternateacademic achievement standards aligned to thestate’s academic content standards was to becomea critical force in transforming the curriculumfor SWDs with severe disabilities. Over the nextdecade, problems with developing a coherentacademic curriculum appropriate to the diversepopulation of students with significant cogni-tive disabilities were prevalent. Among the mostsignificant contributing factors in the delay indeveloping valid and aligned alternate assess-ments under the NCLB was the belief amongspecial educators charged with developing theassessments that grade-level academic contentstandards were not relevant to these students, andthat the appropriate content for these studentsconsisted of ‘life-skills’ for independent living(Wallace, Ticha, & Gustafson, 2008). In responseto this, ED set new standards for technical ade-quacy that alternate assessments were required tomeet. Over the next decade, while the quality ofmany alternate assessments on alternate achieve-ment standards (AA-AAs) improved, by 2010,many states still did not have technically ade-quate, peer-reviewed alternate assessments. Atthe same time, research carried out in the fieldindicated that students eligible for the AA-AAScould use symbolic communication systems andcould learn to read and reason mathematically(Browder, Wakeman, Spooner, Ahlgrim-Delzell,& Algozzine, 2006; Kearns, Towles-Reeves, E.,Kleinert, H. L., & Kleinter, 2009).

Students with significant cognitive disabilitiescontinue to be inappropriately excluded from par-ticipation in alternate assessments, although onestudy of excluded students found that many couldbe using augmentative and assistive technologies

Page 44: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

28 S.C. Weigert

to speak or otherwise communicate, and thatapproximately 75% were learning sight wordsand using calculators to perform mathematicalcalculations (Kearns et al., 2009). Kearns recom-mends that to meaningfully include the popula-tion of students with significant cognitive impair-ment in state assessments, future assessmentsmust include ‘authentic’ demonstrations of skillsand knowledge aligned to the grade-level contentstandards, within the assessment context suchstudents require. Scaffolding may be required forsome within this population to enable them toshow what they know and can do. The inclusive-ness of alternate assessments for students withsignificant cognitive impairment depends largelyupon whether or not the teacher has an under-standing of how to teach a standards-based cur-riculum appropriate to the students within thisdiverse population (Kearns et al., 2009).

An additional problem with the implementa-tion of alternate assessments for students with themost significant cognitive impairments has beenthe continued difficulty of establishing appro-priately challenging achievement standards forthe full range of ability levels manifested inthis population. While states are permitted to setmore than one achievement standard in orderto make an alternate assessment more inclu-sive, teachers and schools have been cautiousabout assigning students to the more challengingachievement standard on a test used for mak-ing school accountability determinations—oftenassigning both higher- and lower-functioning stu-dents in the 1% population to the lowest achieve-ment standard, to safeguard a favorable outcomein accountability under NCLB. As a result ofthese widespread practices, the proficiency rateson alternate assessments have been much highercompared to proficiency rates of SWDs takingthe general assessment across the majority ofstates, suggesting that alternate assessments aresimply not challenging enough for the majorityof students taking them.

Currently available alternate assessments varyin the extent to which they inform parents andteachers about student academic progress. Whileadvances in the development of classroom-basedformative assessments, such as CBM, have been

widely used among students with high-incidencedisabilities since IDEA 1997, comparable forma-tive assessments have rarely if ever been availablefor teachers of students in the 1% population,though OSEP encouraged their development (butsee Phillips et al., 2009). Assessment measuresdesigned to measure a student’s level of masteryand progress on both prerequisite and grade-level content and skills at a challenging level forsuch students would greatly assist and supportinstruction of students with significant cogni-tive disabilities and help to demonstrate theircapacity for progressing in an academic curricu-lum. Fundamentally, the central problem withalternate assessments has arisen from the needto understand and develop an appropriate aca-demic curriculum for eligible students—one thatis aligned to the same grade-level content stan-dards intended for all students, yet reflects con-tent and skills at an appropriate level of difficultyfor each unique student, and is also capable ofindicating progress toward an expected level ofknowledge and skill.

IDEA 2004 and AssessmentsMeasuring Responsivenessto Intervention

The reauthorization of the IDEA 2004 (PL 108-446) reiterated the NCLB mandate for inclusionof all SWDs in State and district-wide assess-ments, and clarified that IEP goals were notto be the only form of assessment. The mostimportant assessment-related changes made inthe IDEA 2004 reauthorization were those thatpertained to requirements for eligibility determi-nations of SLD. These changes permitted statesto move away from the IQ-performance dis-crepancy model of SLD identification that hadbeen the subject of criticism in the Commissionreport. Under IDEA 2004, new assessmentswere to be developed for the purpose of assist-ing with the identification of students withSLD through the assessment of a student’sresponse to tiered, evidence-based instructionalinterventions (RTI). Such assessments were tobe of several types, those to screen students in

Page 45: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

2 U.S. Policies Supporting Inclusive Assessments for Students with Disabilities 29

basic skills (e.g., literacy or mathematics), andthose to help define and monitor responsivenessto evidence-based interventions designed to helpthe child progress in significant areas of academicweakness and to inform a decision about the needfor more intensive remedial instruction.

2007 Joint Title I IDEA RegulationsPermitting Modified AcademicAchievement Standardsin Accountability

Nearing the end of the George W. Bush presi-dential term, Secretary of Education MargaretSpellings grew concerned at the slow pacewith which states were advancing to the goalof ‘universal proficiency’ by 2014 intended bythe NCLB. States and the special educationcommunity expressed concerns that a smallgroup of SWDs who were enrolled in generaleducation classes were, due to the nature oftheir disabilities, unable to demonstrate the sameextent of academic progress on their state’sgeneral assessment by the end of the schoolyear. In 2006, the administration responded byannouncing a new assessment policy option toprovide states with the flexibility to include thescores of ‘persistently low-performing’ studentswith disabilities in alternate assessments basedon modified academic achievement standards(AA-MAS). While states had struggled toimprove teaching practices for a subgroup oflow-performing students, both those with andwithout disabilities, it was widely accepted thatsome students, because of the effects of a dis-ability, required additional time to learn contentstandards to mastery. State rationales for the newassessment included a need to develop academicachievement standards inclusive of this smallgroup of SWDs, the majority of whom wereenrolled in general education classes, yet hadcognitive impairments such as mental retarda-tion, autism, specific learning disability, or otherhealth impairment for which accommodationsalone could not ensure an ‘equal playing field.’

After the publication of the 2007 Joint IDEATitle I regulations, the field became split over

the potential consequences of permitting states toassess a portion of SWDs against a lower stan-dard of performance. Concerns of some advo-cates in the disability community centered on thepotential lowering of standards and educationaltracking of students with SLD predominantly forthe sake of making ‘Adequate Yearly Progress’toward assessment performance targets specifiedunder Title I accountability. States differed in thedegree to which they included the SLD popu-lation in the new modified assessments—someargued that the SLD population in particular hadbeen receiving below grade-level instruction andthat the availability of such a test would unneces-sarily lower academic expectations for these stu-dents, who, by definition, have normal or aboveaverage intellectual abilities. Others argued thatproblems manifested by this group of studentsin accessing general assessment items and theirinability to master the expected content in thesame time frame constituted the direct effectsof their disabilities. They maintained that thenew modified assessments, especially if used as atemporary measure, could help to illuminate theacademic progress of this group of students, tohelp ensure that they received instruction alignedto grade-level content standards.

Subsequent to substantial investments by EDin the development of the AA-MAS, many statesbegan investigating the population of SWDs whowere persistently low achieving and conducteditem accessibility and ‘cognitive lab’ studies todevelop more accessible test items. Elliot et al.,(2010) reported the results of a study in fourstates that indicated that certain modificationsmade to general test items improved the acces-sibility of the items and improved the accuracyof measurement for SWDs eligible for the AA-MAS. Modifications to regular test items did notchange the content to be tested nor did theyreduce the complexity of the test item (‘depthof knowledge’). The modifications made to testitems were straightforward—including removingunnecessary words, simplification of language,addition of pictures and graphics, breaking longparagraphs into several, and bolding of keywords. The study showed that an additional boostin student performance on the modified items

Page 46: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

30 S.C. Weigert

occurred when reading support (audio versions oftext, except for the key vocabulary words beingtested) was added to item modification. Resultsalso showed a large boost in mathematics per-formance for the eligible group (Elliott et al.,2010).

‘Race To The Top’ AssessmentInitiatives

In 2009, during the initial months of the Obamaadministration, a ‘new generation’ of standardsand assessments was envisioned by which pol-icymakers hoped all American students wouldbecome more competitive in the global market-place. Many of the goals of the RTT assessmentinitiative reiterated those of the Goals 2000 erain this research. The RTT initiative endeavors tocreate assessments aligned to ‘fewer, higher, andclearer’ state standards held in common by moststates, as well as becoming more inclusive andinformative for students who typically perform atlower achievement standards, including SWDs.

In the words of Secretary Duncan,

The majority of students with disabilities takethe regular state tests based on the state′s stan-dards for all students, with appropriate accom-modations to ensure that their results are valid.Students with the most significant cognitive dis-abilities can take alternate tests based on alternatestandards and other students with disabilities maytake an alternate test based on modified standards.(Duncan, 2010).

Our proposal would continue to hold schoolsaccountable for teaching students with disabilitiesbut will also reward them for increasing studentlearning. . . The Department plans to support con-sortia of states, who will design better assessmentsfor the purposes of both measuring student growthand providing feedback to inform teaching andlearning in the classroom. All students will ben-efit from these tests, but the tests are especiallyimportant for students with disabilities. (Duncan,2010).

Policies supportive of inclusive assessmentsunder the new administration promise to supportthe development of a new generation of assess-ments aligned to a range of achievement levelsrequired for inclusion of all SWDs and to includemeasures of growth for students who make

progress at different rates. The history of inclu-sion in standards-based assessments shows thatthoughtful inclusion decisions promote meaning-ful academic progress for SWDs. In the era ofGoals 2000, 32% of SWDs graduated high schoolwith a regular diploma, compared to nearly 60%of students in 2007. In 1987, only one in sevenSWDs enrolled in postsecondary education, com-pared to a third of students in 2007 (Duncan,2010). The challenge for growth-based account-ability is to ensure that conceptions of growthare academically meaningful, in addition to being‘statistically’ meaningful. To demonstrate con-sequential validity, the new assessments mustprovide results that specify clearly the knowl-edge and skills at each grade level that the stu-dent has attained and those requiring additionalfocus or individualized instructional intervention.Promising research and development since 2008has resulted in empirically based assessmenttools which can help to specify the accessibilityof test items and to separate construct-relevantfrom irrelevant or extraneous cognitive process-ing demands of test items (Elliott, Kurz, Beddow,& Frey, 2009). To maximize inclusion of SWDsin the next generation of state assessments, arenewed commitment will be required by thedisability community, to build upon the lessonslearned over the history of inclusive assessment,and to methodically investigate and empiricallyestablish expectations for their academic achieve-ment and progress. Maximal inclusiveness inassessment will always be a function of success-ful inclusion in the general education curriculum,and of empirically-derived expectations for aca-demic achievement and progress.

Acknowledgments The author gratefully acknowledgesthe contribution and support of Lou Danielson, who pro-vided his unique historical perspective on assessment-related events during the first 40 years of OSEP.

References

Browder, D., Wakeman, S., & Flowers, C. (2009). Whichcame first, the curriculum or the assessment? InW. D. Shafer & R. W. Lissitz (Eds.), Alternate assess-ments based on alternate achievement standards: pol-icy, practice, and potential. Baltimore, MD: Paul HBrookes Publishing Co.

Page 47: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

2 U.S. Policies Supporting Inclusive Assessments for Students with Disabilities 31

Browder, D., Wakeman, S., Spooner, F., Ahlgrim-Delzell,L., & Algozzine, B. (2006). Research on readinginstruction for individuals with significant cognitivedisabilities. Exceptional Children, 72, 392–408.

Brown v. Board of Education, 347 U.S. 483 (1954).Council for Exceptional Children. (1992). Statement

prepared for testimony to the house subcommitteeon elementary, secondary, and vocational education.Reston, VA: Author.

Council for Exceptional Children. (1992). Statement pre-pared for testimony to the technical advisory panel onuniform national rules for NAEP testing for studentswith disabilities. Reston, VA: Author.

Downing, J.E., & Demchak, M. (1996). First steps:Determining individual abilities and how best to sup-port students. In J. E. Downing (Ed.), Including stu-dents with severe and multiple disabilities in typicalclassrooms: Practical strategies for teachers (pp. 35–61). Baltimore, MD: Paul H. Brookes Publishing Co.

Duncan, A. (2010). Keeping the promise to all America’schildren. Remarks made to the council for exceptionalchildren, April 21, Arlington, VA, www.ed.gov

Elementary and Secondary Education Act of 1965, PL 89-10 (1965).

Elliott, S. N., Kurz, A., Beddow, P., & Frey, J. (2009,February 24). Cognitive load theory: Instruction-based research with applications for designing tests.Paper presented at the annual convention of thenational association of school psychologists, Boston.

Elliott, S. N., Kettler, R. J., Beddow, P. A., Kurz, A.,Compton, E., McGrath, D., et al. (2010). Effects ofusing modified items to test students with persis-tent academic difficulties. Exceptional Children, 76(4),475–495.

Fuchs, L. S., & Fuchs, D. (2004). Determining adequateyearly progress from kindergarten through grade 6with curriculum-based measurement. Assessment forEffective Instruction.

Ginsberg, A. L., Noell, J., & Plisko, V. W. (1988).Lessons from the wall chart. Educational Evaluationand Policy Analysis, 10(1), 1–10.

Hehir, T. (2005). New directions in special educa-tion: Eliminating Ableism in policy and practice.Cambridge, MA: Harvard Education Press.

Kearns, J. F., Towles-Reeves, E., Kleinert, H. L., &Kleinter, J. (2009). Who are the children who takealternate achievement standards assessments? In W.D. Schafer & R. W. Lissitz (Eds.), Alternate assess-ments based on alternate achievement standards:Policy, practice, and potential. Baltimore, MD: PaulH Brookes Publishing Co.

Koehler, P. D. (1992). Inclusion and adaptation inassessment of special needs students in Arizona. InM. L. Thurlow & J. E. Yesseldyke (Eds.) (1993),Can “all” ever really mean “all” in defining andassessing student outcomes? (Synthesis Report No. 5).Minneapolis, MN: University of Minnesota, NationalCenter on Educational Outcomes.

McGrew, K. S., Thurlow, M. L., Shriner, J. G., & Spiegel,A. N. (1992). Inclusion of students with disabili-ties in national and state data-collection programs

(Technical Report 2). Minneapolis, MN: Universityof Minnesota, National Center on EducationalOutcomes.

National Academy of Education. (1998). Goals 2000:Reforming education to improve student achievement.Washington, DC: National Academy of Education.

National Commission on Excellence in Education.(1983). A Nation at risk: The imperative for educa-tional reform. Washington, DC: Government PrintingOffice.

Nolet, V., & McLaughlin, M. J. (2000). Accessing the gen-eral curriculum: Including students with disabilitiesin standards-based reform. Thousand Oaks: CorwinPress.

Phillips, S. E. (1993). Testing accommodations for dis-abled students. Education Law Reports, 80(9).

Phillips, S. E. (1994). High stakes testing accom-modations: Validity versus disabled rights. AppliedMeasurement in Education, 7(2), 93–120. LawrenceErlbaum, Associates, Inc.

Phillips, S. E. (2002). Legal issues affecting special pop-ulations in large-scale assessment programs. In G.Tindal & Thomas M. Haladyna (Eds.), Large-scaleassessment programs for all students: Validity, tech-nical adequacy, and implementation. Mahwah, NJ:Lawrence Erlbaum Associates.

Phillips, G. W., Danielson, L., & Wright, L. (2009).Psychometric advances in alternate assessments.Washington, DC: American Institute for Research.

President’s Commission on Excellence in SpecialEducation. (2002). A new Era: Revitalizing specialeducation for children and their families. Washington,DC: Author

Quenemoen, R. (2009). The long and winding road ofalternate assessments: Where we started, where we arenow, and the road ahead. In William D. Schafer &Robert W. Lissitz (Eds.), Alternate assessments basedon alternate achievement standards: Policy, prac-tice, and potential. Baltimore, MD: Paul H BrookesPublishing Co.

Russo, C., & Osborne, A. (Eds.). (2008). Essential con-cepts and school-based cases in special education law.Thousand Oaks: Corwin Press.

Shriner, J., & Thurlow, M. L. (1993). State special educa-tion outcomes: A report on state Activities at the endof the century. Minneapolis, MN: National Center onEducational Outcomes, University of Minnesota.

Simon, M., Karasoff, P., & Smith, A. (1991). Effectivepractices for inclusion programs: A technical assis-tance planning guide. Unpublished paper supportedby U.S. Department of Education Cooperative Agree-ments #GOO87C3056-91 and #GOO87C3058-91.

Skirtic, T. M. (1991). The special education paradox:Equity as the way to excellence. Harvard EducationalReview, 61(2), 148–206.

Southeastern Community College v. Davis, 442 U.S. 397,413 (1979).

Stecker, P. M. (2005). Monitoring student progress in indi-vidualized educational programs using curriculum-based measurement. In U.S. Department of Education,Office of Special Education: IDEAS that Work:

Page 48: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

32 S.C. Weigert

Toolkit on teaching and assessing students withdisabilities. National Center on Student ProgressMonitoring.

The White House. (1990). National educational goals,office of the press secretary. Washington, DC: Author.

Thompson, S. J., & Thurlow, M. L. (2000). State alter-nate assessments: Status as IDEA alternate assess-ment requirements take effect (Synthesis Report 35).Minneapolis, MN: University of Minnesota, NationalCenter on Educational Outcomes.

Thurlow, M. L., & Yesseldyke, J. E. (1993). Can “all”ever really mean “all” in defining and assessing stu-dent outcomes? (Synthesis Report No.5). Minneapolis,MN: University of Minnesota, National Center onEducational Outcomes.

Tindal, G. (2009). Reflections on the alternate assessmentin Oregon. In W. D. Shafer, & R. W. Lissitz (Eds.),Alternate assessments based on alternate achievementstandards: Policy, practice, and potential. Baltimore,MD: Paul H Brookes Publishing Co.

U.S. Department of Education. (1993). The reautho-rization of the elementary and secondary educa-tion act, executive summary. Washington, DC: U.S.Department of Education.

U.S. Department of Education. (1997). Elementaryand secondary education. Guidance on standards,assessments and accountability. Retrieved October2, 2009 from http://www2.ed.gov/policy/elsec/guid/standardsassessment/guidance_pg4.html#disabilities3

U.S. Department of Education. (1998). Goals 2000:Reforming education to improve student achievement,April 30, 1998. Washington, DC: U.S. Department ofEducation.

U.S. Department of Education. (2000). Summary guid-ance on the inclusion requirement for Title I finalassessments. Retrieved March 13, 2010 from http://www2.ed.gov/policy/elsec/guid/inclusion.html

U.S. Department of Education. (2003). Standards andassessments non-regulatory guidance March 10, 2003.Washington, DC: Author.

Wallace, T., Ticha, R., & Gustafson, K. (2008). Study ofgeneral outcome measurement (GOMs) in reading forstudents with significant cognitive disabilities: Year 1(RIPM Technical Report #27). Minneapolis, MN:Research Institute on Progress Monitoring, Universityof Minnesota.

Ysseldyke, J., & Thurlow, M. L. (1994a). Guidelines forinclusion of students with disabilities in large-Scaleassessments (Policy Directions # 1). Minneapolis:National Center on Educational Outcomes, Universityof Minnesota.

Ysseldyke, J., Thurlow, M. L., McGrew, K. S., & Shriner,G. J. (1994b). Recommendations for making decisionsabout the participation of students with disabilities instatewide assessment programs: A report on a work-ing conference to develop guidelines for statewideassessments and students with disabilities (SynthesisReport 15). Minneapolis, MN: National Center onEducational Outcomes.

Zigmond, N., & Kloo, A. (2009). The ‘two percent stu-dents’: Considerations and consequences of eligibil-ity decisions. Peabody Journal of Education, 84(4),478–495.

Page 49: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

3U.S. Legal Issues in EducationalTesting of Special Populations

S.E. Phillips

Introduction

Legal challenges to educational testing programshave focused primarily on the issues of adverseimpact, parental rights, testing irregularities, non-standard test administrations, and testing Englishlanguage learners (ELLs).1 However, once a casegoes to trial, issues of validity, reliability, passingstandards, and adherence to other professionalstandards may be raised.2 This chapter focuseson the legal, psychometric, and policy issuesrelated to educational testing of special popula-tions. Although prior federal cases and profes-sional standards provide guidance, some toughpolicy decisions remain.

Nonstandard Test Administrations

State-mandated tests are typically administeredunder standard conditions that are the same forall students. The purpose of standard test admin-istration conditions is to produce test scores thathave comparable interpretations across students,

S.E. Phillips (�)Consultant, Mesa, AZ 85205, USAe-mail: [email protected]

1 Portions of this chapter were adapted from Phillips(2010), Phillips (2006), and Phillips (2002).2 The APA/AERA/NCME 1999 Standards forEducational and Psychological Testing. In this chap-ter, the document is referred to as the Test Standards, andindividual Standards are referenced by number.

classrooms, schools, and districts. However, testadministrators routinely receive requests for test-ing alterations for students with disabilities andELLs. For example, a reader, calculator, wordprocessor, extended time, or a separate roommay be requested for students with reading,mathematics, writing, or other learning dis-abilities. Tests administered under such alteredconditions are referred to as nonstandard testadministrations.

The term testing accommodation refers toa nonstandard test administration that providesaccess for persons with disabilities while preserv-ing the content and skills intended to be mea-sured and producing comparable scores. Accessinvolves the removal of irrelevant factors thatmay interfere with valid measurement of theintended content and skills. The validity of testscores as accurate measures of the intended con-tent and skills is improved when the effects ofirrelevant factors are removed but is reducedwhen relevant factors are altered or eliminated.When the intended inference from a test score isthe attainment of a specified domain of contentand skills, nonstandard test administrations thatprovide access to direct assistance with the testedcontent and skills may increase test scores, butthose scores will lack validity for their intendedinterpretation.

The domain of content and skills intendedto be measured by a test is referred to as thetested construct. For achievement tests, the testedconstruct (e.g., reading, mathematics) is definedby the content standards, objectives, and test

33S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4_3, © Springer Science+Business Media, LLC 2011

Page 50: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

34 S.E. Phillips

specifications used to develop the test. Typically,the test specifications for an achievement testconsist of a two-dimensional matrix of content bycognitive difficulty with the corresponding num-ber of included test items given in each cell.A nonstandard test administration preserves thecontent and skills intended to be measured (con-struct preservation) when the resulting scores canbe interpreted as indicators of attainment of thesame domain of content and skills as specified fora standard test administration. Construct preser-vation means that the same content has beenmeasured at the same level of cognitive difficulty.

In addition to construct preservation, scorecomparability is an essential characteristic of atesting accommodation. Comparable test scoresare equivalent and interchangeable and have thesame intrinsic meaning and interpretation. Whentest scores from standard and nonstandard testadministrations are comparable, users are able tomake the same inferences about the degree ofattainment of the content and skills in the con-struct domain from identical scores. In essence,students who obtain comparable scores fromstandard and nonstandard test administrationsknow and can do the same things.

The evidence used to evaluate constructpreservation and score comparability for non-standard test administrations must be related tothe purpose of the test for the intended audi-ence. Such evidence may be logical (judgmen-tal) and/or empirical evidence related to testscore validity, reliability, and interpretation (TestStandards, p. 10; Standards 4.10 and 9.7). Forachievement tests, construct preservation is typ-ically evaluated via content validity evidence.Content validity evidence is obtained by askingcontent experts (usually teachers of the testedsubject) to judge whether each test item is anappropriate measure of its designated contentstandard or objective for the intended grade level(Test Standards, pp. 10–11; Standards 1.2, 1.6& 1.7). Positive content validity judgments sup-port an inference that the test is measuring theintended construct at an appropriate level of cog-nitive difficulty.

For a nonstandard test administration, evi-dence of construct preservation can be obtained

by asking content experts to judge the fidelityof the altered items and/or testing conditions tothe intended domain of content and cognitive dif-ficulty applicable to the test administered understandard conditions. Information from relevanttest statutes, administrative regulations, minutesof board meetings, item-writing guides, and testadministration manuals for standard test admin-istrations may also provide helpful guidance forcontent experts’ judgments of preservation of theintended construct.

If content experts provide content validityjudgments that differ for a standard versus a non-standard administration, these judgments are log-ical evidence that the tested construct is differentfor the two test administrations and that numer-ically equivalent scores for the two test adminis-trations should be interpreted as indicating attain-ment of qualitatively different content and skillsand/or cognitive difficulty. Alternatively, if con-tent experts provide content validity judgmentsindicating that a nonstandard test administrationis measuring the same content but at a dif-ferent level of cognitive difficulty, comparablescores may be obtained by statistically linkingthe test scores from the nonstandard adminis-tration to those from a standard test adminis-tration (Standard 4.10). Empirical evidence ofthe reliability of standard and nonstandard testadministrations and comparisons between stu-dents with disabilities and low-achieving regulareducation students who have tested under stan-dard and nonstandard conditions may also beuseful for evaluating construct preservation andscore comparability.

Unfortunately, the term testing accommoda-tion has been used by some educators and policy-makers to refer to any testing alteration providedto students with disabilities or ELLs during testadministration. This is unfortunate because somealterations in testing conditions do not fit thelegal or psychometric definitions of an accom-modation. These distinctions are discussed inmore detail below. Relevant federal legislationis reviewed followed by the analysis of a seriesof issues related to the decisions state and dis-trict testing staff must make to create a legallyand psychometrically defensible nonstandard test

Page 51: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

3 U.S. Legal Issues in Educational Testing of Special Populations 35

administrations policy. These issues highlightthe trade-offs between competing policy goalsadvocated by different constituencies. In somecases, these competing policy goals cannot all beachieved simultaneously, so policymakers mustprioritize goals and make difficult decisions thatare consistent with the purpose(s) of the state ordistrict tests, the state or district content stan-dards, and the intended score interpretations.

Federal Legislation

There are three major federal statutes with spe-cific provisions for persons with disabilities thatare relevant to decisions about nonstandard testadministrations. They include Section 504 of theRehabilitation Act (1973), the Americans withDisabilities Act (ADA, 1990), and the Individualswith Disabilities Education Act (IDEA, 1991).Congress passed these statutes to correct seriousabuses brought to its attention during hearingsabout the treatment of people with disabilities.For example, the IDEA was intended to provideeducational services to students with disabilitieswho had been excluded, ignored, or inappropri-ately institutionalized by the educational system.Section 504 addressed discrimination by recipi-ents of federal funding who, for example, refusedto hire persons with disabilities even when thedisability was unrelated to the skills required forthe job. The ADA extended the protection againstdiscrimination due to a disability to private enti-ties. When Congress passed disability legislation,it was particularly concerned about mandatingbarrier-free access to facilities open to the pub-lic. However, all three federal disability laws alsoincluded provisions relevant to cognitive skillstesting.

Section 504 of the Rehabilitation Act

Section 504 provides that no otherwise qual-ified disabled person shall be denied partic-ipation in or the benefits of any federallyfunded program solely due to the person’s dis-ability. In Southeastern Community College v.

Davis (1979), the U.S. Supreme Court definedotherwise qualified as a person who, despite thedisability, can meet all educational or employ-ment requirements. In that case, the Court heldthat the college was not required to modify itsnursing program to exempt a profoundly hearing-impaired applicant from clinical training. TheCourt was persuaded that the applicant was nototherwise qualified because she would be unableto communicate effectively with all patients,might misunderstand a doctor’s verbal commandsin an emergency when time is of the essence, andwould not be able to function in a surgical envi-ronment in which required facial masks wouldmake lip reading impossible.

The Davis decision clearly indicated that aneducational institution is not required to loweror substantially modify its standards to accom-modate a disabled person, and it is not requiredto disregard the disability when evaluating a per-son’s fitness for a particular educational program.The Court stated

Section 504 by its terms does not compel edu-cational institutions to disregard the disabilitiesof [disabled] individuals or to make substantialmodifications in their programs to allow disabledpersons to participate. . . . Section 504 indicat[es]only that mere possession of a [disability] is nota permissible ground for assuming an inability tofunction in a particular context. (p. 413, 405)

Thus, a critical aspect in evaluating arequested nonstandard test administration turnson the interpretation of “substantial modifica-tion of standards.” In a diploma testing caseinvolving students with disabilities, Brookhartv. Illinois State Bd. of Educ. (1979), the courtlisted Braille, large print, and testing in a separateroom as accommodations mandated by Section504. However, paraphrasing the Davis decision,the Brookhart court provided the following addi-tional interpretive guidance

Altering the content of the [test] to accommo-date an individual’s inability to learn the testedmaterial because of his [disability] would be a“substantial modification” as well as a “perver-sion” of the diploma requirement. A student who isunable to learn because of his [disability] is surelynot an individual who is qualified in spite of his[disability]. (p. 184, [emphasis added])

Page 52: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

36 S.E. Phillips

This language in the Brookhart opinion indi-cated that the federal courts were willing todraw a line between format changes (reasonableaccommodations) and substantive changes in testquestions (substantial modifications).

The meaning of otherwise qualified was fur-ther explained in Anderson v. Banks (1982). Inthat case, students with severe cognitive disabil-ities in a Georgia school district, who had notbeen taught the skills tested on a mandatorygraduation test, were denied diplomas. The courtheld that when the disability is extraneous to theskills tested, the person is otherwise qualified;but when the disability itself prevents the personfrom demonstrating the required skills, the personis not otherwise qualified. Using this definitionof otherwise qualified, the Anderson court rea-soned that the special education students who hadbeen denied diplomas were unable to benefit fromgeneral education because of their disabilities.The court further reasoned that this should notprevent the district from establishing academicstandards for receipt of a diploma. The fact thatsuch standards had a disparate impact on studentswith disabilities did not render the graduation testunlawful in the court’s view. The court stated

[I]f the [disability] is extraneous to the activitysought to be engaged in, the [person with a disabil-ity] is “otherwise qualified.” ... [But] if the [disabil-ity] itself prevents the individual from participationin an activity program, the individual is not “oth-erwise qualified.” ... To suggest that ... any stan-dard or requirement which has a disparate effecton [persons with disabilities] is presumed unlaw-ful is farfetched. The repeated use of the word“appropriate” in the regulations suggests that dif-ferent standards for [persons with disabilities] arenot envisioned by the regulations. (pp. 510–511,emphasis in original)

In Bd. of Educ. of Northport v. Ambach (1982),a New York Supreme Court justice concluded,“The statute merely requires even-handed treat-ment of the [disabled and nondisabled], ratherthan extraordinary action to favor the [disabled]”(p. 684).

Americans with Disabilities Act (ADA)

The ADA, which uses the phrase qualified indi-vidual with a disability in place of otherwise

qualified [disabled] individual, requires personswith disabilities to be given reasonable accom-modations. Consistent with Section 504 cases,the legal requirement to provide reasonableaccommodations for cognitive tests refers onlyto those variations in standard testing conditionsthat compensate for factors that are extraneousto the academic skills being assessed. Consistentwith Section 504 case law, the ADA does notrequire nonstandard test administrations that sub-stantially modify the tested skills.

Individuals with Disabilities EducationAct (IDEA)

Although the IDEA and its predecessor, theEducation for All Handicapped Children Act(EAHCA), clearly mandated specialized andindividualized education for students with dis-abilities, the federal courts have held that federallaw does not guarantee any particular educationaloutcome. Thus, under federal law, students withdisabilities are guaranteed access to a free, appro-priate, public education that meets their needsin the least restrictive environment, but not spe-cific results (Bd. of Educ. v. Rowley, 1982) ora high school diploma (Brookhart, 1983). TheBrookhart court also held that because studentswere required to earn a specified number of cred-its, complete certain courses mandated by thestate, and pass the graduation test, the graduationtest was “not the sole criterion for graduation”(p. 183).

A student with a disability who has receivedappropriate educational services according toan individualized education program (IEP),3 butwho is unable to master the skills tested on agraduation test, may be denied a high schooldiploma without violating the IDEA. However,when appropriate, federal regulations do requiregood faith efforts by the educational agency

3 An individualized education program (IEP) is a writ-ten document constructed by a team of professionals toaddress the educational needs of a special education stu-dent. The IDEA mandates a separate IEP for each specialeducation student and includes procedural requirementsfor its development and implementation.

Page 53: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

3 U.S. Legal Issues in Educational Testing of Special Populations 37

to teach the tested skills to students with dis-abilities. Federal case precedents also indicatethat an IDEA challenge to a graduation testingrequirement for students with disabilities will beunlikely to succeed if professional testing stan-dards have been satisfied.

Professional Standards

Three national professional organizations, theAmerican Educational Research Association(AERA), the American PsychologicalAssociation (APA), and the National Councilon Measurement in Education (NCME) havecollaborated to produce consensus Standards forEducational and Psychological Testing (1999),referred to in this chapter as the Test Standards.Courts routinely recognize the Test Standards asan appropriate source of guidance and support forprofessional opinions in a testing case. Althoughthe Test Standards are aspirational rather thanprescriptive, judges tend to be skeptical of expertopinions that seriously conflict with reason-able interpretations of the Test Standards. Themost appropriate edition of the Test Standardsfor evaluating a specific test is the edition ineffect at the time the test was constructed andadministered. Most of the cases discussed in thischapter involved tests constructed and admin-istered when the 1985 or 1999 editions werecurrent. Unless otherwise indicated, referencesto the Test Standards are to the most recent1999 edition.

Introductory material in the 1999 TestStandards supports the exercise of professionaljudgment when evaluating tests:

Evaluating the acceptability of a test or test applica-tion does not rest on the literal satisfaction of everystandard in this document, and acceptability can-not be determined by using a checklist. Specificcircumstances affect the importance of individ-ual standards, and individual standards should notbe considered in isolation. Therefore, evaluatingacceptability involves (a) professional judgmentthat is based on a knowledge of behavioral sci-ence, psychometrics, and the community standardsin the professional field to which the tests apply;(b) the degree to which the intent of the standardhas been satisfied by the test developer and user;

(c) the alternatives that are readily available; and(d) research and experiential evidence regardingfeasibility of meeting the standard. (p. 4)

A construct is a skill, such as reading com-prehension or mathematics computation, that ismeasured by a test. The Test Standards dis-tinguish between factors intended to be mea-sured by a test (construct-relevant factors) andfactors extraneous to the construct intended tobe measured (construct-irrelevant factors). Whendetermining the appropriateness of nonstandardadministrations, the Test Standards emphasize theimportance of considering the construct valid-ity of the inference from the test score the userwishes to make. The Test Standards indicate thatthe knowledge and skills intended to be mea-sured (construct-relevant factors) should be pre-served, but construct-irrelevant factors should beeliminated to the extent feasible. Standard 10.1states

In testing individuals with disabilities, test develop-ers, test administrators, and test users should takesteps to ensure that the test score inferences accu-rately reflect the intended construct [knowledgeand skills] rather than any disabilities and theirassociated characteristics extraneous to the intentof the measurement. (p. 106)

The Test Standards distinguish between com-parable and noncomparable scores in the con-text of determining when it is appropriate toplace an identifying notation (flag) on testscores obtained from nonstandard administra-tions. Standard 10.11 states

When there is credible evidence of score compara-bility across regular and modified administrations,no flag should be attached to a score. When suchevidence is lacking, specific information about thenature of the modification should be provided, ifpermitted by law, to assist test users properly tointerpret and act on test scores.

Comment: . . . If a score from a modifiedadministration is comparable to a score from a[standard] administration, there is no need for aflag. Similarly, if a modification is provided forwhich there is no reasonable basis for believing thatthe modification would affect score comparability,there is no need for a flag. . . . [I]f a nonstandardadministration is to be reported because evidencedoes not exist to support score comparability, thenthis report should avoid referencing the existenceor nature of the [student’s] disability and should

Page 54: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

38 S.E. Phillips

instead report only the nature of the [modification]provided, such as extended time for testing, the useof a reader, or the use of a tape recorder. (p. 108)

Terminology

In the Test Standards, the terms accommoda-tion and modification are used interchangeably torefer to nonstandard test administrations. But asindicated in Standard 10.11, the Test Standardsdo distinguish between nonstandard test adminis-trations that produce comparable scores and thosethat do not (or for which evidence of compara-bility is lacking). For convenience in distinguish-ing between these two types of nonstandard testadministrations, this author has urged testing pro-grams to use the term accommodation for theformer and modification for the latter.

Testing programs and test users routinelymake decisions about whether scores obtainedfrom nonstandard test administrations preservethe construct(s) intended to be tested and shouldbe interpreted as comparable to scores obtainedfrom standard test administrations. Many havefound it helpful in communicating with students,parents, educators, professionals, policymakers,and the public to have different words to describenonstandard test administrations that they judgedo and do not result in comparable scores. Thisdistinction has allowed them to explain moreclearly to others why some scores count for sat-isfying requirements such as graduation testingwhile others do not.

The recommendation to use the term accom-modation as a shorthand referent for nonstan-dard test administrations that produce compara-ble scores and the term modification as a short-hand referent for nonstandard test administrationsthat result in noncomparable scores is consistentwith the plain English meaning of these terms.According to the American Heritage Dictionary,accommodate means to adapt or adjust whilemodify means to change in form or charac-ter. Making an adjustment for a paraplegic stu-dent who needs a taller table to make roomfor a wheelchair during the administration of acognitive test is typically judged by psychometri-cians to produce a comparable score. On the other

hand, changing the construct of reading compre-hension to listening comprehension by providinga reader for a reading test is generally viewed bypsychometricians as producing noncomparablescores. Thus, the use of the term accommodationfor the former nonstandard test administrationand modification for the latter is consistent withprior case law and with the English meaningsof those words as applied to the psychometricinterpretations of score comparability describedpreviously. This differential usage of the termsaccommodation and modification is employedthroughout this chapter.

Tension Between Accessibilityand Construct Preservation/ScoreComparability

As previously indicated, it is unfortunate thatsome advocates for students with disabilities haveurged test administrators to classify all non-standard test administrations as accommodationsand to treat the resulting scores as comparableto scores from standard administrations. Suchactions are not consistent with the legal require-ment for the provision of reasonable accom-modations. According to its legal definition, areasonable accommodation must• be needed by a disabled person to access

the test• while ensuring construct preservation and

score comparability.A testing condition variation needed to access

the test means that the student with a disabil-ity is unable to meaningfully respond to thetest questions without it. The phrase needed foraccess requires more than simply providing assis-tance that helps the student with a disability toobtain a higher score; the assistance must be nec-essary for participation in the testing program.The phrase ensuring construct preservation andscore comparability means that the change intest administration conditions produces scoresthat are free from extraneous (content irrele-vant) factors while preserving the knowledgeand skills intended to be measured and pro-ducing scores that have the same interpretation

Page 55: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

3 U.S. Legal Issues in Educational Testing of Special Populations 39

and intrinsic meaning as scores from standardtest administrations. Language from the test-ing accommodations cases discussed earlier andthe Test Standards support the interpretation ofreasonable accommodations as providing acces-sibility for individuals with disabilities withoutcompromising construct interpretation or scorecomparability (Davis, 1979; Brookhart, 1979;Ambach, 1982; Section 504 Regulations, 1997and ADA Regulations, 1997; Rene, 2001 [dis-cussed below]).

Labeling Nonstandard TestAdministrations

The following questions should be consideredwhen labeling a nonstandard test administrationas an accommodation or a modification:1. Will the test scores obtained under altered

testing conditions have a different interpreta-tion than scores obtained under standard testadministration conditions? Are the test scoresfrom standard and nonstandard administra-tions comparable?

2. Is the alteration in test format or administra-tion conditions part of the skill or knowledgebeing tested? Is it construct relevant?

3. Would a nonstandard test administration alsoassist nondisabled, low-achieving students todemonstrate what they know and can do with-out changing the interpretation of their testscores?

4. Can valid and reliable decision procedures andappeals be established for determining whichstudents will be allowed specific nonstandardtest administrations?

5. Do students with disabilities included in regu-lar education classrooms have any responsibil-ity for adapting to standard testing conditions(and foregoing nonstandard test administra-tions) when feasible?4

Some observers have argued that any non-standard test administration that helps a studentwith a disability to achieve a higher score is a

4 Adapted from Phillips (1993, p. 27, 1994, p. 104).

reasonable accommodation that should be treatedthe same as scores obtained by students withcommon accessories such as eyeglasses (Fraser& Fields, 1999). Unfortunately, this view fails todistinguish between variations for extraneous fac-tors and variations that are closely related to thecognitive skill being measured. Eyeglasses area reasonable accommodation for a mathematicsestimation test because vision is not part of theskill the test is intended to measure. Alternatively,although a calculator on the same mathematicsestimation test might assist a student with a learn-ing disability to achieve a higher score, it wouldbe a modification because its use changes the skillbeing measured from application of rounding andapproximation techniques to pushing the correctbuttons on the calculator.5

Some observers have also questioned policydecisions that exclude modified test administra-tions from high-stakes decisions. Yet the lackof availability of such scores for use in makinghigh-stakes decisions is appropriate when thosescores have different interpretations than scoresobtained from standard or accommodated admin-istrations. For example, it would be misleadingand unfair to treat a modified test score froma reading comprehension test obtained with areader (a measure of listening comprehension)as having the same meaning as scores obtainedfrom standard administrations where studentsread the test material silently by themselves.Alternatively, when nonstandard test administra-tions are limited to only those testing variationsthat maintain test score comparability and pre-serve the intended construct (i.e., accommoda-tions), the resulting scores should be interpretedthe same as scores obtained from standard admin-istrations.

5 Prior to becoming a psychometrician and attorney, theauthor was a mathematics educator. Consequently, manyof the content examples of testing variations analyzed inthis chapter using logical evidence are drawn from math-ematics. Note also that the increased efficiency rationalefor allowing students with learning disabilities to use cal-culators on a mathematics computation test is equallyapplicable to low-achieving regular education studentsand would be universally inappropriate if the content stan-dards intended to be measured specified application ofpaper-and-pencil computational algorithms.

Page 56: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

40 S.E. Phillips

Skill Substitution

When a student is given a modification thatchanges the skill intended to be measured, thestudent has been permitted to substitute a differ-ent skill for the one tested. The following sectionsdiscuss examples of skill substitution related toextended time, readers, and calculators.

Extended Time. Extended time is a particularlydifficult nonstandard test administration to clas-sify. Deciding whether it is an accommodation ormodification depends on the degree of intentionaland unintentional speededness of the test. Forexample, if the purpose of the test is to measurehow quickly a student can copy a pattern of num-bers and letters, speed is part of the skill beingmeasured. In this situation, an extended timeadministration would change the skill being mea-sured from accuracy and speed to accuracy only.This change in the skill being measured wouldchange the interpretation of the resulting testscore, so in this case the extended time admin-istration should be classified as a modification.Examples of speeded tests commonly adminis-tered in elementary schools are math facts testsfor which the student is given a sheet of additionfacts (e.g., 5 + 7 = ___) or multiplication facts(e.g., 9 × 8 = ___) with a fixed amount of timeto correctly answer as many items as possible.

Alternatively, an achievement test may inten-tionally be designed to be a power test withgenerous time limits. The purpose may be tomeasure pure academic knowledge irrespectiveof the time taken to demonstrate that knowledge.In this case, extended time may be judged to be anaccommodation of a factor extraneous to the skillbeing measured. Even so, if a portion of the statetest is norm referenced (e.g., Iowa Tests of BasicSkills (ITBS), American College Testing (ACT)),the corresponding normative score informationwould be valid only for scores obtained under thesame timing conditions as the norm group.

Some educators argue that all achievementtests should be power tests with no speeded-ness. But there are two reasons why allowingunlimited time may be counterproductive. First,if given unlimited time, some students will con-tinue working on the test well beyond the point of

productivity. This behavior wastes instructionaltime and may unnecessarily tire and frustrate thestudent. Second, one of the goals of educationis to help students automate skills so that theyare readily available when needed. Thus, studentswho take 4 h to complete a task that most studentscan complete in 1 h may not have the same levelof skill development. Assuming all other rele-vant factors equal, it is unlikely that an employerwould be indifferent between these two potentialapplicants. Huesman and Frisbie (2000) demon-strated that both students with learning disabil-ities and students in regular education signifi-cantly improved their scores on a standardized,norm-referenced achievement test when givenextended time.

An extended time alteration may be com-bined with other testing condition variations suchas segmented administration with extended restbreaks. This combination of changes in standardtesting conditions could result in a 1-h test, typ-ically administered in a single sitting with nobreaks, being administered over a period of 4 hspread across one school week. Such a non-standard administration is often more expensiveto schedule, and it may be difficult to ensurethat the student has not obtained inappropriatecoaching or assistance between testing sessions.In addition, in real-world college, professionallicensure, and employment testing contexts, usersoften interpret scores from such a nonstandardtest administration differently because they judgeskill automaticity, efficiency, and speed of workto be construct relevant. Thus, it is not clear thatstudents with disabilities are well-served by poli-cies that permit major time extensions with seg-mentation for school tests when it is unlikely suchoptions will be available or judged to producecomparable scores in postsecondary contexts.

To address this concern, the combined effectsof taking four times as long in actual time andcompleting the task in much smaller increments,possibly with review or coaching in between,must be considered. Because most students finda final exam over all the material covered in acourse much more difficult than a series of testsover individual units, one might conclude that themeasurement of multiple skills at a single point in

Page 57: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

3 U.S. Legal Issues in Educational Testing of Special Populations 41

time is different than measurement of a series ofskills at different points in time. That is, extend-ing testing time across multiple days in smallsegments appears more consistent with the char-acteristics of a modification. Thus, if additionaltime is permitted to compensate for an extraneousfactor such as reading Braille, the time extensionshould have reasonable limits within a single day(and preferably within a single block of time) tobe labeled a reasonable accommodation.

Readers. Another nonstandard test adminis-tration that is difficult to classify is a readerfor a mathematics test. Again, classifying thistesting variation as an accommodation or mod-ification depends on the purpose of the testand may involve competing educational goals.On the one hand, curriculum specialists mayargue for more authentic mathematics tasks (real-world problems) that require students to read textand graphs, apply mathematical reasoning, andexplain their solutions in writing. On the otherhand, advocates for students with learning dis-abilities may argue that assessing communicationas part of a mathematics test penalizes studentswith reading/writing disabilities and does notallow them to fully demonstrate their mathemati-cal knowledge.

Nonetheless, on real-world mathematics prob-lems, poor performance by low-achieving stu-dents without disabilities may also be the resultof poor reading or writing skills. But a reader,often available to students with learning dis-abilities, usually is not provided to a nondis-abled student with poor reading skills. A policy-maker might question why nondisabled studentswho read slowly and laboriously, have limitedvocabularies, have difficulty interpreting sym-bols, respond slowly, suffer test anxiety, or havedifficulty staying on task are less deserving ofthe opportunity to demonstrate maximum perfor-mance with a reader than is a student who hasbeen labeled learning disabled. Perhaps no onehas yet discovered the “disability” that causes thelow-achieving students to experience the listeddifficulties.

Some evidence suggests that students withdisabilities profit more from a reader than dolow-achieving students (Tindal, 1999). This may

be because testing variations are usually mosteffective when students have had practice usingthem. Instructional prescriptions for studentswith learning disabilities often require all textto be read to the student, whereas low-achievingstudents are typically expected to continue strug-gling to read text by themselves. Therefore, stu-dents with learning disabilities may have receivedextended practice listening to text being readaloud and little practice reading the text them-selves, while the experience of low achieverswas exactly the opposite. Nevertheless, Meloy,Deville, and Frisbie (2000) found that studentswith and without disabilities benefited froma reader on a standardized, norm-referencedachievement test.

The policy dilemma is whether to (a) confera benefit on students with disabilities and poten-tially penalize low achievers by only allowingreaders for students with disabilities on mathe-matics tests, (b) label a reader as a modificationfor all students or (c) allow a reader for any stu-dent who needs one and treat the resulting scoresas comparable to scores from standard adminis-trations. Choosing (a) creates the inconsistencyof arguing that reading is part of the skill beingmeasured for nondisabled students but not forstudents with disabilities. Choosing (b) preservesthe intended construct of authentic problem solv-ing but may be unpopular with advocates forstudents with disabilities because these students’reading/writing disabilities would be consideredpart of the skill intended to be tested. Choosing(c) minimizes the communication component ofmathematics and may be contrary to the testspecifications and the corresponding state mathe-matics content standards. A more straightforwardmethod for minimizing the impact of skills otherthan pure mathematics is to develop less complexitems that use simplified text and/or more pic-torial representations for all students. However,this option may also inappropriately alter theconstruct intended to be measured. Policymakersmust make the final decision based on the pur-pose for testing, the tested content standards, theintended use(s) of the scores, and their evalua-tions of the relative merits of these competingpolicy goals.

Page 58: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

42 S.E. Phillips

Another important consideration in the for-mulation of a policy on readers for mathematicstests is their unintended effects on student andteacher behavior. If readers are allowed for allstudents or if mathematics items are simplifiedto reduce reading/writing complexity, teachersmay be encouraged to engage in less real-worldinstruction in the classroom or may be encour-aged to provide a read-aloud instructional strat-egy for most students who are having difficultyreading text themselves. Alternatively, disallow-ing readers for all students on a mathematics testmight discourage teachers from providing read-aloud instructional strategies for students withdisabilities and encourage them to give greateremphasis to improving students with disabili-ties’ abilities to read text themselves. This lat-ter unintended outcome might provide importantlong-term benefits to individual students with dis-abilities. Consider the following actual examplefrom a statewide graduation testing program.

A first-grade student, Joey (not his real name),was struggling in school relative to the earlier per-formance of his gifted older brother. His parentsrequested a special education referral, and Joeywas tested. His performance in mathematics wasabout one year below grade level, and his per-formance in reading was about half a year belowgrade level. There was a small discrepancy withability in mathematics and none in reading. TheIEP developed for Joey, with significant parentalinput, labeled him learning disabled in mathemat-ics and required all written material to be readaloud to him. When Joey entered high school, anenterprising teacher chose to disregard the IEP andbegan intensive efforts to teach Joey to read. Withinabout 18 months, Joey went from reading at asecond-grade level to reading at a sixth-grade level.Had this teacher not intervened, Joey would havegraduated from high school with only rudimentaryreading skills. Although his sixth-grade readingskills were not as high as expected of high schoolgraduates, they provided some reading indepen-dence for Joey and allowed him to eventually passthe state’s mathematics graduation test without areader.6 (Phillips, 2010, chapter 5)

6 Incidentally, when the items on the state graduation testwere divided into two categories, symbolic text and storyproblems, the percent of items Joey answered correctlywas similar for both categories.

This example illustrates an important trade-off in providing debatable testing variations suchas readers for mathematics tests. Joey’s parentspressured his teachers to read all instructionalmaterials aloud to him so he could avoid the frus-tration of struggling with a difficult skill. Thedownside was that Joey became totally depen-dent on the adults in his life to read everythingto him. Yet, when he finally was expected tolearn reading skills, he was able to develop somefunctional independence. Although he still mayrequire assistance with harder materials, there aremany simpler texts that he can read solo.

Calculators. Suppose a mathematics test mea-sures paper-and-pencil computation algorithmsand estimation skills. If calculators are not per-mitted for a standard administration of the test butare allowed for students with the learning disabil-ity of dyscalculia (difficulty handling numericalinformation), the resulting scores will representdifferent mathematical skills and will not be com-parable. For example, consider this item:

About how many feet of rope would berequired to enclose a 103

′ ′by 196

′ ′garden?

A. 25 ∗B. 50 C. 75 D. 100

Students without calculators might round 103and 196 to 100 and 200, calculate the approxi-mate perimeter as 2(100) + 2(200) = 600 inches,and convert to feet by dividing by 12. Studentswith calculators would likely use the followingkeystrokes: 1 0 3 + 1 0 3 + 1 9 6 + 1 9 6 = ÷1 2 = . The calculator would display 49.833. . .

and the student would only need to choose theclosest answer: 50. Clearly, use of the calculatorwould subvert the skills intended to be measuredbecause the student would not have to do anyestimation or computation to select the correctanswer. In this case, the student’s disability is notan irrelevant factor to be eliminated but rather anindication that the student does not have the cog-nitive skills measured by the mathematics test.Moreover, classifying calculator use as a modi-fication for a mathematics test with such items isbased on evaluation of the test specifications asthey are currently written. If critics want studentstested on different skills (e.g., calculator literacy

Page 59: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

3 U.S. Legal Issues in Educational Testing of Special Populations 43

and problem solving rather than estimation andpaper-and-pencil computation), they should firstconvince policymakers to change the test speci-fications and corresponding mathematics contentstandards.

Construct Fragmentation

In some cases, when a nonstandard test admin-istration assists with or removes a single, minorfactor from the construct measured on a test,a reasonable argument may be made that it isextraneous or not essential to the essence of theconstruct. However, if several parts of the con-struct are removed concurrently by providingmultiple testing variations to the same students,the remaining parts of the construct that are testedmay seriously distort the intended measurement.Allowing any student or group of students tochoose which parts of the construct they will betested on and which parts of the construct will beremoved or receive assistance allows these stu-dents to be tested on different definitions of theconstruct and produces noncomparable scores.

For example, to assist students with learningdisabilities on a mathematics problem-solvingtest, a test administrator might (a) permit calcu-lator use, arguing computation is not part of theintended skill; (b) read the test aloud, arguingreading the problems is not part of mathematicsknowledge; (c) provide a reference sheet with for-mulas and measurement unit conversions, argu-ing memorization of that information is unim-portant; (d) remove extraneous information fromeach problem, arguing such information is anunnecessary distraction; and (e) eliminate oneof the answer choices, arguing that with fewerchoices, the student will be more focused onthe task. If all these parts of the construct ofsolving mathematics problems, each judged indi-vidually to not be part of the intended mea-surement (i.e., construct irrelevant), are removedfrom the tested skill to provide greater “access”for students with learning disabilities, the onlyremaining tasks are to select the correct formula,plug the numbers into the calculator, and findthe matching or closest answer choice. Similar

arguments can be made for administration of areading comprehension test to ELLs with decod-ing, language complexity, more difficult vocabu-lary and nonessential text removed from the test,and each test question relocated to directly fol-low the paragraph containing the answer. In bothexamples, the intended construct has been frag-mented into pieces, selected pieces have beenaltered or removed, and the reassembled pieceshave produced a distorted construct. If the alteredand lost fragments of the construct were part ofthe original test specifications and correspondingcontent standards, content validity judgments willcharacterize the tested construct as qualitativelydifferent from that measured by a standard testadministration.

Construct Shift

In addition to fragmenting the construct intoremovable parts with multiple testing variations,there is another problem with the logic some-times used to identify “appropriate” testing varia-tions. According to the Test Standards, the con-struct which defines the skills intended to bemeasured should be a property of the test andshould be defined by the test’s content objec-tives and specifications. But, for example, whena reading test is administered with a reader forstudents with learning disabilities but not forregular education students who are poor read-ers, or when language assistance is providedto ELLs with English language skill deficien-cies but not to non-ELLs with similar deficien-cies, the construct has been redefined by groupmembership. For the reading test, the constructtested for students without learning disabilitiesis reading comprehension but for students withlearning disabilities is listening comprehension.Similarly, for non-ELLs, the construct is con-tent (reading or mathematics) in English, butfor ELLs, the construct is content only with theeffects of language removed to the extent pos-sible. Thus, the definition of the construct hasshifted from a property of the test to a propertyof the group to which the student is a mem-ber. When tests measuring different constructs

Page 60: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

44 S.E. Phillips

are administered to students in different sub-groups, the resulting scores are not comparableand, according to the Test Standards, should notbe interpreted as having the same meaning (e.g.,proficient, passing) for all students.

Public Policy Exceptions

Normally, the award of diplomas, licenses, andcredentials conditioned on the achievement of aspecified test score should only be made whenpassing scores are obtained from standard admin-istrations or nonstandard administrations thatproduce comparable scores. However, there maybe extraordinary circumstances for which a spe-cial waiver of the testing requirement may beappropriate.

For example, suppose a student who has beentaking accelerated college-preparatory coursesand receiving “A” grades is involved in a tragicautomobile accident just prior to the initialadministration of the graduation test in eleventhgrade. Suppose further that there was extensiveevidence that the student’s academic achievementin reading and mathematics had already exceededthe standards tested by the graduation test andthat the student had been expected to pass easilywith a high score. However, due to the acci-dent, the student is now blind and has no hands.The student can no longer read the material onthe reading test visually (no sight) and is unableto learn to read it in Braille (no hands). But,administering the reading test aloud via a readerwould alter the tested skills from reading com-prehension to listening comprehension producingnoncomparable scores.

Nonetheless, as a matter of public policy, sucha case might be deserving of a waiver of thegraduation test requirement in recognition of theextraordinary circumstances and compelling evi-dence of achievement of the tested skills. Basedon the student’s medical records, transcripts, ref-erences, and a passing score on the graduationtest obtained with the modification of a reader,an appeals board might determine that this stu-dent should receive a test waiver and be eligibleto receive a high school diploma if all other

graduation requirements are satisfied. Rather thantreating this modification as if it were an accom-modation, it is preferable to grant a waiver ofthe testing requirement when a student is other-wise qualified. Waivers granted judiciously avoidpretense and preserve the integrity of the testingprogram. It should be a rare event for studentswith disabilities to be able to document achieve-ment of high school level competencies but beunable to access the graduation test without amodification.

Leveling the Playing Field: AccessVersus Success

One of the goals of federal legislation for stu-dents with disabilities is access. A goal of accessreflects an understanding that there is value inhaving students with disabilities participate intesting programs under any circumstances. Onereason policymakers value access may be that itresults in greater accountability for school dis-tricts in providing instruction and demonstratingeducational progress for special-needs students.If this is the case, any nonstandard test adminis-tration that moves a student from an exclusionarystatus to one of being included in the test may bedesirable.

However, some advocates have interpretedthe goal of federal legislation for students withdisabilities to be increased success (higher testscores). Those who support this view argue thatstudents with disabilities should not be penalizedfor biological conditions outside their control.They believe that the intent of federal legislationwas to confer a benefit on students with disabili-ties that would change the skills being measuredfrom tasks the student cannot do to tasks thestudent is able to do. The term leveling the play-ing field has often been used in this context todescribe testing modifications that go beyond thegoal of access to a goal of success and confuseremoving extraneous factors with equalizing testscores.

Increasing access by removing the effectsof extraneous factors supports the validity oftest score interpretations by requiring students

Page 61: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

3 U.S. Legal Issues in Educational Testing of Special Populations 45

with disabilities to demonstrate all relevant skillswhile not penalizing them for unrelated deficien-cies. Alternatively, increasing success by provid-ing a compensating advantage to offset relevantdeficiencies so that students with disabilities havean equal opportunity to obtain a high or qual-ifying score relative to their nondisabled peersdecreases the validity of the resulting test scoresas indicators of achievement of the skills the testis intended to measure.

Leveling the playing field to equalize suc-cess is analogous to a golf handicap tournament.Based on prior performance, golfers are given ahandicap that is subtracted from the final score. Ina handicap tournament, a poor golfer who playswell on a given day can win over a good golferwith a lower score who has not played as well asusual that day. Handicapping prevents the goodgolfers from always winning.

Similarly, using a handicapping model forinterpreting leveling the playing field would cre-ate a testing scenario in which the highest achiev-ing students would not always receive the highestscores. In such a scenario, the test would beviewed as a competition for which all partici-pants, regardless of their level of knowledge andskills, should have an equal chance of obtaininga proficient score. Under this interpretation, eachstudent with a disability would be given what-ever assistance was necessary to neutralize theeffects of the student’s disability. For example,students who were not proficient at estimationwould be given a calculator for those items;students who had difficulty processing complextextual material would be given multiple-choiceitems with only two choices rather than the usualfour. Low-achieving students whose poor perfor-mance could not be linked to a specific disabilitywould not receive any assistance; they would berequired to compete on the same basis as therest of the regular education students. The settle-ment of a federal case in Oregon case illustratessome of the concerns with using the goal ofequal opportunity for success rather than equalaccess to level the playing field for students withdisabilities.

Oregon Case Settlement

Advocates for Special Kids v. Oregon Dep’t ofEduc. (2001) involved a ban on the use of com-puter spell-check on a tenth-grade writing test forwhich 40% of the score was based on spelling,grammar, and punctuation.7 Affected studentsand their parents claimed the ban discriminatedagainst students with learning disabilities. Aplaintiff student with dyslexia stated that “[w]henthey test me in spelling, they’re testing my dis-ability, which isn’t fair. It’s like testing a blindman on colors.” Unconcerned that her spellingdeficit would hinder her career, she also statedthat “[a] lot of things I like are hands-on, [a]ndif I become a writer, they have editors for that”(Golden, 2000, p. A6).8

In a settlement agreement between the plain-tiffs and the state testing agency, the partiesagreed to changes in the state testing programbased on recommendations of an expert panelconvened to study the issue. In the settlement,the state agreed to permit all requested nonstan-dard administrations on its tests unless it hadempirical research proving that a specific non-standard administration produced noncomparablescores. Contrary to the Test Standards, which cre-ates an assumption of noncomparability of scoresfrom nonstandard administrations in the absenceof credible evidence of score comparability, thisagreement effectively created a default assump-tion that a requested nonstandard test administra-tion produced comparable scores unless the statecould produce empirical evidence that provedotherwise.

This settlement reflected the reality that nega-tive publicity can motivate parties to settle a law-suit contrary to accepted professional standards.Effectively, agreeing to these conditions meant

7 Technically, passing the tenth grade test was not requiredfor receipt of a diploma. Nonetheless, retests were pro-vided, and it functioned like a graduation test of minimalskills expected of high school graduates.8 Note, however, that a color-blind individual cannotobtain a pilot’s license.

Page 62: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

46 S.E. Phillips

that testing variations such as a reader for a read-ing test, a calculator for a computation or esti-mation test, or spell-check and/or grammar checkfor a writing test would automatically be pro-vided to students with learning disabilities whoregularly used them in instruction and requestedthem for the test, even though most psychometri-cians would agree that they altered the constructintended to be tested and produced noncompara-ble scores. In addition, because states often havefew resources for research and often not enoughstudents with a given disability requesting a par-ticular testing variation to conduct a separatestudy, this settlement made it nearly impossi-ble for the state to place limits on nonstandardtest administrations for students with disabili-ties, even when logical evidence clearly indicatedthat the scores were not comparable. Moreover,if the state did impose any limits on nonstan-dard test administrations, the state was requiredto provide an alternate assessment for the testthat probably would not have produced compa-rable scores to those obtained from standard testadministrations.

To date, no court has ordered a state to inter-pret nonstandard administrations that altered theskills intended to be measured and producednoncomparable scores as equivalent to standardadministrations because federal disability lawdoes not require it. Had the Oregon case gone totrial, it is unlikely that the court would have doneso. Unfortunately, despite the fact that this settle-ment applied only to the Oregon testing program,it was used by advocates to pressure other statesto grant similar concessions.

No Child Left Behind (NCLB)Modified Tests

The issue of access versus success and its impli-cations for the validity of the resulting test scoreinterpretations is especially relevant to the mod-ified tests permitted under the NCLB Act (2000)guidelines for students with disabilities whohave had persistent academic difficulties. Federalguidelines require on-grade-level content but per-mit cognitive difficulty to be reduced. State maycount up to 2% of students with disabilities as

proficient on such modified tests. Many stateshave reduced the cognitive difficulty of test itemsfrom standard administrations by reducing thenumber of answer choices, decreasing languagecomplexity, removing nonessential information,adding graphics, shortening the test, and so on.However, as described previously, the intersec-tion of content and cognitive difficulty definesthe content coverage and grade-level constructsintended to be measured by most achievementtests. Thus, if a modified test retains the contentdesignations but changes the cognitive difficultyof the items, the measured construct will changeand the resulting test scores will not be com-parable to scores from the on-grade-level testunless the two tests have been statistically linked.If so, the modified test score corresponding toproficient will typically be proportionally higherthan for the on-grade-level test. Moreover, recentresearch has suggested that some of the itemmodifications intended to increase access (or suc-cess) may be ineffective (Gorin, 2010).

For example, consider the fifth-grade mathe-matics story problem and the simplified item forthe same problem shown below.

Original Item: (The field test percent of studentschoosing each option is given in parentheses.)

Tom divided his 12 cookies equally betweenhimself and his 4-year-old sister Sara who thenate one-third of her share. How many cookies didSara have left?A. 2 (22%)B. 3 (2%)∗ C. 4 (72%)D. 6 (4%)

Modified Item:Sara has 6 cookies. She eats one-third of them.How many cookies are left?A. 3B. 4C. 6

In the original item, the student must ana-lyze the problem; determine several relationships;use mathematical reasoning to understand what

Page 63: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

3 U.S. Legal Issues in Educational Testing of Special Populations 47

happened; identify “4” as an irrelevant number;translate “divided,” “one-third of,” and “left” intothe operations of division, multiplication of frac-tions, and subtraction; set up the problem; andsolve for the answer. The modified item presentsthe same fractional computation but with less lan-guage complexity, fewer steps required to solvethe problem, and fewer answer choices. Tom isremoved completely, extraneous information hasbeen eliminated, and the problem statement isbroken into two simpler sentences using presenttense verbs, no relational words, fewer modifyingphases, and fractions written as numbers ratherthan words. There are only three answer choices,the most attractive wrong answer has been elimi-nated, and the graphic demonstrates how to setupthe problem.

The modified item still involves the same con-tent (fractions), but the cognitive difficulty hasbeen reduced because there are fewer steps, noextra information, and the student can obtain thecorrect answer from the graphic by counting. If,as part of a content validity judgmental review ofthese two items, elementary mathematics teach-ers stated that the modified item matched theinitial introduction of fractions skills described inthe fourth-grade content standards and the orig-inal item matched the fractions skills describedin the fifth-grade content standards, these judg-ments would provide logical evidence that useof the modified item on the fifth-grade mathe-matics test would alter the construct intended tobe measured and produce noncomparable scores.Thus, as judged by the mathematics teachers, stu-dents who could correctly answer the originalitem would have demonstrated more mathemat-ical skill at a higher grade level than studentswho could correctly answer the modified item.Further, the modified item would be easier toguess correctly with fewer answer choices and themost attractive wrong answer eliminated.9

9 If the rationale for eliminating an answer choice is toreduce the reading load, reduce fatigue, and/or compen-sate for a short attention span, all of which might beconsidered construct-irrelevant factors for a mathematicstest, this author’s opinion is that it would be more defen-sible to eliminate the least-often-chosen wrong answer.Eliminating the most attractive wrong answer appears toinappropriately address success rather than access.

The federal guidelines appear to intend theseresults, and there is no conflict with the TestStandards if scores from tests with modifieditems are interpreted as measuring a modifiedconstruct and as comparable only when statis-tically linked to the on-grade-level score scale.Some researchers who have investigated vari-ous item modifications in multiple states describetheir modified items as providing greater accessfor students with disabilities with persistent aca-demic difficulties (Elliott et al., 2010). However,their criteria for a useful item modificationinclude a narrowing of the gap between aver-age scores of students with and without dis-abilities. This criterion sounds more like oneof success than access because it assumes thatthe only reason students with disabilities haveperformed poorly on the on-grade-level tests isdue to lack of “access” to the test questions.However, some research studies have gatheredevidence suggesting that the real problem forsome of these students with disabilities is lackof “access” to instruction on the tested skills(Stoica, 2010). When this is the case, alteringthe content by modifying the items may not beaddressing the real access problem faced by thesestudents with disabilities. Thus, a greater nar-rowing of the achievement gap (success) for stu-dents with disabilities eligible for modified testsmay be realized through a closer match betweenthese students’ instruction and the on-grade-leveltested skills (access through instruction ratherthan access through item modification).

Eligibility for Nonstandard TestAdministrations

There is a difference between truly needing some-thing and wanting something. Humans need foodand water to survive, but they do not need toeat apple pie and drink coffee to survive. Somepeople may want the latter and be unhappy ifthey do not get it, but they can meet their sur-vival needs with other less appealing foods andplain water. Similarly, students who are blindneed an alternative method to access written textor they will be unable to provide any responseto the test questions. However, a student with a

Page 64: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

48 S.E. Phillips

learning disability does not necessarily need tohave a reader. The student with a learning disabil-ity may have sufficient reading skills to respondto the test questions but may want a reader toachieve a higher score and avoid the frustrationof struggling with a difficult task.

The challenge for test administrators is toseparate those students who truly cannot partic-ipate without a nonstandard test administrationfrom those who simply want the additional assis-tance to make the task easier. Nonetheless, evena nonstandard test administration that is trulyneeded for a student to access the test may inval-idate the intended score interpretation. In such acase, policymakers must determine whether thebenefits of inclusion with a modification out-weigh the costs of losing interpretive informa-tion and being excluded from aggregate results.In addition, if a modification confers a benefiton a specific subset of students, it is importantto determine who qualifies for the benefit. Inthis case, relevant considerations may be these:recent written documentation of the disabilityby a trained professional, routine provision ofthe modification in the student’s educationalprogram, an appropriate relationship betweenthe disability and the desired modification, andthe availability of facilities for providing themodification.

In the past, most testing programs have reliedon the student’s IEP or 504 Plan to certify adisability. In many cases, achievement belowexpectations has been sufficient to label a studentdisabled. However, two Supreme Court decisionssuggest a narrower interpretation of the termdisability. In these cases, the Court held that per-sons are disabled only if substantial life activitiesare affected after correction or mitigation of animpairment (Murphy v. United Parcel Service,1999; Sutton v. United Airlines, 1997). The Courtheld that an employee with a corrected impair-ment was not disabled even if the person theo-retically would have difficulties in the absence ofmedication or corrective apparatus (e.g., medica-tion for high blood pressure or eyeglasses). Thisholding suggests a duty to take feasible correc-tive action for a disability and then be evaluatedfor the level of impairment.

In an educational context, this holding maymean that a student would qualify as having areading disability and be eligible for a reader onlyif it is impossible for the student to learn to read,not merely because the student might have dif-ficulty learning to read or because the student’scurrent skill level might be significantly belowgrade level. Nonetheless, a student who is blindand for whom reading printed text is impossiblewould still qualify for a Braille administration.Such an interpretation would have the advantagesof reserving nonstandard test administrations forthose whose conditions cannot be modified orcorrected and reducing any incentive to label astudent disabled whenever the student has diffi-culty with academic tasks.

Undue Burdens

Case law has established that an accommoda-tion is not required if it imposes an undue bur-den. However, most interpretations have requiredextreme expense or disruption for a requestednonstandard test administration to qualify as anundue burden. Most testing programs are bur-dened with extra costs and scheduling difficul-ties for some nonstandard administrations. Forexample, more space and substantially increasednumbers of test administrators are required whensignificant numbers of students are tested indi-vidually. An individual administration may berequired due to frequent breaks, attention prob-lems, readers, use of specialized equipment, med-ical monitoring, and so on, but such administra-tions may also pose a major resource allocationchallenge for schools when testing windows arenarrow. At issue is when such burdens rise tothe level of an undue burden and need not beprovided.

A situation in which a very expensive test-ing variation has been requested for a singlestudent may qualify as an undue burden. Forexample, for one state graduation test, a student’sparents requested a 99-point large-print version.This size print allowed only one to two wordsper page and required special large-paper ver-sions for diagrams. The cost of producing such a

Page 65: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

3 U.S. Legal Issues in Educational Testing of Special Populations 49

large-print version for a single test form was over$5,000. Because it was a graduation test, therewas also the possibility that additional versionswould have to be created if the student was unsuc-cessful on the initial attempt. In this example, thestudent had a visual impairment that could not befurther corrected. Thus, it was impossible for thestudent to take the test without the requested test-ing variation. However, the cost was significantfor the state to pay for a single student. In sucha case, the requested nonstandard test administra-tion might be judged an undue burden and a statemight not be required to provide it.10

Moreover, state policymakers must prioritizethe use of available resources for nonstandardadministrations. It may not be possible to providethe test in all requested formats, even when suchrequests are consistent with state policies. Analternative may be to allow local test administra-tors to decide which nonstandard administrationsto provide. Such a policy would have the advan-tage of limiting the state cost, but would havethe disadvantage of resulting in similarly situ-ated students being treated differently dependingon local policy in their geographic area. In addi-tion, there would probably still be a duty for statepolicymakers to provide guidance to local policy-makers regarding accommodation/modificationdecisions, appropriate use and interpretation ofthe resulting test scores, and informed consent ofparents/students.

Graduation Testing Challenges

Some states with graduation tests award highschool diplomas to students with disabilities who

10 The stated reason for requesting the nonstandardadministration was a parental desire for the student toattend a particular state university that required a highschool diploma for admission. Due to the severity of thedisability (the student could only work on the test forabout ten minutes at a time and was tested in small seg-ments over several weeks), some educators questioned thestudent’s ability to handle college level coursework evenif the student passed the modified test. Note also that acomputerized test administration, unavailable at the time,might provide adequate magnification at less cost.

have completed their IEPs, ELLs who have takena translated test, or students who have passed thegraduation test with modifications. However, thetendency of states to tie testing modifications tospecific disabilities (or lack of language profi-ciency for ELLs) indicates that they implicitlyrecognize that test scores from modified admin-istrations are not comparable to those obtainedfrom standard test administrations. If a stateallows scores from modified test administrationsto be interpreted the same as scores from stan-dard administrations, it has conferred a benefiton those students who received the modifications.When benefits are conferred, the state must estab-lish eligibility criteria and evaluate the qualifica-tions of persons who seek the benefit. Effectively,the conferring of such benefits means that if astudent can document a disability or ELL sta-tus, the student is permitted to satisfy the statetesting requirement by substituting an alternativeskill and/or being exempted from portions of thetested skill.

Instead of awarding diplomas to students whohave completed IEPs, passed translated tests, orreceived the benefit of modifications on a grad-uation test, some states substitute a certificate ofcompletion for these students. This policy, chal-lenged in recent litigation, was influenced bythe early challenges to graduation testing of spe-cial education students referenced previously andchallenges to graduation tests by minority stu-dents claiming racial/ethnic discrimination. Thegraduation test challenges claiming racial/ethnicdiscrimination, Debra P. v. Turlington (1984) andGI Forum v. Texas Educ. Agency (2000), are dis-cussed next followed by additional graduationtesting cases involving students with disabilitiesand ELLs.

The Debra P. and GI Forum Cases

In the Debra P. case, African-American studentssubject to a Florida graduation test requirementalleged that it was unconstitutional because theyhad received an inferior education in segregatedschools and because they had too little timeto prepare for it. Initially, the court announced

Page 66: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

50 S.E. Phillips

two new requirements for graduation tests, noticeand curricular validity, and ordered Florida toaward diplomas to students who had failed thegraduation test but satisfied all other graduationrequirements. The court defined curricular valid-ity as evidence that the tested skills were includedin the official curriculum and taught by the major-ity of teachers. Once Florida produced sufficientevidence of the test’s curricular validity and stu-dents had been given at least 4 years’ notice ofthe requirement, the court held that Florida couldrequire students to pass the graduation test to earna high school diploma.

In the GI Forum case, Hispanic and African-American students alleged that the Texas gradua-tion test discriminated against them because theyreceived inadequate educational preparation andthe test failed to satisfy professional standards.The GI Forum court reaffirmed the holdings ofthe Debra P. court and ruled in favor of the state,finding

While the [graduation test] does adversely affectminority students in significant numbers, the [state]has demonstrated an educational necessity for thetest, and the Plaintiffs have failed to identifyequally effective alternatives. . . . The [state] hasprovided adequate notice of the consequences ofthe exam and has ensured that the exam is stronglycorrelated to material actually taught in the class-room. In addition, the test is valid and in keepingwith current educational norms. Finally, the testdoes not perpetuate prior educational discrimina-tion. . . . Instead, the test seeks to identify inequitiesand to address them. (p. 684)

Notice

Notice requires the state to disseminate infor-mation about graduation test requirements to allaffected students well in advance of implementa-tion. Notice periods of less than 2 years have beenfound unacceptable by the courts; notice periodsof 4 years in the Debra P. case and 5 years in theGI Forum case were found acceptable. The courtshave not mandated a specific length for the noticeperiod; with extensive dissemination efforts, solidcurricular validity, and systematic achievementtesting of prerequisite skills in earlier grades, a3-year notice period may be adequate.

There has been some debate about whether thenotice period applies to the first administrationof the graduation test or to students’ scheduledgraduations. The Debra P. and GI Forum casesreferred to the latter, but satisfying curricularvalidity may require a longer notice period toallow students to complete all the courseworkcovering the tested skills. For example, if a grad-uation test administered in the spring of eleventhgrade includes Algebra II content, and studentsmust take courses covering Pre-algebra, AlgebraI, and Plane Geometry prior to taking Algebra II,notice must be given by seventh grade so studentscan enroll in Pre-algebra no later than eighthgrade. In this case, notice would occur 4 yearsbefore the first test administration and 5 yearsbefore the students’ scheduled graduations.

Curricular Validity

The curricular validity requirement, also referredto as opportunity to learn (OTL), was includedas Standard 8.7 in the 1985 version of the TestStandards and carried forward as Standard 13.5in the 1999 revision. OTL means that studentsmust be taught the skills tested on a graduationtest. In practice, evidence of OTL is often gath-ered by examining the official curricular materi-als used in instruction and surveying the work ofteachers to determine whether they are teachingthe tested content. In the GI Forum case, the courtheld, on all the facts and circumstances, that thestate had satisfied the curricular validity require-ment with a mandated state curriculum, teachercommittee item reviews that considered adequacyof preparation, remediation for unsuccessful stu-dents mandated by statute, and continuity of thegraduation test with its predecessor which wasbased on the same curriculum and for which anOTL survey of teachers had been completed.

Retests and Remediation

Multiple retests combined with substantial reme-diation efforts were persuasive in the Debra P.and GI Forum cases. The Debra P. court stated

Page 67: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

3 U.S. Legal Issues in Educational Testing of Special Populations 51

[The state’s] remedial efforts are extensive ....Students have five chances to pass the [graduationtest] between the 10th and 12th grades, and if theyfail, they are offered remedial help . . .. All [of thestate’s experts] agreed that the [state’s remediation]efforts were substantial and bolstered a finding of[adequate opportunity to learn]. (pp. 1410–1411)

In the GI Forum case, the Texas EducationCode provided: “Each school district shall offeran intensive program of instruction for studentswho did not [pass the graduation test]” (§ 39.024(b)), and the court held

[A]ll students in Texas have had a reasonableopportunity to learn the subject matters covered bythe exam. The State’s efforts at remediation andthe fact that students are given eight opportunitiesto pass the [graduation test] before leaving schoolsupport this conclusion. (p. 29)

Notice and Curricular Validityfor Special Education Students

Contemporaneous with the Debra P. and GIForum cases, lawsuits in Illinois and New Yorkaddressed graduation testing of students with dis-abilities. These cases upheld the application ofgraduation test requirements to students with dis-abilities and addressed the notice and curricularvalidity requirements as applied to special educa-tion students.

The Brookhart Case

The Brookhart v. Illinois State Bd. of Educ.(1983) case addressed the due process require-ments for graduation tests as applied to stu-dents with disabilities. It involved a minimum-competency test mandated by a local schooldistrict in Illinois. The testing requirement wasimposed in the spring of 1978 and became effec-tive for the spring 1980 graduating class. Prior totheir scheduled graduations, regular and specialeducation students had five opportunities to passthe three-part test of reading, language arts, andmathematics skills at a passing standard of 70%.Students who failed the graduation test receivedcertificates of completion instead of a high schooldiploma.

Several students with disabilities who had suc-cessfully completed their IEPs but who had failedthe graduation test and been denied diplomasfiled a lawsuit to challenge the testing require-ment and force their districts to award them diplo-mas. They alleged that their constitutional rightshad been violated by insufficient advance noticeof the graduation test requirement. A varietyof disabilities were represented among the stu-dents challenging the testing requirement includ-ing physical disabilities, multiple impairments,mild mental retardation, and learning disabilities.The trial court ruled against the students, stating

It is certainly true that giving a blind person a testfrom a printed test sheet discloses only his [dis-ability] and nothing of his knowledge. To discovera blind person’s knowledge, a test must be givenorally or in [B]raille .. . . This . . . does not meanthat one can discover the knowledge or degreeof learning of a [cognitively] impaired student bymodifying the test to avoid contact with the [cog-nitive] deficiency. To do so would simply be topretend that the deficiency did not exist, and tofail completely to measure the learning .. . . Thisposition does not involve any lack of compassionor feeling for people living with serious [disabili-ties]; it simply involves the avoidance of pretense,the integrity of a knowledge-testing program, andreserves some meaning for a high school diplomain relation to the attainable knowledge and skillsfor which the schools exist. (p. 729)

The trial court also held that awarding adiploma to a student with a disability who did notreceive extended notice of the graduation testingrequirement

is a misunderstanding and debasement of the con-cept of due process of law.. . . [D]ue process oflaw does not require pretending that such standardhas been achieved by a person whose [disability]clearly makes attainment of that standard impos-sible or inappropriate . . . The means to avoid [thestigma of failing to receive a high school diploma]exists in attainment of the necessary knowledge. Ifcapacity for such does not exist, the law does notrequire pretense to the contrary. (pp. 730–31)

The trial court seemed to be questioningwhether extended notice would really change thelearning outcomes for students with disabilitieswhose IEPs did not already contain provisions foracademic instruction at a high school level.

The students appealed this decision, and theappeals court held that students with disabilities

Page 68: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

52 S.E. Phillips

may be subject to a graduation test requirement.But the court also held that (1) parents and edu-cators must make an informed decision regardingwhether the tested content should be includedin the students’ IEPs and (2) 11/2 years is notsufficient notice for students with disabilities toprepare for a graduation test. The court stated

[I]n an educational system that assumes specialeducation students learn at a slower rate than reg-ular division students, a year and a half at mostto prepare for the [test] is insufficient.. . . [P]arentsand teachers may evaluate students and concludethat energies would be more profitably directedtoward areas other than [test] preparation . . . Herehowever parents had only a year to a year anda half to evaluate properly their children’s abili-ties and redirect their educational goals.. . . [T]hiswas insufficient time to make an informed deci-sion about inclusion or exclusion of training on [thetested] objectives. (p. 187)

The court also found that up to 90% of thetested skills had not been included on the stu-dents’ IEPs, an 18-month notice period was notsufficient to remedy this deficiency, and expertopinion suggested inappropriate school districtpredictions of failure without giving the studentswith disabilities an adequate chance to learn thetested skills. Because a number of the plaintiffshad been out of school for several years, thecourt determined that it would be unreasonable toexpect these students to return to school for fur-ther remediation and that they should be awardeddiplomas as the remedy for the notice violation.

The Ambach Case

The Bd. of Educ. of Northport v. Ambach (1981)case was filed by the Commissioner of Educationin New York against a local school district. Theschool district had awarded high school diplo-mas to two students with disabilities who hadfailed to pass the mandatory state graduation testsin reading and mathematics. The two studentswith disabilities – one with a neurologic disabil-ity affecting computation ability and the otherwho was trainable mentally retarded – had suc-cessfully completed their respective IEPs. TheCommissioner sought to invalidate the diplomas

awarded to these students with disabilities andany others who had not passed the required tests.

In challenging the Commissioner’s order onbehalf of these students with disabilities, theschool district alleged a Section 504 violation. Inholding that there was no denial of a benefit basedsolely on the students’ disabilities, the trial courtstated

Section 504 may require the construction of aramp to afford [a person who is wheelchair bound]access to a building but it does not assure that onceinside he will successfully accomplish his objec-tive. . . . Likewise, § 504 requires that a [disabled]student be provided with an appropriate educationbut does not guarantee that he will successfullyachieve the academic level necessary for the awardof a diploma. (p. 569)

The trial court also found no constitutionalviolations because students with disabilities arenot a protected class, and education is not afundamental right. Nevertheless, after acknowl-edging that it normally does not interfere inacademic decisions, the court found that “[t]hedenial of a diploma could stigmatize the individ-ual [students] and may have severe consequenceson their future employability and ultimate successin life” (p. 574). Although the testing require-ment became effective 3 years after its enactment,the court found that these students’ constitutionalrights had been violated by a lack of timely noticebecause written state guidelines covering the test-ing of students with disabilities were not issueduntil the year the testing requirement becameeffective.

The Ambach (1982) appeals court disagreed,holding

[I]t is undisputed that [the two disabled students]were performing at a level expected of elemen-tary students and the record is clear that theirmental deficits were functional, thereby deprivingthem of ever advancing academically to the pointwhere they could be reasonably expected to passthe [graduation test]. Thus, the issue of whetherthe three-year notice [period was adequate] is irrel-evant. No period of notice, regardless of length,would be sufficient to prepare [these two disabledstudents] for the [graduation test]. However, . . .

[there may exist disabled] students with remedialmental conditions who, with proper notice, might[have] IEPs adopted to meet their needs so as to

Page 69: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

3 U.S. Legal Issues in Educational Testing of Special Populations 53

prepare them to pass the [graduation test]. As tothem, . . . we hold that the three-school-year noticegiven here . . . was not of such a brief duration so asto prevent school districts from programming [theirIEPs] to enable them to pass the [graduation test].(pp. 687–688)

Thus, the issues of adequate notice of the grad-uation test (which must be at least as long forstudents with disabilities as for nondisabled stu-dents) and the curricular validity of the test (evi-dence that students with disabilities have beengiven the option to be taught what is tested) areimportant legal requirements for students withdisabilities in graduation testing cases.

Recent Challenges to Graduation Testsby Students with Disabilities

Historically, special education students couldreceive a high school diploma by remaining inschool for 12 years and completing their IEPsat whatever level of achievement the writers ofthose plans deemed appropriate. So there wassignificant resistance when graduation testingrequirements were applied to special educationstudents in states where there was no exceptionfor students with disabilities unable to demon-strate high school level skills or when a waiverprocess provided for those students was not auto-matic. The Rene v. Reed (2001) case in Indianaand the Chapman (Kidd) v. State Bd. of Educ.(2003) case in California are examples of suchchallenges.

The Indiana Case

The Indiana graduation test, adopted in 1997,was a subset of the English language arts (ELA)and mathematics items from the tenth-gradestate accountability test. Students with disabili-ties could obtain a waiver of a failed graduationsubtest if their teachers certified proficiency inthe tested skills supported by documentation,the principal concurred, remediation and retest-ing were completed as specified in their IEPs,the students maintained a “C” average, and theirattendance was at least 95%.

Rene had been in special education since firstgrade. Her IEP indicated that she was in thediploma program and that she was to be providedwith a reader and a calculator for all tests. Thegraduation test was administered to Rene severaltimes without a reader or calculator and she didnot pass. Rene and other special education stu-dents challenged the graduation test requirementclaiming that it violated their constitutional rightsbecause they had previously been exempted fromstandardized tests, had not been given sufficientnotice of the graduation test requirement to adjusttheir curricula to include the tested skills, andwould not qualify for a waiver because theirteachers were unable to certify that they hadachieved proficiency on the tested skills. Theyalso alleged a violation of the IDEA because thestate had refused to allow certain nonstandardtest administrations provided for in their IEPs.The state countered that the 5 years’ notice givento schools and the 3 years’ notice given to stu-dents and their parents were sufficient, and thestate was not required to provide nonstandard testadministrations which altered the tested skills andproduced noncomparable scores.

By the time the case was decided, Rene wasthe only remaining class representative of theoriginal four because one had dropped out ofschool, one had received a waiver and one hadpassed the graduation test. Citing the Debra P.case, the Rene trial court held there was no consti-tutional violation because (1) these students hadreceived adequate notice and (2) if there was alack of curricular validity because these studentshad not been taught the tested skills, the appropri-ate remedy was additional remedial instruction,not the award of a diploma.

The students appealed the trial court’s deci-sion. The appeals court concurred with the trialcourt holding that 3–5 years’ notice provided ade-quate preparation time and that the remediationopportunities provided to students with disabil-ities were an adequate remedy for any priorfailure of the schools to teach them the testedskills. The appeals court also held that the statewas not required to provide students with dis-abilities modifications of the graduation test thatthe state had determined would fundamentally

Page 70: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

54 S.E. Phillips

alter the tested skills and produce noncompara-ble scores. The appeals court also ruled that theIDEA did not require the state to honor all mod-ifications in students’ IEPs (e.g., a reader for thereading test), even though some students with dis-abilities were unable to pass the graduation testwithout those modifications. The appeals courtfound

We note . . . that the IDEA does not require specificresults, but instead it mandates only that disabledstudents have access to specialized and individu-alized educational services. Therefore, denial of adiploma to [students with disabilities] who cannotachieve the educational level necessary to pass astandardized graduation exam is not a denial ofthe “free appropriate public education” the IDEArequires. Further, the imposition of such a stan-dardized exam [without honoring all modificationsprovided in instruction] does not violate the IDEAwhere, as in the case before us, the exam is notthe sole criterion for graduation. (p. 745, citationsomitted)

The appeals court further distinguished theappropriateness of modifications in educationalservices provided to students with disabilitiesthrough their IEPs and the appropriateness ofmodifications on a graduation test designed tocertify academic skills. The court stated

We cannot say the trial court erred to the extent itdetermined the [s]tate need not honor certain [mod-ifications] called for in the Students’ IEPs wherethose [modifications] would affect the validity ofthe test results. The court had evidence before itthat the State does permit a number of [testing vari-ations] typically called for in IEPs. However, theState does not permit [modifications] for “cogni-tive disabilities” that can “significantly affect themeaning and interpretation of the test score.”

For example, the State permits accommoda-tions such as oral or sign language responses to testquestions, questions in Braille, special lighting orfurniture, enlarged answer sheets, and individual orsmall group testing. By contrast, it prohibits [mod-ifications] in the form of reading to the studenttest questions that are meant to measure readingcomprehension, allowing unlimited time to com-plete test sections, allowing the student to respondto questions in a language other than English, andusing language in the directions or in certain testquestions that is reduced in complexity. (p. 746,citations omitted)

The appeals court also cited Office for CivilRights (OCR) decisions by hearing officers that

states were not required to provide readers forreading comprehension tests.

The appeals court summarized its distinctionbetween educational services and graduation test-ing as follows:

The IEP represents “an educational plan devel-oped specifically for the [student that] setsout the child’s present educational performance,establishes annual and short-term objectives forimprovements in that performance, and describesthe specially designed instruction and services thatwill enable the child to meet those objectives.”The [graduation test], by contrast, is an assess-ment of the outcome of that educational plan. Wetherefore decline to hold that [a modification] forcognitive disabilities provided for in a student’sIEP must necessarily be observed during the [grad-uation test], or that the prohibition of such [amodification] during the [graduation test] is nec-essarily inconsistent with the IEP. (pp. 746–747,citations omitted)

The California Case

The Chapman (Kidd) case was a challenge toa graduation test requirement applied to spe-cial education students in California. It changednames from the Chapman case to the Kiddcase during the course of the litigation whensome of the named plaintiffs met the graduationtest requirement and new named plaintiffs wereadded to take their places.

The California Graduation Test. TheCalifornia High School Exit Examination(CAHSEE) was enacted in 2001 and was firstadministered to high school students early inthe second semester of tenth grade. It consistedof two untimed tests, ELA and mathematics.The ELA test consisted of 72 multiple-choiceitems and one open-ended writing prompt thatcovered the ninth, tenth, and a few eighth-gradeELA content standards with reading and writingweighted approximately equally. The mathemat-ics test consisted of 80 multiple-choice itemscovering the sixth- and seventh-grade contentstandards plus the Algebra I content standards.The passing standard was set by the State Boardof Education (SBE) at 60% of the 90 ELA totalscore points (raw score = 54) and 55% of the

Page 71: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

3 U.S. Legal Issues in Educational Testing of Special Populations 55

Table 3.1 California Graduation Test Passing Rates byClass and Subgroup

CAHSEEpassing rates(both tests)

Class of2004 (%)

Class of2006 (%)

Specialeducation

24 35

Diploma trackspecial education

43 63

All students 68 78

80 mathematics items (raw score = 44) on the2004 tenth-grade census administration and theequivalent scaled score on future test forms (Cal.Dep’t of Educ., 2004–2005).

Cumulative passing rates through eleventhgrade for both sections of the CAHSEE (ELA andmathematics) calculated by the independent eval-uator (HumRRO, 2006) for the classes of 2004and 2006 are shown in Table 3.1.

The cumulative passing rates for special edu-cation students on a diploma track were estimatedusing the California Department of Education(CDE) 2005 high school graduation rate of56% (for special education students not requiredto pass the CAHSEE) as a proxy for theunknown percent of special education studentson a diploma track (Phillips, 2010, Chapter 5).For example, in Table 3.1, the 63% at the middleof the column labeled Class of 2006 was calcu-lated by adjusting the 35% passing rate for allspecial education students (diploma track plusnondiploma track) to its corresponding value forthe estimated 56% of special education studentson a diploma track ((35/56) × 100 = 63%).

SBE Decisions. Final decisions about whatthe graduation test was intended to measurewere made by the SBE, the entity with statutoryauthority to do so. The Board determined thatreading, decoding, and paper-and-pencil math-ematics computation were part of the skillsintended to be measured (SBE, 2000). Theseskills were included in prerequisite standardsfrom earlier grades. The high school standardsspecified that standards from all earlier gradeswere part of the knowledge and skills high schoolstudents were expected to have learned.

The Board also determined that studentswhose cognitive disabilities described the inabil-ity to learn decoding, paper-and-pencil computa-tion, or other tested skills would not be permittedto substitute different skills for the ones intendedto be measured by the graduation test. In partic-ular, the Board determined that use of a readerfor the reading test and a calculator for the math-ematics test would change the construct beingmeasured and would not produce comparablescores to those from standard test administra-tions (5 California Code of Regulations (CCR) §1217 (c)).

Students with disabilities in the Class of 2006who expected to earn a high school diploma had7 years’ notice of the requirement to learn thetested skills and at least five opportunities to passthe graduation test by the time of their scheduledgraduation. Special education students could alsoremain in school beyond their scheduled gradu-ation to continue remediation and work towardearning a high school diploma.

Alternate Assessments. The special educationstudents who challenged the graduation test com-plained that some students with disabilities wereunable to access the CAHSEE because theyneeded an alternate assessment. However, at thattime, the NCLB and IDEA statutes requiringalternate assessments for school accountabilitydid not require alternate assessments for gradua-tion tests. In addition, it was not lack of accessthat resulted from a feature of the test thatprevented these students with disabilities fromresponding to the test questions (e.g., a visu-ally impaired student who cannot see the printedwords or a quadraplegic who cannot blacken theovals on the answer sheet). Rather, it was a sit-uation where the student had not learned thetested high-school-level content. The fact that thedisability may have prevented the student fromlearning academic skills at a high school levelwas not an access problem that required an alter-nate assessment but an indication that the studentwas not otherwise qualified for a skills-baseddiploma.

Waiver Policy. The plaintiffs also claimed thatsome otherwise qualified diploma-track studentswith disabilities had been denied modifications

Page 72: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

56 S.E. Phillips

needed to access the CAHSEE. This was appar-ently a misunderstanding because California hada waiver process. Students with disabilities werepermitted to take the CAHSEE with any mod-ifications specified in their IEPs. Students withdisabilities who received the equivalent of a pass-ing score on the CAHSEE with modificationsthat altered the skills intended to be tested couldreceive a waiver of the graduation test require-ment through a local process specified by law(CEC § 60851 (c)). For diploma-track studentswith disabilities who had completed appropri-ate high school level courses such as AlgebraI and could pass the CAHSEE with modifi-cations, the waiver process should have beenstraightforward.

The plaintiffs proposed that students with dis-abilities in the Class of 2006 and beyond whowere unable to pass the CAHSEE be awardeda high school diploma if they had met all othergraduation requirements. However, the state’sexperts countered that exempting special edu-cation students in the Class of 2006 from thegraduation test requirement while continuing torequire passage of the graduation test for reg-ular education students would lower standardsfor the special education students and excludethem from the benefits of remediation and theopportunity to earn a skills-based diploma pro-vided to their nondisabled peers (Phillips, 2010).It would have in effect created a skills-baseddiploma for nondisabled students and a rite ofpassage (seat time) diploma for students with dis-abilities contrary to the intent of the CAHSEEstatute.

In addition, the state’s experts argued that cre-ating different graduation standards for disabledand nondisabled students would provide a sub-stantial incentive for low-performing nondisabledstudents to be reclassified into special educationso they could receive a diploma without pass-ing the CAHSEE. It would also have providedan incentive for limited remedial resources to beredirected away from educating students with dis-abilities who did not have to pass the CAHSEEto nondisabled students who did. Moreover, cre-ating a double standard would also have devaluedthe diplomas of students with disabilities in the

Class of 2006 who had passed the CAHSEE.Finally, the state and its experts argued that ifsome students with disabilities who were on adiploma track had not been taught or had notlearned all the skills covered by the gradua-tion test, the appropriate educational remedy wasadditional remedial education, not the award ofan unearned skills-based high school diploma(Phillips, 2010, Chapter 5).

Legislative Intervention and Settlement. Priorto trial, the legislature intervened and by law pro-vided that students with disabilities in the Classof 2006 who met certain procedural require-ments and all other graduation requirements wereentitled to a high school diploma regardless ofwhether or not they passed the state graduationtest (Senate Bill 517, Jan. 30, 2006, amendingCEC § 60851 and adding CEC § 60852.3). Thesespecial conditions entitling students with disabili-ties to high school diplomas were extended by thelegislature to the Class of 2007 with the provisothat state authorities study the issue and presenta recommendation to the legislature for the Classof 2008 and beyond.

On May 30, 2008, the court approved a finalsettlement of the case. The settlement requiredthe state to contract for an independent studyof students with disabilities who had failed thegraduation test with accommodations or modifi-cations but satisfied all other high school grad-uation requirements. The recommendations fromthe study report were to be considered in formu-lating the state’s response and recommendationsto the legislature. In return, all claims againstthe state by the plaintiff class members from theClasses of 2001 through 2011 were released (seewww.cde.ca.gov).

ELL Challenges to Graduation Tests

The 1999 Test Standards indicate that makinga professional judgment about the validity andappropriateness of administering graduation oraccountability tests to ELLs in English requiresevaluation of the purpose for testing and theconstruct relevance of English language profi-ciency. Standard 9.3 from the chapter on Testing

Page 73: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

3 U.S. Legal Issues in Educational Testing of Special Populations 57

Individuals of Diverse Linguistic Backgroundsstates

The test generally should be administered in the[student’s] most proficient language, unless profi-ciency in the less proficient language is part of theassessment. (p. 98)

ELL graduation test litigation has focused onevaluating the construct relevance of English lan-guage proficiency and the appropriate remedy forany OTL deficiencies.

The dilemma for policymakers in selectingappropriate testing condition variations for ELLsis similar to that faced in deciding how to teststudents with disabilities. There are competingpolicy goals and costs on opposite sides of eachargument. Policymakers must determine whichof the competing goals is most consistent withthe purpose of the state test, is in the best inter-ests of its students, and has affordable costs. Forexample, some policymakers might argue that bythe time students reach high school, they shouldbe receiving instruction in English. An exceptionmight be newly arrived students who may not beliterate in either English or their native languagesif they have not received formal schooling in theirhome countries. Newly arrived students, whomust master academic content and achieve pro-ficiency in English, may need to spend additionalyears in school to earn a high school diploma thatrequires passing a graduation test administered inEnglish. Alternatives, such as translated tests andtesting waivers, may reduce student frustrationand allow ELLs to graduate earlier, but they havethe disadvantage of awarding a credential to stu-dents who have not yet achieved the knowledgeand skills in English needed for postsecondaryeducation and employment.

Cases in Texas (GI Forum, 2000) andCalifornia (Valenzuela v. O’Connell, 2006) havedealt with the application of graduation testingrequirements to ELLs. California state law estab-lished a presumption that ELLs in elementarygrades would be taught in English, but Texaslaw established a presumption of bilingual ele-mentary education unless the student was profi-cient in English. Despite these differing views onELL education, both states successfully defended

their high school graduation tests administered toELLs in English.

The Texas ELL Case

The Texas Education Code required the pro-vision of bilingual education for ELL elemen-tary students and English as a second language(ESL) instruction for high school students (TEC§ 29.053). At each elementary grade level, dis-tricts enrolling at least twenty ELL studentswith the same primary language were requiredto offer bilingual instruction (TEC §§ 29.055,29.056). In schools required to offer it, bilingualinstruction was expected to be a full-time, dual-language program with basic skills instruction inthe primary language, “structured and sequencedEnglish language instruction” and enrichmentclasses, such as art and music, with non-ELLsin regular classes. Electives could be taught inthe primary language. Detailed criteria, includ-ing a home language survey, English languageproficiency testing, and primary language profi-ciency testing, were provided for identifying eli-gible students. Multiple primary languages wererepresented in Texas districts, but the predomi-nant one was Spanish. State accountability testswere administered in both English and Spanishat the elementary grades but in English only atthe upper grades. Hispanic ELLs challenged therefusal of the state to provide a Spanish languagetranslation for the graduation test.

State testing data indicated that ELLs hadmade progress on the Texas graduation test. Forexample, in 1994, 14% of ELLs passed all sub-tests of the graduation test at its initial admin-istration in tenth grade, but by 1999, with moreELL students participating, 31% had done so.The state’s ELL expert opined the following:

To suggest that students should graduate withoutdemonstrating minimal knowledge and skills on auniform measure is not acceptable for the currentrequirements of the technological and informationage job market or for pursuing higher education . . .

A policy of separating language minority students,many of whom are native born, from the rest ofthe student population when the [graduation test] is

Page 74: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

58 S.E. Phillips

administered is more likely to stigmatize and nega-tively impact the self-esteem of these students thanis their inclusion in the tests. (Porter, 2000, p. 409)

The GI Forum court upheld the graduation testadministered in English for all students, includingELLs, stating

[T]he Court finds, on the basis of the evidence pre-sented at trial, that the disparities in test scoresdo not result from flaws in the test or in the wayit is administered. Instead, as the plaintiffs them-selves, have argued, some minority students have,for a myriad of reasons, failed to keep up (or catchup) with their majority counterparts. It may be, asthe [State] argues, that the [graduation test] is oneweapon in the fight to remedy this problem. At anyrate, the State is within its power to choose thisremedy. (pp. 682–683, citations omitted)

The California ELL Case

In the Valenzuela case, the California HighSchool Exit Examination (CAHSEE), effectivefor the Class of 2006, was challenged in statecourt by a group of ELLs just prior to their sched-uled graduation in the Spring of 2006. The ELLssought a court order barring implementation ofthe graduation test requirement for students inthe Class of 2006. They argued that they hadnot received an adequate opportunity to learnthe tested material because they attended schoolslacking fully aligned curricula and fully creden-tialed teachers. They asserted that this lack ofOTL was a denial of their (state) fundamentalright of equal access to public school educa-tion. In addition, they alleged that ELLs hadbeen disproportionately affected by the scarcityof resources in poor districts.

The state argued that it was appropriateand valid to administer the graduation test inEnglish to ELLs, even when English was not theELLs’ most proficient language, because statelaw required English proficiency for receipt ofa high school diploma and the purpose of theCASHEE was to determine academic proficiencyin English. The California Code of Regulationsprovided specific rules for the administrationof the CAHSEE to ELLs with “accommoda-tions” including supervised testing in a separate

room, additional supervised breaks and extra timewithin a testing day, translated directions, andtranslation glossaries if used regularly in instruc-tion (5 CCR § 1217).

The Valenzuela lawsuit was filed in February2006. On May 12, 2006, the trial court grantedthe requested injunction barring the state fromimposing the testing requirement on any studentin the Class of 2006. On May 24, 2006, theCalifornia Supreme Court stayed the injunctionpending review and decision by the appeals court.The ELLs’ request to the appeals court for animmediate hearing was denied, and the gradua-tion test requirement remained in force for theClass of 2006.

In August 2007, the appeals court issued itsdecision vacating the injunction issued by thetrial court. The appeals court noted that neitherthe graduation test requirement nor the validityof the CAHSEE was being challenged. Further,the court held that even if some ELLs had notreceived an adequate opportunity to learn thetested material, the appropriate remedy was pro-vision of the missed instruction, not removal ofthe test requirement or the award of diplomas bycourt order. The court stated

Within the borders of California, until our schoolscan achieve [academic parity, the CAHSEE] pro-vides students who attend economically disadvan-taged schools, but who pass the [graduation test],with the ability to proclaim empirically that theypossess the same academic proficiency as studentsfrom higher performing and economically moreadvantaged schools. Granting diplomas to studentswho have not proven this proficiency debases thevalue of the diplomas earned by the overwhelm-ing majority of disadvantaged students who havepassed the [test].. . . We believe the trial court’s[order] erred by focusing its remedy on equalaccess to diplomas rather than on equal access toeducation (and the funding necessary to provideit).. . . The purpose of education is not to endowstudents with diplomas, but to equip them withthe substantive knowledge and skills they need tosucceed in life. (p. 18, 27)

The appeals court also found that the scope ofthe remedy (removal of the test requirement forall students) was overbroad because it provided apotential windfall to students who could not tracetheir test failure to inadequate school resources.The court stated

Page 75: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

3 U.S. Legal Issues in Educational Testing of Special Populations 59

[T]he ostensibly interim relief of forcing the“social promotion” of [Class of 2006 students], byordering that they be given diplomas, in fact doesnot maintain the status quo of the litigation, butends it. Surely the trial court did not expect thatif [the state] ultimately prevailed in the litigation,students would give back the diplomas they hadreceived under the mandate of the court’s [order].. . . [D]irecting [the state] to give [Class of 2006students] diplomas . . . would inadvertently haveperpetuated a bitter hoax: that the [court-ordereddiplomas] somehow would have equipped themto compete successfully in life, even though theyhad not actually acquired the basic academic skillsmeasured by the CAHSEE.. . . Plaintiffs virtuallyconcede[d] the overbreadth of the trial court’sinjunction in their argument that some [class mem-bers] “actually know the material, but do not passthe [graduation test] due to test anxiety.” But plain-tiffs have not argued, much less established, thatthere is any constitutional violation involved indepriving a student of a diploma when he or she hasin fact received the educational resources requiredto pass the CAHSEE, but has not been able to doso because of “test anxiety.” (p. 19, 28, 30)

The court’s opinion concluded by urging theparties, with the active assistance of the trialcourt, to work together to provide all seniorsin the Class of 2007 and beyond who had notyet passed the graduation test an equal accessto effective remedial assistance. The case wassettled in October 2007 with legislation provid-ing up to 2 years of extra help beyond highschool for students unable to pass the high schoolgraduation test (Jacobson, 2007).

Accountability Testing Challenges

Challenges to state accountability tests used toevaluate schools have focused on the appropri-ateness of testing ELLs in English rather thantheir primary language. Early challenges werebased on state accountability systems, and morerecent litigation has focused on state tests man-dated by the No Child Left Behind (NCLB) Act(2002). A discussion of cases from California andPennsylvania follows after a brief review of theNCLB Act and its ELL provisions, ELL testingvariations and related policy issues.

No Child Left Behind (NCLB) Act

In 2002, as a condition for receipt of TitleI federal funding for remedial education forlow-performing, disadvantaged students, the fed-eral NCLB Act mandated that each state estab-lish challenging academic content standards andannual assessments in reading and mathematicsfor all students in grades 3–8 and at least onegrade in high school by 2006. The challeng-ing academic content standards were required todescribe what students were expected to knowand do at each grade level and be the same forall students. Achievement standards applied tograde-level assessments which were aligned tothe reading and mathematics state content stan-dards were required to include at least threelevels: advanced, proficient, and basic. NCLBassessments were also required to be valid, reli-able, and consistent with professional standardsand had to include alternate assessments for stu-dents with the most severe cognitive disabili-ties. Subsequently, U.S. Department of Education(U.S. DOE) regulations also permitted modifiedassessments (with content at grade level) for stu-dents with disabilities who did not qualify foralternate assessments but for whom the regu-lar statewide assessments were not appropriatedue to “persistent academic difficulties” (NCLBRegulations, 2006).

The NCLB Act also required each stateto develop accountability measures of schoolprogress toward the goal of all students achiev-ing proficient test scores by 2014. The adequateyearly progress (AYP) determination for eachschool was required to be based on• primarily test scores,• plus graduation rates and at least one addi-

tional academic indicator,• for all students,• and for ethnic, ELL, students with disabil-

ities (SD), and economically disadvantaged(ED) subgroups of statistically reliable and notpersonally identifiable size,

• using a baseline established by the higher ofthe 2002 subgroup with the lowest percent

Page 76: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

60 S.E. Phillips

proficient and above or the state’s 20th-studentpercentile rank,

• and annual state proficiency targets meeting atimeline of all students proficient by 2014,

• with consequences for consistently failingschools, and

• participation in NAEP fourth and eight gradereading and mathematics assessments.States were permitted to count up to 1% of

their students as proficient on alternate assess-ments and up to 2% of their students as proficienton modified assessments.

NCLB ELL Provisions

Under the NCLB Act and its Regulations, allELLs were required to be tested, but states werepermitted to exempt from AYP calculations thescores of ELLs in the United States less than oneyear. ELLs were required to be tested• on the same grade-level content standards as

all other students,• with measures most likely to yield valid and

reliable results,• with reasonable accommodations,• to the extent practicable, in the language and

form most likely to yield accurate data untilthey are English proficient. (20 U.S.C. § 6311)[emphasis added]Similar to the decisions about content stan-

dards, proficiency standards, subgroup sizes forreporting results, and annual school targets thatwere left to the states, with respect to ELLs, theNCLB Act and its Regulations permitted eachstate to decide what was practicable, the criteriafor English proficiency, reasonable accommoda-tions for ELLs, and the language and form oftesting that best aligned to the content standardsrequired of all students in the state.

The NCLB permitted, but did not require, astate to use alternative tests for ELLs. For stateschoosing to administer alternative tests to ELLs,the NCLB Act and its Regulations specified thatsuch tests must be valid, reliable, and alignedto content standards at grade level. Through itspeer-review process, the U.S. DOE signaled itsinterpretation that states administering alternative

tests to ELLs for NCLB accountability purposeswere required to provide evidence of alignment tograde-level content standards and comparabilityto the regular, on-grade-level tests administeredto non-ELLs. The compliance status letters issuedto states in July 2006 indicated that the U.S.DOE questioned 18 states’ evidence of grade-level alignment and comparability of primarylanguage or simplified English tests administeredto ELLs.

ELL “Accommodations”and Modifications

Many states provide “accommodations” for ELLstudents. However, the term accommodation isprobably inappropriate because lack of languageproficiency is not a disability. Disabilities aregenerally thought to describe characteristics overwhich students have no control and generallyare not reversible over time. However, ELL stu-dents can become proficient in English throughinstruction.

To the extent that a test intends to measurecontent skills in English, any nonstandard admin-istration that provides assistance with Englishis providing help with a skill intended to bemeasured. Thus, the nonstandard administrationis compensating for a construct-relevant factor,not an extraneous factor. This is clearly con-trary to the definition of an accommodation.Therefore, if testing variations (i.e., bilingual dic-tionaries or responses in the native language)are provided to give ELLs greater access to astate test, they should be labeled and treated asmodifications.

Majority and Minority ELLs

A few states have provided translated tests forsome ELLs. However, existing resources typi-cally support, at most, a handful of translated teststhat meet professional standards. In many cases,there may be only enough resources to trans-late the state test for the majority ELL languagegroup.

Page 77: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

3 U.S. Legal Issues in Educational Testing of Special Populations 61

The equal protection clause of the U.S.Constitution requires similarly situated personsto be treated equally. Court cases based on thisclause have invalidated educational programs thatfavored a majority ethnic group. In particular,any allocation of benefits based on race or eth-nicity has been considered suspect, and the highstandards required by the courts to justify suchprograms have rarely been met.

A testing program that provides a benefit ofprimary language testing to ELL students whospeak one non-English language (Language 1),but denies that same benefit to ELL studentswho speak all other non-English languages, hastreated similarly situated students unequally. Inthe context of the ELL classification, major-ity group ELLs (Language 1 speakers) wouldbe treated differently than minority group ELLs(speakers of all other non-English languages).However, both majority and minority ELL lan-guage groups are similarly situated in that theylack proficiency in English. Using numericaldominance to justify providing translated tests insome languages and not others is unfair to theELL students who do not receive this benefit andmay constitute an equal protection violation.11

Construct Shift

The argument is often made that students shouldnot be assessed in a language in which theyare not proficient (Fraser & Fields, 1999). Theintended reference is to nonnative speakers ofEnglish. However, there are native speakers whoperform poorly on tests given in English becausethey are also not proficient in English. Yet, for thenative speaker, test administrators rarely worryabout the effects of language proficiency on test

11 Psychometric standards support testing students in thelanguage in which they receive instruction. However,in many cases, bilingual instruction may only be avail-able for a single language. Therefore, the question stillremains whether it is fair and equitable to provide pri-mary language instruction and testing to the majority ELLlanguage group but not to other minority ELL languagegroups.

performance, and typically these students do nothave the effects of poor language skills removedfrom their scores. Thus, the argument seems tobe that the effects of lack of English proficiencyshould be removed from test scores for nonna-tive speakers but not native speakers, althoughboth may need intensive additional instruction inEnglish to achieve proficiency. This is anotherexample of a construct shift – altering the testedconstruct based on group membership rather thanfollowing the guidelines in the Test Standardsthat refer to the tested construct as a property ofthe test. Cases from California and Pennsylvaniaconsidered the issue of construct relevance indeciding whether, for state accountability pur-poses, ELLs whose least-proficient language wasEnglish could be tested in English.

California State Law Case (2000)

A 1997 California law required that an achieve-ment test designated by the State Board ofEducation be administered annually to all stu-dents in grades 2 through 11. The designatedachievement test was the Stanford AchievementTest Ninth Edition (SAT-9) plus an additionalset of items selected to measure state standardsnot measured by the SAT-9. Test scores wereused for school accountability with scores for stu-dents enrolled for less than 12 months excludedfrom the computation of a school’s accountabil-ity index. There were no state-imposed conse-quences for individual students.

The case began when the San FranciscoUnified School District (SFUSD) refused toadminister the SAT-9 in English to ELLs withless than 30 months (three school years) ofpublic school instruction (ELLs<30) unless spe-cially designated by their teachers. The statesought a court order to enforce the testing leg-islation. The Oakland, Hayward, and BerkeleyUnified School Districts joined SFUSD in thelawsuit claiming that administration of the SAT-9to ELLs<30 was unfair because the test mea-sured English language proficiency in additionto content knowledge. They argued that ELLstudents who lacked English proficiency should

Page 78: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

62 S.E. Phillips

either be tested in their primary language orbe exempt from testing. State law provided thatELLs with less than 12 months of public schoolinstruction (ELLs<12) be administered a secondachievement test in their primary language whenavailable. ELLs<12 were also eligible for testingmodifications when tested in English (Cal. Dep’tof Educ. v. San Francisco Unified Sch. Dist.,1998).

Plaintiffs’ experts opined that ELLs<30 wouldsuffer psychological harm from taking the SAT-9in English because their low scores would be stig-matizing, would diminish their self esteem, andwould cause them to be inappropriately placedin special education programs and portrayed ashaving inferior employment skills. In addition,plaintiffs’ experts argued that ELLs<30 wouldscore at the chance level resulting in unreliabletest scores. In response, the State argued that areasonable interpretation of state law indicatedan intent to measure academic skills in English,a fair accountability system required the inclu-sion of all students, the districts and their ELLstudents benefited from the receipt of state fundstargeted toward the improvement of academicskills for low-scoring students, the districts failedto show that any ELLs were harmed by the testadministration, and available test data demon-strated that most ELLs scored above chance andtheir test scores were reliable. In addition, thestate argued 30 months was an arbitrary exclusioncriterion and that there was significant overlapin the performance of ELLs<30 and ELLs<30 inPlaintiff Districts. Further, over the 3-year periodthe SAT-9 had been administered statewide, ELLshad made substantial gains in some districts(Phillips, 2010, Chapter 6).

The SFUSD case subsequently settled out ofcourt just prior to trial. In the settlement, the dis-tricts agreed to administer the state-designatedachievement test to all ELL students as pro-vided by state law. The state agreed to clarify therules regarding educator communications withparents about exemptions, to consider Englishlanguage proficiency test scores, among otherfactors, when evaluating school waiver requestsand to make other minor modifications to pro-gram procedures.

NCLB Cases

The major testing cases under the NCLB Actto date have involved ELLs. Two state courts,Pennsylvania (Reading Sch. Dist. v. Pa. Dep’t ofEduc., 2004) and California (Coachella Valley v.California, 2007), have ruled on the mandates ofthis federal law. Among other rulings, both statecourts held that the NCLB Act did not requirestates to provide primary language testing forELLs. To some extent, the California CoachellaValley case appeared to be a reprise of the sameissues litigated in the SFUSD case, only with adifferent set of California school districts com-plaining about the state requirement to test ELLsin English.

The Reading School District Case

In a challenge by a school district with 69%economically disadvantaged and 16% ELL stu-dents, a Pennsylvania court determined that thestate testing agency appropriately exercised itsdiscretion under the NCLB Act when it madepsychometric decisions related to ELL testingpolicy. Specifically, the court upheld the state’sdetermination that primary language testing wasnot practicable with 125 languages represented inPennsylvania schools and found no NCLB vio-lation because primary language testing was notmandatory under the NCLB.

The Coachella Valley Case

Nine California school districts enrolling largenumbers of Spanish-speaking ELLs asked a statecourt to order the state to provide NCLB testsfor ELLs in Spanish or to provide these studentswith simplified English versions of the tests. Thedistricts argued that the statutory language of theNCLB Act required the state to provide primarylanguage testing for ELLs. On May 25, 2007, thecourt denied the request.

The court agreed with the state argumentthat the NCLB provided discretionary author-

Page 79: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

3 U.S. Legal Issues in Educational Testing of Special Populations 63

ity to states to determine appropriate testing forELLs and held that California’s decision to testELLs in English was not an abuse of its dis-cretion. Therefore, the court held that it did nothave the legal authority to issue an order tothe state requiring a change in its ELL testingpolicy.

In reference to primary language testingof ELLs, the NCLB Act used the qualifyingphrase “to the extent practicable.” The AmericanHeritage Dictionary defines practicable as “feasi-ble and capable of being used for a specified pur-pose.” The state argued that using primary lan-guage tests in Spanish as an alternative account-ability test for some ELLs was not practicable inCalifornia because of the following:• Existing Spanish language tests could not be

used to assess ELLs with the same ELA andmathematics content and performance stan-dards at grade level as non-ELLs as requiredby the NCLB accountability provisions and inEnglish as provided by California law.

• It was not feasible to provide the same ben-efit to the significant numbers of CaliforniaELLs who spoke other primary languages dueto insufficient resources to produce alterna-tive tests in all relevant languages. Providingprimary language tests for ELLs who spokeone language but not for ELLs who spokeother languages would have been contrary tothe Test Standards fairness requirement that“The testing [process] should be carried out sothat [students] receive comparable and equi-table treatment .. . .” (Standard 7.12, p. 84).Moreover, due to differences in language andculture likely to produce differential align-ment to the content standards, inherent diffi-culties in establishing equivalent performancestandards, and inconsistency with the man-dates of California law requiring ELLs tobe instructed primarily in English, satisfyingNCLB peer review with even a single primarylanguage test may have been unattainable inCalifornia.

• Providing primary language tests for ELLswho spoke one language but not for ELLswho spoke other languages may have been

an equal protection violation because ELLstudents who were similarly situated (lackedEnglish language proficiency) would havebeen treated differently (Phillips, 2010,Chapter 6).As in the Valenzuela case, the State in the

Coachella Valley case argued that the appropriateremedy for ineffective instruction was additional,improved, remedial instruction, not less valid testscores that indicated achievement of differentskills than intended. In refusing to issue an ordercompelling the state to change its ELL testingpolicy, the court stated

[G]iven that California has determined to teachstudents who lack English proficiency largely inEnglish, it cannot be said that a decision to assessthese same students in English for purposes ofNCLB is arbitrary and capricious.

Further, given the extensive range of possi-ble primary languages of students lacking Englishproficiency, it is certainly neither arbitrary norcapricious for California to determine that trans-lation and evaluation of assessments in multiplelanguages is not practicable and that, accord-ingly, administration of assessments will be inEnglish, the single language confirmed by the vot-ers through [a ballot initiative] as the “official”language of our educational system.. . .

The task for this court . . . is not to chooseamong competing rational alternatives and thenmandate the judicially chosen one. To the contrary,decisions such as how to assess student perfor-mance for purposes of NCLB are best left to otherbranches of the government that are better suitedto such matters and, so long as they do not act inan arbitrary, capricious, unlawful or procedurallyunfair manner, great deference must be afforded totheir decisions.. . .

California’s manner of conducting studentassessment for the purposes of NCLB does not vio-late any ministerial duty created by statute, nor as amatter of law does it constitute an abuse of any dis-cretionary authority. Therefore, . . . [the districts’]motion [to compel a change in policy] is denied.(p. 24, 27)

Recommendations

The following recommendations are based oncurrent legal requirements and psychometricprinciples. Specific implementation details may

Page 80: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

64 S.E. Phillips

vary depending on the configuration of a statetesting program and its purposes, implementingstate legislation and administrative regulations.These recommendations can be used as a start-ing point for the development of state nonstan-dard test administration policies for students withdisabilities and ELLs.1. Require the IEP/504 committees or local ELL

program directors to select one of the fol-lowing exhaustive, mutually exclusive testingcondition categories for each student with adisability or ELL student. Allow test admin-istrators to provide the accommodations inCategory II (below) for nondisabled studentswhen appropriate.

I. Standard administration conditions.II. Accommodated administration using one

or more of the testing variations desig-nated by subject matter test in the statetest administration manual as accommo-dations that preserve the intended con-struct and result in comparable scores.Scores from test administrations in thiscategory receive the same interpretiveinformation as Category I scores and maybe aggregated with Category I scores.

III. Modified administration using testingcondition variations that are not on thestate test accommodations list but are reg-ularly provided in the student’s instruc-tional program. These may be modifiedtests as defined by U.S. DOE NCLBRegulations. Students tested with modifi-cations should receive score reports withinformation corresponding to the par-ticular test taken. If the on-grade-levelstate test has been modified, on-grade-level achievement levels, pass/fail desig-nations, or associated normative informa-tion should not be reported and CategoryIII scores should not be aggregated withCategory I and II scores unless the modi-fied tests have been statistically linked tothe regular tests.

IV. Alternate assessment for students whocannot access the regular test with mod-ifications or have been instructed with

an alternative curriculum that consists ofenabling or essence skills related to theacademic skills measured by the regularstate test. These may be alternate assess-ments as defined by U.S. DOE NCLBRegulations. Category IV scores shouldbe reported and interpreted separately.

2. Develop a list that classifies specific nonstan-dard test administrations for the state test asaccommodations or modifications. Considerwritten state content standards, test specifi-cations, preservation of the intended skills(construct relevance), and score comparabil-ity. Aggregate and report scores accordingly.If a school/parent requests a testing conditionvariation not on the list, refer them to a con-tact person who has been designated to receiveand act on written requests. The contact per-son may be aided by outside consultants inmaking a decision on each request. Add thesedecisions to the appropriate list for the nextstate test administration.

3. Permit the parent(s)/guardian(s) of any stu-dent (with or without a disability, ELL, ornon-ELL) to request a category I, II, III,or IV state test administration by signing awritten form that includes full disclosure ofthe options and their consequences. RequireIEP/504 committees to include a form inthe IEP, signed by school personnel and theparent(s)/guardian(s), documenting consider-ation of the available testing options and thefinal decision of the committee.

4. Collect information about the specific disabil-ity or ELL status and specific accommoda-tions or modifications on the student answersheet. Conduct separate research studies whensufficient numbers of students and resourcesare available.

5. Consider paying the costs of accommodatedtest administrations from state testing pro-gram funds and requiring local districts topay at least some of the costs of modifiedtest administrations (except state NCLB 1 and2% tests). Provide detailed written guidelinesto aid local districts in making accommoda-tions/modifications decisions.

Page 81: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

3 U.S. Legal Issues in Educational Testing of Special Populations 65

6. For ELLs, use the required NCLB English lan-guage proficiency test scores and primary lan-guage achievement test scores (where avail-able for ELLs receiving bilingual instruction)to augment the interpretation of the state testscores. This information will assist users injudging the effects of language proficiency onthe achievement of academic skills in Englishand assist them in determining appropriatefuture instructional strategies for these stu-dents.

7. If policymakers decide to allow scores frommodified test administrations to be interpretedthe same as scores from standard administra-tions, they should acknowledge that a benefitis being conferred on eligible students, andadopt policies that require appropriate writ-ten documentation and evaluation to deter-mine which students qualify for the benefit.Eligibility criteria for which written documen-tation should be required include (a) certi-fication of the disability by a trained pro-fessional, (b) confirmation of regular use ofthe modification by the student in the class-room, (c) explanation of the rationale for andrelationship of the requested modification tothe specific disability, and (d) certificationof the impossibility of the student accessingthe test without the requested modification.School and/or department officials may beaided by outside consultants when evaluatingwritten documentation and should follow awritten policy. Consider alternate credentialssuch as Certificates of Completion or tran-script and/or diploma notations that identifyscores obtained with modifications. Possiblenotations include general statements such as“tested with modifications” or descriptionsof the modification (e.g., reader, calculator)without identifying the specific disability.

8. Establish an appeals procedure for personsdesiring to challenge denial of a nonstandardtest administration. Such a procedure mightbegin at the local school level and be review-able by state officials upon request.

9. Create a written state ELL testing policy thatanswers the following questions for the stateNCLB tests, graduation tests, and other state

tests, if any, by content area (e.g., ELA, math-ematics, science).A. Is English language proficiency construct

relevant for the tested content? If yes,administer the regular, on-grade-level testto ELLs. If no, consider alternatives.

B. If an alternative test is to be providedto ELLs, will it be a primary languagetest, translated test, or simplified Englishversion of the regular test?

C. If an alternative test is provided, what arethe criteria for ELL eligibility to take thealternative test in place of the regular test(e.g., length of time in U.S. schools, levelof English language proficiency)?

D. How will the state establish comparabilityof its ELL alternative tests to its regu-lar, on-grade-level tests (e.g., alignmentstudies, linking studies)?

E. What is the timeline for developing thestate’s ELL alternative tests? Will the sametest development procedures the state usesfor its regular tests be followed? If not,how will consistency with psychometricstandards be assured?

F. Has the state established expectations forannual progress of ELLs in English lan-guage proficiency? Are policies in place toidentify schools where ELLs are remain-ing at beginning levels of English languageproficiency for too long?

G. Does the state have policies and proce-dures for monitoring schools’ provisionof appropriate English language and statecontent standards instruction to ELLs?

H. What assistance will the state provide toeducators for professional development toadminister the ELL alternative tests, toidentify weaknesses based on test results,and to design/develop appropriate instruc-tional programs to correct deficiencies?

I. If the state has a graduation test, will ELLsreceive the same skills-based diploma asnonELLs, be awarded a certificate of com-pletion if unsuccessful by their sched-uled graduation, be permitted to remainin high school an extra year(s) or have

Page 82: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

66 S.E. Phillips

other options for earning a high schooldiploma after their senior year, and beallowed to substitute an alternative test orbe exempted from the testing requirement?

J. Have ELL state course credit requirementsfor a high school diploma been alignedwith the skills tested on the state’s grad-uation test, if any?

K. Will the policies for recent immigrants bedifferent than for other ELLs?

L. Which ELL testing modifications (e.g.,extra breaks, individual administration,word glossaries, reader) will be permittedon the regular or alternative tests and whatcriteria (e.g., regular use for classroomtests) must be satisfied for an ELL to qual-ify for a modification? Has the state (orhave local schools) established an appealsprocedure for persons desiring to challengean unfavorable modifications decision?

ConclusionMaintaining a defensible testing program isa challenging but achievable task. Tests arevisible and accessible targets for those whodisagree with state educational policy deci-sions. Challenges can come from a variety ofgroups on an assortment of issues. To be pre-pared, state testing programs must follow legaland psychometric standards, comprehensivelydocument program decisions and activities,and work closely with legislators, adminis-trators, and educators. Cooperation betweenstaff members from the state agency, localschool districts, and the testing contractor isessential for success. Carefully crafted policydocuments covering nonstandard test admin-istrations and ELL testing policies may assiststates and school districts to provide acces-sible tests for special populations that satisfylegal and psychometric guidelines for con-struct preservation and comparable scores.When policymakers decide to treat test scoresfrom modified test administrations the sameas those from regular and accommodated testadministrations, test score consumers shouldbe alerted that the modified test scores are notcomparable (or evidence of comparability is

lacking) and be assisted in interpreting andusing these scores appropriately.

References

Advocates for Special Kids v. Oregon Dep’t of Educ.,Settlement Agreement, No. CV99-263 KI (2001,February 1).

American Educational Research Association, AmericanPsychological Association, & National Councilon Measurement in Education. (1999). Standardsfor educational and psychological testing [TestStandards]. Washington, DC: American PsychologicalAssociation.

Americans with Disabilities Act (ADA), Pub. L. No. 101-336, 42 U.S.C. §12101 et seq. (1990).

Americans with Disabilities Act (ADA) Regulations, 28C.F.R. § 36.309 (1992).

Anderson v. Banks, 520 F.Supp. 472 (S.D. Ga. 1981),reh’g, 540 F.Supp. 761 (S.D. Ga. 1982).

Bd. of Educ. of Northport-E. Northport v. Ambach,436 N.Y.S.2d 564 (S.C. N.Y. 1981), rev’d458 N.Y.S.2d 680 (N.Y. App. 1982).

Bd. of Educ. v. Rowley, 458 U.S. 176 (1982).Brookhart v. Illinois State Bd. of Educ., 534 F. Supp. 725

(C.D. Ill. 1982), rev’d, 697 F.2d 179 (7th Cir. 1983).Cal. Dep’t of Educ., Interpreting CAHSEE Scores 2004–

2005, available at www.cde.ca.govCal. Dep’t of Educ. (CDE) v. San Francisco Unified

Sch. Dist. (SFUSD), No. C-98-1417 MMC (N.D. Cal.1998); San Francisco Unified Sch. Dist. (SFUSD) v.State Bd. of Educ. (SBE), Settlement Agreement, No.99-4049 (Cal. Sup. Ct., Nov. 2000).

Cal. State Bd. of Educ. (2000), Item 28 of December6–7 Board Minutes, Item 31 of October 10–11 BoardMinutes.

Chapman (Kidd) v. Cal. Dep’t of Educ., 229 F. Supp.2d981 (N.D. Cal. 2002), No. C 01-01780 CRB (N.D. Cal.Mar. 12, 2003, Sept. 5, 2003).

Coachella Valley v. California, No. CPF-05-505334 (Cal.Sup. Ct., May 25, 2007).

Debra P. v. Turlington, 644 F.2d 397 (5th Cir. 1981), aff’d,730 F.2d 1405 (11th Cir. 1984).

Elliott, S. N., Kettler, R. J., Beddow, P. A., Kurz, A.,Compton, E., McGrath, D., et al. (2010). Effects ofusing modified items to test students with persis-tent academic difficulties. Exceptional Children, 76(4),475–495.

Fraser, K., & Fields, R. (1999, February). NAGB pub-lic hearings and written testimony on students withdisabilities and the proposed voluntary national testOctober–November 1998, Synthesis Report.

G.I. Forum v. Texas Education Agency (TEA), 87F.Supp.2d 667 (W.D. Tex. 2000).

Golden, Daniel (2000, January 21). Evening the score:Meet edith, 16; she plans to spell-check her statewriting test. The Wall St. J., at A1.

Page 83: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

3 U.S. Legal Issues in Educational Testing of Special Populations 67

Gorin, J. (2010, May). Enhanced assessment item devel-opment: An item difficulty modeling approach. Paperpresented at the NCME annual meeting, Denver, CO.

Huesman, R. Jr., & Frisbie, D. A. (2000). The valid-ity of ITBS reading comprehension test scores forlearning disabled and non learning disabled studentsunder extended-time conditions. Paper presented at theAERA annual meeting, New Orleans, LA.

HumRRO (2006, March 28). January 2006 Update, PressRelease, Cal. Dep’t of Educ.

Individuals with Disabilities Education Act (IDEA), Pub.L. No. 102-119, 20 U.S.C. §1400 et seq. (1991).

Jacobson, L. (2007). California offers long-term help withexit exams. Educ. Week, Oct. 24, 2007, at 23.

Meloy, L. L., Deville, C., & Frisbie, D. (2000, April). Theeffect of a reading accommodation on standardizedtest scores of learning disabled and non learning dis-abled students. Paper presented at the NCME annualmeeting, New Orleans, LA.

Murphy v. United Parcel Service (UPS) Inc., 527 U.S. 516(1999).

No Child Left Behind (NCLB) Act, 20 U.S.C. §§ 6301–6578 (2002).

No Child Left Behind (NCLB) Regulations, 34 C.F.R. §200.1 et seq. (2006).

Phillips, S. E. (1993, March 25). Testing accommodationsfor disabled students. 80 Ed. Law Rep. 9.

Phillips, S. E. (1994). High-stakes testing accommo-dations: Validity versus disabled Rights. AppliedMeasurement in Education, 7(2), 93–120.

Phillips, S. E. (2000). G.I. Forum v. TEA: Psychometricevidence. Applied Measurement in Education, 13,343–385.

Phillips, S. E. (2002). Legal issues affecting special popu-lations in large-scale testing programs. In G. Tindal &

T. Haladyna (Eds.), Large-scale assessment programsfor all students (Vol. 109, pp. 109–148), Mahwah, NJ:Lawrence Erlbaum.

Phillips, S. E. (2010). Assessment law in education,Chapters 1, 2, 5, 6 and 9. Prisma Graphics, Phoenix,AZ, available at www.SEPhillips.dokshop.com

Phillips, S. E. & Camara, W. J. (2006). Legal and ethicalissues. In R. L. Brennan (Ed.), Educational measure-ment (4th ed., Vol. 733, pp. 733–755) Westport, CT:American Council on Education/Praeger.

Porter, R (2000). Accountability is overdue: Testing theacademic achievement of limited-english proficient(LEP) students. Applied Measurement in Education,13(4), 403.

Reading Sch. Dist. v. Pa. Dep’t of Educ., 855 A.2d 166(Pa. Commw. Ct. 2004).

Rene v. Reed, 751 N.E.2d 736 (Ind. App. 2001).Section 504 of the Rehabilitation Act, 29 U.S.C. §701 et

seq. (1973).Section 504 Regulations, 45 C.F.R. § 84 et seq. (1997).Southeastern Community College v. Davis, 442 U.S.

397,406 (1979).Stoica, W. M. (2010, May). Recommendations for AA-

MAS item modifications: Utilizing information fromfocus groups and surveys with special educators. Paperpresented at the NCME annual meeting, Denver, CO.

Sutton v. United Air Lines, Inc., 527 U.S. 471 (1997).Tindal, G. (1999, June). Test accommodations: What are

they and how do they affect student performance?CCSSO large scale assessment conference, Snowbird,UT.

Valenzuela v. O’Connell, No. JCCP 004468 (Cal. Super.Ct., May 12, 2006), rev’d, O’Connell v. SuperiorCourt, 47 Cal. Rptr.3d 147 (Cal. App. 2006).

Page 84: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy
Page 85: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

4IEP Team Decision-Making forMore Inclusive Assessments:Policies, Percentages, and PersonalDecisions

Naomi Zigmond, Amanda Kloo,and Christopher J. Lemons

IEP Team Decision Making for MoreInclusive Assessments

Since the 2001–2002 school year, the account-ability provisions of the No Child Left BehindAct (NCLB, 2001) have shaped much of thework of public school teachers and administra-tors in the United States. NCLB required eachstate to develop content and achievement stan-dards in several subjects, to administer tests tomeasure students’ attainment of those standards,to develop targets for student performance onthose tests, and to impose a series of sanctionson schools and districts that did not meet thetargets. Together, the standards, assessments, andconsequences constitute a standards-basedaccountability system. State assessments arethe mechanism for determining whether schoolshave been successful in teaching students theknowledge and skills defined by the contentstandards. The accountability provisions ensurethat schools are held accountable for educationalresults. Many states had such a system in placebefore NCLB took effect, but since 2001–2002,every state in the United States has had to developand implement a standards-based accountabilitysystem that meets the requirements of the law.This mandate has affected every public school

N. Zigmond (�)University of Pittsburgh, Pittsburgh, PA 15260, USAe-mail: [email protected]

student, every public school, and every district inthe nation.

The origins of federally required accountabil-ity for educational outcomes date to the 1994reauthorization of the Elementary and SecondaryEducation Act (Improving America’s SchoolsAct, Public Law 103-382, October 20, 1994). Anoften overlooked stipulation of the 1994 ESEAreauthorization was that each State ensures “par-ticipation in such assessments of all students”(Improving America’s Schools Act, 1994, Title I,Subpart 1, section 1111 (b)(3)(F)(i)). Previously,students with disabilities had been exempt fromparticipation in school level or district level stan-dardized testing requirements. On-grade levelperformance was not expected of these students(Crawford, Almond, Tindal, & Hollenbeck, 2002;Thurlow, Seyfarth, Scott, & Ysseldyke, 1997;Ysseldyke & Thurlow, 1994). And, because theprimary target of the ESEA regulation was theTitle 1 program in a school or district, andaccountability in federal law (PL 94-142 in 1975and the Individuals with Disabilities EducationAct [IDEA] of 1990) tended to focus on procedu-ral compliance and not on achievement outcomes,little attention was actually paid to the ESEA1994 requirement of full participation.

The situation changed in 1997. The reautho-rization of IDEA (US Department of Education,1997) reiterated the call for full participation ofstudents with disabilities in statewide and district-wide assessment programs, through the use ofreasonable adaptations and accommodations. The

69S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4_4, © Springer Science+Business Media, LLC 2011

Page 86: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

70 N. Zigmond et al.

IDEA amendments asserted that the historicalunderachievement of students with disabilitieswas linked to low expectations for learning andscant access to the general education curricu-lum (Koenig & Bachman, 2004). Mandatingthat students with disabilities participate in high-stakes accountability assessments would promotequality assurance in special education (Defur,2002). The assumptions were that participationraises the stakes, in turn yielding higher expec-tations, leading to increased participation in gen-eral education, which promotes better teachingand results in improved academic outcome forstudents with disabilities (see Fig. 4.1).

This time, the special education communitytook notice and the general education communitydid as well when the assessment and account-ability regulations in NCLB (2001) emergedas a logical extension of these IDEA 97 pro-visions. NCLB explicitly prohibited schoolsfrom excluding students with disabilities fromthe accountability system. NCLB restated therequirement for participation of all students instatewide accountability assessments and report-ing of the results for students with disabilitieswith everyone else’s and as a disaggregatedgroup. Furthermore, students with disabilitieswere to be held responsible for the same aca-demic content and performance standards aseveryone else.

IDEA 2004, Section 612, Part B echoed theNCLB call for including all students with dis-abilities in state and district-wide assessment pro-grams. It called for tests to adhere to universaldesign principles to the extent feasible and to con-tain bias-free test items; simple, clear instructionsand procedures; maximum readability and com-prehensibility; and optimal legibility from the

start. In that way, tests would be accessible for asmany students as possible and the vast majorityof students, including the vast majority of thosewith disabilities, would participate in the regu-lar assessment or in the regular assessment withaccommodations approved and widely dissemi-nated by the State. The general idea was that theState had a responsibility to create a testing envi-ronment that ensured that as many students aspossible could take the general assessment in away that produced valid and meaningful results.This orientation was consistent with the grow-ing commitment in law and public policy to fullinclusion of students with disabilities in generaleducation classrooms and access of all students,even those with severe disabilities, to the generaleducation curriculum. The orientation also madesense. By excluding no one from the accountabil-ity requirement, it would be possible to answertruthfully the fundamental accountability ques-tion: How well is each school doing in bringingall of its students up to standard? The answer tothis question lay in the annual reporting of thepercent of students scoring proficient or advancedon the annual accountability test.

Despite the explicit commitment to all stu-dents being assessed on the regular test, fromthe start, the federal government introduced someflexibility. Beginning in December 2003 (FederalRegister, 2003) States were given the option todevelop alternate achievement standards to mea-sure the progress of students with the most sig-nificant cognitive disabilities. A cap (1.0% of thenumber of students enrolled in tested grades) wasset on the number of proficient and advancedscores based on alternate academic achievementstandards that could be included in accountabil-ity reports. This cap not only protected the lowest

Fig. 4.1 Assumptions underlying more inclusive accountability assessments

Page 87: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

4 IEP Team Decision-Making for More Inclusive Assessments 71

performing students with disabilities but also pro-vided a safeguard against inappropriately restrict-ing the scope of educational opportunities forother lower-performing students. Permission todevelop and use alternate achievement standardsfundamentally changed the framework for assess-ing students with disabilities. It was no longersufficient to ask whether a student would beassessed with the general assessment or the gen-eral assessment with a valid (or invalid) accom-modation. Instead, each State needed to provideguidance to Individualized Education Program(IEP) teams to determine whether a student hada most significant cognitive disability and couldbe assessed using alternate achievement stan-dards. The use of alternate achievement standardsensured that students with the most significantcognitive disabilities were appropriately includedin State accountability systems and that schoolsand LEAs received credit for these students’achievement.

Then, in 2005, additional flexibility for inclu-sion of students with disabilities in statewideassessments was introduced. In recognition of“a small group of students whose disability pre-cluded them from achieving grade-level profi-ciency and whose progress was such that theydid not reach grade-level proficiency in the sametime frame as other students” (US Departmentof Education, 2007, p. 8), States were given theoption to develop another alternate assessment,with proficiency defined, this time, on the basis ofmodified achievement standards. Before this newflexibility, students with disabilities could takeeither a grade-level assessment or an alternateassessment based on alternate academic achieve-ment standards. Neither of these options wasbelieved to provide an accurate assessment ofwhat at least some of these students know and cando. The grade-level assessment was too difficultand did not provide data about a student’s abilitiesor information that would be helpful in guid-ing instruction. The alternate assessment basedon alternate academic achievement standards wastoo easy and not intended to assess a student’sprogress toward grade-level achievement.

The new regulations permitted States todevelop modified grade-level achievement

standards, to adopt such standards, and todevelop an assessment aligned with those stan-dards that was appropriately challenging for thisparticular group of students with disabilities. Inthe new assessment, expectations of grade-levelcontent mastery would be modified, rather thanthe grade-level content standards themselves.The assessment had to cover the same grade-level content as the general assessment. Therequirement that modified academic achievementstandards be aligned with grade-level contentstandards was critical; for as many of thesestudents as possible to have an opportunity toachieve at grade level, they must have accessto and instruction in grade-level content. Theregulations included a number of safeguards toensure that students assessed based on modifiedacademic achievement standards had access tograde-level content and were working towardgrade-level achievement. These regulations alsoallowed teachers and schools to receive creditfor the work that they did to help studentswith disabilities progress toward grade-levelachievement.

Now there were several participation optionsfrom which to choose for students with disabil-ities who, even with accommodations, must beassessed with an alternate assessment: an alter-nate assessment based on grade-level academicachievement standards, an alternate assessmentbased on modified academic achievement stan-dards, or an alternate assessment based on alter-nate academic achievement standards. Each ofthese achievement standards was an explicit def-inition of how students were expected to demon-strate attainment of the knowledge and skillsreflected in the grade-level content standards.Grade-level achievement standards describe whatmost students at each achievement level knowand can do. Modified achievement standards areexpectations of performance that are challeng-ing for the students for whom they are designed,but are less difficult than grade-level academicachievement standards. Alternate achievementstandards reflect the “highest achievement stan-dards possible” (US Department of Education,2005, p. 18) for students with the most significantcognitive disabilities, no less challenging for the

Page 88: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

72 N. Zigmond et al.

students than grade-level standards are for stu-dents without disabilities. Because the choice ofparticipation methods has to fit individual needs,both IDEA and NCLB charged the IEP team withthe task of assigning a student with disabilities tothe appropriate assessment method.

Historic Role of the IEP Team

Throughout the history of special education,attention to the unique needs of the individualhas been paramount. PL 94-142 codified into fed-eral law the rights of school-age individuals withdisabilities (and in subsequent reauthorizations ofIDEA of infants, toddlers, and preschoolers) tospecially designed instruction, special educationand related services, and special consideration inthe curricula to be studied and assessments tobe taken. To legitimize and make transparent theeducational decisions for students with disabili-ties, legislators wisely introduced the concept ofan IEP into the landmark special education lawof 1975. The IEP is a written document detail-ing a student with disabilities’ educational needsand the steps the school plans to take to meetthose needs. Developed by the special educationteacher in conjunction with other school person-nel and the child’s parents, it documents annuallythe student’s level of performance, his/her goalsand objectives, the degree to which those can bemet in regular education, and any related servicesthe student might need. It outlines the special cur-riculum and specially designed instruction thatwill be implemented to meet this child’s indi-vidual educational needs. An IEP team, whosemembership is individualized for every studentwith a disability, agrees to what is on the IEP. Theteam consists, at minimum, of a representative ofthe educational agency, often the principal; thestudent’s special education teacher; the student’sgeneral education teacher; the student’s parent orguardian; related service personnel; other personsat the discretion of parents or the educationalagency; and for students nearing a transition, arepresentative of the agency that is likely to pro-vide or pay for the transition services. The IEPthat the team develops commits the educationprovider to focusing on the development of both

academic and practical life skills that are gearedtoward the goals and aspirations of that individualstudent. It ensures a meaningful educational cur-riculum aligned with the individual’s needs andplans. “It is this aspect of individualization thatdistinguishes services authorized and providedthrough special education from those typicallyprovided in general education” (Kohler & Field,2003, p. 180).

Parents have always been designated as mem-bers of the IEP Team. Their role in planning theirchild’s educational experience was reiterated, andperhaps strengthened, in the language of the mostrecently authorized version of the Individualswith Disabilities Education Act (IDEA 2004).Parents help to establish, and by signing off onthe IEP, agree to the goals the team sets for a childduring the school year, as well as any specialsupports needed to help achieve those goals.

IEP Team Decisions RegardingParticipation in Statewide Assessment

The IEP Team is generally given wide berth indetermining what a student with a disability willlearn, where that learning will take place, andhow the outcomes of that learning will be evalu-ated. As specified through IDEA, IEP teams haveultimate responsibility for making instructional,curricular, and assessment decisions for each stu-dent with a disability. However, since passageof the 1997 amendments to the Individuals withDisabilities Education Act (IDEA, 1997), thebreadth of IEP Team decision-making has beensomewhat curtailed. For all but a small percent-age of students with the most severe cognitivedisabilities (and even for those students wheneverpossible), the agreed-upon IEP goals must pro-vide access to the general education curriculumand address state-approved grade-level contentstandards. Furthermore, all students with dis-abilities must be included in state and districtassessments. The IEP team cannot decide that aparticular student will not participate in astatewide assessment, even if the child’s par-ents insist. If it is the judgment of the teamthat the student is unable to participate in theregular assessment even with accommodations,

Page 89: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

4 IEP Team Decision-Making for More Inclusive Assessments 73

they must assign the student to participate inan alternate assessment. Some states may spec-ify certain conditions under which parents mayrefuse to permit their children’s participation;these exemptions apply to all students in a givenstate. For example, in Pennsylvania, students maybe excused from the assessment if (and only if)their parents have reviewed the test’s content andhave declared it to be inappropriate based on reli-gious grounds. Beyond that, it is inconsistent withfederal law for an IEP team to exempt a studentfrom state assessments.

In making the decision about how studentswith disabilities will take the statewide assess-ment, the IEP team operates in an environment inwhich academic content, academic achievementstandards, and assessments are set by the State;the technical qualities of the State assessmentsare well established; there are policies in placeon the use of accommodations that do not inval-idate test results; and there are State guidelinesregarding eligibility for alternate assessments.1

Furthermore, each of these is subject to federalscrutiny and must meet the requirement specifiedin federal regulations.

Federal directives regarding participation inalternate assessments. Two guidance documentsissued by the US Department of Education,Alternate Assessment Standards for Studentswith the most Significant Cognitive Disabilities(2005) and Modified Academic AchievementStandards (2007), have sought to clarify both thenature of the students for whom the alternateassessment based on alternate academic achieve-ment standards and the alternate assessmentbased on modified academic achievement stan-dards were intended and the role of the IEP teamin deciding who should take which state account-ability assessment. The two guidance documentsemphasize that all students with disabilitiesshould have access to the general curriculum; thatthe participation decisions must be made by eachstudent’s IEP team and on an individual student

1 States are not required to use every assessment method.Some states may have alternate assessments but choosenot to use alternate or modified achievement stan-dards. State guidelines should clarify which methods areavailable.

basis, and that decisions should not be basedon disability category or other general qualities.Guidance documents assign to each State theresponsibility to establish clear and appropriateguidelines for IEP teams to use when deciding ifone or another alternate assessment is justified foran individual child. They suggest that each State’sguidelines may contain the following:• Criteria that each student must meet before

participating in alternate assessments;• Examples or case study descriptions of stu-

dents who might be eligible to participate insuch alternate assessments;

• Accommodations that are available for allassessments, and any special instructions thatIEP teams need to know if such accommoda-tions require special permission or materials;

• Flow charts for determining which assessmentis appropriate and/or which accommodationsare appropriate;

• Timelines for making the participation deci-sions;

• Consequences that affect a student as a resultof taking an alternate assessment (e.g., eligi-bility for a regular high school diploma);

• Consequences that affect a test score as a resultof using a particular accommodation;

• Approaches for ensuring that all students haveaccess to the general curriculum;

• Commonly used definitions; and• Information about how results are reported for

individual student reports and in school ordistrict report cards.The 2005 Guidance specifies that “only stu-

dents with the most significant cognitive dis-abilities may be assessed based on alternateachievement standards” (p. 23) and explainsthat these are the students with IEPs whosecognitive impairments may prevent them fromattaining grade-level achievement standards, evenwith the very best instruction. The 2007Guidance stipulates two primary requirementsfor participation in an alternate assessmentbased on modified academic achievement stan-dards. The student must have a disability thatprecludes attainment of grade-level achieve-ment standards. And, the student’s progressto date in response to appropriate instruction,special education, and related services must be

Page 90: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

74 N. Zigmond et al.

such that, even if significant growth occurs,there is reasonable certainty that the stu-dent will not achieve grade-level proficiencywithin the year covered by the student’sIEP. Both documents emphasize the importanceof communicating the assessment participationdecision to students’ parents and to informthem of any educational consequences of thedecision.

State Guidelines for IEP Team DecisionMaking, 2007–2009

In 2007, 2008, and 2010, NCEO published a syn-thesis of decision-making guidelines posted bystates on their State Department of Educationwebsites (Lazarus, Rogers, Cormier, & Thurlow,2008; Lazarus, Thurlow, Christensen, & Cormier,2007; Lazarus, Hodgson, & Thurlow, 2010).The 2007 report was a quick snapshot of thedecision-making guidelines regarding participa-tion in the alternate assessment based on mod-ified academic achievement standards of sixstates (Kansas, Louisiana, North Carolina, NorthDakota, Oklahoma, and Maryland) just a fewmonths after the 2007 regulations on the AA-MAS were finalized. One year later, the reportcompared and contrasted participation guidelinesfor nine states (the original six, plus California,Connecticut, and Texas), although none had yetsuccessfully completed the peer review processthat determines whether an accountability assess-ment fulfills the necessary requirements for thestate to receive federal funds. By 2009, one state(Texas) had passed peer review though the num-ber of states with publically available guidelinesfor student participation in alternate assessmentswas up to 14 (Arizona, Indiana, Michigan, Ohio,and Tennessee added to the nine of the previ-ous year). In that third report, Lazarus, Hodgson,and Thurlow (2010) describe common elementswithin the 14 sets of guidelines and provide sam-ples of the checklists (usually a series of yes/noquestions) or flow charts and decision trees(conceptual representations of the decision-making process) recommended for use by IEPteams. Not surprisingly, the published guidelines

each contain some or all of the elements spec-ified in the federal guidance documents (seeTable 4.1). Across the 14 states, elements of theguidelines to IEP teams fall into four categories:(a) indicators that qualify the student for the alter-nate assessment based on alternate achievementstandards, instead; (b) factors that should notinfluence the participation decision; (c) indicatorsof opportunity to learn grade-level content of thegeneral education curriculum; and (d) indicatorsof limited academic growth or progress.

Recommendations to IEP Teams

The decision about how a student with a disabilitywill participate in the annual statewide account-ability assessments is not made in isolation. Itis part of the total plan that defines an appropri-ate education for the student. As such, it shouldbe made at the annual IEP meeting, with all theimportant actors present, and in conjunction withthe development of IEP goals. If the student ismaking adequate progress in the general educa-tion curriculum, the recommendation regardingparticipation is indisputable; the default decisionis the general assessment, with or without accom-modations, or for students who cannot man-age a pencil/paper test, an alternate assessmentbased on grade-level achievement standards (seeFig. 4.2). The discussion at the IEP team meetingwould focus on the nature of the accommodationsroutinely used to enhance this student’s typicaleducational experience. These accommodationswould be recommended to make the assess-ment more accessible and to have the assessmentresults better reflect what the student knows andis able to do. This seems an obvious point.However, in a 2001 study of the relationshipbetween the types of accommodations typicallyprovided to students with IEPs during instruc-tion and those offered on the accountability tests,Ysseldyke and colleagues (2001) reported a sig-nificantly high tendency for students with IEPsto receive testing accommodations that were notprovided during instruction.

The second decision is also fairly straightfor-ward. If the student has a significant cognitive

Page 91: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

4 IEP Team Decision-Making for More Inclusive Assessments 75

Table 4.1 Common elements in 14 state assessment participation guidelines

Number of State guidelinesn = 14 Element to be considered in making participation decision

14 states Student has a current IEP

12 states Permit “combination participation”: separate decisions made by subject area[3 states allow selection from all options; 9 states require selection from regularassessment and AA-MAS, only]

9 states Decision requires parental notifications

8 states Decision requires consideration of consequences for meeting graduationrequirements

Indicators that qualify the student for the alternate assessment based on alternate academic achievement standards

4 states Decision based on presence of significant cognitive disability

6 states Decision based on whether receiving instruction on extended or alternatestandards

7 states Decision based on whether student receives specialized or individualizedinstruction

Factors that should not influence the participation decision

8 states Decision not based on category label

7 states Decision not based on excessive absences, social, cultural, language, economic, orenvironmental factors

6 states Decision not based on placement setting

Indicators of opportunity to learn grade-level content of the general education curriculum

11 states Decision based on whether student is learning grade-level content

6 states Decision based on whether student receives accommodations during classroominstruction

9 states Decision based on whether student has IEP goals based on grade-level contentstandards

9 states Decision based on whether student has IEP goals based on grade-level contentstandards

Indicators of limited academic growth or progress

12 states Decision based on previous performance on multiple measures

11 states Decision based on evidence that student not progressing at rate expected toachieve grade-level proficiency

9 states Decision based on whether student has IEP goals based on grade-level contentstandards

3 states Decision based on evidence that student’s performance is multiple years behindgrade-level expectations

4 states Decision based on previous performance on state assessment

disability, has IEP goals and is receiving instruc-tion based on extended or alternate academiccontent standards, and requires significant scaf-folding to participate meaningfully in the gen-eral education curriculum, the team should rec-ommend that the student participate in annualstatewide accountability assessments through the

alternate assessment based on alternate academicachievement standards (AA-AAS).

Of course, if the student is not making ade-quate progress in the general education curricu-lum, but is not eligible to be assigned the AA-AAS, the IEP team may be obligated to considerrecommending a different alternate assessment

Page 92: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

76 N. Zigmond et al.

Fig. 4.2 Primary decisions regarding participation in annual accountability assessment

by reviewing and following guidelines providedby their State Department of Education, using theflow charts, decision trees, or checklists designeduniquely for use in their state. This decision isnot nearly as clear-cut. Within the parameters ofthe state guidelines, IEP teams would be wise toconsider the following:

Make certain all members of the IEP teamare clear about what the participation decisionfor these ‘middle’ students is supposed to accom-plish. State guidelines are (have to be) consistentwith federal guidelines and regulations, but anIEP team is making a singular decision in a veryspecific context of district, school, and individ-ual student. Is the student actually competent inthe general education content and skills but needsan even more accessible test than is possible withstandard accommodations? Is this student so farfrom competent that he/she will not be profi-cient regardless of the definition of proficiencyon the alternate assessment, so the student mightas well participate in an ‘easier’ assessment? Isthere pressure (spoken or unspoken) within theschool or district to increase the number of stu-dents with disabilities who score proficient on theannual assessment by finding ways to assign themen masse to an alternate assessment? Is there astrong commitment within the school or districtto maintain uncompromisingly high standards forall the students, teachers, and curriculum, by

assigning as few students as possible to alternateassessments? Answers to these questions clarifythe climate in which the participation decisionneeds to be made.

Do not prejudge the outcome of the decisionbased on the student’s diagnostic label or thesetting in which special education services areprovided. The type of assessment recommendedfor a student should not be based on the student’sdisability classification. Most students with dis-abilities do not have intellectual impairments.They have learning disabilities, speech/languageimpairments, other health impairments, emo-tional/ behavioral disabilities, and/or physical,visual, and hearing impairments. When givenappropriate accommodations, services, supports,and specialized instruction, these students may becapable of learning the grade-level content in thegeneral education curriculum and thus achieveproficiency on the grade-level content standards.In addition, research suggests even a smallpercent of students with disabilities who haveintellectual impairments (i.e., generally includesstudents in categories of mental retardation,developmental delay, and some with multiple dis-abilities and/or autism) might also achieve profi-ciency when they receive high-quality instructionin the grade-level content, appropriate servicesand supports, and appropriate accommodations(Thurlow, 2007). There is no basis in research

Page 93: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

4 IEP Team Decision-Making for More Inclusive Assessments 77

for defining how the specific categorical labelsdifferentiate how students learn and demonstratetheir learning, or how, as required in the regula-tion, this disability prevents a particular studentfrom attainment of grade-level achievementwithin the current year (National Association ofSchool Psychologists, 2002). Instead of focusingon label, the IEP team’s decision about assess-ment should be based on the types and intensityof support the student needs to show academiclearning during ongoing instruction.

The decision about the most appropriatetype of assessment for students with disabilitiesshould be based on neither current placement northe setting in which the student receives instruc-tion. As noted previously, every student has theright to access the general curriculum. Studentsalso have the right to receive instruction fromhighly qualified teachers who are trained in thecontent area and the right to an appropriate edu-cation in the least restrictive environment (LRE;IDEA, 2004). LRE does not define the way a stu-dent participates in a statewide assessment. Everytype of educational setting includes students withdisabilities who can be recommended for any ofthe assessment methods.

Do not let the team decision be undulyinfluenced by the quality and alignment withgrade-level standards of previous years’ IEPsor the limited educational accomplishment ofthe student that resulted from those previ-ous educational experiences. The regulations(CFR §200.1(e)(2)(ii)(A)) stipulate that a studentshould not be assigned to an alternate assess-ment based on modified academic achievementstandards if the child’s IEP is not of “high qual-ity” and designed to move the child “closerto grade-level achievement” (Federal Register,2007, p. 17749). Does that mean a strugglingstudent with a poorly written IEP will be pre-cluded from taking the new alternate assess-ment for at least a year during which time theIEP team writes a better IEP? Further, Section200.1(e)(2)(ii)(A) stipulates that students shouldnot be assigned to an alternate assessment basedon modified academic achievement standards ifthey have not had the opportunity to learn grade-level content. Does that mean that a struggling

student being taught reading or mathematics athis/her instructional level rather than at “grade-level” would be precluded from taking the newalternate assessment for at least a year duringwhich time he is provided less appropriate and‘special’ grade-level instruction instead? Section200.1(e)(2)(ii)(B) requires IEP teams to exam-ine a student’s progress in response to high-quality instruction over time and using multi-ple measures before assigning that student toan alternate assessment based on modified aca-demic achievement standards. Does that meanthat a struggling student whose teacher delivers“less than high-quality instruction” would be pre-cluded from assignment to the assessment basedon modified achievement standards until she orhe gets a better teacher? And how should mea-suring student progress be standardized? Giventhe pressures of accountability and the time andeffort devoted to preparing for and administeringstate tests, many schools have stopped adminis-tering additional standardized achievement tests.And, although Response to Intervention (RTI)and Reading First have promoted increased usedof curriculum-based measurement and progressmonitoring in reading and math, these models areoften restricted to the elementary level. In fact,the field is only beginning to scratch the surfacein developing and implementing comprehensiveprogress monitoring/intervention frameworks formiddle and secondary students – the primarycohort for accountability testing (Mastropieri &Scruggs, 2005; Vaughn et al., 2008). How willIEP teams determine how much progress is suf-ficient? And for how long should an IEP teamwait for signs of progress before recommend-ing the student for the new alternate? Thereare currently no definitive answers to thesequestions.

The recommendation regarding which assess-ment will be used for a particular student’s par-ticipation in statewide accountability should notbe made in a separate meeting; this decision-making is an integral part of the entire IEP teamplanning and decision-making for the studentfor the upcoming year. The same team that ismaking the participation decision is writing theIEP goals and objectives for the next academic

Page 94: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

78 N. Zigmond et al.

year, making decisions about where and how thestudent will be taught. The fact that many stu-dents with disabilities currently do not achieveat proficiency should raise questions of whetherthey were receiving the required high-qualityinstruction in the grade-level content, appropriateservices and supports, and appropriate accommo-dations, not confirm preformed opinions aboutthe capacity of these students to learn or perform.Anne Donnellan (1984, p. 142) concluded usingher “criterion of least dangerous assumption” thatbarring proof to the contrary, educators need toassume that poor performance is due to instruc-tional deficits instead of student deficits. TheIEP team should correct deficiencies in past IEPswhile making assessment decisions for the future.

Craft the IEP so that students have opportu-nity to learn what will be tested (i.e., this sectionof the IEP has ramifications for other sections ofIEP). An important principle of assessment is thatstudents have the opportunity to learn the materialon which they will be tested. English and Steffy(2001) call this the “doctrine of no surprises” (p.88). Because the student being discussed is tobe assessed based on grade-level content stan-dards, instruction for the students with disabil-ities should be aligned with grade-level contentin reading and math (albeit reduced in breadth,depth, and/or complexity for some students). TheIEP team must ensure that the student’s IEPincludes goals that address the content standardsfor the grade in which the student is enrolled.Since the decisions about how a student will par-ticipate in State and district-wide assessments areto be made at the student’s annual IEP meeting,the IEP team will have the time to also developIEP goals that are based on grade-level contentstandards. This will help to ensure that the studenthas had an opportunity to learn grade-level con-tent prior to taking an alternate assessment basedon modified academic achievement standards.

Remember that an IEP team cannot “stopthe clock” (i.e., suspend participation in theannual accountability assessments until the stu-dent has appropriately written IEP goals andaccess to appropriate instruction and accommo-dations). Participation in statewide assessment isan annual event from third grade through eighth

grade (and continues in at least one grade of highschool). It is true that the reality of many stu-dents with disabilities currently not achieving atproficiency raises questions of whether theyhave been receiving the required high-qualityinstruction in the grade-level content, appro-priate services and supports, and appropriateaccommodations. Nevertheless, participation instatewide accountability assessment cannot bepostponed until past deficiencies or inappropri-ate educational decisions/opportunities have beencorrected.

Do not be intimidated by the ‘cap’ on howmany students’ scores from alternate assessmentscan be counted as proficient. Selecting a methodfor participation in statewide assessments, likeall educational decisions made by the IEP team,should focus on creating an appropriate educa-tion for each individual student with a disabil-ity. Although limits are in place on the numberof students who can be reported as proficientusing modified or alternate achievement stan-dards, there is no limit on the number of studentswho can be assigned to these alternate assess-ments. In general, the Department of Educationestimates that about 9% of students with dis-abilities (approximately 1% of all students) havesignificant cognitive disabilities that qualify themto participate in an assessment based on alter-nate achievement standards. An additional 18%of students with disabilities are eligible to take analternate assessment based on modified academicachievement standards. They even suggest “anLEA that assesses significantly more than 3% ofall students with an alternate assessment based onmodified academic achievement standards shouldprompt a review by the State of the implementa-tion of its guidelines to ensure that the LEA wasnot inappropriately assigning students to take thatassessment” (US Department of Education, 2007,p. 37).

These percentage estimates are misleading fortwo reasons. First, since they actually repre-sent the number of students that can be countedtoward proficient, they presume that every studentassigned to an alternate assessment will score inthe proficient range. If a state, in designing thealternate state assessments and defining alternate

Page 95: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

4 IEP Team Decision-Making for More Inclusive Assessments 79

or modified achievement standards, is sincerelysetting challenging standards, that would hardlybe the case. Second, every IEP team should havethe flexibility to select the assessment methodthat is best for a particular student. The IEP teamis individualized, the IEP goals are individual-ized, and the needs of the individual student underconsideration are unique. In the spirit and letterof IDEA, an IEP team must make the assessmentrecommendation free of external considerationslike a percentage participation cap.

IEP team members must understand that therewill be intended and unintended consequencesto the assessment participation decision. Thereis considerable research evidence that, when thestakes are high, assessment drives instruction(rather than the more desired scenario of instruc-tion driving assessment). Without intending toteach to the test, in classrooms today, “what youtest is what you get” (Mislevy, 2008, p. 5). Testsdetermine not only the content but also the formatof instruction. Shepard (1989) reported that inresponse to externally mandated tests, “Teacherstaught the precise content of the tests rather thanunderlying concepts; and skills were taught in thesame format as the test rather than as they wouldbe used in the real world. For example, teach-ers reported giving up essay tests because theyare inefficient in preparing students for multiple-choice tests” (p. 5). Others have documentedthat in schools or classrooms using multiple-choice tests, instruction tends to emphasize drilland practice on decontextualized skills, reflectingthe emphasis of many multiple-choice tests (seeRomberg, Zarinna, & Williams, 1989).

Consequences will, and should, follow theassessment participation decision. Both the alter-nate assessment based on alternate academicachievement standards and the alternate assess-ment based on modified academic achievementstandards assume that the student with disabil-ities has been learning the general educationcurriculum, but each defines very different expec-tations for mastery of that curriculum in terms ofbreadth, depth, and complexity. The reason givenfor developing these assessment options was therecognition that some student with disabilitiesshould not be expected to master the full scope

of the grade-level general education curriculumin the 1-year time frame of annual assessments.The type of student work that defines proficientperformance on an alternate assessment based onalternate academic achievement standards is sub-stantially different from the type of work thatdefines proficient performance on an alternateassessment based on modified academic achieve-ment standards. That, in turn, is substantiallydifferent from the performance criteria associatedwith proficiency on the regular assessment. If def-initions of proficient performance are different,it follows that the scope of instruction will bedifferent, especially if accountability assessmentsare designed to assess mastery of what has beentaught.

Narrowing the curriculum may be a positiveconsequence. If over the course of the academicyear, fewer concepts need to be mastered, moretime can be spent on working to mastery. Onthe other hand, a narrowed curriculum at anearly grade level may limit educational opportu-nities down the road. Assignment to a particularalternate assessment may set the trajectory ofacademic accomplishment too soon, before thestudent has had the opportunity to be exposed to,and possibly achieve proficiency in, a broader andmore complex curriculum.

In the same way, changing expectations forproficiency (i.e., using alternate or modifiedachievement standards to define proficient perfor-mance) may be useful for some students and theirfamilies. It makes “proficiency” more attainableto students with chronic difficulties in school per-formance, giving the student a sense of pride inaccomplishment, even if the “proficiency” desig-nation actually carries a different meaning. Onthe other hand, telling a family or a studentthat his/her academic performance in reading, ormathematics or science is “proficient” could beharmful when the performance is only judgedas proficient because the definition of profi-ciency has been altered. It gives the family andthe student a misleading accounting of relativeaccomplishment, an unwarranted sense of whatmay be achievable in the future. With parentspresent at the IEP meeting, the implications of theassessment decision can be discussed openly and

Page 96: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

80 N. Zigmond et al.

considered in the final assessment participationrecommendation.

Concluding CommentsNCLB (2001) set the ambitious but unachiev-able goal of having all students scoring pro-ficient on accountability assessments by theyear 2014. The reason for introducing theflexibility of alternate assessments into theaccountability system was to help schools,districts, and states “count more students asproficient” (Quenemoen, 2010, p. 22). Theregulatory language leaves it to the states todefine the target populations for these alternateassessments.

In the current standards-based account-ability-driven reform model, any option thatencourages or rewards less-challenging stan-dards for any student who could achieveat grade level (assuming they have accessto the curriculum and are instructed effec-tively) undermines the entire system of schoolreform. For most students, with and with-out disabilities, we cannot yet predict withany accuracy whether they will achieve atgrade level when instructed effectively. Thus,the least dangerous assumption requires thatall students receive that effective instruction.The implementation of alternate assessmentsshould expressly raise expectations and resultin realistic but higher achievement for studentswho participate in the options. Ultimately,state-defined modified or alternate achieve-ment standards should be policy statements ofwhat are appropriately high expectations forsome small, defined groups of students, andthey should lead to improvement of achieve-ment outcomes for these students, in order tobe consistent with the letter and the spirit ofNCLB and IDEA.

Careful monitoring of consequences of theassessment decisions is essential to ensur-ing that the intended positive consequencesare occurring and unintended negative con-sequences are not. But IEP teams in annualIEP meetings are not making grand assess-ments of the efficacy of the standards-based

accountability system. They are making rec-ommendations and decisions to ensure anappropriate education for a particular studentin a particular district, in a particular state.Each member of the team, the LEA represen-tative, the educators, and the student’s parents,should be fully informed about the nature ofthe decisions that need to be made and theimplications of those decisions in shaping thestudent’s educational experiences. There arefederal, state, and perhaps even district guide-lines that should influence the decision, butin the end, the decision needs to be a verypersonalized one, reflecting only the team con-sensus of what is appropriate for this particularstudent at this particular time in his/her edu-cational career, until the next year when thedecision gets made all over again.

References

Crawford, L., Almond, P., Tindal, G., & Hollenbeck, K.(2002). Teacher perspectives on inclusion of studentswith disabilities in high stakes assessments. SpecialServices in the Schools, 18(1/2), 95–118.

Defur, S. H. (2002). Education reform, high-stakesassessment, and students with disabilities: One statesapproach. Remedial and Special Education, 23(4),203–211.

Donnellan, A. (1984). The criterion of the least dangerousassumption. Behavior Disorders, 9, 141–150.

English, F. W., & Steffy, B. E. (2001). Deep curricu-lum alignment: Creating a level playing field for allchildren on high-stakes tests of educational account-ability. Lanham, MD: Scarecrow Press.

Federal Register (2003, December 9). Department ofEducation, 34 CFR Part 200, Title I – Improving theacademic achievement of the disadvantaged; final rule,pp. 68968–68708

Federal Register (2007, April 9). 34 CFR Parts 200 and300, Title I – Improving the academic achievement ofthe disadvantaged; rules and regulations, pp. 17747–17781

Improving America’s Schools Act, Pub. L. No. 103–382.(1994). Retrieved September 10, 2008, from http://www.ed.gov/legislation/ESEA/toc.html

Individuals with Disabilities Education Act, Pub. L. No101-476. (1990). Retrieved September 10, 2008, fromhttp://www.ed.gov/policy/speced/leg/edpicks.jhtm

Individuals with Disabilities Education Act Amendmentsof 1997, Pub. L. No. 105-7. (1997). Retrieved

Page 97: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

4 IEP Team Decision-Making for More Inclusive Assessments 81

September 10, 2008, from http://www.cec.sped.org/law_res/doc/law/index.php

Individuals with Disabilities Education Improvement Act,Pub. L. No. 108-446. (2004). Retrieved September10, 2008, from http://www.ed.gov/policy/speced/guid/idea/idea2004.htm

Koenig, J. A., & Bachman, L. F. (Eds.). (2004). Keepingscore for all: The effects of inclusion and accommoda-tion policies on large-scale educational assessments.Washington, DC: National Academies Press.

Kohler, P. D., & Field, S. (2003) Transition-focused edu-cation: Foundation for the future. Journal of SpecialEducation, 37, 174–183.

Lazarus, S. S., Hodgson, J., & Thurlow, M. L. (2010)States’ participant guidelines for alternate assess-ments based on modified academic achievement stan-dards (AA-MAS) in 2009 (Synthesis Report 75).Minneapolis, MN: University of Minnesota, NationalCenter on Educational Outcomes.

Lazarus, S. S., Rogers, C., Cormier, D., & Thurlow,M. L. (2008). States’ alternate assessments based onmodified achievement standards (AA-MAS) in 2008(Synthesis Report 71). Minneapolis, MN: Universityof Minnesota, National Center on EducationalOutcomes.

Lazarus, S. S., Thurlow, M. L., Christensen, L. L.,& Cormier, D. (2007). States’ alternate assess-ments based on modified achievement standards (AA-MAS) in 2007 (Synthesis Report 67). Minneapolis,MN: University of Minnesota, National Center onEducational Outcomes.

Mastropieri, M. A., & Scruggs, T. E. (2005) Feasibilityand consequences of response to intervention:Examination of the issues and scientific evidenceas a model for the identification of individuals withlearning disabilities. Journal of Learning Disabilities,38(6), 525–531.

Mislevy, R. L. (2007). What you test is what you get:Q&A with Robert J. Mislevy, Ph.D. Endeavors, 9(16),4–5.

National Association of School Psychologists (NASP).(2002). Position statement: Rights without labels.Retrieved April 19 2010, from http://www.nasponline.org/about_nasp/pospaper_rwl.aspx

No Child Left Behind Act, Pub. L. No. 107-110 (2001).Retrieved September 11, 2008, from http://www.ed.gov/policy/elsec/leg/eseas02/index/html

Public Law 94-142 (1975) Education of All HandicappedChildren Act of 1975. Retrieved 1/09, from http://users.rcn.com/peregrin.enteract/add/94-142.txt

Quenemoen, R. (2009, July). Identifying students andconsidering why and whether to assess them with analternate assessment based on modified achievement

standards. In M. Perie (Ed.), Considerations for thealternate assessment based on modified achievementstandards (AA-MAS) (Chapter 2, pp. 17–50). Retrievedfrom the New York Comprehensive Center: http://nycomprehensivecenter.org/docs/AA_MAS_part2.pdf

Romberg, T., Zarinnia, A., & Williams, S. (1989). Theinfluence of mandated testing on mathematics instruc-tion: Grade eight teachers’ perceptions. In T. Romberg& L. Wilson (1992, September). Alignment of testswith the standards. Arithmetic Teacher, 40(1), 18–22.

Shepard, L. A. (1989, April). Why we need better assess-ments. Educational Leadership, 46(7), 4–9.

Thurlow, M. (2007).The challenge of special populationsto accountability for all. In D. Clark (Ed.), No child leftbehind: A five year review (Congressional Program,Vol. 22, No. 1, pp. 39–44). Washington, DC: TheAspen Institute.

Thurlow, M. L., Seyfarth, A., Scott, D. L., & Ysseldyke, J.E. (1997). State Assessment policies on participationand accommodations for students with disabilities:1997 update (Synthesis Report No. 29). Minneapolis,MN: University of Minnesota, National Centeron Educational Outcomes. Retrieved from http://education.umn.edu/NCEO/OnlinePubs/Synthesis29.html

US Department of Education. (1997). Individuals withdisabilities education act, from http://www.ed.gov/offices/OSEP/policy/IDEA/index.html

US Department of Education. (2005). Alternate assess-ment standards for students with the most sig-nificant cognitive disabilities: Non-regulatory guid-ance. Washington, DC: Author. Retrieved September10, 2007, from http://www.ed.gov/policy/speced/guid/nclb/twopercent.doc

US Department of Education. (2007). Modified aca-demic achievement standards: Non-regulatory guid-ance. Washington, DC: Author. Retrieved September10, 2007, from http://www.ed.gov/policy/speced/guid/nclb/twopercent.doc

Vaughn, S., Fletcher, J. M., Francis, D. J., Denton, C. A.,Wanzek, J., Wexler, J., et al. (2008). Response to inter-vention with older students with reading difficulties.Learning & Individual Differences, 18(3), 338–345.

Ysseldyke, J., & Thurlow, M. (1994). Guidelines forinclusion of students with disabilities in large-scaleassessments (Policy Directions No. 1). Minneapolis,MN: University of Minnesota, National Center onEducational Outcomes.

Ysseldyke, J., Thurlow, M., Bielinski, J., House, A.,Moody, M., & Haigh, J. (2001). The relationshipbetween instructional and assessment accommoda-tions in an inclusive state accountability system.Journal of Learning Disabilities, 34(3), 212.

Page 98: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy
Page 99: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

5Australian Policies to SupportInclusive Assessments

Michael Davies and Ian Dempsey

Introduction

Australia has recently adopted a NationalAssessment Program for Literacy and Numeracy(NAPLAN) that involves assessment of studentsin Years 3, 5, 7, and 9. While Australian legisla-tion and policies are designed to support inclusiveassessments for all, it is evident that in practicethis is not the case. This chapter provides infor-mation about Australian legislation and ensu-ing policies, how they interact with nationalachievement testing in Australia, and proposesrecommendations for the development of a moreeffective and inclusive assessment regime forAustralian students.

The Australian Education Landscape

Australia is a federal parliamentary democracycomprising six states and two territories and hadthe eleventh-largest gross domestic product percapita in the world in 2008 (World Bank, 2009a).In that year, the population of Australia was 21.4million people, the vast majority of whom lived inthe country’s capital cities (World Bank, 2009b).

Australia follows a two-tier system of schooleducation which includes primary education

M. Davies (�)School of Education and Professional Studies, Faculty ofEducation, Griffith University, Brisbane, QLD 4122,Australiae-mail: [email protected]

(generally till 12 years of age), and secondaryeducation (generally till 18 years of age).Schooling is compulsory between the ages of6 and 15 or 16, depending on the jurisdiction.Each of Australia’s states and territories has adepartment of education that provides free pub-lic education. Public school students compriseabout two-thirds of the total student population.The remaining students attend fee-paying reli-gious and secular private schools which alsoreceive substantial government funding. TheProgramme for International Student Assessmentfor 2006 ranked Australia 6th on a worldwidescale for Reading, 8th for Science, and 13th forMathematics (Australian Council for EducationalResearch, 2009).

While there is no national curriculum inAustralia, there is agreement across the statesand territories on the broad content of curricu-lum (Ministerial Council for Education, EarlyChildhood Development and Youth Affairs,2009a). Although the Federal government makesa relatively small direct contribution to the states’and territories’ education budgets, conditions areoften tied to this funding. This mechanism ofinfluence has, among other things, facilitated theintroduction in recent years of a national test-ing program and the use of a common studentacademic grading system.

The majority of states and territories pro-vide three enrolment options for students with adisability: regular classes, support classes (sep-arate classes in a regular school), and specialschools. For reasons explored in the next section,

83S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4_5, © Springer Science+Business Media, LLC 2011

Page 100: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

84 M. Davies and I. Dempsey

Australia has a very poor record of reportingboth the number of students with a disability inAustralian schools and the nature of enrolmentof those students. The limited evidence avail-able suggests that over 3.5% of Australian schoolstudents have a disability (there is some vari-ation in the definition of disability across thestates and territories), there has been a largeincrease in the number of students with a dis-ability identified in regular schools, the major-ity of students with a disability are educated inmainstream schools, and as a percentage of thetotal school population there has been a recentincrease in the proportion of students placedin segregated settings (Australian GovernmentProductivity Commission, 2004; Dempsey, 2004,2007). It is likely that the increased number ofidentified students with a disability in regularschools have always been in the regular schoolsystem. However, many have more recently beenidentified with a disability so as to receive fund-ing support and also because of improved aware-ness of disability and special needs by educationprofessionals.

Australian Legislation Relevantto Students with a Disability

Disability Discrimination Act

While many Australian states and territories hadlegislation that directly related to discrimina-tion of a variety of groups in the community,by the late 1980s there was increasing con-cern that without a law focussing on disabilityrights little progress could be made in address-ing discrimination. Consequently, in 1992 fed-eral legislation was passed in the DisabilityDiscrimination Act 1992 (DDA) (AustralasianLegal Information Institute, 2009) that directlyaddressed a range of areas, including educa-tion. In relation to education, the DDA states thefollowing:(1) It is unlawful for an educational authority to discrim-

inate against a person on the ground of the person’sdisability:(a) by refusing or failing to accept the person’s

application for admission as a student; or

(b) in the terms or conditions on which it is preparedto admit the person as a student.

(2) It is unlawful for an educational authority to discrim-inate against a student on the ground of the student’sdisability:(a) by denying the student access, or limiting the

student access, to any benefit provided by theeducational authority; or

(b) by expelling the student; or(c) by subjecting the student to any other detriment.

(3) It is unlawful for an education provider to discrimi-nate against a person on the ground of the person’sdisability:(a) by developing curricula or training courses hav-

ing a content that will either exclude the personfrom participation, or subject the person to anyother detriment; or

(b) by accrediting curricula or training courseshaving such a content (Australasian LegalInformation Institute, 2009).

Although this legislation effectively mandatesthat educational services are provided to studentswith a disability, there are some aspects of theoriginal legislation that are less clear about thenature of the educational setting and the sup-port that should be provided to Australian stu-dents with a disability. For example, the DDAincludes a component that identifies unjusti-fiable hardship as a legal basis for discrim-ination. In determining whether an educationprovider may experience unjustifiable hardshipin enrolling a student with a disability, the DDAnotes that the following need to be taken intoaccount:

(a) The nature of the benefit or detriment likely to accrueto, or to be suffered by, any person concerned;

(b) the effect of the disability of any person concerned;(c) the financial circumstances, and the estimated

amount of expenditure required to be made, by thefirst person (i.e., the education provider);

(d) the availability of financial and other assistance to thefirst person (Australasian Legal Information Institute,2009).

By 2000, there had been one high-profileFederal Court case related to the enrolment ofa student with a disability in a regular school.In Hills Grammar School v. HREOC (AustralianHuman Rights Commission, 2009a), a privateschool was deemed to have discriminated againsta young student with spina bifida by refusingher enrolment in a regular class. The school didnot provide a support class placement option.Enrolment was refused on unjustifiable hardship

Page 101: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

5 Australian Policies to Support Inclusive Assessments 85

grounds with the school arguing that the stu-dent required a range of additional supportsthat were beyond the capacity of the institu-tion. The Federal Court noted that the institutionwas a large national education provider withsubstantial resources. The school was fined andordered to provide a placement for the student,if she wished. The precedent set by this casemeans that the unjustifiable hardship argument isunlikely to be successfully used by large educa-tion providers.

Other aspects of the Australian DDA alsowarrant examination. The DDA defines disabil-ity as including a range of more traditionalimpairments (e.g., physical, intellectual, psychi-atric, and sensory), as well as some impairmentsthat are traditionally not recognised as disabil-ities in educational settings in Australia (e.g.,learning disabilities, behaviour problems, atten-tion deficit hyperactivity disorder, and physicaldisfigurement). In addition, the DDA recognisesdisabilities that individuals may presently experi-ence, may have had in the past, or may have inthe future (e.g., a disease that is yet to physicallymanifest itself).

Because the original DDA legislationaddressed educational enrolment, and not edu-cational services in schools, there was littleincentive for school systems to broaden theirdefinitions of disability beyond traditional, medi-cally oriented categories. Indeed, for many stateand territory education departments there was apowerful disincentive to broaden their definition;that it may place excessive financial demandson educational authorities to meet the needs ofa perceived additional group of students withspecial needs. Such concerns were misplacedbecause Australian public education systemsalready provided a range of extensive supportsto students with a disability, as defined by theDDA (e.g., services to students with readingdifficulties and behaviour problems). Regardless,the unpreparedness of the states and territories toadopt a common definition of disability meansthat in Australia, at a national level, it is notpossible to report on the total number of schoolstudents with a disability, or to determine the

number of students with a disability in differenteducational settings.

In its present form, the DDA provides sev-eral avenues of redress for individuals, groups,and organisations. In the first instance, com-plainants are encouraged to lodge a grievancewith the Australian Human Rights Commission,the body with responsibility for oversight ofthe DDA. Such complaints are reviewed by theCommission, which seeks to conciliate a set-tlement between the parties, usually withoutadmission of liability (Australian Human RightsCommission, 2009b). If conciliation cannot beachieved, the matter may then be heard by theAustralian Federal Court. By 2010, there hadbeen 14 court decisions and 81 conciliated out-comes.

In 2002, a Productivity Commission reviewof the DDA concluded that the legislation hadbeen “reasonably effective in reducing discrim-ination (p. xxvi)” although its effectiveness washighly variable across sectors of the econ-omy and within disability groups (AustralianGovernment Productivity Commission, 2004).Education was the third most common subject ofcomplaints made under the DDA after employ-ment and the provision of goods and services.The Commission also noted that the introductionof education standards to supplement the DDAwas a pressing need and that the net impact ofsuch standards would be positive.

Education Standards

Another important feature of the DDA wor-thy of discussion is that it provided a mecha-nism to develop standards to assist organisationsto understand their responsibilities in avoidingdiscrimination. Education standards were even-tually legislated in 2005 (Australian GovernmentAttorney-General’s Department, 2005), and theyhad a ‘chequered’ history. From the time of therelease of a discussion document in 1996, it took9 years and many attempts at negotiation betweenthe Commonwealth and State and Territory gov-ernments before the standards were enacted.

Page 102: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

86 M. Davies and I. Dempsey

However, the standards address enrolment, par-ticipation, curriculum, student support services,and elimination of harassment and victimisation(Australian Human Rights Commission, 2009c).

An extract from the education standards, asthey relate to participation, appears in Box 5.1.

Box 5.1. Standards for participation

(1) The education provider must take reasonablesteps to ensure that the student is able to par-ticipate in the learning experiences (includingassessment) of the courses or programs pro-vided by the educational institution, and usethe facilities and services provided by it, onthe same basis as a student without a disability,(and include measures that ensure that. . .the assessment and certification require-ments of the course or program are appropriateto the needs of the student and accessible tohim or her; and (f) the assessment proceduresand methodologies for the course or programare adapted to enable the student to demon-strate the knowledge, skills, or competenciesbeing assessed)

(2) The provider must:(a) consult the student, or an associate of

the student, about whether the disabilityaffects the student’s ability to participatein the courses or programs for which thestudent is enrolled and use the facilitiesor services provided by the provider; and

(b) in the light of the consultation, decidewhether an adjustment is necessary toensure that the student is able to partici-pate in the courses or programs providedby the educational institution, and usethe facilities and services provided by it,on the same basis as a student without adisability; and

(c) if:(i) an adjustment is necessary to achieve the

aim mentioned in paragraph (b); and(ii) a reasonable adjustment can be identified

in relation to that aim; make a reasonableadjustment for the student in accordancewith Part 3.

(3) The provider must repeat the process set outin subsection (2) as necessary to allow for thechanging needs of the student over time.

An important feature of the participation stan-dards is the notion that students with a dis-ability are entitled to participate on the samebasis as students without a disability, and without

discrimination. The ramifications of this require-ment are that school courses and activities mustbe flexible enough to meet the student’s needs,that the school must consult with the studentand/or the student’s advocate in determiningthose needs, and that the participation of thestudent with a disability should be compara-ble to the participation of a student without adisability. Students with a disability are to beafforded the same opportunities to participate inschool or a course as other students. The oppor-tunity to participate in assessments, includingnational assessment tests should also be pro-vided. “This may mean making adjustments to . . .

how students will be assessed” (DDA EducationStandards, 2009, p. 5). The Standards also pro-vide information regarding exceptions for educa-tion providers.

Another important feature is the requirementthat education providers must make ‘reasonableadjustments’ to facilitate the engagement of stu-dents with a disability in school activities. Thestandards note that an adjustment is reasonablein relation to a student with a disability when itbalances the interests of all parties affected. Forexample, education authorities are not obligedto adjust courses or programs and assessmentrequirements to the extent that the academicintegrity of the program is threatened. However,the student’s disability, the student’s wishes andthe wishes of their advocate, the effect of theadjustment on the student with a disability andother students, and the cost of the adjustmentshould be considered in coming to a conclu-sion about the ‘reasonableness’ of an adjustment.That the texture of this aspect of the DDA is soopen demonstrates that the standards themselvesare a significant compromise, without which, theAustralian community would still be waiting fortheir enactment.

At the time of writing, some educational juris-dictions are still coming to terms with the ram-ifications of the education standards. However,as the next section demonstrates, the standardsare already having an impact on the way inwhich schools support students with a disabil-ity. In addition to impelling schools to embracea wider definition of disability, schools are now

Page 103: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

5 Australian Policies to Support Inclusive Assessments 87

legally obliged to provide a minimum level ofeducational support to students with a disability(Nelson, 2003). Just how schools interpret thislevel of service will, no doubt, be open to inter-pretation and perhaps subject to legal challenge.

By 2009, there had been no Federal or HighCourt decisions that related to the participationof students with a disability in educational test-ing (there were several Federal Court decisionsrelating to other aspects of the education of stu-dents with a disability). Usually such mattersare addressed through policy and administrativeprocedures and rarely reach courts or tribunals(Cumming, 2009a). However, several conciliatedsettlements had been made between the parties inthis area (Australian Human Rights Commission,2009d). Nine separate students with different dis-abilities at various schools complained that theyhad not been given sufficient adjustments by theirrespective schools to allow them to participate inexams. The schools were instructed to provideappropriate adjustments that included additionaltime, a reader, a writer, access to digital record-ings of exams, a separate exam room, use ofcoloured examination paper, taking exams in afamiliar setting, and the use of a word processor.A student who was deaf complained that he wasnot provided with a sufficiently qualified teacherto help interpret exams. The education authoritysubsequently made available a teacher with thenecessary skills.

That the Australian DDA is not as prescrip-tive as US legislation about the rights of studentswith a disability can be seen as both a weak-ness and a strength in meeting the educationalneeds of these students. Supporters of strongerlegislation say that laws guarantee a minimumstandard and that they may create the circum-stances by which attitudes to students with adisability in the general and education commu-nities may change for the better. Those critical ofstronger legislation point out that there is a differ-ence between ‘following the letter of the law’ andimproved attitudes and practice. The US experi-ence following the mandating of individualisededucation programs for students with a disabil-ity is an example of how behaviour change does

not always follow from compliance with the law(Sopko, 2003). Whatever the view taken on legis-lation, the reliance in Australia on open-texturedlaws and education policy to provide educationalservices for students with a disability is a reflec-tion of cultural standards and historical precedentin this country.

Australian Policy Relevant to Studentswith a Disability

National Goals for Schoolingin the Twenty-First Century

The 1999 Adelaide Declaration on nationalgoals for schooling followed a meeting ofState, Territory, and Commonwealth Ministers ofEducation. The Declaration provided broad direc-tions to guide schools and education authoritiesand to maintain a commitment to the report-ing of comparable educational outcomes for allstudents, including students with a disability(MCEECDYA, 2010). Of interest to this chapteris the common and agreed commitment to• continuing to develop curriculum and related

systems of assessment, accreditation, andcredentialling that promote quality and arenationally recognised and valued; and to

• increasing public confidence in school educa-tion through explicit and defensible standardsthat guide improvement in students’ levels ofeducational achievement and through whichthe effectiveness, efficiency, and equity ofschooling can be measured and evaluated.One of the goals specifically relates to stu-

dents with a disability and calls for schooling tobe socially just, so that students’ outcomes fromschooling are free from the effects of negativeforms of discrimination based on sex, language,culture and ethnicity, religion, or disability. ThisDeclaration was the precursor to the plan to intro-duce national reporting of educational outcomesin numeracy and literacy and other curriculumareas.

The Melbourne Declaration on EducationalGoals for Young Australians (Ministerial Council

Page 104: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

88 M. Davies and I. Dempsey

on Education, Early Childhood Development,and Youth Affairs, 2008) was released inDecember, 2008 and superseded the AdelaideDeclaration to set the direction for Australianschooling for the next 10 years.

The MCEETYA 4 Year Plan (2009–2012) sup-ports the Melbourne Declaration and outlines thekey strategies and initiatives Australian govern-ments will undertake in eight inter-related areasin order to support the achievement of the educa-tional goals for young Australians. The two areasof interest to this chapter are• promoting world-class curriculum and assess-

ment and• strengthening accountability and transparency.

In the area of promoting world-class curricu-lum and assessment, the goals of a national cur-riculum are identified, before stating that “assess-ment of student progress will be rigorous andcomprehensive” (p. 14, MCEETYA 4 Year Plan,2009). The plan also identifies the need forgovernment to work with all school sectors todevelop and enhance national and school-levelassessment that focuses on• assessment for learning—enabling teachers

to use information about student progress toinform their teaching,

• assessment as learning—enabling students toreflect on and monitor their own progress toinform their future learning goals and

• assessment of learning—assisting teachers touse evidence of student learning to assess stu-dent achievement against goals and standards.Some of the agreed strategies and actions to

support this plan are as follows:• To establish the Australian Curriculum,

Assessment and Reporting Authority(ACARA) to deliver key national reformsincluding development of plans to improve thecapacity of schools to assess student perfor-mance and to link assessment to the nationalcurriculum and to manage the NationalAssessment Program.

• Development of high-quality diagnostic andformative assessment tools and strategies tosupport teachers’ skills and understanding inthe use (of) assessment as a tool for stu-dent learning and classroom planning and in

adapting instructional practice in classroomsto focus on specific student needs.In terms of strengthening accountability and

transparency, the plan identifies the need forgood-quality information on schooling. Schoolsand students need reliable and rich data onstudent performance so as to improve studentoutcomes. Parents and families, however, needinformation about the performance of their sonor daughter at school, of the school he or sheattends, and of the system, to help parents andfamilies make informed choices.

Through the Council of AustralianGovernments (COAG), all jurisdictions agreedto a new performance-reporting framework andagreed that ACARA will be supplied with theinformation necessary to enable it to publishrelevant, nationally comparable information onall schools to support accountability, schoolevaluation, collaborative policy development,and resource allocation.

While the Melbourne Declaration andassociated plans and policies identify stu-dents from low socio-economic backgrounds,including indigenous youth and other disad-vantaged young Australians, students withdisabilities receive minimal attention. Thiscohort, however, does receive some recognitionin the conduct of the National AssessmentProgram.

The National Assessment Program

In 2008 the National Assessment Program-Literacy and Numeracy (NAPLAN) commencedin all Australian schools. The program contin-ued in 2009 with all Australian students in Years3, 5, 7, and 9 being assessed using commonnational tests in reading, writing, language con-ventions (spelling, grammar, and punctuation),and numeracy. Samples of students must alsoparticipate in other national and internationaltests, such as in Information Technology, Maths,and Science, but NAPLAN is the most compre-hensive national assessment process that aimsto include all students in the nominated gradeseach year. Commonwealth legislation requires

Page 105: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

5 Australian Policies to Support Inclusive Assessments 89

testing of all students in identified grades forfederal school funding to be provided to thestates.

The tests are developed collaboratively by thestates and territories, the non-government educa-tion sectors, and the Australian Government. TheNAPLAN tests broadly reflect aspects of liter-acy and numeracy within the curriculum in allstates and territories, and the types of test ques-tions and test formats are chosen for familiarityto both teachers and students across Australia.Content of the tests is also informed by theNational Statement of Learning in English andMathematics. Questions are multiple-choice typeor require a short written response. Nationaltests of literacy assessed language conventions(spelling, grammar, and punctuation), writing(knowledge and control of written language),and reading (comprehension). In numeracy, thecontent areas assessed were number, measure-ment, chance and data, space, algebra, function,and pattern. Students are expected to develop arange of important skills and strategies before sit-ting the tests. Resources to help students developthese skills are made available, and teachers arealso able to use the preparation materials fromprevious NAPLAN and state tests to familiarisestudents with the types of questions and responseformats on the tests.

Each state and territory has a test adminis-tration authority that is responsible for printingthe NAPLAN tests each year, for test admin-istration, data capture, marking, and the deliv-ery of reports to the central Commonwealthbody, the MCEETYA. National protocols for testadministration ensure that the administration ofthe tests by the eight authorities is consistentlyapplied. Principals and teachers are offered infor-mation on protocols by the relevant authoritythrough a range of mediums including infor-mation sessions, written information, and web-based materials. Students are tested in their ownschools, administered by their own school teach-ers, in mid-May of the school year. In terms ofmarking, the tests for reading, language conven-tions, and numeracy are marked using opticalmark recognition software to score multiple-choice items. Writing tasks are professionally

marked by expert, independent markers usingwell-established procedures to maintain markerconsistency.

National data are collected by the relevant testadministration authority, and the de-identifiedstudent data are then submitted to an inde-pendent national data contractor for analysis.Comparative data showing the performance ofeach state, to determine national achievementscales, national means, and achievement of themiddle 60%, for each domain of reading, writing,spelling, grammar and punctuation, and numer-acy, are then provided to each testing authority.The national achievement scales each span Years3, 5, 7, and 9. The skills and understandingsassessed in each domain from Year three to Yearnine are mapped onto achievement scales withscores that range from 0 to 1000 (MCEETYA,2008).

The scores for individual students across thenational achievement scale are provided to enablecomparisons over time. Students are able tobe located on a single national scale for eachdomain, and achievement could be assessedagainst national and state means and nationalminimum standards. Specific scale scores deter-mined cutoffs for achievement bands from 1 to10 for each domain. Reporting scales were con-structed so that any given scaled score representsthe same level of achievement over time, enablingmonitoring of student achievement as studentsadvance through year levels.

Later in the year schools receive statementsof performance of their individual students andyear levels as a whole in relation to the nationalminimum standards and the number and per-centage of children not reaching national bench-marks. While decisions regarding interventionvaried across states, regions, and schools, thedecision regarding intervention is clear if the stu-dent achieves a level that is identified as below thenational minimum standard. Schools then pro-vide individual students (and their parents/carers)with statements of performance in relation tothe national minimum standards. The means andstandard deviations for each state and territorycompared to national means and standard devi-ations for each domain are published in the full

Page 106: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

90 M. Davies and I. Dempsey

report that is publicly available on the web (seeMCEETYA, 2008; MCEECDYA, 2009b).

Student Participation

The policies on student participation are pro-vided in more detail in materials developed bythe educational authorities responsible for eachstate and territory. While based on the same infor-mation, these materials are presented differently.One of the states, Queensland, provided a TestPreparation Handbook that states that the tests are“structured to be inclusive of all students, withinbudgetary and administrative limitations. . .andall eligible students must sit for the tests, unlessthey are exempt or withdrawn by parents/carers”(QSA Test Preparation Handbook, 2009, section4.0). Further, eligible students include those stu-dents of equivalent chronological age to a “typi-cal” Year 3, 5, 7, or 9 student and involved in aspecial education facility or program.

Exempt Students

The Queensland Handbook indicates that stu-dents may qualify for exemption from one ormore of the tests because of their “lack of pro-ficiency in the English language, or because ofsignificant intellectual and/or functional disabil-ity”. However, students with disabilities should“be given the opportunity to participate in test-ing if their parent/carer prefers that they doso” (QSA Test Preparation Handbook, 2009,section 4.2). Principals must consult with par-ents/carers on all matters of exemption, and thenuse ‘professional judgment’ when making deci-sions about a student’s participation in the tests.Principals are required to obtain signed forms

from parents/carers to allow students who meetthe criteria to be exempted. In Australia, exemptstudents are judged to have achieved in eachexempted test at the level “below national min-imum standard” and are reported within thissubgroup of the population of students who hadparticipated in the tests.

Students may also be withdrawn from thetesting program by their parents/carers in consul-tation with the school. Withdrawals are intendedto address concerns such as religious beliefs andphilosophical objections to testing. However,information guides from some educationalauthorities link withdrawal with students withdisabilities. For example, the 2007 informationguide for New South Wales, another Australianstate, documented that “Students with confirmeddisabilities or difficulties in learning are expectedto participate in the testing. However, parents dohave the right to withdraw their children fromtesting. This is classified as a parent withdrawaland not as an exemption” (p. 2).

Students who are withdrawn are not countedas part of the population. In terms of reporting,they are grouped with those students who wereabsent or suspended, and despite the principals’facilitation were unable to complete the test(s)in the days immediately following the standardNational testing day.

The numbers and percentages of students whoare exempted or absent/withdrawn for each ofthe five domains are reported within participa-tion statistics in the NAPLAN annual report.To provide a summary, the average percentagesof students in each grade who were exempted,absent, or withdrawn across the five domains forthe 2009 NAPLAN across Australia are providedin the Table 5.1.

In the 2009 report, some comparisons weremade regarding participation levels of the

Table 5.1 Percentages of Australian student exemptions, absences/withdrawals, and assessed across year levels for2009 NAPLAN

Year 3 (n = 260,000) Year 5 (n = 265,000) Year 7 (n = 255,000) Year 9 (n = 250,000)

Exempt 1.9 1.7 1.2 1.3

Absent/withdrawn 3.6 3.2 3.6 6.1

Assessed 94.5 95.1 95.2 92.6

Page 107: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

5 Australian Policies to Support Inclusive Assessments 91

2009 cohort compared with 2008. In gen-eral, these participation rates were very simi-lar, although some year-to-year variations amongindigenous populations in some states werenoted.

No details were publicly available as to thereasons for exemptions or absences and with-drawals. Testing authorities mentioned that thewritten parent applications for exemptions orwithdrawals subsequently approved by princi-pals were kept at the school level and werenot centrally recorded at the state or nationallevel. It is unknown as to how many exemptedstudents had identified disabilities, or were stu-dents with learning disabilities, or languagedifficulties. Similarly, the reasons for studentsbeing withdrawn or absent (or suspended) andparental philosophical objections for withdrawalare unknown. How many of those absent or with-drawn who might have had learning disabilities isalso unknown.

Special Provisions/Considerations

Special provisions were to be made where thesewere identified as necessary to comply withthe legislative requirements of the DisabilityStandards for Education (DDA). A range of sup-port and differentiated resources was to be madeavailable to enable all students to complete theNAPLAN testing. The information guide forNSW (2007) stated that special provisions wereprovided to students with disabilities or spe-cial needs in line with arrangements for exist-ing state-base tests. Special provisions could beaccessed by a student for all or part of a test,using more than one provision in any one test.Special provisions should also reflect the typeof support the student regularly accesses in theclassroom.

Queensland Special Provisions and consid-erations were included in the Test PreparationHandbook and also made freely availableon a website: http://www.learningplace.com.au/deliver/content.asp?pid=43511 (accessed Jan18, 2010). The stated goal of special pro-visions/considerations is the maximisation of

student access to the NAPLAN tests. Schoolswere encouraged to consider the principles ofequity and inclusivity in meeting the needs ofall students by recognising that many studentswithout disabilities might also require specialprovisions/consideration. While this extension ofpotential support is noteworthy, the notion of“reasonable adjustment” provides schools withthe opportunity to avoid offering accommoda-tions. After a process of consideration as towhether an adjustment was necessary, a reason-able adjustment, as previously outlined, was thento be identified and put into place. Schools wereable to provide the following accommodations orreasonable adjustments, as were considered nec-essary (QSA Test Preparation Handbook, 2009,section 5.10):• Reading support• Use of a scribe.• Extra time (up to 50%) including rest breaks• Braille and large-print test materials• Separate supervision or special test environment• PCs/laptops (no spell check or speech-to-text soft-

ware)• Assistive listening devices• Specialised equipment or alternative communication

devices• ‘Signed’ instructions.

However, a number of accommodations werenot permitted. In terms of reading, it was notpermitted to• read numbers or symbols in numeracy tests• interpret diagrams or rephrase questions• read questions, multiple-choice distractors, or stimulus

material in the reading or language conventions tests• paraphrase, interpret, or give hints about questions or

texts;• literacy questions could not be read or signed to

students with moderate/severe-to-profound hearingimpairment.

Many of these disallowed accommodationswould be judged by many to be quite “reason-able” in providing students with disabilities withthe chance of an even playing field in under-standing what is required and to then be able tocomplete assessment tasks.

The numbers and percentages of studentswho are afforded special consideration accom-modated, and the types of accommodations arenot provided in the annual report. Some test-ing authorities have reported the numbers whowere given special consideration on at least one

Page 108: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

92 M. Davies and I. Dempsey

Table 5.2 Queensland students afforded special consideration and exemptions, across year levels for 2009 NAPLAN

Year levelTotalparticipants

Numbers given specialconsideration

Percentage of total givenspecial consideration

Numbersexempted

Percentage oftotal exempted

3 56,368 7388 13.1 1123 1.9

5 57,467 6730 11.7 1068 1.8

7 58,182 6121 10.5 976 1.7

9 59,997 2834 4.7 1015 1.7

test booklet. For instance, the Queensland StudiesAuthority (QSA, 2009) provided the details forthe Table 5.2 to be constructed.

No details were publicly available as to thetypes of special consideration afforded to stu-dents, and the types of disability of these stu-dents. Written parent applications for specialconsideration subsequently approved by princi-pals were kept at the school level and werenot centrally recorded at the state or nationallevel.

Issues Arising fromExemptions/Absences/Withdrawals

Some concerns have been expressed by politi-cians and others in the print media regardingan increase in NAPLAN absenteeism in someschools in some states (Sydney Morning Herald,October 20, 2009). In this article published inthe Sydney Morning Herald, a politician sug-gested that because of the publication of schoolperformance data that there is “a fear that theschool’s average and reputation will be dam-aged by individual poor results”. The Council ofAustralian Governments (COAG) agreement to anew performance reporting framework to publishrelevant, nationally comparable information onall schools to support accountability and schoolevaluation is potentially threatening to poor per-formers. Some teachers, schools, school prin-cipals, education authorities, and governmentswill potentially experience ramifications if per-formances at their level of responsibility in thesystem fall below expectation and/or the national

standards. To avoid such negative outcomes,strategies at each level of the system are being putinto place. For example, since the first NAPLANtesting in 2008, there is anecdotal evidence thatteachers are spending more time on training theirchildren on NAPLAN-type tasks from very earlyin the school year. School principals in somestates are being offered financial and other incen-tives to increase the performance of their school.In the future, school funding and teacher per-formance pay will potentially be linked to per-formance on national tests (Cumming, 2009a).There is concern that at the student level, manypoor-performing students and their parents willbe discouraged by teachers/schools to participate.While these issues raise the potential for neg-ative outcomes for students, many schools andteachers are adopting a more positive approachand identifying strategies to help students learnthe curriculum and ultimately perform well onassessment tasks.

The figures provided in Tables 5.1, and 5.2point to the fact that substantial numbers of stu-dents are not participating in national achieve-ment testing. Non-participation is an indictmenton the whole system and flies in the face ofthe legislation (DDA, Education Standards) andpolicies (National Schooling for the twenty-firstcentury, Adelaide and Melbourne Declarations,MCEETYA 4 Year Plan) that promote assessmentfor all.

When students are exempted from nationaltesting or are withdrawn or are absent becausethe tests are deemed not to be appropriate forthose students, the national assessment programfails those students. Their lack of inclusion

Page 109: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

5 Australian Policies to Support Inclusive Assessments 93

places them and their parents in a position ofobscurity. These students and their performancelevels become less important to schools, andtheir parents do not receive meaningful perfor-mance information in terms of national stan-dards. In terms of the school performance data,many of these students could be written offas not having the potential to achieve the cur-rent minimum national standards, and so whenresources are allocated to improve performance,the learning needs of these students may beconsidered less important, compared to other stu-dents who might have potential to achieve thesestandards.

These are similar concerns to those raised inthe United States more than 10 years ago inthe report from the Committee on Goals 2000and the Inclusion of Students with Disabilitiesfor the National Research Council (McDonnell,McLaughlin, & Morison, 1997). In the ExecutiveSummary of this report, concerns were identi-fied about the exclusion from participation ofmany students with disabilities. When data onachievement levels of students with disabilitiesare absent, then “judgments about the effective-ness of educational policies and programs atlocal, state and national levels (p. 6)” are neithervalid nor fair.

Issues Arising from SpecialProvisions/Considerations

There is evidence of special provisions andaccommodations being applied to support stu-dents with additional needs. However, these sanc-tioned accommodations are regarded as mini-mal by some judges. Moreover, validity issueshave been raised in relation to the limited appli-cation of accommodations and adaptations toNAPLAN among other assessment approaches(Cumming, 2009a). From a cultural perspec-tive, while students with disabilities have diverseways of knowing, we still frame accommoda-tions from particular constructions of ways ofknowing, “expected patterns of ‘normal’ devel-opment, and how the demonstration of knowl-edge should occur” (Cumming, 2009a, p. 5).

While legislation and policy calls for appropriateaccommodations for all students with disabili-ties, alternative assessment forms to allow stu-dents to demonstrate their knowledge in otherways are not provided in NAPLAN. Cummingalso raises concerns about the form of someNAPLAN assessment items dominating the taskand precluding the demonstration of knowledge.Such assessment formats are not uncommon inlarge-scale testing, but they do not allow studentswith disabilities to demonstrate their skills andknowledge.

There is a need to develop alternative assess-ments for students with additional needs. Inthe United States, the No Child Left BehindAct (NCLB) requires all students to participatein statewide assessments. The challenge is todesign an assessment system that signals highexpectation of performance but still provides use-ful data about the progress of students at thelower end (McDonnell et al., 1997). Participationfor some students with disabilities will requiresome form of testing accommodation that entailnon-standard forms of test administration andresponse. Other students will require alterna-tive assessments to accurately measure perfor-mance at the low end of the scale. The USDepartment of Education has developed guide-lines for states that permit alternate and flexi-ble assessments for special education students(Cortiella, 2007). Some students will requirealternate assessment based on modified academicachievement standards (AA-MAS) that are deter-mined, justified, and documented by the teach-ing or Individualised Education Plan (IEP) team.Assessment is aligned to grade-level contentstandards for the grade in which the student isenrolled, but may be less challenging than thegrade-level achievement standard. This form ofmodified assessment must have at least threeachievement levels. States can modify standards,or design a totally different assessment, or adaptthe existing regular assessments. Some statesadapted the regular assessment by reducing thenumber of test questions, others simplified thelanguage of test items, reducing the number ofmultiple choice options, using pictures to aidunderstanding, and providing more white space

Page 110: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

94 M. Davies and I. Dempsey

on the test booklet. The focus of the specialeducation service is to accelerate learning toovercome achievement gaps.

A few students will require alternate assess-ment based on alternate achievement standards(AA-AAS) whereby the expectation of perfor-mance is lower than the grade-level achievementstandard, and “usually based on a very limitedsample of content that is linked to but does notfully represent grade-level content” (Cortiella,2007, p. 7).

In essence, assessment needs to give creditfor what knowledge can be demonstrated whennot bound by the constraints of comparisonswith other children, or a curriculum that doesnot reflect different ways of knowing (Cumming,2009a). Educators and assessors will need todemonstrate clearer understandings of childrenand their ways of knowing before assess-ments can effectively meet the needs of allchildren.

ConclusionThe Ministerial Council for Education, EarlyChildhood Development and Youth Affairs(MCEECDYA) under its previous name(MCEETYA) had a facilitative role in thedevelopment of the Education Standards.Subsequent to the passing of the EducationStandards into legislation, MCEECDYA hasdeveloped a national testing regime. It is con-cerning that a number of significant incon-sistencies between the Standards (and otherlegislation and policies) and the application ofNAPLAN have been identified.

While the legislation and policies requireequity of opportunity in all school activities,many students with a disability are exemptedfrom national testing. While the overallnumber of exemptions is known, the reasonsfor these exemptions are not known. Somestudents are offered adjustments and accom-modations, but these are not centrally recordedor monitored for quality assurance in the statesthat have been reviewed. No alternative assess-ments for NAPLAN have been developed,and there appears to be no agenda to do so.

So while there would seem to be a nationalagenda of assessment of achievement for all,the practical application of these policies fallswell short of the mark. It may well be that test-ing authorities in each state could justify theirlack of alternative assessment items or testsbehind the DDA Education Standards’ excep-tion of “unjustifiable hardship” in not carry-ing out the assessment obligation because it“would be very expensive” (DDA EducationStandards, 2009, p. 6).

National assessment needs to become moreinclusive to meet the requirements of theDDA Education Standards. Public reportingof assessment results for students with dis-abilities along with those who participate indifferent or modified assessments are “keyto ensuring fair and equitable comparisonsamong schools, districts, and states; in addi-tion, all students should be accounted for inthe public reporting of results” (McDonnellet al., 1997, p. 7). With the establishment ofACARA and the goal of alternative assess-ments, there is the promise that inclusivenational assessment for all can take place inthe near future. If alternate modes of assess-ment are not developed, there is an expectationof court challenges about the adequacy of cur-rent provisions to assess student achievement(Cumming, 2009b).

So the agenda is clear. Exemptions needto be minimised, and existing assessmentprotocols need to be adjusted, in particular,the level of expected performance needs tobe reduced for each grade level. Conditionsof testing also need to be adjusted, andtest items need to be reviewed and modi-fied to accommodate students with additionalneeds. Alternative assessment tools need tobe designed to accommodate such students.While national testing has been newly intro-duced and is evolving within complex politicaland educational contexts, this testing regimeneeds to achieve full validity and provideequity for all. For this to be realised, it isessential that national assessments are trulyinclusive.

Page 111: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

5 Australian Policies to Support Inclusive Assessments 95

References

Australasian Legal Information Institute. (2009).Disability Discrimination Act 1992. Retrieved onDecember 3, 2009 from the World Wide Web: http://www.austlii.edu.au/

Australian Council for Educational Research. (2009).Programme for international student assessment.Retrieved on December 3, 2009 from the World WideWeb: http://www.acer.edu.au/index.html

Australian Government Attorney-General’s Department.(2005). Disability standards for education.Retrieved on December 4, 2009 from the WorldWide Web: http://www.ag.gov.au/www/agd/agd.nsf/Page/Humanrightsandanti-discrimination_DisabilityStandardsforEducation

Australian Government Productivity Commission. (2004).Review of the disability discrimination Act 1992.Retrieved on December 4, 2009 from the World WideWeb: http://www.pc.gov.au/

Australian Human Rights Commission. (2009a). Courtdecisions. Retrieved on August 19, 2009 fromthe World Wide Web: http://www.hreoc.gov.au/disability_rights/decisions/court/court.html#cted

Australian Human Rights Commission. (2009b).Disability complaint outcomes. Retrieved onDecember 4, 2009 from the World Wide Web:http://www.hreoc.gov.au/disability_rights/index.html

Australian Human Rights Commission. (2009c).Disability standards and guidelines. Retrievedon December 3, 2009 from the World Wide Web:http://www.hreoc.gov.au/

Australian Human Rights Commission. (2009d).Disability rights. Accessed on August 19, 2009from the World Wide Web: http://www.hreoc.gov.au/disability_rights/index.html

Cortiella, C. (2007). Learning opportunities for your childthrough alternate assessments: Alternate assessmentsbased on modified academic achievement standards.Minneapolis, MN: University of Minnesota, NationalCenter on Educational Outcomes.

Cumming, J. J. (2009a). Adjustments and accommoda-tions in assessments that counts: Needed creativityin consideration of equity. Paper presented at the35th annual conference for the international asso-ciation for educational assessment, Brisbane, 13–18September 2009. Retrieved January 19, 2010 from theWorld Wide Web: http://www.iaea2009.com/abstract/187.asp

Cumming, J. J. (2009b). Assessment challenges, thelaw and the future. In C. M. Wyatt-Smith &J. J. Cumming (Eds.), Educational assessment inthe 21st Century: Connecting theory and practice(pp. 157–179). Dordrecht, The Netherlands: SpringerInternational.

DDA Education Standards. (2009). Your right to aneducation: A guide for students with a disability,their associates, and education providers. Retrieved

January 19, 2010 from the World Wide Web: www.ddaedustandards.info

Dempsey, I. (2004). Recent changes in the proportionof students identified with a disability in Australianschools. Journal of Intellectual and DevelopmentalDisability, 29, 172–176.

Dempsey, I. (2007). Trends in the placement of studentsin segregated settings in NSW government schools.Australasian Journal of Special Education, 31,73–78.

McDonnell, L. M., McLaughlin, M. J., & Morrison, P.(1997). Educating one and all: Students with disabil-ities and standards-based reform. Washington, DC:National Academy Press.

Ministerial Council on Education, Early ChildhoodDevelopment, and Youth Affairs. (2008). Nationaldeclaration on the educational goals for youngAustralians. Retrieved January 19, 2010 from theWorld Wide Web: http://www.mceecdya.edu.au/verve/_resources/National_Declaration_on_the_Educational_Goals_for_Young_Australians.pdf

Ministerial Council for Education, Early ChildhoodDevelopment and Youth Affairs. (2009a). Statementsof learning. Retrieved on December 3, 2009 fromthe World Wide Web: http://www.mceecdya.edu.au/mceecdya/

Ministerial Council for Education, Early ChildhoodDevelopment and Youth Affairs. (2009b). NAPLAN2009 report. Retrieved on January 19, 2010 fromthe World Wide Web: http://www.mceecdya.edu.au/mceecdya/naplan_2009_report,29487.html

Ministerial Council for Education, Early ChildhoodDevelopment and Youth Affairs. (2010). The Adelaidedeclaration on national goals for schooling in thetwenty-first century. Retrieved on January 27, 2010from the World Wide Web: http://www.mceecdya.edu.au/mceecdya/adelaide_declaration_1999_text,28298.html

Ministerial Council on Education, Employment, Trainingand Youth Affairs. (2008). National assessment pro-gram: Literacy and numeracy. Achievement in read-ing, writing, language conventions and numeracy.Retrieved on 18 March 2009 from the World WideWeb: http://www.mceetya.edu.au

Ministerial Council on Education, Employment, Trainingand Youth Affairs. (2009). MCEETYA four year plan2009–2012. Retrieved on January 14, 2010 fromthe World Wide Web: http://www.mceecdya.edu.au/mceecdya/action_plan,25966.html

Nelson, B. (2003). Most state and territory education min-isters vote against disability standards. Media release,July 11. Retrieved on January 28, 2010 from theWorld Wide Web: http://www.dest.gov.au/ministers/nelson/jul_03/minco703.htm

NSW Dept of Education and Training. (2007). Nationalassessment program: Primary newsletter. Retrievedon January 18, 2010 from the World Wide Web:http://www.curriculumsupport.education.nsw.gov.au/policies/nap/assets/nap_pri_newshi.pdf.

Page 112: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

96 M. Davies and I. Dempsey

Queensland Studies Authority (QSA). (2009). Test prepa-ration handbook. Brisbane, Australia: Retrieved onJuly 10, 2009 from the World Wide Web: QSA. <http://qsa.qld.edu.au/assessment/8021.html, and on Jan 18,2010 on the learning place website: http://www.learningplace.com.au/deliver/content.asp?pid=43511

Sopko, K. M. (2003). The IEP: A synthesis of currentliterature since 1997. ERIC Document ReproductionService, No. ED476559.

Sydney Morning Herald. (2009). Schools play tru-ant to avoid bad marks. Anna patty. Media release.

Retrieved on October 20, 2009 from the World WideWeb: http://smh.com.au/national/schools-play-truant-to-avoid-bad-marks

World Bank. (2009a). Gross domestic product. Retrievedon December 3, 2009 from the World Wide Web:http://www.worldbank.org/

World Bank. (2009b). Population. Retrieved on December3, 2009 from the World Wide Web: http://www.worldbank.org/

Page 113: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

Part II

Classroom Connections

Page 114: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy
Page 115: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

6Access to What Should Be Taughtand Will Be Tested: Students’Opportunity to Learn the IntendedCurriculum

Alexander Kurz

The annual large-scale assessment of stu-dent achievement for accountability purposes isexpected to provide reliable test scores that per-mit valid inferences about the extent to whichstudents have achieved the intended academiccontent standards. The fact that schools are opento sanctions and rewards on the basis of studentachievement also indicates that these test scoreinterpretations include inferences about what stu-dents know and can do as a result of the learn-ing opportunities provided by schools (Burstein& Winters, 1994). That is, teachers and schooladministrators are considered to contribute to stu-dent achievement and, if properly incentivizedand supported, are assumed to promote studentachievement more effectively (Linn, 2008). Tosupport the validity of such inferences, test usersmust either account for differences in learningopportunities or provide evidence that all test-takers had access to a comparable opportunity tolearn the intended and assessed curriculum (Kurz& Elliott, in press; Porter, 1995).

The issue of opportunity to learn (OTL), espe-cially in the context of test-based accountability,has also prompted researchers and other stake-holders to raise equity concerns for subgroupssuch as students identified with disabilitiesand English language learners (Heubert, 2004;

A. Kurz (�)Department of Special Education, Peabody Collegeof Vanderbilt University, Nashville, TN 37067, USAe-mail: [email protected]

Herman & Abedi, 2004; Kurz, Elliott, Wehby, &Smithson, 2010; Porter, 1993; Pullin & Haertel,2008; Stevens & Grymes, 1993). The differen-tial achievement of students with and withoutdisabilities on state and national achievementtests (Abedi, Leon, & Kao, 2008; Malmgren,McLaughlin, & Nolet, 2005; Ysseldyke et al.,1998) underscores this concern and has keptthe question of access and OTL on the fore-front of debates in special education: “Are stu-dents [with disabilities] receiving equitable treat-ment in terms of access to curriculum . . . andother inputs critical to the attainment of aca-demic and educational outcomes important totheir postschool success?” (McLaughlin, 2010, p.274).

For nearly five decades, research related toOTL has focused on unpacking the “black box”of schooling processes that translate inputs intostudent outcomes (McDonnell, 1995). The con-ceptual and methodological challenges of OTLnot withstanding (Elliott, Kurz, & Neergaard, inpress; Kurz & Elliott, 2011; Roach et al., 2009),measurement related to this concept has evolvedover the years allowing stakeholders to ascer-tain relevant information about OTL at the sys-tem, teacher, and student level. Broadly defined,OTL refers to “the opportunities which schoolsprovide students to learn what is expected ofthem” (Herman, Klein, & Abedi, 2000, p. 16).Researchers from various disciplines, however,have focused on different aspects of OTL thatfacilitate student learning of what is expected,which has resulted in a wide array of definitions

99S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4_6, © Springer Science+Business Media, LLC 2011

Page 116: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

100 A. Kurz

and ways to measure OTL. Moreover, the issuesof alignment and access (to the general curricu-lum) intersect with OTL making it all the moredifficult for stakeholders to understand and applyOTL and its measurement tools in the service ofpromoting student achievement for different sub-groups. Hence the main objectives of this chapter:(a) situate the concept of OTL in the context ofa comprehensive curriculum framework; (b) clar-ify the conceptual and substantive relevance ofOTL; (c) synthesize aspects of OTL emphasizedin the literature into a coherent and empiricallybased concept; (d) highlight OTL measurementoptions; and (e) conclude with a prospective viewon OTL. Although McLaughlin (2010) recentlynoted that we can only “speculate” about thefactors that may have contributed to the currentdisparate educational outcomes for students withdisabilities and that we “cannot know with cer-tainty whether greater access to rigorous coursesand higher expectations would make a differenceto both in school and postschool outcomes” (p.274), I contend that the concept of OTL andits measurement, as discussed throughout thischapter, can move us beyond speculation towardactual measurement of access and the role it playsin promoting student achievement.

Student Access to the IntendedCurriculum

The educational lives of students with dis-abilities are embedded in a legal frameworkbased on legislation, such as the American withDisabilities Act (ADA, 1990), the Individualswith Disabilities Education Improvement Act(IDEA, 2004), and the No Child Left BehindAct (NCLB, 2001), which mandates physical andintellectual access at all levels of the educa-tional environment including curriculum, instruc-tion, and assessment (Kurz & Elliott, 2011). Thatis, students with disabilities are to be providedwith a free and appropriate public education thatmeets their individual needs and offers themaccess to, and progress in, the general curriculum.Moreover, students with disabilities are to befully included in test-based accountability and

offered content that is maximally aligned withthe grade-level content standards of their generaleducation peers. To some, this legal frameworksignals “a clear presumption that all students withdisabilities should have access to the generalcurriculum [emphasis added] and to the sameopportunity to learn [emphasis added] challeng-ing and important content that is offered to allstudents” (McLaughlin, 1999, p. 9). The extentto which standards-based accountability alonga general curriculum for all students can bereconciled with the notion of an individualizededucation for students with disabilities contin-ues to be a matter of debate (e.g., Fuchs, Fuchs,& Stecker, 2010; McDonnell, McLaughlin, &Morison, 1997; McLaughlin, 2010). To shed lighton this debate and to develop the conceptualand substantive relevance of OTL, it is necessaryto set the stage via a comprehensive curriculumframework.

The Intended Curriculum Model

Several researchers have introduced curriculumframeworks (e.g., Anderson, 2002; Porter, 2002;Webb, 1997) that emphasize the content of threemajor elements of the educational environment:the intended curriculum (i.e., the content desig-nated by state and district standards), the enactedcurriculum (i.e., the content of teacher instruc-tion), and the assessed curriculum (i.e., the testedcontent of assessments). Alignment refers to theextent to which the contents of these curric-ula overlap and serve in conjunction with oneanother to promote student achievement (Webb,2002). In a well-aligned system, the content ofteachers’ instructional activities covers the con-tent intended by the standards, which in turn aremeasured (or rather sampled) by the tested con-tent of state assessments. Several methods forthe quantitative and qualitative analysis of align-ment between these curricula are available (seeMartone & Sireci, 2009; Roach, Niebling, &Kurz, 2008).

Building on prior work (Petty & Green, 2007;Porter, 2006), Kurz and Elliott (2011) extendedthe traditional three-part curriculum framework

Page 117: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

6 Students’ Opportunity to Learn 101

to explicate issues of access and OTL for stu-dents with disabilities at the system, teacher, andstudent level. In the context of their intended cur-riculum model (ICM), the authors defined OTLas being concerned with “students’ opportunityto learn the intended curriculum” (p. 38). Thepresent version of the ICM has been revisedto clarify several questions that were promptedby the original model related to the conceptualrelevance of OTL: Are the concepts of align-ment and OTL interchangeable? Are the con-cepts of access to the general curriculum andOTL interchangeable? Is the intended curricu-lum the same for students with and without dis-abilities? Consequently, a general education andspecial education version of the model were spec-ified. Figure 6.1 presents the ICM for generaleducation.

The ICM for General Education

At the system level, the ICM posits the intendedcurriculum as the primary target of schooling.The intended curriculum hereby represents acollection of educational objectives, which intheir entirety encompass the intended purposesof schooling (i.e., what students are expected toknow and be able to do). Ideally, the intendedcurriculum identifies all valued and expected out-comes via operationally defined and measure-able objectives at different levels of aggregationsuch as subject and grade. Under the NCLBAct (2001), states were required to developchallenging academic content and performancestandards that specify “what” and “how much”is expected of students in mathematics, read-ing/language arts, and science (Linn, 2008). Thisfederal mandate was intended to compel states

Fig. 6.1 The intended curriculum model for general education

Page 118: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

102 A. Kurz

to define and improve the so-called “general cur-riculum” (Karger, 2005). NCLB further describedthe general curriculum as applicable to all stu-dents – hence the term “general.” The statute’simplementing regulations, for example, statedthat NCLB requires “each State to develop grade-level academic content and achievement stan-dards that [NCLB] expects all students – includ-ing students with disabilities – to meet” (67 F.R.71710, 71741). Additional legislative mandatesthat circumscribe or augment this general cur-riculum are not available for students without dis-abilities. The academic content and performancestandards that comprise the general curriculum atthe system level thus signal the entirety of theintended curriculum for students without disabil-ities. In other words, the general curriculum isthe intended curriculum in the context of generaleducation. For students with disabilities, how-ever, the intended curriculum is not under theexclusive purview of the general curriculum – aswill be discussed shortly.

The assessed curriculum for accountabilitypurposes is designed at the system level in align-ment with the intended curriculum. That is, thetested content of a state’s large-scale assess-ment is used to sample exclusively across thevarious content domains of the intended curricu-lum to permit a valid test score inference aboutthe extent to which students have achieved theintended curriculum. It would be unreasonable toexpect state tests to cover all skills prescribed bythe intended curriculum due to test length andtime constraints. Figure 6.1 therefore displaysthe assessed curriculum as being slightly smallerthan the intended (general) curriculum. Underthe NCLB Act (2001), all states are requiredto document alignment between the intendedand assessed curriculum (Linn, 2008). Alignmentmethodologies such as the Surveys of the EnactedCurriculum (SEC; Porter & Smithson, 2001) andthe Webb method (Webb, 1997) allow stakehold-ers to provide evidence of alignment beyond asimple match of content topics using additionalindices such as content emphasis and match ofcognitive process expectations (see Martone &Sireci, 2009; Roach et al., 2008). Lastly, it isimportant to note that the uniform description of

the intended curriculum via the general curricu-lum results in only one assessed curriculum foraccountability purposes: the annual state achieve-ment test.

At the teacher level, the ICM posits theplanned curriculum as the first proxy of theintended curriculum. The planned curriculumrepresents a teacher’s cumulative plans for cov-ering the content prescribed by the intendedcurriculum. Although the intended curriculuminforms what content should be covered for aparticular subject and grade, a teacher’s plannedcurriculum is likely to be constrained as a func-tion of the teacher’s subject matter knowledgeor familiarity with the intended curriculum. Forexample, a teacher may deliberately plan toemphasize certain content domains and omit oth-ers, while a different teacher may simply beunable to plan for comprehensive coverage ofthe intended curriculum due to missing contentexpertise and professional development experi-ences. To date, the content of teachers’ plannedcurriculum and its alignment with the intendedcurriculum have received limited research atten-tion. As part of their alignment study, Kurz et al.(2010) adapted the SEC methodology to examinealignment between teachers’ planned curriculumand the state’s intended curriculum for 18 gen-eral and special education teachers. Results viathe SEC’s alignment index (AI), which repre-sents content alignment along two dimensions(i.e., topics and cognitive demand) on a contin-uum from 0 to 1, indicated that approximately10% of teachers’ self-reported planned curricu-lum (for the first half of the school year) wasaligned with the intended curriculum. Althoughmore research is needed, the planned curricu-lum represents a viable target for professionaldevelopment, because a teacher’s planned cur-riculum directly informs and potentially con-strains his or her enacted curriculum. In theKurz et al. study, for example, alignment betweenthe planned and enacted curriculum was signif-icantly greater (about 45%) than between theintended and enacted curriculum (about 10%).That is, teachers appear to adhere first and fore-most to their own planned curriculum (ratherthan the intended curriculum). Lastly, the model

Page 119: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

6 Students’ Opportunity to Learn 103

indicates that the planned curriculum is informedby both the intended and assessed curriculum. Inthe context of test-based accountability, the con-tent of the assessed curriculum exerts a stronginfluence on what teachers plan to cover andultimately implement. Under the NCLB Act(2001), the intended and assessed curriculumhave to be aligned, which should allow teachersto focus their planning and teaching efforts onthe intended curriculum. Misalignment, however,may pressure teachers to focus on the assessedcurriculum because inferences about their effec-tiveness are made on the basis of test scores – inshort, teachers may “teach to the test” rather thanthe broader intended curriculum.

The next proxy of the intended curriculumat the teacher level is the enacted curriculum,which largely comprised of the content of class-room instruction and its accompanying materials(e.g., textbooks). Teachers also make pedagogi-cal decisions about the delivery of this contentincluding instructional practices, activities, cog-nitive demands, and time emphases related to theteaching of certain topics and skills. The enactedcurriculum plays a central role in our definitionand measurement of OTL (i.e., students’ oppor-tunity to learn the intended curriculum) becauseit is primarily through the teacher’s enacted cur-riculum that students access the intended cur-riculum. The enacted curriculum consequentlyrepresents one of the key intervention targetsfor increasing OTL. As can be seen in Fig. 6.1,the model again illustrates the potentially atten-uated uptake of the intended curriculum by eachsubordinate curriculum. At this level, the day-to-day realities of school instruction may preventteachers from enacting their entire planned cur-riculum in response to students’ rate of learn-ing, school assemblies, absences, and so on.The extent to which students have the oppor-tunity to learn the intended curriculum via theteacher’s enacted curriculum, however, is criti-cal to their performance on achievement tests,even after controlling for other factors (e.g.,Cooley & Leinhardt, 1980; Porter, Kirst, Osthoff,Smithson, & Schneider, 1993; Stedman, 1997).Moreover, providing students with the opportu-nity to learn the content that they are expected

to know represents a basic aspect of fairness intesting, particularly under high-stakes conditions(see American Educational Research Association[AERA], American Psychological Association[APA], & National Council on Measurement inEducation [NCME], 1999). OTL also plays arole in the validity of certain test score infer-ences such as those that interpret assessmentresults as a function of teacher instruction orthat explain mean test score differences betweensubgroups of examinees: “OTL provides a neces-sary context for interpreting test scores includinginferences about the possible reasons underly-ing student achievement (e.g., teacher perfor-mance, student disability) and suggestions forremedial actions (e.g., assignment of PD training,referral to special education)” (Kurz & Elliott,p. 39).

At the student level, the engaged curricu-lum represents those portions of content cover-age during which the student is engaged in theteacher’s enacted curriculum. Considering datafrom the 2006 High School Survey of StudentEngagement, on which 28% of over 80,000students reported being unengaged in school(Yazzie-Mintz, 2007), it seems reasonable to sug-gest that some students are unlikely to engage ina teacher’s entire enacted curriculum as it unfoldsacross the school year. Moreover, a student’sengaged curriculum is likely to constrain his orher learned curriculum. That is, a student will pre-sumably learn only those portions of the enactedcurriculum during which he or she is engaged.The ICM thus indicates the potential for furtherattenuation as the intended curriculum reachesthe student level via the teacher’s enacted cur-riculum. At the end of the intended curriculumchain, the model posits the displayed curricu-lum, which represents the content of the intendedcurriculum that a student is able to demonstratevia classroom tasks, assignments, and/or assess-ments. A student’s displayed curriculum may notreveal the entirety of his or her learned curricu-lum due to various factors including interactionsbetween test-taker characteristics and features ofthe test that do not permit the student to fullydemonstrate his or her knowledge of the tar-get construct (see Chapter 9, this volume), test

Page 120: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

104 A. Kurz

anxiety, or constraints directly linked to theassessed curriculum. The latter concern is essen-tially an issue of alignment: A student canonly “display” his or her achievement of theintended curriculum to the extent to which theassessed curriculum is aligned with the intendedcurriculum.

So far, we have discussed how the intendedcurriculum unfolds across the system, teacher,and student level in general education. It is withinthis educational context that most states use thegeneral curriculum (i.e., the academic contentand performance standards applicable to all stu-dents) to define their students’ intended curricu-lum. As such, it is not surprising that researchershave failed to see the need to distinguish betweenthe general and intended curriculum at the sys-tem level: In the context of general educationboth curricula are indeed synonymous. However,the uncritical adoption of traditional curriculum

models in the context of special education canblur important distinctions among curricula thatdetermine the intended outcomes of schoolingfor students with disabilities (i.e., what studentsare expected to know and be able to do). Infact, an ongoing debate in special educationcenters around the perceived tension betweentwo federal policies relevant to standards-basedreform and questions about the extent to whichthe newly established standards should circum-scribe the intended and assessed curriculum forstudents with disabilities: “There is increasingrecognition of a fundamental tension betweenthe prevailing K-12 educational policy of uni-versal standards, assessments, and accountabil-ity as defined through [NCLB] and the entitle-ment to a Free Appropriate Public Education(FAPE) within IDEA” (McLaughlin, 2010,p. 265). Figure 6.2 presents the ICM for specialeducation.

Fig. 6.2 The intended curriculum model for special education

Page 121: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

6 Students’ Opportunity to Learn 105

The ICM for Special Education

In the context of special education, the ICMposits the intended curriculum as dually deter-mined by both the general curriculum and the stu-dent’s Individualized Education Program (IEP)curriculum. That is, neither the IEP curriculumnor the general curriculum exclusively representsthe intended curriculum for a student with adisability. The implementing regulations for thereauthorization of IDEA in 1997 identified theintended purposes of special education as fol-lows: “To [(a)] address the unique needs of thechild that result from the child’s disability; and[(b)] to ensure access of the child to the gen-eral curriculum, so that he or she can meet theeducational standards within the jurisdiction ofthe public agency that apply to all children” (34C.F.R. § 300.26(b)(3)). Both reauthorizations ofIDEA in 1997 and 2004 further emphasized theIEP as the central mechanism for detailing a stu-dent’s access, involvement, and progress in thegeneral (education) curriculum (Karger, 2005).The IEP is used further to document educationalobjectives relevant to the student’s present lev-els of performance as well as accommodationsand modifications that facilitate the student’saccess to the enacted and assessed curriculum(Chapter 7, this volume). The IEP curriculum canthus include content that goes beyond the knowl-edge and skills put forth in the general curricu-lum. A student’s IEP, for example, can includesocial and behavioral goals or other functionalgoals that are not part of subject- and grade-specific academic standards. The requirement todocument a student’s access, involvement, andprogress in the general curriculum also has pro-moted the development of so-called “standards-based IEPs,” which refers to the practice thatlinks IEP objectives to a state’s grade-level stan-dards and assessments (Ahearn, 2006). As such, astudent’s IEP may include specific objectives thatcome directly from the general curriculum of hisor her peers. In short, the IEP curriculum delin-eates the extent to which the general curriculumis part of the student’s intended curriculum andincludes a set of specific (intended) educationalobjectives, which, depending on the student’s

unique disability-related needs, may fall withinor outside the general curriculum. To this end,the ICM depicts overlap between the IEP curricu-lum and the general curriculum. The degree towhich both curricula overlap is specified in eachindividual student’s IEP and thus varies from stu-dent to student. Consequently, there is no uniformintended curriculum in the context of special edu-cation: The intended curriculum for students withdisabilities is student specific by law.

The possibility of individualized intended cur-ricula has direct implications for the remain-ing curricula within the ICM framework. Mostimportantly, the notion of only one assessed cur-riculum fully aligned with the general curriculumand applicable to all students is no longer tenable.For purposes of the assessed curriculum, the ICMtherefore reflects the three assessment optionscurrently available to students with disabilities:the regular state assessment, the alternate assess-ment based on modified academic achievementstandards (AA-MAS), and the alternate assess-ment based on alternate academic achievementstandards (AA-AAS). According to the model,the varying degrees of overlap between theIEP curriculum and the general curriculum canbe grouped into three broad categories of theintended curriculum that directly correspond tothe three assessed curricula: regular, modified,and alternate.

For students with disabilities whose IEP cur-riculum largely overlaps with the general cur-riculum, the intended curriculum should leadto planned and enacted curricula that offer stu-dents the opportunity to learn grade-level subjectmatter content and progress toward predeter-mined NCLB achievement goals. The content ofthe regular achievement test thus represents theappropriate assessed curriculum. As for studentswithout disabilities, the resulting displayed cur-riculum would be used to monitor educationalprogress.

For students with disabilities whose IEPcurriculum moderately overlaps with the generalcurriculum, the intended curriculum should leadto planned and enacted curricula that continueto offer students the opportunity to learn grade-level subject matter content and progress toward

Page 122: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

106 A. Kurz

predetermined NCLB achievement goals.However, we would expect the non-overlappingportions of the IEP curriculum to include mod-ified outcomes for some general curriculumobjectives, a set of non-academic educationalobjectives (e.g., social and behavioral goals), aswell as more intensive and specialized accom-modations and related services that supportOTL. The content of the modified achievementtest thus represents a more appropriate assessedcurriculum. Progress monitoring via the resultingdisplayed curriculum would be benchmarked tomodified and regular achievement standards.

For students with disabilities whose IEP cur-riculum barely overlaps with the general cur-riculum, the intended curriculum should lead tohighly individualized planned and enacted cur-ricula that offer students the opportunity to learnsubject matter content that is linked to a limitedand not fully representative sample of grade-levelcontent. We would expect the non-overlappingportions of the IEP curriculum to represent alter-nate outcomes for most general curriculum objec-tives, a large set of non-academic educationalobjectives (e.g., social, behavioral, and func-tional goals), intensive and specialized accom-modations and modifications, and several relatedservices that support OTL. The content of thealternate achievement test thus represents a moreappropriate assessed curriculum. Progress moni-toring via the displayed curriculum would occuragainst highly differentiated outcomes related tofunctional independence and self-sufficiency.

As for students without disabilities, theintended curriculum is subject to change on anannual basis as students advance from one gradeto the other. Besides subject- and grade-specificchanges in the general curriculum, students withdisabilities also experience an annual review andupdate of their IEP. Additional changes in theIEP curriculum are therefore very likely. Ongoingfeedback loops from the displayed curriculum tothe curricula at the teacher level (i.e., planned andenacted) and system level (i.e., intended) shouldfurther permit formative changes in the contentof the intended curriculum and the planning andimplementation of classroom instruction. Lastly,it should be noted that the discussed intended

curriculum categories serve illustrative purposesand do not suggest separate “tracks” of intendedspecial education curricula.

At the teachers and student level, the intendedcurriculum unfolds in much the same way asdescribed previously in the general educationcontext. However, the student-specific nature ofthe IEP curriculum implies that the content ofa teacher’s planned and enacted curricula oughtto reflect each student’s unique intended cur-riculum. Differentiated instruction according tothe specific needs and abilities of each stu-dent, of course, represents the very essence ofspecial education and summarizes much of theteacher training content for special educators.The sources of instruction for students withdisabilities responsible for implementing theirintended curriculum, however, are rarely com-prised of only special education teachers. Inmost cases, general and special education teach-ers share the responsibility of providing a stu-dent with the opportunity to learn his or herintended curriculum, supported by paraprofes-sionals, teacher consultants and specialists, andother related services providers. The fragmen-tation of OTL sources therefore presents a sig-nificant measurement challenge – as will bediscussed in a later section of this chapter.

The Relevance of OTL

Up until now, our general definition of OTLwas used to establish the “who” (i.e., stu-dents) and “what” (i.e., the intended curricu-lum) of OTL and to pinpoint “how” thisopportunity to learn the intended curriculumpresents itself to students, namely through theteacher’s enacted curriculum. Not surprisingly,researchers interested in measuring OTL haveconcentrated their efforts on the enacted curricu-lum (e.g., Pianta, Belsky, Houts, Morrison, &National Institute of Child Health and HumanDevelopment [NICHD], 2007; Rowan, Camburn,& Correnti, 2004; Smithson, Porter, & Blank,1995). Within the context of the ICM, the focushas been on the content of the various curricula atthe system, teacher, and student level. However,

Page 123: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

6 Students’ Opportunity to Learn 107

the content of the enacted curriculum (i.e., thecontent of teacher instruction) represents onlyone dimension of OTL examined in the literature.At the enacted curriculum level, two additionaldimensions related to the concept of OTL – timeon instruction and quality of instruction – havereceived sustained research attention for nearlyfive decades. Before tracing the three dimensionsof OTL discussed in the literature, it is helpful tobriefly review the “why” of OTL in light of theICM framework.

Conceptual Relevance

The importance of OTL can be separated into itsconceptual and substantive relevance. The con-ceptual relevance of OTL lies in the fact thatit intersects and informs the concepts of align-ment and access to the general curriculum. Tothat end, three questions were introduced ear-lier: Are the concepts of alignment and OTLinterchangeable? Are the concepts of access tothe general curriculum and OTL interchange-able? Is the intended curriculum the same forstudents with and without disabilities? The lastquestion already has been answered via the twoversions of the ICM. For students without dis-abilities, the intended curriculum is exclusivelycomprised of the general curriculum, which is thesame for all students. For students with disabil-ities, the intended curriculum is comprised of astudent’s unique IEP curriculum and (to varyingdegrees) the general curriculum, which results ina different intended curriculum for each student.The answer to the second question is also “no.”Access to the general curriculum is a policy man-date under IDEA (1997, 2004) and applies onlyto students with disabilities. As defined in thischapter, the concept of OTL is concerned with astudent’s intended curriculum and hence is appli-cable to both students with and without disabili-ties. According to the ICM, however, the generalcurriculum is always part of the intended curricu-lum (at least to some degree), which implies thatdocumentation of OTL can also serve as an indi-cator of access to the general curriculum. In thissense, access to the general curriculum and OTLare related but not interchangeable concepts.

The first question requires further clarifica-tion before it can be answered. Alignment isalways established between two curricula. Withrespect to OTL, researchers have suggested align-ment between the enacted and intended cur-riculum as a possible proxy (e.g., Kurz et al.,2010; Porter, 2002; Smithson et al., 1995). Therevised question thus reads as follows: Are theconcepts of alignment between the enacted andintended curriculum and students’ opportunityto learn the intended curriculum interchange-able? Unfortunately, an answer to this questiondepends on the constraints of the alignmentmethod used. First, current alignment method-ologies do not account for the IEP curriculumas part of the intended curriculum (see Martone& Sireci, 2009; Roach et al., 2008). The over-lap between the content of classroom instructionand academic standards could thus represent anarrow aspect of students’ opportunity to learnthe intended curriculum. Second, interchange-able use of both concepts would imply that oneconsiders the content dimension of the enactedcurriculum (i.e., the degree to which its con-tent is aligned with state content standards) asa sufficient indicator of OTL – let alone theassumption that the teachers’ enacted curricu-lum represents the critical juncture at which tomeasure OTL.

Substantive Relevance

The substantive relevance of OTL underscoresthe need to document OTL because of its impacton four related issues: (a) validity of test scoreinferences (e.g., Burstein & Winters, 1994; Kurz& Elliott, 2010; Wang, 1998); (b) equity interms of educational opportunity to learn theintended curriculum and tested content (e.g.,Pullin, 2008; Yoon & Resnick, 1998); (c) compli-ance with federal policy for students with disabil-ities (e.g., Kurz et al., 2010; McLaughlin, 2010);and (d) student achievement (e.g., Gamoran,Porter, Smithson, & White, 1997; Kurz et al.,2010; Wang, 1998). First, documenting OTLcan assist stakeholders in the current test-basedaccountability system to draw valid inferencesabout possible reasons for differential subgroup

Page 124: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

108 A. Kurz

achievement or the extent to which students havelearned subject matter content as a function ofteacher instruction. Second, OTL in the contextof the ICM has highlighted that equal opportunityto learn tested content for all students representsa limited view of OTL because it is not targetedtoward students’ intended curriculum. Moreover,the intended curriculum for students with disabil-ities is legally augmented via their IEP curricu-lum, which establishes an individualized intendedcurriculum reflective of their unique abilities andneeds. Equal OTL across all student groups couldtherefore lead to unequal outcomes for studentswith disabilities. OTL as defined within the ICMhighlights equitable OTL in the context of spe-cial education. That is, opportunity to learn theintended curriculum should not be equal acrossstudents due to the student-specific nature ofthe intended curriculum in special education (asattested by special education practices such asmodified instructional content, additional timeon task, or differentiated instruction). In short,students with disabilities should receive equi-table OTL according to their individual abilitiesand needs. Third, documenting OTL can provideevidence of access to general curriculum for stu-dents with disabilities as mandated under IDEA(1997, 2004) and the NCLB Act (2001), as wellas access to tested content for all students assupported by court rulings (for purposes of grad-uation), such as Debra P. v. Turlington (1981).Fourth, researchers have used the concept of OTLand its measurement dimensions, especially at theenacted curriculum level, to identify significantpredictors of student achievement. DocumentingOTL, and ultimately increasing OTL, thus rep-resents an important avenue to improve studentoutcomes. As mentioned earlier, research on OTLat the enacted curriculum level has focused onseveral different instructional dimensions, all ofwhich were identified as contributors to studentachievement.

In summary, OTL is an important conceptthat warrants measurement. Prior to discussingdifferent measurement options, it is necessaryto establish (a) at what level of the educa-tional environment and via which curriculum todocument OTL and (b) the respective curriculum

dimensions to be measured. So far, we have usedthe curriculum framework of the ICM to substan-tiate the teacher’s enacted curriculum as students’key access point to the intended curriculum. Thenext section thus examines measurement dimen-sions of OTL at the enacted curriculum level asemphasized in the literature.

Instructional Dimensions of OTL

For nearly five decades, researchers from a vari-ety of disciplines have focused on different pro-cess indicators at the system (e.g., per-pupilexpenditures), teacher (e.g., content overlap), andstudent level (e.g., engagement) to (a) obtaindescriptions of the educational opportunities pro-vided to students; (b) monitor the effects ofschool reform efforts; and (c) understand andimprove students’ academic achievement (Kurz& Elliott, 2011). Arguably the most proximalvariable to the instructional lives of students andtheir opportunity to learn the intended curriculumis the teacher’s enacted curriculum: “[S]tudents’opportunities to learn specific topics in the schoolcurriculum are both the central feature of instruc-tion and a critical determinant of student learning.The importance of curricular content to studentlearning has led researchers to become increas-ingly interested in measuring the ‘enacted cur-riculum’ . . .” (Rowan et al., 2004, pp. 75–76).Starting in the 1960s, separate OTL researchstrands started to form around three differentinstructional dimensions of the enacted curricu-lum: time on instruction (e.g., Carroll, 1963),content of instruction (e.g., Husén, 1967), andquality of instruction (e.g., Brophy & Good,1986). Each strand is briefly reviewed next.

Time on Instruction

The first research strand emerged with JohnCarroll (1963), who introduced the concept ofOTL as part of his model of school learning:“Opportunity to learn is defined as the amountof time allowed for learning, for example by

Page 125: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

6 Students’ Opportunity to Learn 109

a school schedule or program” (Carroll, 1989,p. 26). Carroll included OTL as one of five vari-ables in a mathematical formula, which he used toexpress a student’s degree of learning (i.e., ratioof the time spent on a task to the total amountof time needed for learning the task). Subsequentresearch on time and school learning began torefine this OTL conceptualization from allocatedtime into “engaged time,” “instructional time,”and “academic learning time” (see Borg, 1980;Gettinger & Seibert, 2002). The concept of aca-demic learning time (ALT) introduced by Fisherand colleagues (1980), for example, considers theamount of time a student is engaged in an aca-demic task that he or she can perform with highsuccess. The amount of time dedicated to instruc-tion has received substantial empirical supportin predicting student achievement (e.g., Carroll,1989; Denham & Lieberman, 1980; Fisher &Berliner, 1985; Walberg, 1988). In a research syn-thesis on teaching, Walberg (1986) identified 31studies that examined the “quantity of instruc-tion” and its relation to student achievement.Walberg reported a median (partial) correlationof 0.35 controlling for other variables such as stu-dent ability and socioeconomic status. In a meta-analysis on educational effectiveness, Scheerensand Bosker (1997) examined the effect of (allo-cated) time on student achievement using 21 stud-ies with a total of 56 replications across studies.The average Cohen’s d effect size for time was0.39 (as cited in Marzano, 2000). Both researchreviews, however, provided insufficient informa-tion about the extent to which time usage wasreported by special education teachers and failedto disaggregate the relation between time and stu-dent achievement for students with and withoutdisabilities. Considering that time usage relatedto instruction represents one of the best docu-mented predictors of student achievement acrossschools, classes, student abilities, grade levels,and subject areas (Vannest & Parker, 2010), itis not surprising that research regarding time oninstruction continues across the system (i.e., allo-cated time), teacher (i.e., instructional time), andstudent level (i.e., ALT) of the ICM.

Despite the fact that NCLB has positedincreased time on instruction as an important

avenue for improving the academic achieve-ment of all students (Metzker, 2003), little isknown about the extent to which special educa-tion teachers spend time on instruction (Vannest& Hagan-Burke, 2010). Special education hasbeen marked by significant changes in teacherroles, settings, and instructional arrangementsover the last few decades, which have increasedthe number of activities that require substantialamounts of teacher time such as paperwork, con-sultation, collaboration, assessment, and behaviormanagement (e.g., Conderman & Katsiyannis,2002; Kersaint, Lewis, Potter, & Meisels, 2007;Mastropieri & Scruggs, 2002). The majorityof available data, however, are anecdotal. Inone of the first studies that used teacher self-reports in conjunction with continuous and inter-val direct observation data, Vannest and Hagan-Burke (2010) reported on the results of 2,200hours of data from 36 special education teach-ers. Two findings are noteworthy: (a) Time usefor 12 different activities ranged from 2.9 to15.6%, which indicates that no single activitytook up the majority of the hours of the day; (b)academic instruction, instructional support, andpaperwork occupied large percentages of timewith 15.6, 14.6, and 12.1%, respectively. Vannestand Hagan-Burke concluded that “the sheer num-ber of activities in which [special education]teachers engage is perhaps more of an issuethan any one type of activity, although paper-work (12%) certainly reflects a rather disastrouslylarge quantity of noninstructional time in a day”(p. 14).

In summary, time on instruction represents animportant instructional dimension of the enactedcurriculum and has received substantial empir-ical support as a strong contributor to studentachievement. Unfortunately, research data ontime usage for special education teachers are vir-tually non-existent, especially in relation to stu-dent achievement. Moreover, the limited researchavailable for special education teachers indi-cates that large percentages of time are occupiedby non-instructional activities, which raises con-cerns about the total amount of time a specialeducation teacher can dedicate to instruction (seeVannest & Hagan-Burke, 2010).

Page 126: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

110 A. Kurz

Content of Instruction

The second research strand emerged with studiesthat focused on the content overlap between theenacted and assessed curriculum (e.g., Comber &Keeves, 1973; Husén, 1967). Husén, one of thekey investigators for several international studiesof student achievement, developed an item-basedOTL measure that required teachers to reporton the instructional content coverage for eachassessment item via a three-point Likert scale:“Thus opportunity to learn from the Husén per-spective is best understood as the match betweenwhat is taught and what is tested” (Anderson,1986, p. 3682). To date, the InternationalAssociation for the Evaluation of EducationalAchievement (IEA) has conducted six compara-tive studies of international student achievement,the results of which have supported students’opportunity to learn the assessed curriculum as asignificant predictor of systematic differences instudent performance. This content overlap con-ceptualization of OTL remained dominant in sev-eral other research studies during the 1970s and1980s, all of which focused on general educationteachers (e.g., Borg, 1979; Mehrens & Phillips,1986; Walker & Schaffarzick, 1974; Winfield,1987). For their meta-analysis, Scheerens andBosker (1997) reviewed 19 studies focused onteachers’ content coverage of tested content andreported an average Cohen’s d effect size of .18(as cited in Marzano, 2000).

Another line of research on content overlapfocused on students’ opportunity to learn impor-tant content objectives (e.g., Armbuster et al.,1977; Jenkins & Pany, 1978; Porter et al., 1978).Porter et al., for instance, developed a basic tax-onomy for classifying content included in mathe-matics curricula and measured whether differentstandardized mathematics achievement tests cov-ered the same objectives delineated in the taxon-omy. Porter continued his research on measuringthe content of the enacted curriculum during theadvent of standards-based reform (e.g., Gamoranet al., 1997; Porter, Kirst, Osthoff, Smithson, &Schneider, 1993) and developed a survey-basedmeasure that examined the content of instruction

along two dimensions: topics and categories ofcognitive demand (Porter & Smithson, 2001;Porter, 2002). The findings of Gamoran et al.indicated that alignment between instruction anda test of student achievement in high schoolmathematics accounted for 25% of the varianceamong teachers. None of the mentioned stud-ies, however, considered students’ opportunity tolearn the assessed or intended curriculum andacademic achievement in the context of specialeducation.

Porter’s measure, now called the SEC, ispresently the only method that can assess align-ment among various enacted, intended, andassessed curricula via a content translation ofeach curriculum into individual content matri-ces along two dimensions (i.e., topics, cogni-tive demands). For purposes of the SEC, Porter(2002) developed an AI to determine the contentoverlap between two matrices at the intersec-tion of topic and cognitive demand. Researchershave utilized this continuous alignment variableas an independent variable in correlational stud-ies predicting student achievement (Smithson &Collares, 2007; Kurz et al., 2010).

Smithson and Collares (2007) used simplecorrelations, multiple regression, and hierarchicallinear modeling to examine the relation betweenalignment (using the SEC’s AI) and studentachievement. The average correlation betweenalignment (of the enacted to the intended cur-riculum) and student achievement was 0.34 (p <0.01). Smithson and Collares subsequently usedmultiple regression analyses to control for theeffects of prior achievement, grade level, andsocioeconomic status (SES). The results sup-ported alignment (of the enacted to the intendedcurriculum) as a significant predictor of achieve-ment with adjusted R2 ranging between 0.41 and0.70. Smithson and Collares further noted thatthe results of the multi-level analysis supportedalignment as significant predictor of achieve-ment at the classroom level (Level 2) controllingfor grade level and SES as well as control-ling for prior achievement at the student level(Level 1). Herman and Abedi (2004) conductedanalyses similar to Smithson and Collares’s(2007), using their own item-based OTL measure

Page 127: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

6 Students’ Opportunity to Learn 111

(i.e., asking students and teachers about theextent to which tested content was covered). Assuch, the OTL construct related to the contentof instruction was aimed at the content overlapbetween the teacher’s instruction on 28 AlgebraI content domains and an aligned mathemat-ics assessment. The correlation between student-reported OTL (at the class level) and classachievement was 0.72 (p < 0.01), and the corre-lation between teacher-reported OTL (at the classlevel) and class achievement was 0.53 (p < 0.01).Their multi-level analyses further indicated thatthe proportion of English language learners in aclass and OTL have significant effects on studentachievement, even after controlling for students’prior achievement and background.

Research data on the relation between OTLand student achievement in the context of specialeducation are presently very limited. Roach andElliott (2006) used student grade level, teacherreports of students’ curricular access, percent-age of academic-focused IEP goals, and timespent in general education settings as predic-tors of academic performance on a state’s alter-nate assessment. Results indicated the modelaccounted for 41% of the variance in studentachievement. Teacher-reported coverage of gen-eral curriculum content was the best predictorin the model accounting for 23% in the vari-ance in student performance. Kurz et al. (2010)used the SEC alignment methodology to exam-ine the relation between OTL (i.e., alignmentbetween the enacted and intended curriculum)and student achievement averages for generaland special education teachers. The content ofinstruction delivered by general and special edu-cation teachers as measured by the SEC did notindicate significantly different alignment indicesbetween the two groups. The correlation betweenOTL and (class averages of) student achieve-ment was 0.64 (p < 0.05). When general andspecial education teachers were examined sep-arately, the correlation between alignment andachievement remained significant only for thespecial education group with 0.77 (p < 0.05).Unfortunately, these findings cannot be gener-alized due to the study’s limited sample size.A multi-level (re)analysis of the Kurz et al.

data via hierarchical linear modeling allowed forvariance decomposition of students’ end-of-yearachievement using predictors at the student level(i.e., prior achievement) and classroom level (i.e.,classroom type, classroom alignment). The intra-class correlation coefficient (ICC) was ρ̂ = 0.34(i.e., 34% of variance in students’ end-of-yearachievement was between classrooms). The final(main effects) model predicted individual stu-dent achievement as a function of overall meanclassroom achievement, main effect for class-room type (i.e., general education, special educa-tion), main effect for classroom alignment, priorachievement as a covariate, and random error. Allfour fixed effects were significant (p < 0.001),while the random effects were not significant (p >0.05). The results of the reanalysis thus supportedclassroom type and classroom alignment as sig-nificant predictors of individual student achieve-ment even after controlling for prior achievementat the student level. In addition, both classroomtype and classroom alignment accounted for vir-tually all variance in student achievement thatwas between classrooms.

In summary, the available data support anempirical association between the content ofinstruction and student achievement. The qual-ity of the data, however, is limited, whichmakes it difficult to generalize the findings anddevelop interventions. First, the measures of stu-dents’ opportunity to learn instructional contentvary across studies. Researchers have repeatedlyemployed two approaches for collecting OTLdata on the content of instruction: (a) item-basedOTL measures, which teachers use to report onthe relative content coverage related to each testitem (e.g., Herman & Abedi, 2004; Husén, 1967;Winfield, 1993); and (b) taxonomic OTL mea-sures that provide an exhaustive list of subject-specific content topics, which teachers use toreport on the relative emphases of enacted contentaccording to different dimensions (e.g., Porter,2002; Rowan & Correnti, 2009). Second, thequality of achievement measures used acrossstudies is unclear. That is, little information isavailable on the reliability of achievement testscores and the test’s alignment to the intendedcurriculum. The latter concern is about the

Page 128: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

112 A. Kurz

extent to which the achievement test in questionmeasured the content that teachers were supposedto teach (i.e., the content prescribed by the stan-dards). That is, alignment between the enactedand intended curriculum cannot be expected tocorrelate highly with student achievement, if thetest fails to be aligned with the respective contentstandards. In addition, the instructional sensi-tivity of assessments used to detect the influ-ence of OTL on achievement typically remainsan unexamined assumption among researcher(D’Agostino, Welsh, & Corson, 2007; Polikoff,2010). Another limitation in the presently avail-able data on OTL related to the content of instruc-tion is the paucity of research involving specialeducation teachers and students with disabilities.

Quality of Instruction

The third and most diverse research strand relatedto an instructional dimension of OTL can betraced back to several models of school learning(e.g., Bloom, 1976; Carroll, 1963; Gagné, 1977;Harnischfeger & Wiley, 1976). Both Carroll’smodel of school learning and Walberg’s (1980)model of educational productivity, for exam-ple, featured quality of instruction alongsidequantity of instruction. The operationalization ofinstructional quality for purposes of measure-ment, however, resulted in a much larger set ofindependent variables related to student achieve-ment than instructional time. In his researchsynthesis on teaching, Walberg (1986) reviewed91 studies that examined the effect of qualityindicators on student achievement, such as fre-quency of praise statements, corrective feedback,classroom climate, and instructional groupings.Walberg reported the highest mean effect sizesfor (positive) reinforcement and corrective feed-back with 1.17 and 0.97, respectively. Brophyand Good’s (1986) seminal review of the process-product literature identified aspects of givinginformation (e.g., pacing), questioning students(e.g., cognitive level), and providing feedbackas important instructional quality variables withconsistent empirical support. Additional meta-analyses focusing on specific subjects and student

subgroups are also available (e.g., Gersten et al.,2009; Vaughn, Gersten, & Chard, 2000). Gerstenet al. (2009), for example, examined variousinstructional practices that enhanced the math-ematics proficiency of students with learningdisabilities. Gersten and colleagues hereby iden-tified two instructional practices that providedpractically and statistically important increases ineffect size: teaching students the use of heuris-tics (i.e., general problem-solving strategy) andexplicit instruction.

OTL research related to the quality ofinstruction also has considered teacher expecta-tions for the enacted curriculum (i.e., cognitivedemands) and instructional resources such asaccess to textbooks, calculators, and computers(e.g., Boscardin, Aguirre-Muñoz, Chinen, Leon,& Shin, 2004; Herman et al., 2000; Porter, 2002;Wang, 1998). Wang provided one of the firstmulti-level OTL studies that examined the qualityof instruction alongside three other content vari-ables (i.e., coverage, exposure, and emphasis).Wang’s findings supported students’ attendancerate, content coverage, content exposure, andquality of instruction as significant predictors ofstudent achievement (controlling for ability, gen-der, and race). Wang further noted that contentexposure (i.e., indicator of time on instructionvia periods allocated to instruction) was the mostsignificant predictor of written test scores, whilequality of instruction (i.e., lesson plan comple-tion, equipment use, textbook availability, mate-rial adequacy) was the most significant predictorof hands-on test scores. Although Wang consid-ered the multi-dimensional nature of OTL, shedid not include time on instruction and usedan unconventional measure of content cover-age, namely the teachers’ predicted pass rate forstudents on each test item. The latter measureof instructional content, however, is difficult tointerpret without knowing the extent to whichthe test covered the teachers’ enacted curriculum.Moreover, questions that ask teachers to predictstudents’ pass rates on items are likely to be con-founded by their estimates of student ability. Thiscaveat notwithstanding, Wang demonstrated thatquality of instruction can serve as a significantpredictor of student test scores even with other

Page 129: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

6 Students’ Opportunity to Learn 113

key OTL variables in the model. The empiri-cal relation between quality of instruction andstudent achievement, however, is mostly basedon the reports of general education teachers andthe academic achievement of students withoutdisabilities.

In summary, many researchers interested inOTL have started to consider the dimension ofinstructional quality. Herman et al. (2000) iden-tified two broad categories of interest in thisinstructional dimension related to (a) instruc-tional resources such as equipment use and (b)instructional practices such as working in smallgroups. However, as discussed earlier, numer-ous other indicators of quality associated withstudent achievement are found in the literatureincluding teacher expectations for student learn-ing, progress monitoring, and corrective feed-back (e.g., Brophy & Good, 1986, Porter, 2002).The wide range of instructional quality vari-ables available underscores the importance forresearchers to provide a theoretical and empir-ical rationale for their particular operationaliza-tion of instructional quality. Lastly, it should beacknowledged that all quality indicators used inOTL research based on teacher self-report arelimited to information about frequency or empha-sis. While such information is clearly useful,it cannot indicate the extent to which certainpractices were implemented correctly.

The Unfolding of Instruction

So far, we have redefined the concept of OTLin the context of the current test-based account-ability system as students’ opportunity to learnthe intended curriculum and considered therespective implications for both students withand without disabilities. We further posited theenacted curriculum as students’ main accesspoint to the intended curriculum. The subse-quent literature review was used to highlightthree broad research strands related to OTL thatfocused on different instructional dimensions ofthe enacted curriculum: time on instruction, con-tent of instruction, and quality of instruction.Anderson first articulated a merger of the various

OTL conceptualizations in 1986: “A single con-ceptualization of opportunity to learn coupledwith the inclusion of the variable[s] in classroominstructional research . . . could have a profoundeffect on our understanding of life in classrooms”(p. 3686). In 1993, Stevens and Grymes estab-lished the first unified conceptual framework ofOTL to investigate “students’ access to the corecurriculum” along four dimensions: content cov-erage (i.e., content of the general and assessedcurriculum), content exposure (i.e., time spenton the general curriculum), content emphasis(i.e., skill emphasis, cognitive process expec-tations, instructional groupings), and quality ofinstructional delivery (i.e., instructional strate-gies). Unfortunately, Stevens and Grymes did notdevelop an empirical program of research on thebasis of this framework. Nonetheless, Stevens(1996) identified a critical gap in the OTL lit-erature, namely that most “opportunity to learnstudies looked at [only] one variable at a time”(p. 4).

The scarcity of research studies used to inves-tigate all three instructional dimensions of OTLis unfortunate because neither aspect of OTL canoccur in isolation for all practical intents andpurposes. That is, instructional content enactedby a teacher always unfolds along (at least)two additional dimensions: time and quality. Forexample, a teacher’s instruction is not adequatelycaptured by referring solely to the content ofinstruction such as solving algebraic equations.In actuality, the teacher may have asked studentsto apply strategies related to solving algebraicequations for 15 min in small groups while pro-viding guided feedback. The different sets ofitalicized words refer to various aspects of OTL –time, content, and quality of instruction – thathave to occur in conjunction with one anotherwhenever instruction is enacted by a teacher.Figure 6.3 depicts a model of the three instruc-tional dimensions of OTL. The x-axis representstime on instruction (i.e., the amount of timespent on teaching the intended curriculum); they-axis represents the content of instruction (i.e.,the extent to which the content of instructionis aligned with the intended curriculum); andthe z-axis represents the quality of instruction

Page 130: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

114 A. Kurz

Quality (z)

Time (x)

Content (y)

Fig. 6.3 The instructionaldimensions of OTL

(i.e., the extent to which a teacher’s instructionalpractices are evidence based). At the enacted cur-riculum level, almost all instruction should unfoldalong all three dimensions. Instruction restrictedto the time–content plane would indicate that theteacher delivered content of the intended cur-riculum without relying on any evidence-basedinstructional practices. Instruction restricted tothe time–quality plane would indicate thatthe teacher used evidence-based instructionalpractices to deliver content, yet the enactedcontent failed to be aligned with the student’sintended curriculum. Lastly, the content–qualityplane would only occur at the planned curricu-lum level. Although the suggested continuumsfor the three different dimensions of OTL areconsistent with the aforementioned literature, themodel is, at least for now, intended to serveillustrative purposes. That is, the model can beused as a framework to categorize measurementoptions related to each instructional dimensionof OTL.

Measurement of OTL

The measurement of OTL at the enacted cur-riculum level has historically relied on threebasic methods: direct observation, teacher report,

and document analysis. That is, the instructionaldimensions of OTL related to time, content,and quality can be operationalized and subse-quently documented using (a) observers whoconduct classroom observations or code video-taped lessons, (b) teachers who self-report ontheir classroom instruction via annual surveysor daily logs, or (c) experts who are trainedto review classroom documents such as text-books, assessments, and other student products.Third-party observations and teacher surveys areby far the most frequently used methods, eachwith a unique set of advantages and challenges(Rowan & Correnti, 2009).

Third-party observations are often consid-ered the “gold standard” for classroom research,but the high costs associated with this methodlimit its large-scale application outside well-funded studies for purposes of documentingOTL. Moreover, the complexity and variabilityof classroom instruction across the school year(Jackson, 1990; Rogosa et al., 1984) raise thequestion of generalizability: How many observa-tions are necessary to generalize to a teacher’sentire enacted curriculum? Annual surveys, onthe other hand, are relatively inexpensive but relyexclusively on teacher memory for the accuraterecall of the enacted curriculum. To address thesemeasurement challenges, Rowan and colleagues

Page 131: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

6 Students’ Opportunity to Learn 115

suggested a third alternative, namely the use offrequently administered teacher logs (see Rowanet al., 2004). Teacher logs are intended to (a)reduce a teacher’s response burden by focusingon a discreet set of behaviors, (b) increase accu-racy of teacher recall by focusing on a recenttime period (e.g., today’s lesson), and (c) increasegeneralizability through frequent (cost-effective)administrations across the school year.

As part of their Reform Up Close study,Porter et al. (1993) used a variety of methodsto collect data on teachers’ enacted curriculumincluding daily logs, weekly surveys, classroomobservations, and questionnaires. The agreementbetween classroom observations and teacher logdata (calculated on each observation pair andaveraged over all pairs) along four dimensions –broad content area (A), subskills within broadcontent area (AB), delivery of content (C), andcognitive demand (D) – was 0.78, 0.68, 0.67, and0.59, respectively. Porter et al. also noted sig-nificant correlations between log data and ques-tionnaire data on dimension (A) of 0.50 to 0.93in mathematics and of 0.61 to 0.88 in science(as cited in Smithson & Porter, 1994). In 2002,Porter argued that a number of studies investi-gating the validity of survey data have confirmedthat “survey data is excellent for describing quan-tity – for example, what content is taught andfor how long – but not as good for describingquality – for example, how well particular con-tent is taught” (p. 9). For purposes of validatingteacher logs, Camburn and Barnes (2004) dis-cussed the challenges related to reaching (inter-rater) agreement as one of their validation strate-gies including rater background, type of instruc-tional content, level of detail (e.g., subskills)associated with content, and frequency of occur-rence. On the basis of their statistical results,Camburn and Barnes expressed confidence inteacher logs to measure instruction at grosser lev-els of detail and for activities that occurred morefrequently. Rowan and Correnti (2009) concludedthat teacher logs are (a) “far more trustwor-thy” than annual surveys to determine the fre-quency with which particular content and instruc-tional practices are enacted and (b) yield “nearlyequivalent” data to what would be gathered via

trained observers. That being said, classroomobservations are presently unrivaled in determin-ing aspects of child-instruction or teacher–childinteractions (e.g., Connor, Morrison, et al., 2009;Pianta & Hamre, 2009).

The measurement of the enacted curriculumhas attracted much research attention in recentyears, as evidenced by two special issues dedi-cated to “opening up the black box” of classroominstruction: the September 2004 issue of theElementary School Journal and the March 2009issue of Educational Researcher. A thoroughreview of available measurement tools, however,is beyond the scope of this chapter. Instead, I situ-ate and discuss four guiding questions originallyposed by Douglas (2009) for purposes of measur-ing classroom instruction in the context of OTL:What should we measure in classroom instruc-tion? How can we best analyze data on classroominstruction? At what level should we measureclassroom instruction? What tools can we use tomeasure classroom instruction?

The first question challenges researchers toprovide a (theoretical and/or empirical) frame-work for selecting measurement variables ofinterest and for understanding their relation tothe overall construct in question. With respectto OTL, the argument presented in this chap-ter suggests three instructional dimensions atthe enacted curriculum level for purposes ofdocumenting students’ opportunity to learn theintended curriculum. The ICM framework, areview of three distinct research strands relatedto OTL at the enacted curriculum level, and asubsequent instructional dimensions model pro-vided the theoretical and empirical underpinningsfor this argument. The general answer to “what”should be measured for purposes documentingOTL is thus: time on instruction, content ofinstruction, and quality of instruction. Dependingon the specific framework, research context, andquestions asked, stakeholders may define theseinstructional dimensions differently. The modelintroduced in this chapter, for example, positedthree different continua: time on instruction as theamount of time spent on teaching the intendedcurriculum; content of instruction as the extentto which the content of instruction is aligned

Page 132: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

116 A. Kurz

with the intended curriculum; and quality ofinstruction as the extent to which a teacher’sinstructional practices are evidence based.

The second question points to the nestingof classroom instruction and the importanceof variance decomposition models in evaluat-ing the effects of classroom instruction on stu-dent achievement. Scheerens and Bosker’s (1997)review of the literature indicated that variance instudent achievement status (without controllingfor prior achievement and SES) can be decom-posed as follows: About 15–20% of the variancelies among schools; another 15–20% of the vari-ance lies among classrooms within schools; andabout 60–70% of the variance lies among stu-dents within classroom within schools. Scheerensand Bosker, however, used an unconditionalmodel (i.e., no independent variables were usedto predict student achievement). For their anal-yses of achievement data, Rowan, Correnti, andMiller (2002) also used a three-level hierarchicallinear model but included covariates at each level(i.e., prior achievement, home and social back-ground, social composition of schools). Theirresults indicated that about 4–16% of the vari-ance in students’ reading achievement and about8–18% of students’ mathematics achievement lieamong classrooms (depending on grade level).Although theses studies support the methodolog-ical appropriateness of using multi-level modelsin the measurement of OTL, which is ultimatelya teacher effect, several analysts have challengedthe adequacy of covariate adjustment models tomodel changes in student achievement (Rogosa,1995; Stoolmiller & Bank, 1995). The evalu-ation of teacher effects on students’ academicgrowth via gain score as the outcome variable,however, has its own set of unique challengesespecially when differences among students onacademic growth are rather small (see Rowanet al., 2002). Nonetheless, researchers can selectfrom many options within multi-level modelingthat can account for the unique nesting of theenacted curriculum and its relation to studentachievement. A cross-classified random effectsmodel (Raudenbush & Bryk, 2002), for example,can account for a situation (common in spe-cial education) in which lower-level units are

cross-classified by two or more higher-level units(e.g., a students’ sources of OTL can come fromdifferent teachers nested within different class-rooms). In short, multi-level analysis is an invalu-able tool for evaluating the effects of OTL onstudent achievement by portioning true variancefrom error variance and for modeling interactionsacross time, students, classrooms, and schools(Douglas, 2009).

The third question is also related to the nestednature of OTL and asks researchers to considerhow to locate and sample for OTL. One ofthe first challenges is to decide the number ofmeasurement points for purposes of document-ing OTL at the enacted curriculum level. Rowanand Correnti (2009), who used daily teacherlogs to measure different aspects of a teacher’senacted curriculum, decomposed variance in timespent on reading/language arts instruction intothree levels: time on instruction on a given day(Level 1), days nested within teachers (Level2), and teachers nested within schools (Level 3).Their results on the basis of about 2,000 teachers,who logged approximately 75,000 days, indi-cated that approximately 72% of the variance ininstructional time lies among days, about 23%lies among teachers within schools, and about5% lies among schools. In other words, time oninstruction can vary significantly from day to day:“the average teacher in the [study] provided stu-dents with about 80 min of reading/language artsinstruction per day, but the standard deviation ofinstructional time across days for a given teacherwas 45 min, with 15% of all days including 0 minof reading/language arts instruction!” (Rowan &Correnti, 2009, p. 123). This wide variability ofclassroom instruction around key instructionaldimensions of OTL seems to suggest a fairlylarge number of measurement points for pur-poses of reliably discriminating among teachers.Rowan and Correnti suggested that about 20 logsper year are optimal (with diminishing returnsthereafter), if the measurement goal is to reliablydiscriminate among teachers.

In addition to day-to-day variability, Connor,Morrison, et al. (2009), who used an observa-tional measure, reported that different studentsnested within the same class may be experiencing

Page 133: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

6 Students’ Opportunity to Learn 117

different amounts and types of instruction. Thisissue points to the appropriate measurement levelof OTL: Should it be documented at the studentlevel or the classroom level? Most of the teacher-report measures mentioned in the previous liter-ature review were used to collect information onthe enacted curriculum at the class level. Givenempirical evidence of significant variation alongkey instructional dimensions of OTL for studentswithin the same class and the theoretical model ofthe ICM, measurement of OTL restricted to theclassroom level does not appear to be sufficient.That is, data on classroom-level OTL cannot begeneralized to individual students and, in the caseof students with disabilities, cannot yield infor-mation on the extent to which students had theopportunity to learn their specific intended cur-riculum (which presumably varies from studentto student).

Croninger and Valli (2009) identified addi-tional challenges related to the variability ofinstruction, namely the sources and boundariesof (reading) instruction. Results from their five-year longitudinal study of teaching in schools

of poverty indicated that only one-third of stu-dents experienced no shared instruction. Thatis, the majority of students received readinginstruction from multiple sources in one or morelocations. Corninger and Valli noted that “themost prevalent form experienced by students wassimultaneous instruction involving an instruc-tional assistant (30%), student teacher (17%),staff developer/resource teachers (15%), and/orin-class help assigned specifically to them (8%)”(p. 105). Moreover, nearly 20% of studentsreceived additional reading instruction outsideclassrooms. Croninger and Valli further notedthat many students experienced more readinginstruction outside their scheduled reading classthan during their scheduled lesson. These find-ings underscore an important measurement chal-lenge, namely to account for all sources ofinstruction that contribute to a student’s opportu-nity to learn his or her intended curriculum. Thisissue is particularly relevant for students with dis-abilities who are likely to share multiple sourcesof (reading or mathematics) instruction such asa special education teacher, a general education

Table 6.1 Taxonomy of instructional sources and scenarios for students with disabilities

Source of instruction Instructional scenario

GenED

Target student receives instruction exclusively from a GenED teacher.� Full inclusion

SPED

Target student receives instruction exclusively from a SPED teacher.� Pull-out, resource

GenED/SPED

Target student receives instruction from a GenED and SPED teacher.� Co-teaching, collaboration

GenED + SPED

Target student receives instruction separately from a GenED teacher anda SPED teacher.

� Full inclusion plus pull-out/resource

GenED/SPED + SPED

Target student receives instruction from a GenED and SPED teacher andadditionally from a SPED teacher.

� Co-teaching/collaboration plus pull-out/resource

Instruction by general education teacher(GendED)

Instruction by special educationteacher (SPED)

Target student

Page 134: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

118 A. Kurz

teacher, a teaching assistant, and/or related ser-vices providers. Table 6.1 provides a taxonomythat illustrates several instructional sources andscenarios for students with disabilities. Clearly,measurement of a student’s opportunity to learnthe intended curriculum is unlikely to be ade-quately captured via one teacher’s instructionduring one class period at the class level. Withall the aforementioned measurement challengesin mind, we are now ready to explore somemeasurement options for the three instructionaldimensions of OTL.

Options for Measurement

A selection of available measurement tools withinformation on their stated purpose, generalmethod, and emphasized instructional dimen-sion(s) of OTL are highlighted in Table 6.2.Additional information includes theoreticalgrounding, instructional target (i.e., class level,student level), psychometric support, and cost(i.e., high, medium, or low investment of timeand funds). This table is intended to assistreaders in their initial search for potential mea-sures of OTL along several key criteria. Thedescription of each measure is anchored in onemain literature source; readers are thereforeadvised to consult the entire body of literaturerelated to each measure for purposes of makinga final selection. Each measure will be discussedbriefly.

The Individualizing Student InstructionObservation and Coding System (ISI; Connor,Morrison, et al., 2009) is an observational mea-sure designed to capture literacy instruction (K-3)and the extent to which teachers are matchingtheir instruction to students’ assessed skill levels.This observational measure is connected to alarger program of intervention research focusedon Child x Instruction interactions and the effectsof timing of instructional activities (e.g., Connor,Piasta, et al., 2009). Connor and colleagueshave developed a comprehensive model of thelearning environment that incorporates studentcharacteristics and skills, multiple dimensionsof the classroom environment (i.e., teacher

characteristics), and multiple dimensions ofinstruction (see Connor, Morrison, et al., 2009).As indicated in the table, Connor et al. identifiedfour, simultaneously operating, dimensions ofinstruction relevant to Child x Instruction inter-actions: (a) management of students’ attention,(b) context of the activity, (c) content of theactivity, and (d) duration of the activity. Withinthe context of OTL, it is important to note thatthe measure is restricted to literacy content atspecific grade levels (K-3). Although the contentcodes are presumably expandable, they arecurrently not set up to reflect student-specificintended curricula (including state or district-specific general curricula). The ISI is targetedat the individual student level, yet context codescan account for instruction at the classroom level.

The Classroom Assessment Scoring System(CLASS; Pianta & Hamre, 2009) is a stan-dardized observation system to assess globalclassroom quality via indicators in three broaddomains: (a) emotional supports (e.g., affect,negativity, responsiveness, flexibility); (b) class-room organization (e.g., clear expectations, max-imized time use, variety); and (c) instructionalsupports (e.g., creativity, feedback loops, conver-sation). The CLASS is focused on teacher–childinteractions at the indicator level, which are oper-ationalized for purposes of rating-specific behav-iors and interaction patterns within a 30-min timeframe. Although the CLASS can only provideinformation on the quality dimension of OTL, itdoes so across a wide grade spectrum and withoutbeing restricted to a specific subject. However,the CLASS cannot be used to discern differencesin instructional quality at the individual studentlevel.

The Study of Instructional Improvement (SII)logs (Rowan et al., 2004) are self-report mea-sures that require teachers to report on thetime, content, and quality of their instructionfor a particular student and day. The logs areavailable for the content areas of language artsand mathematics at the elementary school level.For the time dimension, teachers are asked toreport on the number of minutes a particu-lar student has received instruction including inanother classroom or by another teacher. For the

Page 135: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

6 Students’ Opportunity to Learn 119

Tab

le6

.2M

easu

rem

ento

ptio

nsfo

rth

ein

stru

ctio

nald

imen

sion

sof

OT

L

Mea

sure

Purp

ose

Met

hod

Tim

eC

onte

ntQ

uali

tyT

heor

etic

algr

ound

ing

Inst

r.ta

rget

Rel

iabi

lity/

valid

ityev

iden

ceC

ost

Indi

vidu

aliz

ing

Stud

ent

Inst

ruct

ion

(ISI

)ob

serv

atio

nan

dco

ding

syst

em(e

.g.,

Con

nor

etal

.,20

09)

�To

capt

ure

liter

acy

inst

ruct

ion

(K-3

)�

Toex

amin

ew

heth

erte

ache

rsar

em

atch

ing

the

cont

ento

fth

elit

erac

yin

stru

ctio

nto

stud

ents

’as

sess

edsk

illle

vels

�D

irec

tob

serv

atio

nto

com

plet

efie

ldno

tes

�V

ideo

tapi

ngto

code

inst

ruct

ion

�4–

8fu

ll-da

yob

serv

atio

ns/ta

ping

s

�D

urat

ion

(sec

)�

Con

tent

(i.e

.,co

defo

cuse

da ,m

eani

ngfo

cuse

db)

�M

anag

emen

t(i.e

.,te

ache

r/ch

ildm

anag

ed,p

eer/

child

man

aged

)�

Con

text

(i.e

.,w

hole

clas

s,sm

all

grou

p/pa

ir,

indi

vidu

al)

�C

lass

room

envi

ronm

ent

mod

elc

�E

colo

gica

lm

odel

d

�St

uden

t�

Inte

rrat

erre

liabi

lity

(e.g

.,K

appa

)�

Pred

ictiv

eva

lidity

(e.g

.,lit

erac

ysk

illga

ins)

Hig

h(i

.e.,

trai

ned

obse

rver

s,tw

ovi

deo

cam

eras

,tim

efo

rco

ding

)

Cla

ssro

omA

sses

smen

tSc

orin

gSy

stem

(CL

ASS

;e.g

.,Pi

anta

,La

Paro

,&

Ham

re,

2008

)

�To

mea

sure

glob

alcl

assr

oom

qual

ity(p

re-K

-12)

�D

irec

tob

serv

atio

n(i

.e.,

time-

sam

pled

code

s,gl

obal

ratin

gs)

�A

ppro

x.6

obse

rvat

ions

(30

min

each

)

N/A

N/A

�E

mot

iona

lsup

port

s(i

.e.,

posi

tive

clas

sroo

mcl

imat

e,te

ache

rse

nsiti

vity

,re

gard

for

stud

ent

pers

pect

ives

)�

Cla

ssro

omor

gani

zatio

n(i

.e.,

effe

ctiv

ebe

havi

orm

anag

emen

t,pr

oduc

tivity

,in

stru

ctio

nal

lear

ning

form

ats)

�In

stru

ctio

nal

supp

orts

(i.e

.,co

ncep

tde

velo

pmen

t,qu

ality

offe

edba

ck,

lang

uage

mod

elin

g)

�L

itera

ture

oncl

assr

oom

teac

hing

and

educ

atio

nal

effe

ctiv

enes

s

�C

lass

�In

terr

ater

relia

bilit

y�

Con

stru

ctva

lidity

(e.g

.,fa

ctor

anal

ysis

)�

Cri

teri

onva

lidity

�Pr

edic

tive

valid

ity(e

.g.,

achi

evem

ent

gain

s)

Hig

h(i

.e.,

trai

ned

obse

rver

s)

Page 136: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

120 A. Kurz

Tab

le6

.2(c

ontin

ued)

Mea

sure

Purp

ose

Met

hod

Tim

eC

onte

ntQ

uali

tyT

heor

etic

algr

ound

ing

Inst

r.ta

rget

Rel

iabi

lity/

valid

ityev

iden

ceC

ost

Stud

yof

Inst

ruct

iona

lIm

prov

emen

t(S

II)

logs

(e.g

.,R

owan

,C

ambu

rn,&

Cor

rent

i,20

04)

�To

mea

sure

freq

uenc

ies

ofin

stru

ctio

nala

ctiv

ities

(1–5

)�

Toev

alua

tesc

hool

refo

rmef

fort

s

�Te

ache

rlo

g(a

bout

100

item

sfo

rap

prox

.45

min

)�

App

rox.

20lo

gs

�D

urat

ion

(min

)�

Bro

adco

nten

tst

rand

sw

ithsu

bski

lls/

activ

ities

�C

ogni

tive

dem

and

�M

ater

ials

,stu

dent

resp

onse

s

OT

Llit

erat

ure

�St

uden

t�

Inte

rrat

erre

liabi

lity

�C

onst

ruct

valid

ity�

Pred

ictiv

eva

lidity

(e.g

.,ac

hiev

emen

tga

ins)

Low

(i.e

.,tr

aini

ng,

teac

her

ince

ntiv

es,

scor

ing)

Surv

eys

ofth

eE

nact

edC

urri

culu

m(S

EC

;Por

ter,

2002

)

�To

mea

sure

topi

csan

dco

gniti

vede

man

dof

inst

ruct

ion

�To

eval

uate

alig

nmen

tbe

twee

nen

acte

d,in

tend

ed,a

ndas

sess

edcu

rric

ula

�Te

ache

rsu

rvey

�E

nd-o

f-ye

arsu

rvey

(90–

120

min

)

N/A

�B

road

cont

ent

stra

nds

with

subs

kills

�C

ogni

tive

dem

and

�In

stru

ctio

nal

prac

tices

surv

ey(e

.g.,

hom

ewor

k,in

stru

ctio

nal

activ

ities

,te

chno

logy

)

�O

TL

liter

atur

e�

Cla

ss�

Inte

rrat

erre

liabi

lity

�C

onst

ruct

valid

ity�

Pred

ictiv

eva

lidity

(e.g

.,ac

hiev

emen

tga

ins)

Low

(i.e

.,tr

aini

ng,

teac

her

ince

ntiv

es)

My

Inst

ruct

iona

lL

earn

ing

Opp

ortu

nitie

sG

uida

nce

Syst

em(M

yiL

OG

S;K

urz,

Elli

ott,

&Sh

rago

,200

9)

�To

mea

sure

oppo

rtun

ityto

lear

nth

ein

tend

edcu

rric

ulum

�To

eval

uate

alig

nmen

tbe

twee

npl

anne

d,en

acte

d,in

tend

ed,a

ndas

sess

edcu

rric

ula

�To

eval

uate

diff

eren

tiatio

nin

stru

ctio

n

�Te

ache

rlo

g�

Dai

lyca

lend

ar(2

–5m

in)

�A

ppro

x.20

inst

ruct

ion

deta

ils(2

–5m

in)

�D

urat

ion

(min

)�

Bro

adco

nten

tst

rand

sw

ithsu

bski

lls�

Cus

tom

curr

icul

a

�C

ogni

tive

dem

and

�In

stru

ctio

nal

prac

tices

�C

onte

xt�

Eng

agem

ent

�G

oala

ttain

men

t

�O

TL

liter

atur

e�

Inte

nded

curr

icul

umm

odel

�St

uden

t�

Cla

ssN

/A(I

npl

anni

ng)

Low

(i.e

.,tr

aini

ng)

a Cod

e-re

late

dlit

erac

yin

stru

ctio

n(e

.g.,

phon

olog

ical

awar

enes

s,ph

onic

s,le

tter

and

wor

dflu

ency

)bA

ctiv

eex

trac

tion

and

cons

truc

tion

ofm

eani

ngfr

omte

xt(e

.g.,

oral

lang

uage

,voc

abul

ary,

com

preh

ensi

on,t

extfl

uenc

y,w

ritin

g)c T

hecl

assr

oom

envi

ronm

entm

odel

purp

orts

that

stud

ents

’re

adin

gou

tcom

esar

em

ultip

lyde

term

ined

byst

uden

tcha

ract

eris

tics

(e.g

.,la

ngua

ge,s

elf-

regu

latio

n,ho

me

supp

ort)

,cl

assr

oom

char

acte

rist

ics

(e.g

.,w

arm

th/r

espo

nsiv

enes

s,te

ache

r’s

cont

enta

rea

know

ledg

e),i

nstr

uctio

nalc

hara

cter

istic

s(i

.e.,

dura

tion,

cont

ent,

man

agem

ent,

cont

ext)

dT

heec

olog

ical

mod

elpu

rpor

tsth

atch

ildre

n’s

deve

lopm

enti

saf

fect

edby

both

prox

imal

(e.g

.,te

ache

r)an

ddi

stal

sour

ces

ofin

fluen

ce(e

.g.,

com

mun

ity)a

ndth

atth

ese

influ

ence

san

dth

eir

asso

ciat

ions

with

deve

lopm

entm

aych

ange

over

time

(Mor

riso

n,B

achm

an,&

Con

nor,

2005

)

Page 137: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

6 Students’ Opportunity to Learn 121

content dimension, teachers are asked to indi-cate their content foci along several broad contenttopics (i.e., major focus, minor focus, touchedon briefly, not taught). The quality dimensionis addressed along specific enacted languagearts (i.e., comprehension, writing, word analysis)and mathematics topics (i.e., number concepts,operations, patterns/functions/algebra) via ques-tions related to cognitive demands, materials,and student responses. With regard to theory, thedevelopment of the SII logs is grounded in theaforementioned OTL literature, which explainstheir good match to the desired OTL dimension. Itshould be noted, however, that the content dimen-sion of the SII logs is not reflective of individualintended curricula. In fact, their content selectionis more appropriately described as a core contentselection within elementary school language artsand mathematics (see Rowan & Correnti, 2009).Lastly, the cost of administering SII logs is rela-tively low. Rowan and Correnti mentioned a lessthan $30 per log cost (not including training).

The Surveys of the Enacted Curriculum (SEC;Porter, 2002) are annual teacher surveys designedto provide information on the alignment betweenintended, enacted, and assessed curricula. TheSEC method relies on content translations byteachers (for purposes of the enacted curricu-lum) or curriculum experts (for purposes of theintended and assessed curriculum) who code aparticular curriculum (i.e., classroom instruction,state test, or state standards) into a content frame-work that features a comprehensive list of top-ics in mathematics, English/language arts, andscience (K-12). The topic list is exhaustive toaccommodate different levels of content speci-ficity. For the enacted curriculum, teachers indi-cate content coverage and cognitive demand forapproximately 200 topics. Content coverage isindicated via relative time emphases (i.e., none,slight, moderate, sustained). Time on instruc-tion is thus not assessed directly via the SEC.To gather additional information on instructionalquality (besides cognitive demand), teachers arerequired to complete an additional survey oninstructional practices (i.e., activities, homework,technology use, assessments, teacher opinions,and other characteristics). The SEC is available

online (www.seconline.org), which allows teach-ers who complete the survey online to reviewgraphical representations of their content cov-erage. In addition, the SEC website can calcu-late content overlap between two curricula viaan AI (see Porter, 2002). The SEC can thusprovide information on the content and qualitydimensions of OTL. This information, however,is limited to the classroom level. The graphicalreports further allow teachers to review their self-reported content coverage, which holds potentialfor formative feedback. Figure 6.4 displays theonline SEC’s graphical output of two contentmatrices taken from the Kurz et al. (2010) study.

The content map on the left represents ateacher’s enacted curriculum, and the contentmap to the right represents a state’s eight-grade general curriculum in mathematics. TheAI between the two content maps is 0.05 (seeKurz et al. for details). A review of these contentmaps can allow teachers to pinpoint differencesbetween emphasized content topics and cogni-tive demands of the state’s general curriculumand their own enacted curriculum. This teacher,for example, emphasized the topics of NumberSense and Basic Algebra. However, theses top-ics were not taught at the intended category ofcognitive demand (i.e., demonstrate understand-ing). Moreover, the teacher did not place theintended instructional emphases on the topicsof Measurement, Geometric Concepts, and DataDisplays. Although these charts are capable ofdelivering formative feedback, the SEC is typ-ically completed at the end of the school year,which limits its formative benefit to teachers.

My Instructional Learning OpportunitiesGuidance System (MyiLOGS; Kurz, Elliott, &Shrago, 2009) is an online teacher tool (www.myilogs.com) designed to assist teachers withthe planning and implementation of intendedcurricula at the class and individual student level.The development of the measure was groundedin the literature and models discussed in thischapter and thus can be used to document allthree instructional dimension of OTL. The toolfeatures state-specific general curricula for var-ious subjects and additional customizable skillsthat allow special education teachers and other

Page 138: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

122 A. Kurz

Alignment Index: 0.05 Select New View MarginalsTN Stnds Gr. 8

DocumentAll Content Areas

Coarse Grain: 0.1029568 Gr.8 [2006]Grade 8All Content Areas

Number Sense

Operations

Measurement

Consumer Applications

Basic Algebra

Advanced Algebra

Geometric Concepts

Advanced Geometry

Data Displays

Statistics

Probability

Analysis

Trigonometry

Special Topics

Functions

Non-routine

Conjecture

Demonstrate

Procedures

Mem

orize

Non-routine

Conjecture

Demonstrate

Procedures

Mem

orize

Instructional Tech.

Number Sense

Operations

Measurement

Consumer Applications

Basic Algebra

Advanced Algebra

Geometric Concepts

Advanced Geometry

Data Displays

Statistics

Probability

Analysis

Trigonometry

Special Topics

Functions

Instructional Tech.

Fig. 6.4 Content map example of an enacted and general curriculum matrix from the SEC

related services providers to add student-specificobjectives. The tool therefore permits teachersto document the extent to which their classroominstruction covers individualized intended cur-ricula. To this end, MyiLOGS provides teacherswith an instructional calendar that features anexpandable sidebar, which lists the skills thatcomprise the intended curriculum. Teachers thendrag and drop planned skills onto the respectivecalendar days and indicate the number of minutesattached to each skill. After the lesson, teachersconfirm enacted skills and respective times atthe class level. On a user-specified sample ofdays, teachers are further asked to report onadditional time emphases (in minutes) related tothe (planned and enacted) skills listed on the cal-endar including cognitive demand, instructionalgroupings, use of evidence-based instructionalpractices (e.g., use of heuristics, explicit instruc-tion), engagement, and goal attainment. This

detailed reporting occurs at the classroom leveland student level. Teachers can further review arange of graphic reports and tables that providedetailed information on their enacted curriculumand its relation to the intended curriculum.Reports are available for the entire class andindividual students. Psychometric evidence, suchas predictive and criterion-related validity, iscurrently being collected as part of a three-statefield test. Figure 6.5 provides an example of atime allocation report. This time allocation reportis considered cumulative because it is basedon the total amount of minutes dedicated toinstruction as reported by the teacher on a dailybasis.

The top graphic in Fig. 6.5 displays ateacher’s time allocation across all five mathe-matics content strands prescribed by the state ofPennsylvania. In addition, the top graphic dis-plays the amount of instructional time dedicated

Page 139: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

6 Students’ Opportunity to Learn 123

Cumulative Time Allocationin Mathematics

Subskills of M8.A Numbers/Operations

Fig. 6.5 Example of a time allocation report from MyiLOGS

to Custom Skills/Activities, which allows teach-ers to document skills related to a student’sIEP curriculum as well as skills or activi-ties that fall outside the state’s general cur-riculum for mathematics. This teacher, forexample, dedicated over half of his or herinstructional time to Numbers/Operations andCustom Skills/Activities. The bottom graphicof Fig. 6.5 displays the breakdown for theNumbers/Operations strand into its respectivesubskills. An example of a cognitive processdimensions report is shown in Fig. 6.6.

This report is based a sample of 77 days onwhich the teacher decided to report on instruc-tional details at class and student level. The topand bottom graphics allow a teacher to review(a) his or her time emphases on different cog-nitive process dimensions at the class level and

(b) the extent to which differentiated instructiontook place for two target students. In this exam-ple, the teacher’s expectations largely focusedon the Understand/Apply dimension at the classlevel. Kayla received very comparable instruc-tion. James, on the other hand, received a muchlarger emphasis on the Remember dimension. Inaddition, James experienced much less time oninstruction (i.e., about 25% were not available forinstruction) than the rest of the class. Togetherwith additional tables and reports, MyiLOGSis intended to provide teachers with formativefeedback that can enhance instructional focus,alignment, and access to the general curriculum.

The present selection of measurement toolsrelated to the three instructional dimensions ofOTL at the enacted curriculum level indicatesa diverse set of options, each method with

Page 140: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

124 A. Kurz

Cognitive Processes: 77 Day Sample

Fulsom, James: 77 Day Sample Palko, Kayla: 77 Day Sample

Time in Hours Cumulative

Understand/Apply 61.8%

Remember 5.2%

Time Not Available 7.2%

Create 2.4%

Analyze/Evaluate 23.4%

Understand/Apply 46.9%Understand/Apply 58.4%

Analyze/Evaluate 11.4%Analyze/Evaluate 24.6%Create 1.4%

Create 2.5%

Time Not Available 25%

Time Not Avail

Remember 15.1%Remember 6.9%

Fig. 6.6 Example of cognitive process dimensions report from MyiLOGS

its unique benefits and drawbacks. The relativeweights of issues such as observer reliability,reactivity, teacher memory, and cost can only beassigned after the proper research context andquestions have been established. However, evenafter those issues are settled, it is important toremember that “all methods are to some extentlimited in scope with regard to measuring themultifaceted nature of the classroom” (Pianta &Hamre, 2009).

The Future of OTL

In 1995, Schmidt and McKnight declared that“there is a story to be told . . . a story aboutchildren and also about curricula – curricula

transforming national visions and aims into inten-tions that shape children’s opportunities for learn-ing through schooling” (p. 346). This chapter pro-vided a new framework that delineated these cur-ricula and their mediated relations to the intendedcurriculum. The concept of OTL was redefinedwithin that framework, which posited that thetransformative powers of OTL are the greatestat the enacted curriculum level where “inten-tions” become measurable teacher actions. Themeasurement of students’ opportunity to learnthe intended curriculum via three instructionaldimensions of the enacted curriculum, however,cannot be the sole direction for future researchand work on OTL. Although the conceptual andsubstantive relevance of OTL may be sufficient tosustain the concept and its measurement for some

Page 141: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

6 Students’ Opportunity to Learn 125

time, the prospects of OTL lie in the usefulnessof OTL data to assist stakeholders in developinginterventions and making meaningful changes ininstruction that increase learning opportunitiesfor all students.

Teacher actions at the enacted curriculum leveldetermine the extent to which students havethe opportunity to learn the intended curricu-lum. Empirical evidence indicates that teacheractions along key instructional dimensions –time on instruction, content of instruction, andquality of instruction – impact student outcomes.Documenting these actions thus should not be anend in itself. Rather than “admiring the problem,”measurement of OTL should function as a meansto an end, namely to use the gathered informa-tion as feedback to modify teaching and learningactivities with the ultimate goal of increasingstudent outcomes. In short, the formative bene-fits of measuring OTL represent an exciting andpromising direction for future research and workon OTL.

Shepard, Hammerness, Darling-Hammond,and Rust (2005) argued that an assessmentbecomes formative when “[it] is carried out dur-ing the instructional process for the purpose ofimproving teaching and learning” (p. 275). Theextent to which measurement of OTL at theenacted curriculum can be integrated into theinstructional process and improve teaching andlearning remains an open empirical question.Some researchers have already begun to answerthis question by using measures highlighted in theprevious section in formative ways. Connor andcolleagues, for example, have developed a web-based software tool that incorporates algorithms,which prescribe amount and types of instruc-tion based on previously observed interactionsvia the ISI (e.g., Connor, Morrison, & Katch,2004; Connor, Morrison, et al., 2009; Connor,Morrison, & Underwood, 2007). Porter and col-leagues have used the graphical reports of theSEC for purposes of professional development,which resulted in greater alignment (between theenacted and intended curriculum) for the treat-ment group with an effect size of 0.36 (Porter,Smithson, Blank, & Zeidner, 2007). Lastly,Kurz and colleagues have developed an OTL

measurement tool that is to be used as an ongoingpart of the instructional process with continuousteacher feedback to assist with targeted instruc-tional changes at the classroom and studentlevel (Kurz, Elliott, & Shrago, 2009). Enhancingaccess to “what should be learned and will betested” has been the main topic of this chapter.To this end, OTL and its measurement hold greatpromise. And yet, the future of OTL ultimatelylies in our ability to deliver this promise for allstudents.

References

Abedi, J., Leon, S., & Kao, J. C. (2008). Examining dif-ferential item functioning in reading assessments forstudents with disabilities (CRESST Report No. 744).Los Angeles, CA: University of California, NationalCenter for Research on Evaluation, Standards, andStudent Testing.

Ahearn, E. (2006). Standards-based IEPs:Implementation in selected states. Alexandria,VA: National Association of State Directors of SpecialEducation.

American Educational Research Association, AmericanPsychological Association, & National Council onMeasurement in Education. (1999). Standards for edu-cational and psychological testing. Washington, DC:American Educational Research Association.

American with Disabilities Act, 42 U.S.C. §§ 12101 etseq. (1990).

Anderson, L. W. (1986). Opportunity to learn. In T. Husén& T. Postlethwaite (Eds.), International encyclopediaof education: Research and studies. (pp. 3682–3686).Oxford, UK: Pergamon.

Anderson, L. W. (2002). Curricular alignment: A re-examination. Theory into Practice, 41(4), 255–260.

Armbuster, B. B., Stevens, R. J., & Rosenshine, B. (1977).Analyzing content coverage and emphasis: A study ofthree curricula and two tests (Technical Report No.26). Urbana, IL: Center for the Study of Reading,University of Illinois.

Bloom, B. S. (1976). Human characteristics and schoollearning. New York: McGraw-Hill.

Borg, W. R. (1979). Teacher coverage of academic con-tent and pupil achievement. Journal of EducationalPsychology, 71(5), 635–645.

Borg, W. R. (1980). Time and school learning. In C.Denham & A. Lieberman (Eds.), Time to learn(pp. 33–72). Washington, DC: National Institute ofEducation.

Boscardin, C. K., Aguirre-Muñoz, Z., Chinen, M., Leon,S., & Shin, H. S. (2004). Consequences and valid-ity of performance assessment for English learners:Assessing opportunity to learn (OTL) in grade 6

Page 142: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

126 A. Kurz

language arts (CSE Report No. 635). Los Angeles:University of California, National Center for Researchon Evaluation, Standards, and Student Testing.

Brophy, J., & Good, T. L. (1986). Teacher behaviorand student achievement. In M. C. Wittrock (Ed.),Handbook of research on teaching (3rd ed., pp. 328–375). New York: Macmillian.

Burstein, L., & Winters, L. (1994, June). Models for col-lecting and using data on opportunity to lean at thestate level: OTL options for the CCSSO SCASS sci-ence assessment. Presented at the CCSSO NationalConference on Large-scale Assessment, Albuquerque,NM.

Camburn, E., & Barnes, C. A. (2004). Assessing thevalidity of a language arts instruction log throughtriangulation. Elementary School Journal, 105(1),49–73.

Carroll, J. B. (1963). A model of school learning. TeachersCollege Record, 64(8), 723–733.

Carroll, J. B. (1989). The Carroll model: A 25-year retrospective and prospective view. EducationalResearcher, 18(1), 26–31.

Comber, L. C., & Keeves, J. P. (1973). Science educationin nineteen countries. New York: Halsted Press.

Conderman, G., & Katsiyannis, A. (2002). Instructionalissues and practices in secondary special edu-cation. Remedial and Special Education, 23(3),169–179.

Connor, C. M., Morrison, F. J., & Katch, L. E. (2004).Beyond the reading wars: Exploring the effect of child-instruction interactions on growth in early reading.Scientific Studies of Reading, 8(4), 305–336.

Connor, C. M., Morrison, F. J., & Underwood, P. S.(2007). A second chance in second grade: The inde-pendent and cumulative impact of first- and second-grade reading instruction and students’ letter-wordreading skill growth. Scientific Studies of Reading,11(3), 199–233.

Connor, C. M., Morrison, F. J., Fishman, B. J., Ponitz,C. C., Glasney, S., Underwood, P. S., et al. (2009).The ISI classroom observation system: Examining theliteracy instruction provided to individual students.Educational Researcher, 38(2), 85–99.

Connor, C. M., Piasta, S. B., Fishman, B., Glasney,S., Schatschneider, C., Crowe, E., et al. (2009).Individualizing student instruction precisely: Effectsof child × instruction interactions on first graders’literacy development. Child Development, 80(1),77–100.

Cooley, W. W., & Leinhardt, G. (1980). The instructionaldimensions study. Educational Evaluation and PolicyAnalysis, 2(1), 7–25.

Croninger, R. G., & Valli, L. (2009). “Where is theaction?” Challenges to studying the teaching of read-ing in elementary classrooms. Educational Researcher,38(2), 100–108.

D’Agostino, J. V., Welsh, M. E., & Corson, N. M.(2007). Instructional sensitivity of a state’s standards-based assessment. Educational Assessment, 12(1),1–22.

Denham, C., & Lieberman, A. (Eds.). (1980). Timeto learn. Washington, DC: National Institute forEducation.

Douglas, K. (2009). Sharpening our focus in measuringclassroom instruction. Educational Researcher, 38(7),518–521.

Elliott, S. N., Kurz, A., & Neergaard, L. (in press). Large-scale assessment for educational accountability. In S.Graham, A. Bus, S. Major, & L. Swanson (Eds.),The handbook of educational psychology: Applicationof educational psychology to learning and teaching(Vol. 3). Washington, DC: American PsychologicalAssociation.

Fisher, C. W., & Berliner, D. C. (Eds.). (1985).Perspectives on instructional time. New York:Longman.

Fisher, C. W., Berliner, D. C., Filby, N. N., Marliave,R., Cahen, L. S., & Dishaw, M. M. (1980).Teaching behaviors, academic learning time,and student achievement: An overview. In C.Denham & A. Lieberman (Eds.), Time to learn(pp. 7–32). Washington, DC: National Institute ofEducation.

Fuchs, D., Fuchs, L. S., & Stecker, P. M. (2010). The“blurring” of special education in a new contin-uum of general education placements and services.Exceptional Children, 76(3), 301–323.

Gagné, R. M. (1977). The conditions of learning. Chicago:Holt, Rinehart & Winston.

Gamoran, A., Porter, A. C., Smithson, J., & White,P. A. (1997). Upgrading high school mathematicsinstruction: Improving learning opportunities for low-achieving, low-income youth. Educational Evaluationand Policy Analysis, 19(4), 325–338.

Gersten, R., Chard, D. J., Jayanthi, M., Baker, S.K., Morphy, P., & Flojo, J. (2009). Mathematicsinstruction for students with learning disabilities: Ameta-analysis of instructional components. Review ofEducational Research, 79(3), 1202–1242.

Gettinger, M., & Seibert, J. K. (2002). Best practices inincreasing academic learning time. In A. Thomas &J. Grimes (Eds.), Best practices in school psychol-ogy IV (Vol. 1, pp. 773–787). Bethesda, MD: NationalAssociation of School Psychologists.

Harnischfeger, A., & Wiley, D. E. (1976). The teaching–learning process in elementary schools: A synopticview. Curriculum Inquiry, 6(1), 5–43.

Herman, J. L., & Abedi, J. (2004). Issues in assess-ing English language learners’ opportunity to learnmathematics (CSE Report No. 633). Los Angeles:Center for the Study of Evaluation, National Centerfor Research on Evaluation, Standards, and StudentTesting.

Herman, J. L., Klein, D. C., & Abedi, J. (2000). Assessingstudents’ opportunity to learn: Teacher and studentperspectives. Educational Measurement: Issues andPractice, 19(4), 16–24.

Heubert, J. P. (2004). High-stakes testing in a changingenvironment: Disparate impact, opportunity to learn,and current legal protections. In S. H. Fuhrman &

Page 143: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

6 Students’ Opportunity to Learn 127

R. F. Elmore (Eds.), Redesigning accountability sys-tems for education. New York: Teachers College Press(pp. 220–242).

Husén, T. (1967). International study of achievement inmathematics: A comparison of twelve countries. NewYork: Wiley.

Individuals with Disabilities Education Act Amendments,20 U.S.C. §§ 1400 et seq. (1997).

Individuals with Disabilities Education Improvement Act,Amending 20 U.S.C. §§ 1400 et seq. (2004).

Jackson, P. W. (1990). Life in classrooms. New York:Teachers College Press.

Jenkins, J. R., & Pany, D. (1978). Curriculum biasesin reading achievement tests. Journal of ReadingBehavior, 10(4), 345–357.

Karger, J. (2005). Access to the general education cur-riculum for students with disabilities: A discussionof the interrelationship between IDEA and NCLB.Wakefield, MA: National Center on Accessing theGeneral Curriculum.

Kersaint, G., Lewis, J., Potter, R., & Meisels, G. (2007).Why teachers leave: Factors that influence retentionand resignation. Teaching and Teacher Education,23(6), 775–794.

Kurz, A., & Elliott, S. N. (2011). Overcoming barriers toaccess for students with disabilities: Testing accom-modations and beyond. In M. Russell (Ed.), Assessingstudents in the margins: Challenges, strategies, andtechniques (pp. 31–58). Charlotte, NC: InformationAge Publishing.

Kurz, A., Elliott, S. N., & Shrago, J. S. (2009). MyiLOGS:My instructional learning oppportunities guidancesystem. Nashville, TN: Vanderbilt University.

Kurz, A., Elliott, S. N., Wehby, J. H., & Smithson, J.L. (2010). Alignment of the intended, planned, andenacted curriculum in general and special educationand its relation to student achievement. Journal ofSpecial Education, 44(3), 1–20.

Linn, R. L. (2008). Educational accountability systems.In K. E. Ryan & L. A. Shepard (Eds.), The future oftest-based educational accountability (pp. 3–24). NewYork: Routledge.

Malmgren, K. W., McLaughlin, M. J., & Nolet, V. (2005).Accounting for the performance of students with dis-abilities on statewide assessments. The Journal ofSpecial Education, 39(2), 86–96.

Martone, A., & Sireci, S. G. (2009). Evaluating align-ment between curriculum, assessment, and instruction.Review of Educational Research, 79(4), 1332–1361.

Marzano, R. J. (2002). A new era of school reform: Goingwhere the research takes us (REL No. #RJ96006101).Aurora, CO: Mid-continent Research for Educationand Learning.

Mastropieri, M. A., & Scruggs, T. E. (2002). Effectiveinstruction for special education (3rd ed.). Austin, TX:PRO-ED.

McDonnell, L. M. (1995). Opportunity to learn as aresearch concept and a policy instrument. EducationalEvaluation and Policy Analysis, 17(3), 305–322.

McDonnell, L. M., McLaughlin, M. J., & Morison, P.(1997). Reform for one and all: Standards-basedreform and students with disabilities. Washington, DC:National Academy of Sciences Press.

McLaughlin, M. J. (1999). Access to the general educa-tion curriculum: Paperwork and procedures or redefin-ing “special education.” Journal of Special EducationLeadership, 12(1), 9–14.

McLaughlin, M. J. (2010). Evolving interpretations ofeducational equity and students with disabilities.Exceptional Children, 76(3), 265–278.

Mehrens, W. A., & Phillips, S. E. (1986). Detectingimpacts of curricular differences in achievement testdata. Journal of Educational Measurement, 23(3),185–196.

Metzker, B. (2003). Time and learning (ED474260).Eugene, OR: University of Oregon.

No Child Left Behind Act, 20 U.S.C. §§ 6301 et seq.(2001).

Pianta, R. C., & Hamre, B. K. (2009). Conceptualization,measurement, and improvement of classroom pro-cesses: Standardized observation can leverage capac-ity. Educational Researcher, 38(2), 109–119.

Pianta, R. C., Belsky, J., Houts, R., Morrison, F., &the National Institute of Child Health and HumanDevelopment. (2007). Teaching: Opportunities to learnin America’s elementary classrooms. Science, 315,1795–1796.

Polikoff, M. S. (2010). Instructional sensitivity as apsychometric property of assessments. EducationalMeasurement: Issues and Practice, 29(4), 3–14.

Porter, A. C. (1991). Creating a system of school pro-cess indicators. Educational Evaluation and PolicyAnalysis, 13(1), 13–29.

Porter, A. C. (1993). School delivery standards.Educational Researcher, 22(5), 24–30.

Porter, A. C. (1995). The uses and misuses of opportunity-to-learn standards. Educational Researcher, 24(1),21–27.

Porter, A. C. (2002). Measuring the content of instruction:Uses in research and practice. Educational Researcher,31(7), 3–14.

Porter, A. C. (2006). Curriculum assessment. In J. L.Green, G. Camilli, & P. B. Elmore (Eds.), Handbookof complementary methods in education research (pp.141–159). Mahwah, NJ: Lawrence Erlbaum.

Porter, A. C., & Smithson, J. L. (2001). Are content stan-dards being implemented in the classroom? A method-ology and some tentative answers. In S. Fuhrman(Ed.), From the Capitol to the classroom: Standards-based reform in the states. One Hundredth Yearbookof the National Society for the Study of Education (pp.60–80). Chicago: University of Chicago Press.

Porter, A. C., Kirst, M. W., Osthoff, E. J., Smithson, J. L.,& Schneider, S. A. (1993). Reform up close: An analy-sis of high school mathematics and science classrooms(Final Report). Madison, WI: University of Wisconsin,Wisconsin Center for Education Research.

Page 144: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

128 A. Kurz

Porter, A. C., Schmidt, W. H., Floden, R. E., & Freeman,D. J. (1978). Impact on what? The importance of con-tent covered (Research Series No. 2). East Lansing,MI: Michigan State University, Institute for Researchon Teaching.

Porter, A. C., Smithson, J., Blank, R., & Zeidner, T.(2007). Alignment as a teacher variable. AppliedMeasurement in Education, 20(1), 27–51.

Pullin, D. C. (2008). Assessment, equity, and opportunityto learn. In P. A. Moss, D. C. Pullin, J. P. Gee, E. H.Haertel, & L. J. Young (Eds.), Assessment, equity, andopportunity to learn. New York: Cambridge UniversityPress.

Pullin, D. C., & Haertel, E. H. (2008). Assessment throughthe lens of “opportunity to learn”. In P. A. Moss, D. C.Pullin, J. P. Gee, E. H. Haertel, & L. J. Young (Eds.),Assessment, equity, and opportunity to learn (pp. 17–41). New York: Cambridge University Press.

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchicallinear models: Applications and data analysis methods(2nd ed.). Thousand Oaks, CA: Sage.

Roach, A. T., & Elliott, S. N. (2006). The influence ofaccess to general education curriculum on alternateassessment performance of students with significantcognitive disabilities. Educational Evaluation andPolicy Analysis, 28(2), 181–194.

Roach, A. T., Chilungu, E. N., LaSalle, T. P., Talapatra,D., Vignieri, M. J., & Kurz, A. (2009). Opportunitiesand options for facilitating and evaluating access tothe general curriculum for students with disabilities.Peabody Journal of Education, 84(4), 511–528.

Roach, A. T., Niebling, B. C., & Kurz, A. (2008).Evaluating the alignment among curriculum, instruc-tion, and assessments: Implications and applicationsfor research and practice. Psychology in the Schools,45(2), 158–176.

Rogosa, D. (1995). Myths and methods: Myths about lon-gitudinal research plus supplemental questions. In J.M. Gottman (Ed.), The analysis of change (pp. 3–66).Mahwah, NJ: Lawrence Erlbaum.

Rogosa, D., Floden, R., & Willett, J. B. (1984).Assessing the stability of teacher behavior. Journal ofEducational Psychology, 76(6), 1000–1027.

Rowan, B., & Correnti, R. (2009). Studying read-ing instruction with teacher logs: Lessons fromthe Study of Instructional Improvement. EducationalResearcher, 38(2), 120–131.

Rowan, B., Camburn, E., & Correnti, R. (2004). Usingteacher logs to measure the enacted curriculum: Astudy of literacy teaching in third-grade classrooms.Elementary School Journal, 105(1), 75–101.

Rowan, B., Correnti, R., & Miller, R. J. (2002). Whatlarge-scale survey research tells us about teachereffects on student achievement: Insights from theProspects study of elementary schools. TeachersCollege Record, 104(8), 1525–1567.

Scheerens, J., & Bosker, R. (1997). The foundations ofeducational effectiveness. New York: Pergamon.

Schmidt, W. H., & McKnight, C. C. (1995). Surveyingeducational opportunity in mathematics and science:

An international perspective. Educational Evaluationand Policy Analysis, 17(3), 337–353.

Shepard, K., Hammerness, K., Darling-Hammond, L., &Rust, F. (2005). Assessment. In L. Darling-Hammond& J. Bransford (Eds.), Preparing teachers for a chang-ing world: What teachers should learn and be able todo (pp. 275–326). San Francisco: Jossey-Bass.

Smithson, J. L., & Collares, A. C. (2007). Alignment asa predictor of student achievement gains. Paper pre-sented at the meeting of the American EducationalResearch Association, Chicago, IL.

Smithson, J. L., & Porter, A. C. (1994). Measuringclassroom practice: Lessons learned from efforts todescribe the enacted curriculum (CPRE ResearchReport No. 31). New Brunswick, NJ: Consortium forPolicy Research in Education.

Smithson, J. L., Porter, A. C., & Blank, R. K. (1995).Describing the enacted curriculum: Developmentand dissemination of opportunity to learn indicatorsin science education (ED385430). Washington, DC:Council of Chief State School Officers.

Stedman, L. C. (1997). International achievement dif-ferences: An assessment of a new perspective.Educational Researcher, 26(3), 4–15.

Stevens, F. I. (1996, April). The need to expand theopportunity to learn conceptual framework: Shouldstudents, parents, and school resources be included?Paper presented at the annual meeting of the AmericanEducational Research Association, New York City,NY.

Stevens, F. I., & Grymes, J. (1993). Opportunity to learn:Issues of equity for poor and minority students (NCESNo. 93-232). Washington, DC: National Center forEducation Statistics.

Stoolmiller, M., & Bank, L. (1995). Autoregressiveeffects in structural equation models: We see someproblems. In J. M. Gottman (Ed.), The analysisof change (pp. 261–276). Mahwah, NJ: LawrenceErlbaum.

Vannest, K. J., & Hagan-Burke, S. (2010). Teachertime use in special education. Remedial and SpecialEducation, 31(2), 126–142.

Vannest, K. J., & Parker, R. I. (2010). Measuring time: Thestability of special education teacher time use. Journalof Special Education, 44(2), 94–106.

Vaughn, S., Gersten, R., & Chard, D. J. (2000). The under-lying message in LD intervention research: Findingsfrom research syntheses. Exceptional Children, 67(1),99–114.

Walberg, H. J. (1980). A psychological theory of educa-tional productivity. In F. H. Farley & N. Gordon (Eds.),Psychology and education (pp. 81–110). Berkeley,CA: McCutchan.

Walberg, H. J. (1986). Syntheses of research on teach-ing. In M. C. Wittrock (Ed.), Handbook of researchon teaching (3rd ed., pp. 214–229). New York:Macmillian Publishing Company.

Walberg, H. J. (1988). Synthesis of research ontime and learning. Educational Leadership, 45(6),76–85.

Page 145: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

6 Students’ Opportunity to Learn 129

Walker, D. F., & Schaffarzick, J. (1974). Comparingcurricula. Review of Educational Research, 44(1),83–111.

Wang, J. (1998). Opportunity to learn: The impactsand policy implications. Educational Evaluation andPolicy Analysis, 20(3), 137–156.

Webb, N. L. (1997). Criteria for alignment of expectationsand assessments in mathematics and science educa-tion (NISE Research Monograph No. 6). Madison, WI:University of Wisconsin-Madison, National Institutefor Science Education.

Winfield, L. F. (1987). Teachers’ estimates of test con-tent covered in class and first-grade students’ read-ing achievement. Elementary School Journal, 87(4),437–454.

Winfield, L. F. (1993). Investigating test contentand curriculum content overlap to assess opportu-nity to learn. Journal of Negro Education, 62(3),288–310.

Yazzie-Mintz, E. (2007). Voices of students on engage-ment: A report on the 2006 high school surveyof student engagement. Bloomington, IN: IndianaUniversity, Center for Evaluation and EducationPolicy.

Yoon, B., & Resnick, L. B. (1998). Instructional valid-ity, opportunity to learn and equity: New stan-dards examinations for the California mathematicsrenaissance (CSE Technical Report No. 484). LosAngeles: Center for the Study of Evaluation, NationalCenter for Research on Evaluation, Standards, andStudent Testing.

Ysseldyke, J., Thurlow, M., Langenfeld, K., Nelson, J. R.,Teelucksingh, E., & Seyfarth, A. (1998). Educationalresults for students with disabilities: What do thedata tell us? (Technical Report No. 23). Minneapolis,MN: University of Minnesota, National Center onEducational Outcomes.

Page 146: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy
Page 147: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

7Instructional Adaptations:Accommodations andModifications That SupportAccessible Instruction

Leanne R. Ketterlin-Geller and Elisa M. Jamgochian

Introduction

The purpose of this book is to describe how edu-cational environments can be made maximallyaccessible for students with disabilities. In thischapter we situate the discussion of accessibil-ity within the classroom instructional setting. Wedescribe how the use of instructional accommo-dations and instructional modifications can helpeliminate barriers inherent in many instructionalproducts, environments, or systems that preventmaximum engagement by students with disabili-ties. Implementing these instructional adaptationsmay increase the accessibility of the learningenvironment, thereby permitting equal access forall students.

Recent National Assessment of EducationalProgress (NCES, 2009) results in reading andmathematics indicate that students with disabil-ities are not reaching expected proficiency levels.Specifically, the 2009 reading scores identify that12% of fourth-grade and 8% of eighth-gradestudents with disabilities are performing at orabove proficiency as compared to 34% and 33%of their peers without disabilities, respectively.Similarly, the 2009 mathematics results show19% of fourth-grade and 9% of eighth-grade stu-dents with disabilities are performing at or above

L.R. Ketterlin-Geller (�)Southern Methodist University, Dallas, TX 75252, USAe-mail: [email protected]

proficiency as compared to 41% and 35% of thegeneral education population, respectively. Onepossible cause of these results is inaccessibleinstruction.

Instructional accessibility is influenced by theinteraction between individual student charac-teristics and features of instructional products,environments, or systems. Accessibility can beenhanced by intentionally designing and deliv-ering instruction that aligns with the student’sneeds. Just as testing adaptations improve theaccessibility of assessments, instructional adap-tations support students’ interactions throughchanges in the presentation, setting, timing orscheduling, and response mode of instruction.The purpose of this chapter is to describe therole of instructional adaptations in improving theaccessibility of instruction.

As with testing adaptations, instructionaladaptations are divided into two generalclasses: accommodations and modifications.Instructional accommodations are adaptations tothe design or delivery of instruction and asso-ciated materials that do not change the breadthof content coverage and depth of knowledge ofthe grade-level content standards. The use ofinstructional accommodations has no impact onthe performance expectations implied in the gen-eral education learning objectives. As such, moststudents using instructional accommodationsare expected to learn the same material to thesame level of proficiency as students who are notusing instructional accommodations. Conversely,instructional modifications are adaptations to the

131S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4_7, © Springer Science+Business Media, LLC 2011

Page 148: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

132 L.R. Ketterlin-Geller and E.M. Jamgochian

StudentNeeds

InstructionalGoals

InstructionalPlanning

Modifications

Accommodations

DifferentiatedInstruction

Level of Support

AccessibleInstruction

Fig. 7.1 Instructional planning to support access

design or delivery of instruction and associatedmaterials that result in a reduction of the breadthof content coverage and/or depth of knowledgeof the grade-level content standards. Instructionalmodifications may also result in a reduction ofthe performance expectations established for thegeneral education student population. As such,students receiving instructional modificationsare not expected to learn the same material tothe same level of proficiency as students in thegeneral education classroom.

In this chapter, we highlight assumptionsand implications related to the content and per-formance expectations of classroom instructionwhen making instructional adaptations. Further,we specify the distinctions underlying the useof instructional accommodations as opposed toinstructional modifications and present exam-ples to illustrate the similarities and differ-ences in these adaptations. Additionally, wedescribe implications for implementing instruc-tional adaptations including potential conse-quences of misalignment between adaptationsused in instruction and those used in assessment.Figure 7.1 provides an advanced organizer thatidentifies and illustrates the interaction betweenthe instructional planning components to supportstudents’ access to instruction. Students’ needsand instructional goals are considered duringinstructional planning. Support features includ-ing differentiated instruction, accommodations,and modifications are implemented depending onthe level of support individuals need to accessinstruction.

Instructional Adaptations: ImportantDistinctions

To understand the role of instructional adapta-tions in improving the accessibility of instruc-tional environments, two important distinctionsare necessary. First, at the broadest level, instruc-tional adaptations designed to improve accessfor students with disabilities are differentiatedfrom instructional planning approaches appliedto classroom instruction that promote learningfor all students. Second, the defining character-istics that differentiate instructional accommoda-tions from instructional modifications are fullyarticulated. Specifically, the degree of alignmentwith grade-level content standards and the con-sistency across performance expectations are dis-cussed to distinguish between these instructionaladaptations.

Distinguishing Between InstructionalAdaptations and DifferentiatedInstruction

To begin the discussion on instructional adapta-tions, it is important to distinguish these practicesfrom the process of differentiating instruction.Similar to instructional adaptations, differentiatedinstruction is an instructional planning approachthat intentionally adapts instructional design anddelivery methods to support student learning ofthe instructional objectives. When differentiating

Page 149: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

7 Instructional Adaptations 133

instruction, the teacher considers students’ back-ground knowledge, readiness for the instructionalobjective, language skills and abilities, prefer-ences, and interests of all students in his/herclassroom (Hall, 2002). These variables arematched to different instructional methods suchas varied supports from the teacher and/or peers,modeling and the use of manipulatives, coaching,use of different media sources, self-reflection,and alternate review activities (Tomlinson, 1999).The process of aligning student characteristicswith instructional approaches may be formallyarticulated in teachers’ lesson plans or may begenerated informally through teachers’ interac-tions with students. Ultimately, by incorporatingstudent variables in the instructional planningprocess, the teacher can adjust the learning expe-riences accordingly to facilitate student learningof the instructional objective.

Because both instructional adaptations anddifferentiated instruction focus on adapting thedesign and delivery of instruction to sup-port student learning, many teachers intertwinethese instructional methodologies. However, sev-eral key factors distinguish instructional adapta-tions from differentiated instruction. Specifically,Individualized Education Program (IEP) teamsassign instructional adaptations based on the per-sistent needs of students with disabilities. Aswith other IEP decisions, the student’s charac-teristics are considered along with performancedata to identify the student’s strengths and lim-itations. Instructional adaptations are identifiedthat align with the student’s needs in meetingthe most appropriate learning objectives. In somecases, IEP teams may evaluate the effectivenessof differentiated instructional approaches on thestudent’s learning prior to systematically assign-ing the approaches as instructional adaptations onthe student’s IEP. Because the IEP is a legallybinding document between the educational ser-vice providers and the student (Individuals withDisabilities Education Act [IDEA], 2004), theinstructional adaptations listed on the IEP mustbe systematically applied across the learningenvironment. Even though some instructionaladaptations may mirror differentiated instruc-tional strategies (e.g., use of a graphic organizer,

additional practice opportunities), when specifiedon an IEP, the teacher must regularly incorporatethem into the student’s instructional opportuni-ties. Consequently, the flexibility in instructionalplanning proffered by differentiated instructionis not afforded to teachers when implementinginstructional adaptations.

It is important to note that the use of instruc-tional adaptations does not supplant the useof differentiated instruction. In fact, studentswith disabilities often benefit from differenti-ated instruction in addition to the instructionaladaptations identified on their IEP. For exam-ple, an instructional adaptation for a studentwith working memory limitations might includeusing a graphic organizer to document story ele-ments when reading fictional text. To increasethe student’s motivation, a differentiated instruc-tional strategy might involve allowing him/her toselect the text based on personal interests. Thegraphic organizer would be formally identified asan instructional adaptation on the student’s IEP;however, it is not necessary to document vari-ations in story context. Researchers and policymakers identify the value of differentiating gen-eral education instruction as a mechanism formaking the general education classroom moreaccessible for students with disabilities (e.g.,Gersten et al., 2008). As such, these instructionalapproaches can be used conjunctively to supportstudent learning.

Distinguishing Between InstructionalAccommodations and Modifications

The primary distinctions between instructionalaccommodations and instructional modificationsare based on the degree of alignment with grade-level content standards and the consistency acrossperformance expectations. As previously intro-duced, most students using instructional accom-modations are expected to learn the same materialto the same level of performance as other studentsin the general education classroom. Conversely,students receiving instructional modifications arenot necessarily expected to learn the same mate-rial to the same level of performance as other

Page 150: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

134 L.R. Ketterlin-Geller and E.M. Jamgochian

students. Although these distinguishing charac-teristics often help differentiate between accom-modations and modifications, they are not aseasily separated. For example, a student with sig-nificant cognitive disabilities might receive bothinstructional accommodations (e.g., large printor verbatim oral presentation of direction) at thesame time he or she is receiving instructionalmodifications (e.g., using off-grade-level readingmaterial). As such, it is often useful to considerthe impact instructional adaptations have on theoutcomes of instruction when classifying them asaccommodations or modifications.

Alignment of Instructional Adaptationswith Grade-Level Content Standards

As noted elsewhere in this volume, IDEA (2004)requires IEP teams to align students’ instruc-tional goals with grade-level content standardsthat guide learning objectives for all students inthe state. Because instructional goals are enactedthrough the design and delivery of instruction, theadapted content resulting from the use of instruc-tional accommodations and modifications shouldbe evaluated for alignment with grade-level con-tent standards. When referencing alignment tograde-level content standards, two dimensionsare typically considered: content coverage anddepth of knowledge (also referred to as cogni-tive complexity) (La Marca, 2001). Alignmentto grade-level content standards is evaluated byconsidering the degree of consistency of con-tent coverage and depth of knowledge betweenthe content standards and students’ instructionalgoals as stated on their IEP.

Content coverage refers to the knowledge andskills represented in the content standards, suchas facts, concepts, principles, and procedures.Federal legislation (NCLB, 2001; IDEA, 2004)requires that all students receive instruction ongrade-level content, regardless of their educa-tional classification. A key distinction betweeninstructional accommodations and instructionalmodifications is the extent to which the contentcovered aligns with the grade-level content stan-dards. Because instructional accommodations are

not intended to change the breadth or depth ofthe grade-level content standards, instructionalaccommodations should maintain a one-to-onecorrespondence in the content coverage betweenthe instructional goals and the grade-level contentstandards. This does not preclude accommoda-tions such as providing extra time for learningthe content or rearranging the sequence of con-tent presented, as long as the content coverageremains consistent. On the other hand, for somestudents with disabilities, it may not be feasi-ble to learn the breadth of content stated in thegrade-level content standards. For these students,the IEP team may identify instructional modifi-cations that reduce the content covered within theyear. For example, the IEP team may recommendchanging the scope of instruction to focus on the“big ideas” presented in the content standards.As such, these instructional modifications reducethe degree of alignment between the instructionaldesign and delivery and the grade-level contentstandards.

Depth of knowledge, or cognitive complex-ity, refers to the level of cognitive engagementat which students are expected to interact withthe content. Taxonomies of depth of knowledge(e.g., Webb’s taxonomy (Webb, 1999), Bloom’staxonomy (Krathwohl, 2002)) classify engage-ment into categories depending on the type ofinteraction expected from the student, such asreiteration of facts, summarization of a concept,or application of a procedure to a novel situa-tion. Instructional accommodations differ frominstructional modifications in the degree to whichthey alter the cognitive complexity of the grade-level content standards. Instructional accommo-dations designed to support students’ engagementwith the content should not alter the intendedcognitive complexity of the grade-level contentstandard. Instead, these accommodations maychange the sequence of instruction to provideadditional practice in prerequisite knowledge andskills, prime background knowledge, or scaf-fold instruction to gradually increase the levelof cognitive complexity in which students inter-act with content. Still, instruction should culmi-nate in the depth of knowledge that is consistentwith the grade-level content standards. However,

Page 151: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

7 Instructional Adaptations 135

instructional modifications may reduce the cog-nitive complexity of instruction.

Federal legislation recognizes that the depthof knowledge in the grade-level content stan-dards may not be appropriate for some studentswith disabilities (NCLB, 2001; IDEA, 2004).For these students, IEP teams may recommendinstructional modifications that reduce level ofcognitive complexity implied by the grade-levelcontent standards. For example, consider themathematics content standard that expects stu-dents to classify and sort two-dimensional fig-ures based on their attributes. An instructionalmodification may instead require students toidentify two-dimensional figures. By reducingthe depth of knowledge of the content stan-dard, instructional modifications improve acces-sibility to instruction and associated materialsfor some students with disabilities. However,because instructional modifications alter align-ment between grade-level content standards andinstruction, students receiving instructional mod-ifications do not have the same opportunity tolearn the breadth of content and/or depth ofknowledge as other students in the general edu-cation classroom.

Consistency of PerformanceExpectations with General Education

Performance standards or expectations are mosttypically expressed in terms of the degree of mas-tery of the instructional objectives students areexpected to reach. Sometimes this is manifestedin the criterion established for decision making(e.g., students must complete 80% of the home-work items correctly); other times it is evidentin the characteristics of the tasks students areexpected to complete (e.g., students answer 20homework problems with graduated difficulty).As previously introduced, the consistency ofthese performance expectations with the generaleducation instructional objectives distinguishesinstructional accommodations from instructionalmodifications.

Instructional accommodations are typicallyintended for students who are expected to

learn the same material to the same level ofperformance expectations as students who arenot receiving instructional accommodations. Assuch, for students with disabilities who mayneed additional instructional support to reach thegeneral education performance expectations, IEPteams should consider instructional accommoda-tions that scaffold students’ development of con-tent knowledge and skills to the expected levelof proficiency, as opposed to instructional adapta-tions that reduce these expectations. Such instruc-tional accommodations may include providingmore guided practice on instructional activitiesto reach the expected performance level on inde-pendent class work or extending the instructionaltime in which students are expected to reachthe performance level. Although these accommo-dations may alter the schedule for meeting thegeneral education performance expectations, theyresult in no changes to the instructional objectivesthemselves.

Modifications to the performance expectationsshould be considered for students with disabili-ties who are not likely to reach the performanceexpectations established for the general educationpopulation within the academic year. Although itis often difficult to determine with certainty whothese students are, their IEP teams may recom-mend reducing the level of proficiency expectedon independent work (e.g., students must com-plete 60% as opposed to 80% of the homeworkitems correctly) or reducing the number or dif-ficulty of the tasks to be completed (e.g., stu-dents answer 10 homework problems as opposedto 20). This instructional modification may alsobe implemented in tandem with a reduction inthe content coverage and/or depth of knowledgeof the grade-level content standards. In theseinstances, students might be completing fewerinstructional tasks that cover a narrower focusof content and involve less difficult cognitiveprocessing skills.

In summary, instructional accommodationsand instructional modifications serve the purposeof improving accessibility of the general edu-cation curriculum by changing the design anddelivery of instruction and associated materials.These adaptations differ based on the extent to

Page 152: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

136 L.R. Ketterlin-Geller and E.M. Jamgochian

which these changes alter the alignment with thecontent coverage and/or depth of knowledge ofthe grade-level content standards and/or the con-sistency across performance expectations. In thenext section, we describe possible consequencesIEP teams should consider prior to assigninginstructional adaptations.

Consequences of ImplementingInstructional Adaptations

Prior to assigning instructional accommoda-tions and instructional modifications, IEP teamsmust be fully aware of the likely consequencesof these adaptations. Specifically, implement-ing instructional adaptations should not pre-clude instruction in the knowledge and skillareas that are being accommodated or mod-ified. Additionally, implementing instructionalmodifications can significantly change students’immediate and subsequent instructional program-ming, as well as assessment participation options.Further, assigning instructional adaptations mayinfluence the adaptations students use duringtesting. IEP teams should not interpret theseimplications as potential reasons for not assign-ing needed instructional adaptations; instead, IEPteams should be aware of these issues to facil-itate decision making and ensure fidelity ofimplementation.

Consequences of Overuseof Instructional Adaptations

Instructional adaptations have the potential ofbeing overused in classroom practices at theexpense of instruction on the knowledge andskills that are being supported (Phillips, 1999).Although most often unintended, some teachersmay inadvertently omit instruction on the knowl-edge and skills that are being accommodated ormodified from the student’s educational program-ming. For example, students struggling to mastercomputational fluency may be allowed to use acalculator to solve situated problems. However,these students need continued instruction and

practice to develop fluency with computationalalgorithms. As such, the use of instructionaladaptations should be paired with sufficient andfocused instruction on the knowledge and skillsthat are being supported. By developing students’knowledge and skills in these areas, studentsmay be able to gradually reduce their use anddependency on the instructional adaptation.

Related to overuse of instructional adaptationsis misuse or misclassification of instructionalchanges that result in a reduction of content orexpectations. For example, in a survey of 176general and special education teachers’ instruc-tion and accommodations practices in mathe-matics, Maccini and Gagnon (2006) found 41%of special education teachers and 19% of gen-eral education teachers assigned fewer practiceproblems as an instructional strategy and 33%of special education teachers and 17% of gen-eral education teachers assigned fewer homeworkproblems during instruction on computationalskills. Similar results were observed when thecontent focused on solving multi-step problems.This adaptation was perceived as an instructionalstrategy; however, because the expectation of per-formance was reduced from that of the generaleducation instructional objectives, it should beclassified as a modification. Similarly, in thissame study, Maccini and Gagnon (2006) foundthat 54% of special education teachers and 40%of general education teachers allowed studentsto use calculators during instruction on basicskills or computational tasks. Because the con-struct of instruction was closely aligned with theknowledge and skills that were being supportedthrough the use of a calculator, this instruc-tional adaptation should be viewed as a modi-fication as opposed to an instructional strategy.These results mirror those of other researchersexamining teachers’ classification of test adap-tations (e.g., Hollenbeck, Tindal, & Almond,1998; Ketterlin-Geller, Alonzo, Braun-Monegan,& Tindal, 2007). Assignment of instructionaladaptations is beyond the scope of this chap-ter; however, results of these studies indicate thatteachers may need additional support in distin-guishing between instructional strategies, accom-modations, and modifications.

Page 153: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

7 Instructional Adaptations 137

Consequences of InstructionalModifications

Regardless of the type of instructional modifi-cation(s) assigned to a student, IEP teams mustbe aware of the implications of these adapta-tions. First, modifications alter the alignmentbetween grade-level content standards and class-room instruction. Although this statement is obvi-ous, the implications are significant: The alteredalignment inherently reduces students’ opportu-nity to learn grade-level content standards tothe level of proficiency expected of the gen-eral education population. Consequently, subse-quent learning experiences will be influenced bythe (lack of) content coverage, (reduced) depthof knowledge, and/or (lower) expectations ofproficiency. For example, consider an instruc-tional modification for a fourth-grade studentin mathematics that reduces the geometry con-tent covered to focus only on two-dimensionaland not three-dimensional figures. Subsequentinstructional objectives that rely on the student’sunderstanding of three-dimensional figures, suchas understanding and calculating volume, willalso need to be modified to account for this stu-dent’s lack of understanding of three-dimensionalfigures. Similarly, consider an instructional mod-ification that allows a seventh-grade studentstudying life science to use a textbook writ-ten at the fourth-grade reading level. Inherentin this instructional modification is a reductionin the student’s exposure to grade-level vocabu-lary, which may impact subsequent instructionalobjectives. As such, whenever possible, instruc-tional modifications should be limited in favor ofproviding accommodations that maintain align-ment with the grade-level content standards.

A second implication of assigning instruc-tional modifications is the (mis)alignmentbetween grade-level content standards and large-scale assessments. Because grade-level contentstandards form the basis for most large-scalestate and district achievement tests, it followsthat these assessments are not appropriate for stu-dents who are not receiving instruction alignedto these standards (Browder, Spooner, Wakeman,

Trela, & Baker, 2006). As such, IEP teamsneed to consider alternate participation optionson large-scale state and district achievementassessments that might be most appropriate forstudents receiving instructional modifications.

Federal legislation allows for two alternateparticipation options on state accountability tests:alternate assessments based on alternate achieve-ment standards (AA-AAS) and alternate assess-ments based on modified achievement standards(AA-MAS). Specific guidelines govern the selec-tion of students participating in AA-AAS (e.g.,students with the most significant cognitive dis-abilities), and states are currently in the processof providing guidance for selecting students whomay participate in the AA-MAS. Federal regula-tions, however, stipulate that scores from only asmall portion of students with disabilities (10%for the AA-AAS and 20% for the AA-MAS,which translates to 1% and 2% of the overallstudent population, respectively) can be includedin states’ reporting of proficiency for AdequateYearly Progress (AYP) under these participa-tion options. Consequently, these participationoptions should be selected only for those studentsfor whom the general education assessment isinappropriate. As such, prior to assigning instruc-tional modifications, IEP teams need to considerthe lasting and immediate implications for stu-dents’ instructional programming and assessmentparticipation.

Interdependence Between AccessibleInstruction and AccessibleAssessments

To support the educational needs of studentswith disabilities, accessible instruction is insep-arable from accessible assessments. Consider,for example, an accessible instructional envi-ronment coupled with inaccessible assessments.In this setting, all students have equal opportu-nities to learn instructional content to specificexpectations; however, as a result of inaccessibleassessments, students with disabilities may not beable to demonstrate what they know and can doon classroom-based and large-scale assessments.

Page 154: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

138 L.R. Ketterlin-Geller and E.M. Jamgochian

Consequently, poor student performance may be(mis)interpreted as a lack of progress in learningthe instructional content to specific performanceexpectations. Because accurate representation ofstudents’ abilities was obstructed by inaccessi-ble assessments, decisions based on inaccessibleassessments are invalid. Issues about test-scorevalidity are discussed in Chapters 10, 12, and 13of this book.

Conversely, consider a situation in whichinstruction is inaccessible but the assessmentsare accessible. In this case, all students do nothave an equal opportunity to learn the content ofinstruction. Although the assessments may accu-rately represent students’ (lack of) knowledgeand skills, the representation may not adequatelyreflect students’ potential abilities. However, ifinterpreted appropriately, these results mighthighlight the lack of opportunities to learn thecontent. As these scenarios point out, to supportstudent learning, accessibility must be viewedacross multiple dimensions of the educationalenvironment.

Because of the interdependence betweenaccessible instruction and accessible assess-ments, instructional adaptations should formthe basis for testing adaptations. As such,instructional accommodations and instructionalmodifications should be seamlessly integratedinto classroom-based and large-scale assessmentpractices. In fact, in 2007, 47 states requiredthat the use of instructional accommodationsbe considered when making decisions abouttesting accommodations (Christensen, Lazarus,Crone, & Thurlow, 2008). However, someresearchers point out the discrepancy betweenthese practices. In a study examining the IEPsfor 280 students with disabilities, Ysseldyke et al.(2001) found that approximately 86% of theparticipants had either instructional or testingaccommodations listed on their IEPs. Of thesestudents, 84% had instructional accommodationsthat matched testing accommodations. The rateof agreement varied based on the prevalence ofthe disability: Instructional and testing accommo-dations matched in 98% of the IEPs for studentswith low-prevalence disabilities as compared to71% for students with moderate-prevalence and

84% for high-prevalence disabilities. Similarly,DeStefano, Shriner, and Lloyd (2001) conducteda study on the accommodation practices of100 urban educators. In their findings, theynoted little association between accommodationsused in instruction and those provided on tests;most frequently, teachers used more accommo-dations during testing than instruction. However,after teachers received training on accommo-dation assignment procedures, results indicatedthat teachers used more accommodations dur-ing instruction and fewer during assessments.Ultimately, students should have access to thesame adaptations in instruction and assessment.

Instructional Adaptations ThatSupport Accessible LearningEnvironments

In this chapter, we highlighted characteristicsof instructional accommodations that distinguishthese instructional adaptations from instructionalmodifications. This differentiation provided abasis for discussing the implications of assign-ing instructional adaptations on current andfuture instructional programming and assessmentpractices. The remainder of this chapter willfocus on practical suggestions for implement-ing these instructional adaptations in classroominstruction. We describe instructional accommo-dations followed by instructional modificationsin four categories: presentation, setting, timingor scheduling, and response modes. Although weidentify possible adaptations, it is important tonote that we do not discuss assignment criteria forIEP teams to consider when making instructionaladaptation decisions.

Table 7.1 provides a summary of types ofinstructional adaptations, student characteristicsto describe those who may benefit from eachtype of adaptation, and examples of accommo-dations and modifications to succinctly illustratethe differences between these instructional adap-tations. It is important to note that allowableaccommodations may differ from state to state,and differences may exist between state and fed-eral policies. Because of this, teachers and other

Page 155: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

7 Instructional Adaptations 139

Table 7.1 Summary of instructional adaptations

Adaptationtype Benefits students. . . Accommodation examples Modification examples

Presentation With physical, sensory, orcognitive difficulties thataffect perception of visual orauditory stimulus

Visual – Braille, large print font,copies of written material,magnification, electronictexts/materialsAuditory – read aloud,audio-/video-recorded presentations,screen readers, amplification devices,peer-assisted learning, cross-gradetutoringTactile – tactile prompts, physicalguidance, raised-line paper, raisedgraphics

Abridged version of text ornovel, reduced contentcoverage (fewer concepts),reduced depth of knowledge,read aloud (as a modification)

Setting Who are distractibleWhose adaptation may bedistractibleWho need alternate physicalaccess

Preferential seating/grouping (nearteacher, away from doors andwindows), individual or small groupinstruction, headphones, study carrel,separate location

Independent work on grouptask, work with partner onindividual task

Timing andscheduling

Who process informationslowlyWith physical disabilities thataffect task completionWho use adaptations thatrequire additional time

Extended time, multiple breaks,breaking task into smaller parts,on-task reminders (e.g., timer), dailyschedule posted

Extended time/frequentbreaks on timed tasks

Response With expressive languagedifficulties or motoricimpairmentsWho need additional supportwith organization or problemsolving

Write, type, or word-processresponses, point or sign to indicateresponse, communication devices,text-to-speech, audio recordresponses, scribe, oral response,manipulatives, scratch paper, responddirectly on assignment, visual/graphicorganizers, calculator

Reducing number of items tocomplete or paragraphs towrite, providing fewer answerchoices, reducing depth ofknowledge required,dictionary or thesaurus (as amodification)

instructional decision makers need to be wellinformed of these guidelines when assigning stu-dent accommodations.

Instructional Accommodations

Instructional accommodations are adaptations tothe design or delivery of instruction and asso-ciated materials that do not change the breadthof content coverage and depth of knowledge ofthe grade-level content standards. Additionally,the use of instructional accommodations hasno impact on the level of performance stu-dents are expected to reach. As such, most stu-dents using instructional accommodations areexpected to learn the same material to the samelevel of expectation as students who are not

using instructional accommodations, althoughthe learning process may take more time. Belowis a summary of instructional accommodationscategorized by type: presentation, setting, tim-ing or scheduling, and response modes. IEPteams should reference state accommodationspolicies for allowable accommodations prior toassignment.

Changes in Presentation

Presentation accommodations provide studentswith alternate ways to access instructional mate-rial. Students with difficulties perceiving visualor auditory stimulus may benefit from visual,tactile, and/or auditory changes in presentation(Thompson, 2005). To increase the accessibility

Page 156: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

140 L.R. Ketterlin-Geller and E.M. Jamgochian

for all students, general guidelines suggest thatmaterial presented visually should include typein a standard, legible font that is no less than12-point size for printed materials (24-point forprojected material); high contrast between textand background; sufficient space between letter,words, and lines; and text that is left-aligned(with staggered right margins). Also, images andgraphics should be relevant to the content ofthe text, have dark lines, and sufficient con-trast (Thompson, Johnstone, Anderson, & Miller,2005). Material presented auditorily should pro-vide students with appropriate cues for under-standing, including expression and stance (e.g.,facing class when speaking), repetition of ques-tions, and summary of discussion or material cov-ered and important points (University of Kansas,2005). Specific examples of visual, auditory, andtactile changes in presentation are detailed below.

Visual presentation accommodations benefitstudents with physical, sensory, or cognitivedisabilities that affect their ability to visuallyread standard print. Common accommodationsinclude Braille, large print font, magnification,and providing copies of written materials. Large-print editions of textbooks and other instructionalmaterials are often available through the pub-lishing company. Photocopying at a size greaterthan 100 percent can enlarge regular print materi-als. Some students may use a magnifying device(handheld, eyeglass mounted, freestanding) toview materials in a larger format. Copies of pre-sented materials (e.g., presentation slides, notes,outlines) also support students in understandingand accessing visually presented material. Somestudents may also benefit from using a guide tohelp track while reading (e.g., bookmark, indexcard, or ruler), colored overlays to improve con-trast (e.g., Wilkins, Lewis, Smith, Rowland, &Tweedie, 2001), or revised documents with lessinformation and/or fewer items per page.

Technology may also support students’ access.As the availability of electronic texts and materi-als increases, students have greater control overthe visual presentation of information, includ-ing font, font size, color/contrast, graphic/imagecaptions, and can visually track their progressthrough the material (e.g., Rose & Dalton, 2009).

Teachers can also make word-processed docu-ments, presentations, or other materials availableto students on a computer. By having access tothe source documents, students have flexibility toindividualize the material by changing visual fea-tures (e.g., font size/color, line spacing, contrast)to customize access.

Auditory changes in presentation benefit stu-dents with visual or hearing impairments ordifficulty processing print or auditory infor-mation. Such changes in presentation includeread-aloud accommodations, audio recordings, orvideo presentations of written material, amplifi-cation devices, and screen readers. Read-aloudaccommodations involve oral presentation of textto students to provide access to instructionalcontent (Tindal, 1998). A teacher or educa-tional assistant may provide this accommoda-tion, text may be recorded and played backfor the student, text may be read aloud on acomputer, or the student may read aloud onhis/her own. Additionally, students may workwith each other to provide this support, eitherthrough peer-assisted learning strategies (e.g.,Fuchs, Fuchs, Mathes, & Simmons, 1997; Fuchs,Fuchs, Hamlett, Phillips, & Karns, 1995) orcross-grade tutoring, or classroom volunteers canprovide read-aloud support. It is important tonote that this is an accommodation as long asthe intended construct remains unchanged; forexample, if the instructional objective involvesunderstanding and using new vocabulary froma story or requires a student to formulate andpropose a solution to a mathematics story prob-lem, a read-aloud accommodation maintains thegoals and expectations of instruction. In contrast,if the instructional objective requires students touse letter–sound correspondence to sound outnovel words or to read aloud grade-level textfluently and accurately with appropriate into-nation and expression, providing a read-aloudaccommodation alters the instructional goals andexpectations.

Similar to the read-aloud accommodation,providing students with audio recordings or videopresentations of written material can help toimprove access to instruction. An additional ben-efit of using recorded material is that a student

Page 157: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

7 Instructional Adaptations 141

has the ability to replay the recording, providingmultiple exposures to the content. Also, as notedin the description of visual changes in presenta-tion, electronic texts embed flexibility, providingstudents with access to auditory support, includ-ing the option to hear text spoken. Access tocommon technology tools may help facilitate thisaccommodation: Many students have cell phonesor MP3 players (e.g., iPod, Zune) that supportvarious audio files. Audio recordings of writ-ten material may be created directly on suchdevices or may be created, saved, and transferredfrom a computer-based application. In addition,many books for children are available on CDsor as downloadable audio files. Other auditoryaccommodations may include various assistivetechnologies, such as amplification devices andscreen readers.

Tactile presentation accommodations benefitstudents with visual impairments, as well as stu-dents needing alternate representations of infor-mation. Tactile accommodations provide infor-mation to students through touch and may includetactile prompts or physical guidance (e.g., posi-tioning a student’s fingers on a keyboard, mouse,or pencil), manipulatives, writing paper withraised lines, or graphic material presented ina raised format (e.g., maps, charts, and graphswith raised lines). Tactile accommodations oftenrequire additional information, such as a ver-bal description, for students to fully access thecontent presented and understand performanceexpectations (O’Connell, Lieberman, & Petersen,2006). Practically, teachers can outline maps,charts, and graphs on student worksheets in gluethe day prior to their use in a lesson. When dry,the raised lines provide tactile information to helpstudents identify the graphic.

Changes in Setting

Setting accommodations are changes to theconditions or location of an instructional set-ting. Students who are distractible, who receiveaccommodations that may distract others, or whoneed alternate physical access may benefit fromchanges in the instructional setting. To promote

access for all students, the classroom settingshould be well lit, well ventilated, and have acomfortable temperature, with chairs and tablesthat are appropriate in height and have suffi-cient space for students to work, and materi-als and equipment should be in good condition(University of Kansas, 2005).

May involve a change of location either withinor outside of the classroom. Within the class-room, a distractible student may be seated nearthe teacher, away from doors and windows,and/or with a group of students who are notdistracting. In addition, students may receiveinstruction individually or in a small group, useheadphones to buffer noise, or a study carrel todiminish visual distractions. If a student receivesaccommodations that may distract other students,such as a scribe, multiple breaks, or read-aloud,he or she may receive them in a separate loca-tion. In addition, some students may require achange in setting to improve physical access orto access equipment that is not readily availablein the classroom.

Changes in Timing or Scheduling

Timing and scheduling accommodations alter theamount of time allotted to complete a test, assign-ment, or activity, or change the way instruc-tional time is organized or allocated (Thompson,2005). Changes in timing and scheduling maybenefit students who need additional time, fre-quent breaks, or multiple exposures to informa-tion, including students who process informationslowly, students with physical disabilities thatmay impact their task completion rate, and stu-dents who use accommodations or equipmentthat require additional time. Timing and schedul-ing accommodations may also benefit studentswith difficulty focusing and attending to tasksfor extended periods of time, students who maybecome easily frustrated or anxious, or thosetaking medication or who have health-relateddisabilities that affect their ability to attend.

Possible accommodations include extendingthe time for task completion, providing multi-ple breaks during a given task, breaking a larger

Page 158: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

142 L.R. Ketterlin-Geller and E.M. Jamgochian

task into smaller parts, and/or setting remindersto remain on task. For example, a teacher mayset a timer for a student during independentclass work to provide a visual cue that indi-cates expected time for engagement in a partic-ular activity. Additionally, teachers can post theschedule for daily activities in a consistent placeto provide students with structured guidelines forclass work.

Changes in Response Mode

Response format accommodations change theway students express their knowledge and skills.Changes in response mode may benefit studentswith expressive language difficulties or motoricimpairments or students who need additionalsupport with organization or problem solving.Students who have difficulty with verbal expres-sion may be allowed to write or word processtheir answers, point or sign to indicate theirresponse, or use a communication device or text-to-speech program. Students with motor diffi-culties may respond orally, tape/digitally recordresponses, and/or have a scribe record theirresponses. Students who need support with orga-nization or problem solving may benefit from theuse of manipulatives to model problems, scratchpaper to plan responses, writing their answersdirectly on a test or assignment rather than trans-ferring to a separate paper, visual/graphic orga-nizers, using a calculator, or highlighting impor-tant information within a given task.

Technology already in place in the class-room may facilitate implementation of manyresponse accommodations. For example, teach-ers can provide classroom-based assignmentselectronically that include embedded mnemonicdevices or scaffolded instructional prompts thatstructure students’ responses. Additionally, stu-dents might type their responses to assign-ments or respond to multiple-choice questionsby highlighting or making their answers in bold.Students can use software such as Inspiration R©or Kidspiration R© to help create an outline orvisual map to organize their ideas prior toresponding to assignments or as a method for

highlighting relationships presented in a les-son. In addition, many classrooms have elec-tronic input devices (commonly referred to as“clickers”), which capture student responses andcan be incorporated into Microsoft PowerPoint R©presentations.

Depending on the instructional objectives,some of the accommodations described abovemay become modifications if they are used ina way that changes the intended construct. Forexample, if a student is provided with a read-aloud accommodation on a decoding or readingcomprehension task, the construct may changefrom reading comprehension to listening compre-hension.

Instructional Modifications

Instructional modifications are adaptations to thedesign or delivery of instruction and associatedmaterials that result in a reduction of the breadthof content coverage and/or depth of knowledgeof the grade-level content standards. Instructionalmodifications may also result in a reductionof the performance expectations established forthe general education student population. Assuch, students receiving instructional modifica-tions are not necessarily expected to learn thesame material to the same level of expectationas students in the general education classroom.Below is a summary of instructional modifi-cations categorized by type: presentation, set-ting, timing or scheduling, and response modes(see Table 7.1).

Changes in Presentation

Presentation modifications change instructionaldelivery in a way that reduces the breadth of con-tent coverage or depth of knowledge, and in turn,alters the construct of instruction. Students whocontinue to demonstrate difficulty perceiving orunderstanding visual or auditory presentations ofinstructional content, given appropriate accom-modations, may benefit from modifications topresentation. For these students, the cognitive

Page 159: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

7 Instructional Adaptations 143

processing expectations of the learning environ-ment may exceed their processing capacities,thereby causing cognitive overload (Mayer &Moreno, 2003). Specifically, students’ cognitiveprocessing might be burdened by the amountof new information presented during instruc-tion that is associated with the learning goal.Alternatively, some students might find the con-text of the instructional environment excessivelycomplex, thereby taxing their cognitive process-ing. Additional theories of students’ cognitiveprocessing are likely plausible explanations forcognitive overload. For students experiencingcognitive overload, instructional modificationsmay improve the accessibility of instruction byreducing the cognitive processing demands.

Presentation modifications may reduce thecognitive processing load in several ways. Forexample, reducing the complexity of informationby simplifying the level of instructional materialmay decrease the cognitive processing demandsthereby allowing the student to focus on learn-ing essential knowledge and skills (Elliott, Kurz,Beddow, & Frey, 2009). For example, a stu-dent might benefit from instructional materials,such as a social studies textbook written at alower grade level that covers the same contentas the general education textbooks. Similarly,a student may meet the learning objective byreading an abridged version of a novel. Thesemodifications provide students access to grade-level material and content but lower performanceexpectations. Another way to modify instruc-tional delivery to support students’ cognitive pro-cessing is to reduce content coverage; that is,students may be instructed on a limited num-ber of concepts that represent the essence ofthe grade-level content standards. By eliminatingnon-essential attributes of instruction, students’cognitive processing can be diverted from extra-neous information to essential learning (Mayer& Moreno, 2003). Additionally, the depth ofknowledge intended by the grade-level contentstandard might be reduced. For example, ratherthan presenting an analysis of historical events,the teacher may compare and contrast eventsto simplify the material and concepts. As notedpreviously, these modifications have significant

implications for the student’s subsequent instruc-tional experiences.

Changes in Setting

Setting modifications change the conditions orlocation of the instructional setting that are war-ranted due to the need for specialized instruc-tional supports, materials, or equipment. As oftenas possible, students should receive instructionin the general education classroom. However, forsome students to meet their instructional objec-tives, modifications to the setting of instructionmay be needed. Possible setting modificationsthat may be implemented in the general edu-cation classroom include allowing the studentto work independently on group tasks or, con-versely, allowing a student to work with a partneron an independent task. These changes are clas-sified as modifications if they alter the contentor performance expectations of the instructionalobjectives.

Changes in Timing or Scheduling

A modification to the timing or scheduling ofinstruction changes the amount or organization oftime in a way that affects the goals of the instruc-tional activity. These changes may be needed forstudents who require additional support beyondthose provided through accommodations. Suchstudents may have difficulty processing informa-tion, specific schedule requirements, or may havea physical disability that significantly impactsthe time needed to complete tasks. An exam-ple of a scheduling accommodation may includechanging the student’s daily schedule while atschool. A modification to the timing of a taskmight include providing extended time or fre-quent breaks on an activity or test involving flu-ency. Because fluency activities typically involvea rate of performance, changing the allowedtime or providing multiple breaks may changethe instructional objective. For example, con-sider a task designed to build students’ fluency inmathematical computation by providing students

Page 160: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

144 L.R. Ketterlin-Geller and E.M. Jamgochian

with 2 minutes to complete as many additionproblems as possible. Allowing a student addi-tional time on this task changes grade-levelproficiency expectations, but still provides thestudent an opportunity to demonstrate computa-tional understanding. Another timing modifica-tion would be to eliminate the time requirementfor a given task or limit timed tasks to masteredskills. By providing such timing modifications,the expectations of proficiency or fluency andcompletion rate changes.

Changes in Response Mode

Modifications to response mode change a taskin a way that reduces student expectations ordepth of knowledge required to complete thetask. Students with expressive language or infor-mation processing difficulties or motoric impair-ments, or those with difficulty organizing infor-mation, may require more significant changes inresponse mode than those offered through accom-modations to demonstrate their knowledge andskills. Examples of response mode modificationsinclude reducing the number of items a stu-dent is expected to complete or the number ofanswer choices per item on an assignment thusreducing proficiency expectations. A modifica-tion to response mode that changes the depth ofknowledge required for an activity might includechanging a task that requires students to analyzeand synthesize information to one that requiresthem to identify the cause and effect of an event.The use of a calculator on a mathematics taskthat is designed to gauge students’ computationskills is a modification that alters student expec-tations is another example of a response modemodification. Other similar examples would bethe use of a dictionary or thesaurus for a vocab-ulary task or the use of spelling or grammar aidsfor a writing task that targets or measures theseskills. A modification to a writing assignment thatrequires a five-paragraph essay might reduce theamount of writing a student is expected to do toa paragraph with three-to-five sentences supple-mented with images to demonstrate content andsequence.

Integrating Instructional AdaptationsBased on Students’ Needs

Students’ needs provide the foundation for mak-ing instructional planning decisions. In somecases, students with disabilities will be appropri-ately supported in the general education curricu-lum through differentiated instructional strate-gies. In other cases, students may need instruc-tional accommodations and/or instructional mod-ifications to reach their learning potential. In thischapter, we have described these approaches tosupporting students with disabilities in isolation.The case example presented in Fig. 7.2, however,illustrates how these approaches can be inte-grated to provide a maximally accessible learningenvironment based on an individual’s needs.

ConclusionsAccessibility of instruction can be enhancedby designing and delivering instructionthat supports the interaction between anindividual’s characteristics and the format,sequencing, delivery method, and other fea-tures of instruction and the tasks or activitiesin which students engage. In this chapter,we described instructional adaptations as amethod for increasing the accessibility ofinstruction for students with disabilities topromote student learning. We highlightedthe important distinctions between instruc-tional accommodations and instructionalmodifications and provided examples of howeach adaptation may be implemented in theclassroom. Moreover, we discussed possibleconsequences of assigning instructionaladaptations to students’ current and futureeducational programming and assessmentparticipation options. This information wasintended to provide special education serviceproviders, administrators, and policy makerswith a clear understanding of the role ofthese instructional adaptations for improvingaccessibility of educational environments.

It is important to note, however, thatinstructional adaptations should be viewed inthe larger context of educational programming

Page 161: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

7 Instructional Adaptations 145

Case example

Maria is a 15 year-old, female student, in the eighth grade. Maria repeated first grade because she struggled to learn basic concepts of reading. Maria continues to have difficulty with decoding and math computation, but she excels on comprehension and problem – solving tasks. Maria’s poor decoding skills often impact her work in other classes, as decoding difficulties make it hard for her to understand her assignments. In addition, her teachers report that she is often off task, doesn’t complete assignments, and is getting further behind in her studies. Maria works well with the teacher in the learning center, but has trouble following directions from other school staff. In addition, Maria has recently been identified as having a behavior disorder.

Possible adaptations to instruction

Differentiated instruction strategies

Provide opportunities for flexible grouping (whole class, small groups, pairs, etc.)

Explicitly state learning goals and objectives for each lesson

Support and incorporate background knowledge

Maintain clear and consistent classroom rules and expectations

Provide opportunities for student choice (materials, assignments, partners, etc.)

Accommodations

Peer-assisted learning during math problem-solving tasks; reading partner during independent reading tasks

Read aloud the directions and expectations for class assignments

Highlight key words or ideas

Provide notes or outlines for class lectures or presentations; allow student to audio-record presentations

Break large assignments into smaller tasks

Teach self-monitoring strategies for on-task behavior; scaffold with reminders

Modifications

Read aloud (for decoding/comprehension tasks)

Calculator (for computation tasks)

Allow student breaks with learning center teacher (as needed/requested)

Fig. 7.2 Case example illustrating integrated instructional supports

available to students with disabilities.Instructional adaptations can improve accessto grade-level content but must be consideredas an integral part of the students’ educationalopportunities. Instructional adaptations areneither a “silver bullet” that should be viewedas the only solution to improving access norshould they be cast aside as supplementalresources that are only used by the specialeducation professional.

References

Browder, D. M., Spooner, F., Wakeman, S., Trela, K., &Baker, J. (2006). Aligning instruction with academiccontent standards: Finding the link. Research and

Practice for Persons with Severe Disabilities, 31(4),309–321.

Christensen, L. L., Lazarus, S. S., Crone, M., &Thurlow, M. L. (2008). 2007 state policies on assess-ment participation and accommodations for studentswith disabilities (Synthesis Report 69). Minneapolis,MN: University of Minnesota, National Center onEducational Outcomes.

DeStefano, L., Shriner, J. G., & Lloyd, C. A. (2001).Teacher decision making in participation of studentswith disabilities in large-scale assessment. ExceptionalChildren, 68(1), 7–22.

Elliott, S. N., Kurz, A., Beddow, P., & Frey, J. (2009).Cognitive load theory: Instruction-based research withapplications for designing tests. Presentation at theannual convention of the National Association ofSchool Psychologists, Boston, MA.

Fuchs, D., Fuchs, L. S., Mathes, P. G., & Simmons,D. C. (1997). Peer-assisted learning strategies:Making classrooms more responsive to diversity.

Page 162: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

146 L.R. Ketterlin-Geller and E.M. Jamgochian

American Educational Research Journal, 34,174–206.

Fuchs, L. S., Fuchs, D., Hamlett, C. L., Phillips, N.B., & Karns, K. (1995). General educators’ special-ized adaptation for students with learning disabilities.Exceptional Children, 61, 440–459.

Gersten, R., Compton, D., Connor, C. M., Dimino,J., Santoro, L., Linan-Thompson, S., et al. (2008).Assisting students struggling with reading: Responseto Intervention and multi-tier intervention for read-ing in the primary grades. A practice guide(NCEE 2009-4045). Washington, DC: National Centerfor Education Evaluation and Regional Assistance,Institute of Education Sciences, U.S. Department ofEducation. Retrieved from http://ies.ed.gov/ncee/wwc/

Hall, T. (2002). Differentiated instruction. Wakefield,MA: National Center on Accessing the GeneralCurriculum. Retrieved on February 1, 2009from http://www.cast.org/publications/ncac/ncac_diffinstruc.html

Hollenbeck, K., Tindal, G., & Almond, P. (1998).Teachers’ knowledge of accommodations as a valid-ity issue in high-stakes testing. The Journal of SpecialEducation, 32(3), 175–183.

Ketterlin-Geller, L. R., Alonzo, J., Braun-Monegan, J., &Tindal, G. (2007). Recommendations for accommoda-tions: Implications of (in)consistency. Remedial andSpecial Education, 28(4), 194–206.

Krathwohl, D. R. (2002). A revision of Bloom’sTaxonomy: An overview. Theory into Practice, 42,212–218.

La Marca, Paul M. (2001). Alignment of standardsand assessments as an accountability criterion.Practical Assessment, Research & Evaluation,7(21). Retrieved on January 31, 2010 fromhttp://PAREonline.net/getvn.asp?v=7&n=21

Maccini, P., & Gagnon, J. C. (2006). Mathematics instruc-tional practices and assessment accommodations bysecondary special and general educators. ExceptionalChildren, 72(2), 217–234.

Mayer, R. E., & Moreno, R. (2003). Nine ways to reducecognitive load in multimedia learning. EducationalPsychologist, 38, 43–52.

National Center for Education Statistics. (2009). TheNation’s Report Card: Mathematics 2009 (NCES2010–451). Washington, DC: Institute of EducationSciences, U.S. Department of Education.

O’Connell, M., Lieberman, L., & Petersen, S. (2006).The use of tactile modeling and physical guidanceas instructional strategies in physical activity forchildren who are blind. Journal of Visual Impairment& Blindness, 100(8), 471–477.

Phillips, S. (1999, June). Needs, wants, access, success:The stone soup of accommodations. Presentation at the29th annual Council of Chief State School OfficersNational Conference on Large-Scale Assessment,Snowbird, UT.

Rose, D., & Dalton, B. (2009). Learning to read in thedigital age. Mind, Brain, and Education, 3(2), 74–83.

Thompson, S. J. (2005). An introduction to instruc-tional accommodations. Special Connections.Retrieved on May 1, 2006 from http://www.specialconnections.ku.edu/cgi-bin/cgiwrap/specconn/main.php?cat=instruction&section=main&subsection=ia/main

Thompson, S. J., Johnstone, C. J., Anderson, M. E.,& Miller, N. A. (2005). Considerations for thedevelopment and review of universally designedassessments (Technical Report 42). Minneapolis,MN: University of Minnesota, National Center onEducational Outcomes. Retrieved on November1, 2006 from http://education.umn.edu/NCEO/OnlinePubs/Technical42.htm

Tindal, G. (1998). Models for understanding task compa-rability in accommodated testing. Washington, D. C.:Council of Chief State School Officers.

Tomlinson, C. A. (1999). How to differentiate instructionin mixed-ability classrooms. Alexandria, VA: ASCD.

University of Kansas. (2005). Special Connections:Connecting Teachers to Strategies that Help Studentswith Special Needs Successfully Access the GeneralEducation Curriculum. Retrieved on February 1,2010 from http://www.specialconnections.ku.edu/cgi-bin/cgiwrap/specconn/index.php

U.S. Department of Education, Office of Elementary andSecondary Education. (2001). No Child Left Behind.Washington, DC: [AUTHOR].

U.S. Department of Education, Office of SpecialEducation and Rehabilitative Services. (2004).Individuals with Disabilities Education Act.Washington, DC: [AUTHOR].

Webb, N. L. (1999). Alignment of science and mathemat-ics standards and assessments in four states: Researchmonograph no. 18. Madison, WI: National Institute forScience Education, University of Wisconsin-Madison.

Wilkins, A. J., Lewis, E., Smith, F., Rowland, E., &Tweedie, W. (2001). Coloured overlays and their ben-efit for reading. Journal of Research in Reading, 24,41–64.

Ysseldyke, J., Thurlow, M., Bielinski, J., House, A.,Moody, M., & Haigh, J. (2001). The relationshipbetween instructional and assessment accommoda-tions in an inclusive state accountability system.Journal of Learning Disabilities, 34, 212–220.

Page 163: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

8Test-Taking Skills and Their Impacton Accessibility for All Students

Ryan J. Kettler, Jeffery P. Braden,and Peter A. Beddow

Do you remember the first time in school that apeer or teacher advised you to use some type ofstrategy, which seemed to have nothing to do withthe subject being tested, but would likely improveyour score? This advice, which may have beento answer all of the easy questions first and thengo back to the harder questions, to select falsewhenever the word always or the word neverappeared in a true or false question, or to choosethe longest answer of the options on a multiple-choice question, was ostensibly intended to helpyou attain a higher score on the test. Another out-come of this advice was that it increased yourwisdom in the domain of taking tests, so thatyou would be better able to show what youknew and could do in content areas like read-ing and mathematics. This chapter explains therelationship between access in educational test-ing, test-wiseness, and validity; provides a briefhistory of frameworks and findings related to test-taking skills; explores the interaction betweeninstruction of these skills and more recentlyembraced methods for increasing access to tests[e.g., accommodations, modifications, computer-based tests (CBTs)]; and discusses implicationsof this relationship for test development andinstruction.

R.J. Kettler (�)Department of Special Education, Peabody Collegeof Vanderbilt University, Nashville, TN 37067, USAe-mail: [email protected]

Access and Test-Wiseness

Access in the context of educational testing refersto the opportunity for a student to demonstratewhat she or he knows on a test. It is the oppor-tunity to determine how proficient the studentis on the target construct (e.g., reading, mathe-matics, science) and can be conceptualized as aninteraction between the individual and the test(Kettler, Elliott, & Beddow, 2009). While it isoften easier to think about access in terms ofthe barriers that block it (e.g., confusing visualson a reading test, high reading load on a math-ematics test, information spread across multiplepages on a test of anything other than work-ing memory), increased access allows tests to bemore reliable and have greater construct validity.Reliability is the consistency of scores yielded bya measure, and construct validity is the degree towhich the scores represent the characteristics ortraits that the test is designed to measure. Bothreliability and construct validity are higher whenthe proportion of variance within a set of testscores that is construct relevant is maximized.Construct-relevant variance is based on variancein the actual characteristic or trait being mea-sured. The remainder of the variance in test scoresis construct-irrelevant variance, which is vari-ance that emanates from other sources, includingthe aforementioned barriers to access. Reducingbarriers to access decreases construct-irrelevantvariance, thereby increasing the reliability andconstruct validity of the scores.

147S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4_8, © Springer Science+Business Media, LLC 2011

Page 164: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

148 R.J. Kettler et al.

One way to reduce barriers to access is todevelop tests or testing situations that minimizethe need for access skills. Access skills are thosewhich are necessary for a student to take a test butwhich are not a part of the construct being tested.If the format of a mathematics test includeslengthy word problems, the ability to read is anaccess skill for that test. Having the test readaloud to a student would remove reading abilityas an access skill, would decrease the variancebased on reading ability that would be reflectedin scores from the test, and would increase thereliability and construct validity of the test scores.

Another way to reduce barriers to access isto train students in access skills. This training isintended to decrease the degree to which studentsvary in their knowledge of access skills, thereforereducing the contribution of that variance to thetest scores. Test-wiseness is an access skill fortests designed to measure academic achievementin a variety of content areas, and teaching test-taking skills to increase this wiseness is intendedto reduce the differences among students on thisconstruct that are reflected in scores intended toindicate abilities in reading, mathematics, andso on.

Millman, Bishop, and Ebel (1965) define test-wiseness as a respondent’s “capacity to utilizethe characteristics and formats of the test and/ortest-taking situation to receive a higher score”(p. 707). Test-taking skills (or strategies) aretaught with the intent that they will improvetest-wiseness, resulting in higher scores, butmore importantly resulting in scores from whichmore valid inferences can be drawn. Withoutexplicit instruction, students naturally vary intheir degrees of test-wiseness. While some havea very high capacity to use the test and the test-ing situation to show what they know, others areconfused by the characteristics (e.g., timed ver-sus untimed, penalties for guessing versus nopenalties) and format of the test (e.g., multiplechoice versus constructed response, online versuspaper and pencil), and cannot understand how toshow what they know. Instruction in test-takingskills may be able to reduce this variance, or atleast reduce the degree to which this varianceaffects the test scores, by increasing test-wiseness

until most or all respondents have reached a the-oretical threshold necessary to access the test.Variance in the access skill above this thresh-old would not contribute to variance in the testscores. This threshold hypothesis requires furtherexamination.

Threshold Hypothesis

To clarify the aforementioned point about athreshold ability level for an access skill, wereturn to the example of reading as an access skillfor a mathematics test. A mathematics test withword problems written at a sixth-grade readinglevel and with sixth-grade mathematics contentwould be difficult to access for a student whoreads at a fourth-grade level. A very powerfulintervention that allows the student to read at aneighth-grade level would in turn allow the studentto better show what she knows on the mathe-matics test. However, a second intervention thatimproves the student’s reading ability from aneighth-grade level to a twelfth-grade level wouldnot further improve her score on the math test,because she would already be beyond the thresh-old ability level necessary to access the test. Ifthe student did not know how to solve the math-ematics problems, no increase in reading abil-ity beyond the threshold would help. Table 8.1depicts the relations between reading ability andthe construct being measured on the test in thishypothetical example.

Test-wiseness is theoretically analogous toreading ability from the example, in that trainingstudents beyond a certain threshold would reduceits influence over scores on an achievement test,and therefore increase the degree to which scoreson the test reflect the intended construct (e.g.,reading, math, science). However, there is insuf-ficient research to support this threshold hypoth-esis, and so it is not clear whether test-taking

Table 8.1 Relationship between reading ability and con-struct indicated by math test scores in example

Reading ability Construct indicated

Above threshold MathematicsBelow threshold Reading

Page 165: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

8 Test-Taking Skills and Accessibility 149

skill training would be viewed as “leveling theplaying field” for students with and without dis-abilities, or might actually increase gaps in testperformance between them. Some research usingstudents with learning disabilities (e.g., Dunn,1981) found that test-wiseness training moder-ately improved scores on reading tests and thatthose improvements were retained over a 2-weekperiod, but the design of the study was unableto establish whether training “leveled the playingfield” relative to students without disabilities.

Whether test-taking skills influence test per-formance uniformly across all levels of ability,or if there is a threshold after which skills nolonger improve performance, affects how oneviews teaching test-taking skills to students withdisabilities as a method of providing access. Forexample, eyeglasses are widely viewed as a fairaccommodation for individuals with visual acu-ity difficulties, because the eyeglasses provide athreshold of access equal to that enjoyed by stu-dents without visual acuity difficulties. This isbecause additional correction beyond the thresh-old needed to decode text does not provide anadvantage to users; put another way, studentswithout visual acuity problems would be wel-come to use eyeglasses – but would not benefitfrom their use. In contrast, extra time is a con-troversial accommodation because there is noclear threshold after which additional time doesnot enhance performance. Consequently, studentswith and without disabilities alike benefit fromextra time on a test. In fact, students without dis-abilities may benefit more from extra time thantheir peers with disabilities (Elliott & Marquart,2004).

The question of whether test-taking skillsfunction as a threshold or a continuous influenceon performance is relevant. If test-taking skillshave a threshold beyond which additional strat-egy application or training does not improve testperformance, training students to that level oftest-taking strategy proficiency would be viewedas both effective and fair. It would be effective,in that it would eliminate the influence of test-taking skills on scores for students with (andwithout) disabilities, thus improving the validityof the assessment for both groups. It would be

fair, in that students who already have test-takingskills would not benefit from additional trainingonce the threshold is reached, whereas those wholacked the skills would gain equal access throughtraining.

Conversely, if test-taking skills have a con-tinuous effect across the performance domain,the effectiveness and fairness of teaching test-taking strategies is called into question. First,teaching test-taking skills may not be effective, asstudents with more or better strategies may bene-fit more from training than students with feweror poorer strategies (i.e., training could widen,rather than decrease, test performance differ-ences between more and less test-wise students).Second, teaching test-taking strategies could beunethical, because there would be no clear pointat which the playing field was level for studentswith and without disabilities (e.g., would onewithhold test-taking skill instruction from somestudents but not others, and if so, how would onedecide at which point students should or shouldnot receive the training?).

A third possibility regarding test-taking skillstraining is that it might improve test scores forstudents with disabilities, but nonetheless pro-duce larger lost opportunity costs for studentswithout disabilities, compared to those with dis-abilities. In other words, the time taken to teachstudents with disabilities test-taking skills mayimprove their scores, but it may not improvethem as much as devoting the same amount oftime to instruction in test content. Some evidence(Ehrmann, 2001) suggests that is the case forstudents without disabilities – that is, trainingusing an established test-taking program low-ered scores on some subject matter tests, butit appeared to improve scores on other subjectmatter tests. Therefore, careful consideration ofinteractions between subject matter knowledge,test design, and test-taking skills should be con-sidered in deciding whether to provide test-takingskills training to any students – especially thosewith disabilities.

The literature on test-taking skills suggests theanswer to the threshold question is dependent onthree factors: the design of the test, the specifictest-taking skills, and the test-taker’s knowledge

Page 166: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

150 R.J. Kettler et al.

of tested material. In general, the better designedthe test, the less that performance is influencedby test-taking skills. For example, scores onmultiple-choice tests that use only plausible dis-tractors (incorrect response options), simple itemstems, and positive wording are generally resis-tant to test-taking skills; performance on tests thatdo not incorporate these strong design featuresis more susceptible to test-taking skills (Rogers& Harley, 1999; Scruggs & Mastroprieri, 1988).Although there is little direct research on thesubject, it is logical to infer that large-scale, pro-fessionally developed tests are more likely toincorporate strong design features, meaning theinfluence of test-taking skills would be limitedto approaches such as time management or scan-ning the test for easy items before tackling harderitems, and therefore would be more likely to havea threshold beyond which additional improve-ments in strategies would be unlikely to improvetest performance.

The threshold hypothesis has received limitedresearch but remains an important issue in theinstruction of test-taking skills. Next we reviewother important historical frameworks and find-ings in the area of test-taking skills and test-wiseness.

Frameworks and Findings

Millman et al. (1965) developed the most com-monly cited theoretical framework for empiricalstudies of test-wiseness. The authors character-ized their framework as narrow in that it didnot include factors related to mental or moti-vational state, and it was restricted to objectivetests of achievement and aptitude. There were nopublished empirical studies of test-wiseness priorto the development of this framework. Millmanet al. (1965) based their framework on principlesof test construction, advice for taking tests, andthe theory that test-wiseness was one source ofvariability in test scores that cannot be attributedto item content or random error.

Millman et al. (1965) divided their frame-work of test-wiseness principles into two maincategories. The first category, elements which

are independent of the test constructor or testpurpose, included four subdivisions: (a) time-using strategy, (b) error-avoidance strategy, (c)guessing strategy, and (d) deductive reasoningstrategy. The time-using strategies are specificto timed tests and are intended to avoid los-ing points because of poor use of time, ratherthan lack of knowledge of the tested content.The error-avoidance strategies are also focusedon avoiding the loss of points for reasons otherthan lack of knowledge, with the source in thiscase being carelessness. The guessing strategiesare intended to help respondents gain points forresponses made completely randomly, and thedeductive reasoning strategies are focused onhelping gain points when the respondent has partof the necessary knowledge, but does not knowthe correct answer. Table 8.2 lists Millman et al.’s(1965) elements which are independent of the testconstructor or test purpose.

Millman et al.’s (1965) second category, ele-ments dependent upon the test constructor orpurpose, includes strategies that require someknowledge of the specific test, constructor, or pur-pose. The two subdivisions of this category are(a) intent consideration strategy and (b) cue-usingstrategy. While the intent consideration strategiesallow the respondent to avoid being penalizedfor anything other than a lack of content knowl-edge, the cue-using strategies are focused onthe respondent gaining points when a specificanswer is not known. Table 8.3 lists Millmanet al.’s (1965) elements dependent upon the testconstructor or test purpose.

The effect of coaching on test performance hasreceived more attention than has test-wiseness.Bangert-Drowns, Kulik, and Kulik (1983) pub-lished a meta-analysis of 30 controlled studies ofthe effects of coaching programs on achievementtest results. Coaching programs were defined asthose in which respondents “are told how toanswer test questions and are given hints onhow to improve their test performance” (p. 573).The authors found an average effect size of0.25 in favor of respondents who were coachedversus respondents from a control group. Theaverage effect size for short programs (M =3.8 h) was 0.17, compared to an average effect

Page 167: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

8 Test-Taking Skills and Accessibility 151

Table 8.2 Millman et al.’s (1965) elements independent of test constructor or test purpose

A. Time-using strategy1. Bring to work as rapidly as possible with reasonable assurance or accuracy2. Set up a schedule for progress through the test3. Omit or guess at items (see I.C. and II.B.) which resist a quick response4. Mark omitted items, or items which could use further consideration, to assure easy relocation

5. Use time remaining after completion of the test to reconsider answers

B. Error-avoidance strategy1. Pay careful attention to directions, determining clearly the nature of the task and the intended basis for response2. Pay careful attention to the items, determining clearly the nature of the question3. Ask examiner for clarification when necessary, if it is permitted

4. Check all answers

C. Guessing strategy1. Always guess if right answers only are scored2. Always guess if the correction for guessing is less severe than a “correction for guessing” formula that gives an

expected score of zero for random responding

3. Always guess even if the usual correction or a more severe penalty for guessing is employed, wheneverelimination of options provides sufficient chance of profiting

D. Deductive reasoning strategy1. Eliminate options which are known to be incorrect and choose from among the remaining options2. Choose neither or both of two options which imply the correctness of each other3. Choose neither or one (but not both) of two statements, one of which, if correct, would imply the incorrectness

of the other4. Restrict choice to those options which encompass all of two or more given statements known to be correct5. Utilize relevant content information in other test items and options

Note: Adapted from Millman et al. (1965, pp. 711–712). Reprinted by Permission of SAGE Publications

Table 8.3 Millman et al.’s (1965) elements dependent on the test constructor or purpose

A. Intent consideration strategy1. Interpret and answer questions in view of previous idiosyncratic emphases of the test constructor or in view of

the test purpose2. Answer items as the test constructor intended3. Adopt the level of sophistication that is expected

4. Consider the relevance of specific detail

B. Cue-using strategy1. Recognize and make use of any consistent idiosyncrasies of the test constructor which distinguish the correct answer

from incorrect optionsa. She or he makes it longer (shorter) than the incorrect optionsb. She or he qualifies it more carefully, or makes it represent a higher degree of generalizationc. She or he includes more false (true) statementsd. She or he places it in certain physical positions among the options (such as in the middle)e. She or he places it in a certain logical position among an ordered set of options (such as the middle of the

sequence)f. She or he includes (does not include) it among similar statements, or makes (does not make) it one of a pair of

diametrically opposite statementsg. She or he composes (does not compose) it of familiar or stereotyped phraseologyh. She or he does not make it grammatically inconsistent with the stem

2. Consider the relevancy of specific detail when answering a given item3. Recognize and make use of specific determiners4. Recognize and make use of resemblances between the options and an aspect of the stem.

Note: Adapted from Millman et al. (1965, p. 712). Reprinted by Permission of SAGE Publications

Page 168: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

152 R.J. Kettler et al.

size of 0.43 for longer programs (M = 15.1 h).Bangert-Drowns et al. (1983) also found that alogarithmic relationship between length of coach-ing program and effect size was a better fit thanthe linear relationship. This finding indicated thatalthough longer programs were more effective,the returns would be diminished when addingtime to programs that were already fairly long.

Teaching of test-wiseness strategies wasviewed as a subset of coaching in Bangert-Drowns et al. (1983). No significant differenceswere found between the average effects of the21 studies that included a focus on test-wisenessand the 9 studies that did not. Elements that wereidentified as more important included drill andpractice on items and direct teaching of content.

The clear impact of coaching, which oftenincludes the teaching of test-taking skills, makesrelevant any concerns about the ethical impli-cations of spending classroom time attemptingto increase test-wiseness. One such concern isthat when the stakes related to a test are high,teachers are likely to focus on teaching to thetest, rather than teaching the broader content area(Mehrens, 1989). Mehrens and Kaminski (1989)identified a seven-point continuum between theideal of solely teaching content and the alterna-tive of focusing instruction entirely on improvingtest performance. Figure 8.1 depicts this testpreparation continuum.

Point 1 on Mehrens and Kaminski’s (1989)continuum, which involves giving generalinstruction on district objectives without refer-ring to the test, is always considered ethical.Teaching of test-taking skills is also typicallyconsidered ethical, and Mehrens (1989) indicatedthat doing so is worthwhile in that it takeslittle instructional time and can keep studentswho know the content from making errors. Theline between ethical and unethical practice istypically considered to be somewhere amongPoints 3 (providing instruction based on objec-tives a variety of tests measure), 4 (providinginstruction specifically on the tested objectives),and 5 (providing instruction on tested objectivesand in tested format). Points 6 and 7 refer topractice on parallel forms of a test or on thetest itself and are never considered to be ethical

Fig. 8.1 A depiction of Mehrens and Kaminski’s (1989)continuum of test preparation

practice. The danger in teaching to the test is thatit limits legitimate inferences from those that arebased on a broad domain to those that are basedon performance on a specific test, and it is rarethat a test is given for the sake of the latter formof inferences. Mehrens (1989) wrote that forthe level of inference typically desired, the linebetween ethical and unethical test preparation isbetween Point 3 and Point 4.

In a comprehensive review of the issues sur-rounding test preparation, Crocker (2005) identi-fied an increasingly mobile society and general-izability to real-world skills as reasons that test-taking skills have become increasingly important.With students moving across state and nationalborders to attend college or find employment, acommon metric such as achievement tests fromwhich valid inferences can be made is critical.Also, these tests have incorporated short essaysand performance tasks, and the access skills forthese types of items can be helpful in other situ-ations. Crocker (2005) indicates that test prepa-ration “encompasses not only study of content

Page 169: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

8 Test-Taking Skills and Accessibility 153

from the domain of knowledge sampled by theassessment, but also practicing the skills that willallow students to demonstrate their knowledge onvarious types of assessment exercises” (p. 161).The author recommends teaching for assessment,rather than teaching to the test, and identified fouressential elements of this practice: (a) challeng-ing core curriculum, (b) comprehensive instruc-tion in the curriculum, (c) developing test-takingskills, and (d) adherence to ethical guidelines.The challenging core curriculum includes all ofthe content that is intended to be learned, ratherthan the subset that is likely to be sampled on theassessment. Comprehensive instructional prac-tices involve spending most of the time teachingthe content and trusting that student achievementwill be reflected by the assessment. Developingtest-taking skills includes many of the strategiesidentified by Millman et al. (1965), incorporatedinto year-long classroom instruction.

Crocker (2005) outlined four broad criteria formeeting the fourth essential element of teachingfor assessment: adherence to ethical guidelines.The first of these criteria is academic ethics, theidea that test preparation should be consistentwith the ethics of education, including that cheat-ing and stealing the work of others is inappro-priate. The second criterion is validity, indicatingthat students should be able to show what theyknow and are able to do through the test. Thethird of these criteria is transferability, the ideathat test preparation should be in those domainsthat are applicable to a broad range of tests. Thefourth criterion is educational value, meaning thattest preparation that leads to increases in scoresshould also lead to increases in student mas-tery of the content. A test preparation practicethat meets these four criteria is likely to be anappropriate method of improving test-wisenessand assessment.

Following the suggestions of Crocker (2005),as well as the strategies, frameworks, and find-ings that influenced this work, provides studentswith greater access to show their knowledgein a testing situation. Next we examine howtest-wiseness and test-taking skills interact withother methods of improving access to appropriateassessment.

Test-Taking Skills and Other Methodsof Increasing Access

Since the passage of the Individuals withDisabilities Education Act (IDEA) in 1997, man-dating that all students be included to the greatestextent possible within large-scale accountabil-ity systems, assessment accommodations havebeen the most common method for helping stu-dents with disabilities overcome access barriersinherent in achievement tests. Assessment accom-modations are changes in the standard assessmentprocess made because an individual’s disabilityrequires changes for the test to be a measure fromwhich valid inferences can be drawn (Elliott,Kratochwill, & Schulte, 1999; Kettler & Elliott,2010). More recently the final regulations of theNo Child Left Behind (NCLB) Act of 2001 haveprovided for changes to the content of a test,made for a group of persistently low-performingstudents with disabilities, rather than on an indi-vidual basis (U.S. Department of Education,2007a, 2007b). Modifications are the result of “aprocess by which a test developer starts with apool of existing test items with known psychome-tric properties, and makes changes to the items,creating a new test with enhanced accessibilityfor the target population” (Kettler et al., in press).Both assessment accommodations and modifica-tions can interact with test-taking skills in waysthat should be considered by test developers.

Test-Taking Skills and Accommodations

Although there exists no research addressinginteractions between test-wiseness and assess-ment accommodations, it would be wise toanticipate whether some common accommoda-tions might have the unintended consequence ofinhibiting appropriate use of test-taking skills.For example, the accommodation of extra timeis intended to give respondents who workslowly the time they need to access test con-tent. However, additional time might inadver-tently encourage use of dysfunctional strategies(e.g., excessive correction or second-guessing

Page 170: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

154 R.J. Kettler et al.

of answers), or inhibit use of functional strate-gies (e.g., students might not manage their timewell because they lack a sense of urgencyor they fail to deploy strategies because theybecome physically fatigued). Likewise, changesin test item presentation (e.g., large print, read-ing test items aloud to students, magnification)may encourage linear progression through thetest and may discourage adaptive strategies, suchas scanning the test for easy items, skippinghard problems, or returning to parts of the testthat were skipped. Likewise, having a mathe-matics or science test read aloud to a studentmight discourage spending more time on moredifficult sections of a passage, underlining themost important parts, or returning to the pas-sage to find an answer. In these ways, assessmentaccommodations may interact with test-takingskills; therefore, test-wiseness training shouldbe deployed, practiced, and evaluated well inadvance of the actual assessment session so thatunintended consequences can be recognized andcorrected.

Test-Taking Skills and Modifications

Many common modifications are designed toreduce the impact of variance in test-taking skillson variance reflected in scores from achievementtests. For example, Millman et al. (1965) sug-gested eliminating any options which are knownto be incorrect, prior to answering a multiple-choice question. One common modification tech-nique is elimination of the least-plausible distrac-tor, so that the total number of answer choicestypically changes from four to three. Because thedistractor that is removed via this modificationis usually one that is eliminated by most wisetest-takers, the field is leveled by automaticallyremoving that option for all students. Anotherskill suggested by Millman and colleagues is tolook for the correct option to be longer or shorterthan the incorrect options. A common modifica-tion strategy of making all distractors similar inlength nullifies the advantage created by this test-taking skill. Ultimately, scores from a test that isperfectly accessible will not contain any variance

based on differences in test-wiseness. Tests inthe future may be closer to this ideal and arealso more likely to be delivered via computer;implications of this latter shift are discussednext.

Computer-Based Test-Wiseness

Since the inception of digital scoring in the1930s, CBTs have become increasingly ubiq-uitous in the arena of student assessment. In2009, Tucker reported over half of U.S. statesused computers to deliver some portion of theirfederally mandated tests. Despite this upwardtrend, the majority of current CBTs remain meretranspositions of paper-and-pencil tests (PPTs).Most CBTs consist of item sets that look andfunction the same as those used in their PPTforebears, with the overlay of an interface thatpermits test-takers to use a mouse or keyboardto select responses. In these cases, the advan-tages of the computer over the traditional testbooklet primarily benefit the test administrators.This is not to say that computers cannot or willnot be used to improve testing in other ways,such as to reduce test length and increase scoreaccuracy; indeed, as test developers utilize inno-vations in computer technology and psychomet-rics, the accuracy and efficiency of delivery andscoring likely will increase. Presently, however,in the arena of large-scale and/or commercialtesting programs, little has changed in terms ofitem and test presentation, and most of the tra-ditions of PPT have been carried forward intheir CBT descendants. Thus, on the surface itappears little difference exists between the influ-ence of test-taking skill use on PPTs comparedto CBTs.

To the contrary, while the threat to inferen-tial validity from the kinds of test-taking skillsdiscussed elsewhere in this chapter may remainunchanged with the migration to CBTs, otheraspects of CBTs may in fact transmute the prob-lem to another theater. In Chapter 15, Russell dis-cusses the many potential sources of error intro-duced by computer tests due to access issues thatmay not exist in other types of tests, issues that

Page 171: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

8 Test-Taking Skills and Accessibility 155

likely are particularly problematic for studentswith special needs. Simultaneously, he notesthat certain innovations in CBTs offer opportu-nities for addressing access concerns that havelong been problematic for users of PPTs. Hedescribes recent CBT interfaces that successfullyreduce the demand for scribes, readers, and otherhuman testing accommodation delivery agents,interfaces that are preferred by many test-takersover their traditional alternatives. Specifically, herefers to test alterations made permissible bycomputers. These alterations include (a) alteredpresentations (e.g., changing font size, reducingthe amount of content presented on a page), (b)altered interactions (e.g., pacing, content mask-ing, scaffolding), (c) altered response modes(speech-to-text interfaces, touch screens), and (d)altered representations (tactile representations ofvisuals, translations into different languages).

These access tools are in no way includedwith the intention of permitting test-takers toincrease their scores via strategy use. However,to the degree test access tools (e.g., those rec-ommended by Russell) find their way into CBTs,the issue of test-wiseness must be translated to anew playing field. The tools that Russell recom-mends, particularly those that may be integratedinto test interfaces and made universally avail-able to the entirety of the test-taker population,potentially offer a new set of “opportunities” fortest-wiseness to cause construct-irrelevant varia-tion in test scores. Indeed, test-takers who learnto use these tools appropriately and efficientlymay have an advantage over those who fail tolearn them or those who use them improperly orinefficiently.

Commensurate with the putative accessibil-ity problems introduced by the migration ofassessment to CBTs, the potential exists for mea-surement error to increase as a result of effortsto promote solutions to these problems. Indeed,collinearity between access tool use and test-taker demonstration of construct-relevant skillsmay obscure the meanings of test scores andthreaten the validity of score inferences. It is criti-cal that efforts to eliminate access barriers for onegroup of test-takers do not inadvertently createadvantages for others.

Suggestions for Developersof Computer-Based Tests

Developers of CBTs should ensure, to the extentpossible, test-delivery systems – including inte-grated access tools – are optimally accessible forall test-takers. To this end, developers should con-sider applying the following recommendationsfrom Schneiderman’s (1997) work on effectiveinterface design: (a) strive for consistency, (b)cater to universal usability, (c) offer informa-tive feedback, (d) design dialogs to yield closure,(e) prevent errors, (f) permit easy reversal ofactions, (g) support internal locus of control, and(h) reduce short-term memory load. The inte-gration of these principles into the design oftest interfaces, with particular attention to fea-tures designed to reduce specific access barriers,likely will result in accessible test events for morestudents (see Chapter 9, this volume).

Even after integrating these principles into thetest-delivery system, however, there likely willremain some test-takers for whom a user inter-face that integrates best practices of usabilityand user-friendliness will create cognitive over-load based on the unfamiliarity of the interfacealone. Test developers, therefore, are advised tobuild interface training modules into each assess-ment system to reduce the potential for threatsto validity based on variation across the popula-tion in interface expertise. These modules, whichcan either be delivered prior to the test eventas part of the general curriculum or included inthe test event itself, should serve two purposes.Specifically, they should (a) equip all test-takerswith the expertise to use the interface at the high-est level possible and (b) assess the expertise ofeach test-taker in the use of the interface.

There are potential problems that arise withthe need to incorporate a learning activity into atest. Among these is the fact that many of the test-takers for whom such an activity is necessary tendto perform poorly on tests (and often are accus-tomed to school failure). For these students, testanxiety is an important concern. The addition ofan instructional task to a test event that alreadyprovokes negative emotions, in a potentially high-pressure environment, is a decision that should be

Page 172: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

156 R.J. Kettler et al.

approached delicately and with ample attention tothe emotional and academic needs of this targetstudent population.

A Lesson from Video Games

In the arena of academic learning, someeducators have argued video games are theharbingers of effective computer-based learningtools. Curiously, the notion that video game tech-nologies could offer solutions to learning chal-lenges is plausible, for several reasons. First, dueto the ubiquity of video games (e.g., computergames, Sony Playstation, and Nintendo) there isa high probability most students will have somemeasure of experience with them. Second, theincreasing complexity of gameplay mechanics inrecent years has compelled developers to inte-grate mechanics that iteratively train the playerto use the features of the game.

Much research has been conducted on thedevelopment and use of computer games forinstruction (e.g., Gee, 2005; Becker, 2006;Blumberg, 2008). The similarities betweeninstruction and assessment, however, suggestmany of the strategies utilized in effective learn-ing games can and should be applied to CBTs.One example is chain gameplay, the simple,repeatable process of subtly increasing difficultythat is found in many video games (Sivak, 2009).Sivak explained chain gameplay as follows:

A gameplay chain is any set of interlockingmechanics that must be done together in orderto achieve a goal. . .[an idea] somewhat similarto Ian Bogost’s theory of Unit Operations. . .Eachbasic mechanic is used as a building block andseamlessly connects to the next mechanic cre-ating a complex gameplay structure that is farmore interesting than the sum of the individuallinks. . .[helping] to initiate a state of flow for theplayer. Flow. . .is the feeling of total immersion inan activity (p. 284).

In the context of CBTs that utilize integratedaccess tool interfaces, chain mechanics may beused as part of training modules to ensure test-takers, like gameplayers, are equipped with thenecessary expertise to utilize the tools with amodicum of cognition.

In Curry’s (2009) chapter on the design fea-tures that comprised the classic 1985 Nintendovideo game Super Mario Bros. and that havecontributed to its enduring popularity amongvideo game critics, the author detailed a num-ber of interface aspects that may prove useful forthe development of assessment training modules.Specifically, Curry described four key ingredientsthat led to Mario’s success as a game. First, thegame was instantly accessible. Curry noted that“when someone sits down with a new game, beit a board game, a word game in the newspa-per, or videogame, his first question is always,‘ok, what am I trying to do here?’”(p. 14). Hewrote that Super Mario Bros. immediately madethe user aware of the objective of the game simplyby eliminating the ability of the user to interactwith any elements that did not contribute to theobjective (which, in Mario’s case, was to run tothe right.) In the context of assessment training,the user should be presented with a single but-ton, or tool, with limited choices, such that theuser invariably accomplishes the first goal andexperiences firsthand the primary objective of thetest (e.g., to finish the test.) Second, the gamewas easy to control. It had clear rules that, whilelimiting the user’s freedom, offered enough vari-ety so as to avoid monotony and gave the user asense of empowerment to make his or her way tothe objective. Third, the game was challenging atan appropriate threshold. It utilized a mechanicsimilar to Sivak’s gameplay chain insofar as theinterface equipped the user with initial skills,followed by more emergent abilities to master.Fourth, the game was overflowing with rewards.Each advancement to a new level or the equip-ment with a new tool or skill was accompanied byvisceral audio and video that served as rewards.

While these design features may seem a farcry from current CBTs, they should not be disre-garded as trivial priorities. Indeed, as the applica-tion of universal design principles to assessmentsystems results in the inclusion of access toolsto accommodate a wide variety of student needs,there is increased likelihood that computer inter-faces will become more complex. It is critical thatcomputer interfaces with vast libraries of “help-ful tools” do not result in disparities between

Page 173: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

8 Test-Taking Skills and Accessibility 157

students who have expertise in the use of theseinterfaces and those who do not. If some stu-dents are able to increase their scores relativeto other students on the basis of their strate-gic use of access tools that are inaccessible toother students, their resulting score inferenceswill reflect an indistinguishable combination ofthe level of the target construct they possess andtheir expertise with the test system.

Practical Implications

The theory and findings on test-wiseness pre-sented have numerous implications for develop-ers and users of tests. The following are fivelessons about test-taking skills and accessibility:

Design Assessments to MinimizeTest-Taking Skills

Educators who employ sound principles ofassessment design tests that minimize the influ-ence of test-taking skills. In other words, ratherthan try to make test-takers less vulnerableto poor test design, we recommend improvingassessments to reduce the construct-irrelevantvariance introduced by differences in test-takingskills among respondents (see Rogers & Harley,1999; Scruggs & Mastroprieri, 1988, on howto design tests to reduce the influence of test-wiseness on test scores). This recommendation isextended to producers and users of CBTs, whoneed to train test-takers to the point that they arecompetent with the test’s interface, so that dif-ferences in this familiarity are not reflected inthe test scores. As CBTs become more sophis-ticated and commonplace in large-scale assess-ment, lessons can also be taken from successfulvideo game design.

Evaluate the Influence of Test-TakingStrategies on a Case-By-Case Basis

Although teachers are good judges of relativestudent performance (i.e., knowing which stu-dents will do better or worse on tests), they are

less effective when predicting test performancefor students with disabilities than for peers with-out disabilities, and they are less effective whenpredicting student performance on standardizedversus teacher-made tests (Hurwitz, Elliott, &Braden, 2007). Given that the effectiveness oftest-wiseness training interacts with students’knowledge, test-taking skills, and test design, webelieve it will be extremely difficult to antici-pate the influence of test-wiseness training onthe scores of students with and without disabil-ities. Therefore, we advocate single-case designsto determine whether a specific training programwill work with a given student for a partic-ular type of test. We further recommend thatteachers vary at least two instructional interven-tions – one aimed at test-wiseness and the otheraimed at the content knowledge being tested –to evaluate the relative “lost opportunity costs”of devoting instruction to test-wiseness versusinstruction of academic content. If there is aconsistent difference favoring the test-wisenesstraining over subject matter instruction, then test-wiseness training should be recommended; other-wise, instruction in academic content would offerfewer ethical issues and produce at least equaleffects.

Spend a Lot of Time Teaching Contentand a Little Time Teaching Test-TakingSkills

Teaching test-taking skills should not be con-troversial if the training only takes a little classtime, because it can help students avoid makingmistakes that might affect their scores but notaccurately reflect their achievement. Researchindicates that direct instruction of content,along with drilling and practice testing, resultsin greater effect sizes compared to instructionin test-taking skills (Bangert-Drowns et al.,1983). These findings are reflected by the focusof the countless test preparation sites on theworld wide web (e.g., www.usatestprep.com,www.studyzone.org, www.studyisland.com),which focus on practice tests and methods ofdrilling content, along with minimal reference

Page 174: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

158 R.J. Kettler et al.

to test-taking skills. While spending 15–20 minonce per year teaching test-taking skills isnot likely to cause concern, including isolatedinstruction on test-taking skills repeatedly inone’s lesson plan at the expense of grade-levelcontent is not an appropriate strategy, and maywiden the achievement gap between studentswithout disabilities and students with disabilities.

Consider Interaction with OtherMethods of Increasing Access to Tests

The applicability of test-taking skills to a testwill vary in great degree based on how acces-sible its items are. Modifications to a test mayenhance it so that the test-taking skills that onewould emphasize are no longer relevant. Also,the assessment accommodations that a studentuses should be considered along with the test-taking skills, because some of the skills maybecome counterproductive when combined withaccommodations.

Teach for the Assessment

We recommend following Crocker’s (2005) sug-gestion of teaching for the assessment, whichincludes an emphasis on instruction linked tocontent standards beyond the subset that isassessed, with faith in the assessment to accu-rately measure what students know and can do.This policy includes some instruction in test-taking skills, but ultimately prepares students forsuccess in a number of different settings, ratherthan just the specific testing situation.

ConclusionsEvery test has a format, including a deliv-ery system, directions, method of respond-ing, and other features, that if working opti-mally should provide respondents access toshow what they know or can do relatedto the construct of interest. Test developersshould make any and all modifications to atest that will reduce construct-irrelevant vari-ance – connected to comfort with the test’s

format – from being included in variancein the final test score. Beyond that commit-ment, training students in test-taking skills thatwill bring all respondents above a thresholdnecessary to remove the construct-irrelevantvariance is also justifiable, so long as takingthe time to do so does not remove students’opportunity to learn on the construct of inter-est. The positive impact of teaching test-takingskills appears to be small on a group basis,but it can be meaningful for individuals whomight make mistakes on tests that are notrelated to their achievement on the intendedconstruct. It is for this reason that limitedtraining in test-taking skills can ethically beincluded within a larger plan of teaching foran assessment and increasing access for allstudents.

References

Bangert-Drowns, R. L., Kulik, J. A., & Kulik, C.-L. C.(1983). Effect of coaching programs on achievementtest performance. Review of Educational Research, 53,571–585.

Becker, K. (2006). Video game pedagogy: Good games=good pedagogy. Journal of Design Research, 5,1–47.

Blumberg, F. C., Rosenthal, S. F., & Randall, J. D. (2008).Impasse-driven learning in the context of video games.Computers in Human Behavior, 24, 1530–1541.

Crocker, L. (2005). Teaching for the test: How andwhy test preparation is appropriate. In R. P. Phelps(Ed.), Defending standardized testing (pp. 159–174).Mahwah, NJ: Lawrence Erlbaum Associates.

Curry, P. (2009). Everything I know about game design Ilearned from super Mario Bros. In D. Davidson (Ed.),Well played 1.0: Video games, value and meaning (pp.13–36). Halifax: ETC Press. Retrieved May 8, 2010,from http://etc.cmu.edu/etcpress/

Dunn, A. E. (1981). An investigation of the effects ofteaching test-taking skills to secondary learning dis-abled students in the Montgomery County (Maryland)public schools learning centers. Unpublished doc-toral dissertation, George Washington University,Washington, DC.

Ehrmann, C. N. (2001). The effect of test preparationon students’ Terranova scores. Unpublished mas-ter’s thesis, University of Wisconsin – Madison,Madison, WI.

Elliott, S. N., Kratochwill, T. K., & Schulte, A.G. (1999). The assessment accommodations check-list: Facilitating decisions and documentation in the

Page 175: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

8 Test-Taking Skills and Accessibility 159

assessment of students with disabilities. TeachingExceptional Children, Nov/Dec, 10–14.

Elliott, S. N., & Marquart, A. (2004). Extended time asa testing accommodation: Its effects and perceivedconsequences. Exceptional Children, 70(3), 349–367.

Gee, J. P. (2005). Learning by design: Good video gamesas learning machines. E-Learning, 2, 5–16.

Hurwitz, J. T., Elliott, S. N., & Braden, J. P. (2007).The influence of test familiarity and student dis-ability status upon teachers’ judgments of students’test performance. School Psychology Quarterly, 22(2),115–144.

Individuals with Disabilities Education Act Amendments,20 U.S.C. §1400 et seq. (1997).

Kettler, R. J., & Elliott, S. N. (2010). Assessment accom-modations for children with special needs. In B.McGaw, E. Baker, & P. Peterson (Eds.), Internationalencyclopedia of education (pp. 530–536) (3rd Ed.).Oxford, UK: Elsevier.

Kettler, R. J., Elliott, S. N., & Beddow, P. A.(2009). Modifying achievement test items: A theory-guided and data-based approach for better mea-surement of what students with disabilities know.Peabody Journal of Education, 84(4), 529–551.doi:10.1080/01619560903240996

Kettler, R. J., Rodriguez, M. R., Bolt, D. M., Elliott,S. N., Beddow, P. A., & Kurz, A. (in press).Modified multiple-choice items for alternate assess-ments: Reliability, difficulty, and differential boost.Applied Measurement in Education.

Mehrens W. A. (1989). Preparing students totake standardized achievement tests. PracticalAssessment, Research & Evaluation, 1(11).Retrieved May 28, 2010, from http://PAREonline.net/getvn.asp?v=1&n=11

Mehrens, W. A., & Kaminski, J. (1989). Methods forimproving standardized test scores: Fruitful, fruitlessor fraudulent? Educational Measurement: Issues andPractices, 8(1), 14–22.

Millman, J., Bishop, H., & Ebel, R. (1965). An analy-sis of test-wiseness. Educational and PsychologicalMeasurement, 25(3), 707–726.

Nintendo of America, Inc. (1985). Super Mario Bros.(Video game).

No Child Left Behind Act, 20 U.S.C. §16301 et seq.(2001).

Rogers, W., & Harley, D. (1999). An empirical com-parison of three- and four-choice items and tests:Susceptibility to testwiseness and internal con-sistency reliability. Educational and PsychologicalMeasurement, 59(2), 234–247.

Scruggs, T., & Mastropieri, M. (1988). Are learn-ing disabled students ‘test-wise’? A review ofrecent research. Learning Disabilities Focus, 3(2),87–97.

Shneiderman, B. (1997). Designing the user interface:Strategies for effective human-computer interaction.Boston: Addison-Wesley Longman.

Sivak, S. (2009). Each link is a chain in a journey: Ananalysis of the legend of zelda: Ocarina of time. In D.Davidson (Ed.), Well played 1.0: Video games, valueand meaning. (pp. 283–314). ETC Press. RetrievedMay 8, 2010, from http://etc.cmu.edu/etcpress/

U.S. Department of Education. (2007a, April). ModifiedAcademic Achievement Standards: Non-RegulatoryGuidance. Washington, DC: Author.

U.S. Department of Education. (2007b, revised July).Standards and assessments peer review Guidance.Washington, DC: Author.

Page 176: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy
Page 177: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

Part III

Test Design and Innovative Practices

Page 178: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy
Page 179: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

9Accessibility Theory: Guiding theScience and Practice of Test ItemDesign with the Test-Taker in Mind

Peter A. Beddow, Alexander Kurz,and Jennifer R. Frey

Test accessibility is defined as the extent towhich a test and its constituent item set permitthe test-taker to demonstrate his or her knowl-edge of the target construct (Beddow, Elliott, &Kettler, 2009). The principles of accessibility the-ory (Beddow, in press) suggest the measurementof achievement involves a multiplicity of inter-actions between test-taker characteristics andfeatures of the test itself. Beddow argued achieve-ment test results are valid to the degree thetest event controls these interactions and yieldsscores from which inferences reflect the amountof the target construct possessed by the test-taker. Test score inferences typically are based onthe assumption that the test event was optimallyaccessible; therefore, the validity of an achieve-ment test result depends both on the precisionof the test score and the accuracy of subsequentinferences about the test-taker’s knowledge of thetested content after accounting for the influenceof any access barriers. In essence, the accessibil-ity of a test event is proportional to the validity oftest results.

This chapter contains three primary sec-tions. In the first section, we describe the cur-rent theoretical state of assessment accessibility.Specifically, we (a) describe the recent focuson universal design (UD; Mace, 1991; Rose &

P.A. Beddow (�)Department of Special Education, Peabody College ofVanderbilt University, Nashville, TN 37067, USAe-mail: [email protected]

Meyer, 2002) in education and briefly surveythe broad effort to apply universal design prin-ciples to educational assessment and (b) intro-duce accessibility theory (Beddow, in press) anddescribe how it advances the current researchand theory, focusing on the influence of Chandlerand Sweller’s (1991) cognitive load theory (CLT)and its relevance to the development of accessi-ble tests. In the second section of this chapter,we describe a comprehensive decision-makinginstrument for evaluating and quantifying theaccessibility of test items and applying modi-fications to improve their accessibility, demon-strating an iterative strategy for modifying testitems and defining the various theoretical princi-ples that undergird the process. We conclude withan examination of the relevance of accessibilitytheory across the educational environment.

Universal Design: The Endor the Beginning?

When test-taker characteristics interact with thetest such that access barriers are presented, issuesof fairness are raised and ultimately the valid-ity of decisions is impacted. Thus, it is criticalstudent characteristics are considered through-out the test design process and test develop-ers strive to integrate test features that promoteaccessibility during all stages of development.Principles of universal design (UD; Mace, 1997)provide a broad framework to conceptualize theseconsiderations.

163S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4_9, © Springer Science+Business Media, LLC 2011

Page 180: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

164 P.A. Beddow et al.

In its initial conception, UD was appliedwithin an architectural design context and wasconceptualized as designing all buildings andproducts to be usable by as many people aspossible to reduce the need for adaptations or spe-cialized designs for specific populations (Mace,Hardie, & Place, 1996; Mace, 1997). Within theUD framework, products, buildings, and environ-ments should be designed in such a way that theyare useful and accessible to people with differentabilities and interests (Mace, 1997).

Federal legislation (e.g., The FairHousing Amendments Act of 1988; TheAmericans with Disabilities Act of 1990; TheTelecommunications Act of 1996; The AssistiveTechnology Act of 2004) has required programsand public facilities to be accessible to allindividuals. The principles of UD state thatdesigns should (a) be useful, appealing, andequitable to people with different abilities; (b)accommodate a wide range of individual pref-erences and abilities; (c) be easy to understand;(d) communicate necessary information throughmultiple modes (i.e., visual, verbal, tactile); (e)promote efficient and easy use; and (f) provideappropriate size and space (Mace, 1997). In2004, with the reauthorization of the Individualswith Disabilities Education Act, the principlesof UD were extended to the development andadministration of educational assessments.

Based on the principles of UD, assessmentsshould be developed and designed to allow awide range of students, with varying abilitiesand disabilities, to participate in the same assess-ment. Universally designed assessments shouldeliminate access barriers to the test and allowfor more valid inferences about a student’s per-formance. Thompson, Johnston, and Thurlow(2002) proposed seven elements of universallydesigned assessments: inclusive assessment pop-ulation, clearly defined test constructs, non-biased test items, open to accommodations, cleardirections/procedures, maximally readable andunderstandable, and maximally legible. By incor-porating these elements into the development anddesign of an assessment, they hypothesized thetest should be accessible to as many studentsas possible. Universally designed tests should

reduce the need for accommodations, but theydo not eliminate the need for testing accom-modations, as universally designed tests are notnecessarily accessible to all students (Thompson,Johnstone, Anderson, & Miller, 2005).

Accessibility Theory: The Test-Taker,Not the Universe

The expectation that tests be universally designedis insufficient insofar as the broad charge byUD proponents to integrate universal accesstools into assessment instruments lacks, to alarge extent, a theoretical foundation with clearrelevance to measurement. Figure 9.1 consistsof Beddow’s (in press) model of test acces-sibility, which the author has referred to asaccessibility theory. The left side of the modelrepresents the test event – that is, the sumof interactions between the test-taker and thetest. The first column in the test event con-sists of the skills and abilities possessed –or at least required – by the test-taker during thetest event. Each has a parallel in one or more fea-tures in a test or test item (the second column.)The right side of the model illustrates the pre-sumed causal pathway among these interactionsand the measurement of the target construct, theresulting test score, inferences about the amountof the target construct possessed by the test-taker, and subsequent decisions based on thesetest score inferences.

Accessibility theory defines five domainsof test event interactions: physical, percep-tive, receptive, emotive, and cognitive (see alsoKetterlin-Geller, 2008, who similarly proposedfour interaction categories, including cognitive,sensory, physical, and language). Depending onthe design features of a particular test, each ofthese interactions may require the test-taker todemonstrate an access skill in addition to – orprior to – demonstrating knowledge of the tar-get construct of the test or test item. Tests thatdemand test-takers possess certain access skills(or even have certain characteristics) to demon-strate the target construct, irrespective of theportion of the test-taker population that possesses

Page 181: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

9 Accessibility Theory 165

Fig. 9.1 Test accessibility theory. From Beddow (in press). (Reprinted with permission)

these skills or characteristics, inadvertently mea-sure them. Thus, the design objective should notbe to create tests that are designed to be univer-sally accessible; rather, the goal is to ensure testsare accessible to the specific test-takers for whomthe test is intended to be used. This is why it iscritical for test developers to have familiarity withthe target population of the test.

For instance, Russell (Chapter 15) addressesthe accessibility issue of computer-based tests fortest-takers who are deaf or have impaired hearing.The access barrier for these test-takers is causedby the perceptive interaction between the audiofeatures of the test and their limited ability to hearthe elements of the test that are presented throughaudio. In addition to the demand for the test-taker to perceive sound, the audio features of thetest also generate a receptive interaction duringthe test event whereby the test-taker must com-prehend information via audio. Russell describesa test-delivery system that presents these test-takers with avatars that present the same infor-mation in sign language. To the degree this

accessibility tool reduces the influence of hear-ing ability on the subsequent test scores of deaftest-takers, the interactive effect of this accessbarrier is reduced and that particular feature ofthe test is equally accessible for those test-takerscompared to test-takers whose hearing is notimpaired.

These skills, then, represent ancillary requisiteconstructs (ARCs); that is, in addition to mea-suring the target construct, the test also measuresconstructs that may obstruct the stated objectiveof the test. This is true for the entirety of thetest-taker population. Of course, if these ARCs donot present access barriers for any test-taker, theireffects are nil and the validity of test score infer-ences remains uncompromised. Thus, the goalof accessible test design is to control all inter-actions (i.e., to ensure the total effect of ARCsdoes not influence the test score of any test-taker) such that, barring any threats to validityapart from accessibility (e.g., low reliability oftest scores), the test measures only the targetconstruct.

Page 182: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

166 P.A. Beddow et al.

Cognitive Load Theory: From Teachingto Testing

Of the interaction domains defined in accessibil-ity theory, arguably the one that warrants the mostattention in the design of accessible tests andtest items is the cognitive domain. In the frame-work of accessibility theory, we discuss cognitiveinteractions in terms of limited test-taker workingmemory (i.e., the immediate availability of use-ful cognitive resources; Baddeley, 2003; Miller,1956) and the cognitive demands of the test. CLT(Chandler & Sweller, 1991) provides an intuitivemodel for understanding these interactions andadjusting test features to control them. Sweller(2010), like most cognitive scientists, disaggre-gates memory (i.e., cognitive functioning) intolong-term memory and working memory. Thisconceptualization will facilitate our understand-ing of the various cognitive interactions that mayoccur during the test event.

CLT primarily has been used to developinstructional tasks to facilitate efficient learning(e.g., Chandler & Sweller, 1991; Clark, Nguyen,& Sweller, 2006; Mayer & Moreno, 2003; Plass,Moreno, & Brunken, 2010). Theorists discusscognitive load in terms of three categories: intrin-sic (i.e., essential cognition for engaging and/orcompleting the task), germane (i.e., cognitionthat facilitates the transfer of information intolong-term memory), and extraneous (i.e., requi-site cognitive demand that is irrelevant to thetask.) In spite of its lack of application to edu-cational measurement, CLT offers an array ofevidence-based strategies that can be applied totesting (e.g., Kettler, Elliott, & Beddow, 2009;Elliott et al., 2010).

Proponents of CLT recommend designers ofinstructional materials aim to eliminate extrane-ous load while maximizing intrinsic load anddeliberately managing germane load to enhanceknowledge acquisition. This ensures the learneris permitted to allocate his or her cognitiveresources to the primary objectives of the task(Sweller, 2010). In testing, the inclusion of extra-neous and/or construct irrelevant demands mustbe addressed at both the test and item levels to

ensure the test yields scores that represent, tothe greatest extent possible, a measure of the tar-get construct that is free from the influence ofancillary interactions due to access barriers. Tothis end, CLT offers a useful lens through whichto evaluate and modify tests and their respectiveitems to increase test-taker access to the targetconstruct.

Clark et al. (2006) described a set of 29 prin-ciples for facilitating efficient learning accordingto CLT. Many of these can be directly appliedto the design of accessible paper-pencil tests orcomputer-based tests (Elliott, Kurz, Beddow, &Frey, 2009). Table 9.1 consists of a distillation ofseveral principles of CLT that may offer strategiesfor managing cognitive load in assessment instru-ments. These include guidance on the efficientdesign of visuals, including graphs, charts, tables,and pictures; text economy to reduce informationoverload; page organization and layout; high-lighting and bolding of essential data; avoidingredundant multimodal presentation of material;textual or visual support for high-complexitymaterial; and the use of audio for learners withlow prior knowledge. Additionally, Mayer andMoreno’s cognitive theory of multimedia learn-ing (2003) integrates many cognitive load effectsand applies them to the design of computer-based instructional materials, providing a usefulperspective for test designers.

Developing Accessible Test Items:Identify, Quantify, and Modify

A set of tools has emerged from the frame-work of accessibility theory for examining andrating test items with a focus on increasingaccess, drawing from two decades of researchon CLT. Specifically, two instruments weredeveloped out of the Consortium for AlternateAssessment Validity and Experimental Studies(CAAVES) and Consortium for ModifiedAlternate Assessment Development andImplementation (CMAADI) projects aimedat improving accessible tests for studentswith special needs: the Test Accessibilityand Modification Inventory (TAMI; Beddow,

Page 183: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

9 Accessibility Theory 167

Table 9.1 Application of cognitive load theory guidelines to assessment

Test element Cognitive load theory concept Application to testing

Visuals Eliminate unnecessary visuals, includingthose included to promote interest:• Visual elements can distract attention from

essential task demands.Use diagrams to promote some types oflearning (e.g., spatial relationships):

• Using an apt visual can offload cognitivedemand to permit the learner to utilizeresources for other essential task demands.

• All elements in a test item (i.e., stimulus, stem,answer choices, and visuals) should be viewableon one page.

• Avoid text when a simple visual will suffice (i.e.,visuals should replace text rather than duplicateit).

• Use visuals when spatial reasoning is necessary(unless the target construct requires the test-takergenerating a visual from text).

• When necessary, use visuals to facilitateunderstanding.

• Use minimal amount of text to facilitateunderstanding of complex visuals (e.g., labels).

Page layout Avoid split attention and redundancy effects:• Integrating knowledge from two sources

using the same modality (e.g., two visuals)is cognitively demanding.

• Including redundant information increasescognitive load.Reduce need for representational holding:

• Maintaining information in workingmemory for use in another physicallocation, such as on another page or screen,can limit the resources available for othertask demands.

• Use one integrated visual rather than two similarvisuals.

• Text should not be added to self-explanatoryvisuals.

• Information should not be presented redundantlyin both text and visuals.

• Text and related visuals should not be separatedon a page, on different pages, or screens.

• Integrate explanatory text close to related visualson pages and screens.

• Integrate requisite reference sheets or othermaterial into the item so the test-taker is notrequired to hold material from other pages inworking memory to respond.

Item text Pare content down to essentials:• Complex text demands high working

memory capacity;• Simplifying text and adding signals help

prevent cognitive overload.Attend to the complexity of text to managecognitive load for readers with variouslevels of expertise:

• Write highly coherent texts for lowknowledge readers, requiring minimalinferences;

• Avoid redundant information for highknowledge readers.

• Eliminate redundant but related technical content.• Eliminate unnecessary language to reduce reading

load.• Vocabulary and sentence structure should be as

simple as possible, if the target construct permits.• Use bold font and/or underlining to highlight

words that are essential for responding.• Whenever possible, permit test-takers to access

definitions of unfamiliar words.• Item stems should be written as simply as

possible.• Only use headers and titles to cue test-takers to the

content and types of items.• Use caution in breaking up text for low knowledge

readers (e.g., by a visual).

Audio Avoid overloading working memory with theuse of audio to support learning:• The phonological loop and the

visual-spatial loop are two theoreticalcomponents of the working memorysystem.

• Do not add audio when visuals areself-explanatory.

• Don’t describe visuals with words presented inboth text and audio narration.

• Sequence on-screen text after audio to minimizeredundancy.

• Avoid audio narration of lengthy text passageswhen no visual is present.

• When a visual requires further explanation,integrate text with audio to avoid thesplit-attention effect.

• If possible, use audio to provide explanatorywords for complex visuals rather than add writtendefinitions.

Page 184: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

168 P.A. Beddow et al.

Table 9.1 (continued)

Test element Cognitive load theory concept Application to testing

Deliverysystem

Teach system components before teachingthe full process:• Ensure learners master the steps of a

procedure before they are required toperform it as a whole;Give learners control over pacing:

• When pacing must be instructionalcontrolled, cognitive load must bemanaged through the design of instructionand materials.Use completion examples to promotelearner processing of examples:

• Use hybrid practice problems and workedexamples. Essentially, the first step orsteps is/are done for the learner.Afterward, the learner completes the stepsindependently.

• Train test-takers in the test-delivery system priorto the test date.

• Teach supporting knowledge, separate fromteaching procedure steps.

• Test navigation should be trained to masterybefore the test-taker is required to use thetest-deliver system.

• Computer-based tests should permit the test-takerto navigate to any item during the test session,save progress, and take breaks when needed. Anon-screen progress indicator and/or clock isrecommended.

• Test training should result in test-taker mastery ofthe test-delivery system. Item examples should beincluded.

• If several types of items are included in a test, testtraining should include examples of each type.

Kettler, & Elliott, 2008) and the TAMIAccessibility Rating Matrix (ARM; Beddowet al., 2009). Both instruments were designedwith the dual purpose of documenting andevaluating features of tests and test items, andeach generates information that can be usedto guide strategic modifications to reduce theinfluence of ancillary interactions on test results.In sum, the process – which we argue is bestconducted in teams – consists of three steps: theitem writing/accessibility team identifies featuresof the items that may present access barriers forsome test-takers, evaluates the item on the basisof these features, and modifies the item accordingto suggestions made by the team.

The TAMI ARM (Beddow et al., 2009) repre-sents the most recent advancement in guiding thesystematic improvement of test item accessibil-ity. ARM ratings can be used to identify featuresof individual test items that may hinder access

for some test-takers, quantify the accessibilityof item elements and overall items, and offerevidence-based recommendations for increasingaccess for more test-takers. The ARM is com-prised of two analytic rubrics. The first rubricconsists of a matrix for rating the five basicelements of a typical multiple-choice item: theitem passage and/or stimulus, the visual, the itemstem, the answer choices, and the page or itemlayout. Raters use a 4-point Likert-type scale torate the accessibility of each item element (seeFig. 9.2). It should be noted that when evalu-ating constructed response items, such as shortanswer or essay questions, the answer-choiceselement is not rated. The second rubric aggre-gates the element matrix to yield an overall acces-sibility rating for the item. Because individualitem elements can disproportionately influencethe overall accessibility of a test item, the over-all rating is not simply an average of the matrix

Fig. 9.2 Accessibility levels(Beddow, Elliott, & Kettler,2010)

Page 185: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

9 Accessibility Theory 169

Table 9.2 Key characteristics of optimally accessible multiple-choice items

Item element Optimally accessible if: Key references

Item passage/stimulus It contains only essential words.The text is minimal in length and written asplainly as possible.The vocabulary and sentence structure aregrade-appropriate.The directions or pre-reading text are clear.

Clark, Nguyen, and Sweller (2006); Mayer,Bove, Bryman, Mars, and Tapangco (1995);Mayer and Moreno (2003)

Item stem The text is minimal in length, written asplainly as possible.It reflects the intended content standardsand/or objectives.The target construct is evident.It is positively worded and uses the activevoice.

Haladyna, Downing, and Rodriguez (2002)

Visuals Are necessary for responding to the item.They clearly depict the intended images andare as simple as possible.They contain only text that is necessary forresponding.They are unlikely to distract test-takers.

Mayer and Moreno (2003); Mousavi, Low,and Sweller (1995); Torcasio and Sweller(2010)

Answer choices Are minimal in length and written asplainly as possible.The key and distractors are balanced withregard to length, order, and content.All distractors are plausible.Only one answer is correct.

Halyadyna, Downing, and Rodriguez(2002); Mayer and Moreno (2003);Rodriguez (2005)

Page/item layout The entire item and all necessaryinformation for responding are presented onone page or screen.All visuals are integrated with the otheritem elements rather than being placed offto the side.The layout is well organized and presentedin a manner that facilitates responding.There is sufficient white space to facilitatecomprehension of necessary item elements.The text and item elements are large andreadable.

Sweller and Chandler (1994); Chandler andSweller (1996); Moreno and Mayer (1999)

ratings. Table 9.2 contains the key characteristicsof optimally accessible multiple-choice items.

It is essential TAMI accessibility raters have(a) experience with test item writing; (b) knowl-edge of the domain content at the grade level(s)of the target test; and (c) familiarity with therange of abilities and needs of the target popu-lation of the test. Validity studies on the use ofthe ARM indicate, with minimal time (i.e., 4–6 h),assessment professionals with no prior experi-ence can be trained to rate the accessibility of

items with a high level of agreement with expertraters (Beddow, Elliott, & Kettler, 2010).

To facilitate an understanding of this “path-way to accessibility,” we will model the processof identification, evaluation, and modificationwith a high school science item (see Fig. 9.3).The item was written, to measure a particu-lar performance objective found in the Grade11 Content Standards from an unnamed state,as follows: “Use knowledge of biological con-cepts to draw conclusions from graphed data.”

Page 186: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

170 P.A. Beddow et al.

Fig. 9.3 Grade 11 scienceitem, original form

Although the item is hypothetical, its content,structural features, and target construct all arederived from an actual end-of-course state testitem. The item represents several features of com-mon test items that may influence their accessibil-ity for some test-takers. Figures 9.4 and 9.5 (seepp. 175–176) present the same item in two sub-sequent phases of modification. Each reflects anincreasing degree of accessibility resulting fromthis iterative process.

The item in Fig. 9.3 (“original” form) con-sists, primarily, of a passage, stimulus, visual,stem, and answer choices. The item begins witha passage that describes a phenomenon wherebylow levels of salinity near the equator and atthe poles are caused by rainfall and low tem-peratures, respectively, while the water at the“mid-latitudes” (i.e., between the equator and thepoles) contains a higher concentration of salt. Thepassage is followed by a stimulus that precedesa graph of the metabolic rates of two organisms(referred to as Organism A and Organism B) asa function of water temperature. The item stemdirects the test-taker to select the answer choicethat contains the best conclusion based on thedata presented in the graph. Each of the answerchoices refers to the degree to which one of thetwo organisms is better adapted to life in differentlocations or salinity levels than the other.

Before undertaking the effort to identify itemfeatures that may present access barriers for sometest-takers, the rater must first assume the roleof the test-taker and complete the item indepen-dently. It is by experiencing the item firsthand(without peeking at the item key!) that the rateris able to ascertain the various aspects of theitem that demand the use of cognitive resources.Greater the degree the rater attends to the varietyof individual differences, that may impact test-takers while completing the item (i.e., using hisor her knowledge of, and experience with, the tar-get population of the test), the more informationhe or she may be able to use to identify featuresof the item that may influence accessibility.

The following represents the recordedthoughts of a TAMI-trained test-taker, describinghis attempt to complete the original item aspresented in Fig. 9.3:

When I began to read from the top of the ques-tion, it immediately became clear that the textwas abstruse and very complex. While the vocab-ulary did not appear to exceed grade-level lan-guage expectations for upper-grade high schoolstudents, the block of text at the top of theitem was difficult to comprehend. The graphwas similarly challenging. The axis labels usedfont sizes that were large enough to read, butit was difficult to keep track of all of theparts of the graph to make sense of the data.

Page 187: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

9 Accessibility Theory 171

Reaching the item stem and answer choices, thekey verb adapted – found in each answer choice –seemed to spring out of nowhere. Indeed, no textpreceding the answer choices included the wordsadapt, adaptation, or any variation thereof. I foundwords such as metabolic rate, evaporation, precip-itation, and I eventually inferred the connectionbetween metabolism and adaptation. Once I madethis connection, I referred back to the graph toattempt to draw the necessary conclusion, namelythat the metabolic rate of Organism A (and thus itsapparent adaptability) is highest on the extremesof the X-axis. I then re-read the passage to learnthat those extremes represent the poles and theequator. Re-reading the answer choices – severaltimes over – I did not find a statement aboutOrganism A’s adaptability at the poles and equatorthat matched my conclusion. I then examined thedata for Organism B and found that its metabolicrate is highest at the mid-latitudes. At this point, Isearched the answer choices again for a statementthat confirmed my latest conclusion, but again, mysearch was fruitless. Frustrated, I finally decidedto read each of the answer choices individuallyand refer to the graph to determine whether it waspotentially true. Thankfully, response option A –“Organism A is less well adapted for life at themid-latitudes than Organism B” – was indeed cor-rect. I realized that I had skimmed over the wordless and wrongly assumed all of the choices con-cerned which organism was better adapted thanthe other (except choice D, which equated the twoorganisms.) Even then I had to double-check toensure the mid-latitudes were indeed representedby the unlabeled space between the extremes of theabscissa and that the metabolic rate of OrganismA was indeed lower there than that of OrganismB. Satisfied (and relieved) I moved on to the nextquestion.

It should be noted that there are many ways atest-taker may have opted to respond to this item,and the person above did not explore every per-mutation in his think-aloud. It is for this reasonthat cognitive interview studies with representa-tive participants from the target test-taker pop-ulation (e.g., Roach, Beddow, Kurz, Kettler, &Elliott, 2010; Johnstone, Bottsford-Miller, &Thompson, 2006) can offer useful informationfor understanding the various solution processesattempted by students who are likely to take thetest. The greater the degree to which test devel-opers who pursue to improve the accessibilityof tests can assimilate these perspectives intothe development process, the more likely theiritem generation, evaluation, and modification

procedures will result in items that are free fromaccess barriers for the intended test-takers.

Now that we have completed the item, wemay begin the process of identifying potentialaccess barriers. Following the ARM (Beddowet al., 2010), we shall begin with the topmost itemelement, the passage or item stimulus.

Item Stimulus

The stimulus begins as follows: “The temperatureof the Pacific Ocean ranges from near freezing(32◦ Fahrenheit, 0◦ Centigrade) at the poles to86◦ Fahrenheit (30◦ Centigrade) in close proxim-ity to the equator.”

Cognitive load theory offers a helpful perspec-tive for understanding the mental demands ofthis first sentence. According to Sweller (2010),the dominant source of complexity in learningtasks is from element interactivity. In this sen-tence, the test-taker simultaneously must holdtwo informational elements in working memoryto integrate them into a unit of understandingthat permits him or her to proceed to the nextsentence. Specifically, the relation between tem-perature and location must be clear before thetest-taker can move onto the next sentence, whichbegins like this: “The salinity of the water is high-est at the mid-latitudes.” If the test-taker has notintegrated the temperature-location knowledgefrom the first sentence such that it is availablefor use in working memory, the second sentencemay be difficult to follow, as the word salinityis presented for the first time without an explicitstatement of how temperature and location arerelated to salinity.

Several additional sources of potential elementinteractivity can be found in the first paragraph ofthe item stimulus alone – namely, the test-takermust understand the relation between salinity andlocation to make the later connection betweensalinity and temperature in answer choice C, andthe relation between Fahrenheit and Centigradetemperature systems likely must be understood inorder to mentally dismiss one temperature systemor the other to avoid confusion. In fact, it may bepossible to eliminate both measurement systems

Page 188: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

172 P.A. Beddow et al.

altogether by describing the water using the termswarm and cold instead of numeric degrees.

Clearly, the item stimulus contains numerousfeatures that likely pose access barriers for at leastsome test-takers. At the lowest accessibility levelfor the item stimulus, the ARM item analysismatrix contains this descriptor: “the majority oftext is likely to be difficult to understand for sometest-takers.” Based on the factors above, the high-est ARM rating the item stimulus can receive is 1or inaccessible for many test-takers.

Item Stem

The next step is to identify features of the itemstem that may contain access barriers for sometest-takers. At first glance, the item stem appearsto be worded plainly. However, two features ofthe stem could be changed to conform to pro-fessional guidance from CLT and item writingguidelines. Specifically, the word best uses allcapital letters as opposed to using bold or under-line for highlighting the importance of the word(Clark et al., 2006). Second, the stem is writtenpartially in the passive voice (“can be drawn”),which may cause confusion for some test-takers(Haladyna et al., 2002). While the accessibilityof the stem is not optimal, these concerns are rel-atively minor; based on the item analysis matrix,the item stem receives a rating of 3 or maximallyaccessible for most test-takers.

Visuals

The next item element to examine is the visual.The first question to ask is, “Is the visual neces-sary for responding to the item?” In terms of itemevaluation, visuals include any pictures, charts,tables, or graphs that appear on the page with theother item elements. Evidence indicates the inclu-sion of nonessential visuals may hinder readingcomprehension for some learners (Torcasio &Sweller, 2010) and negatively influence studenttest performance (Kettler et al., in press). In itscurrent form, the item in Fig. 9.3 requires thetest-taker to use the graph to respond; therefore,

the graph must be examined in terms of its clar-ity, simplicity, and its location with respect to theother item elements to identify features that maycause access problems for some test-takers.

The graph’s Y-axis label is Metabolic Rate,which has no reference term in any other itemelement and its relation to adaptation must, there-fore, be deduced. The axis has an arrow symbol atthe top to indicate low versus high metabolic rate.This tiny graphical element may be missed bysome test-takers. Additionally, to the degree thetest-taker has prior knowledge that metabolismmay be used as an indicator of fitness (and thatmetabolic rate is the measure of metabolism),and if he or she understands that fitness or sur-vival rate may be used as a proxy for adaptabil-ity, the label does not present an access barrier.This raises the question of the target constructof the item. Is the demand for these connec-tions with prior knowledge an ARC, or is it acomponent of the target construct of the item?The performance objective (our nearest defini-tion of the intended measurement construct) is,as stated in the content standards, “Use knowl-edge of biological concepts to interpret grapheddata.” The item stimulus includes ample dis-cussion of the relation between temperature andsalinity, the assimilation of which likely demandsextensive intrinsic load. The focus should be toreduce extraneous cognitive load wherever pos-sible. Arguably, replacing the word metabolicwith survival eliminates the demand for using thetransitive property (if a equals b and b equalsc then a equals c, or if metabolic rate equalssurvival rate and survival rate equals adaptabil-ity, then metabolic rate equals adaptability), thusreducing cognitive load. With this modification,the test-taker is still required to understand thatsurvival rate is a proxy for adaptability – a bio-logical concept – so the change does not dilute theitem’s measure of target construct of the item. Itshould be noted that this issue supports the needfor item modification to be a collaborative pro-cess consisting of interactive discussion amongassessment experts, content area specialists, andeducators with familiarity with the target pop-ulation of a test. It could be said that this isan example of where science meets art in the

Page 189: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

9 Accessibility Theory 173

arena of test item writing (Rodriguez, 1997). TheX-axis label is Temperature, with the clarifierDegrees Centigrade. Removing one temperaturesystem from the stimulus – or even eliminatingboth systems in favor of the warm and cold termsmentioned earlier – would permit the eliminationof this clarifier and reduce the reading load of thevisual.

Taken together, the features of this item visualare likely to cause access barriers for many test-takers. Therefore, the visual receives a rating of 1or inaccessible for many test-takers. As we willobserve shortly, the visual is the central featureof the item and should receive abundant attentionif our goal is to improve the accessibility of thisitem.

Answer Choices

The next item element is the answer choices.Haladyna et al. (2002) listed a number of guide-lines for writing effective answer choices, includ-ing recommending all answer choices should beplausible and balanced with regard to length andcontent. Further, based on a meta-analysis of 80years of research, Rodriguez (2005) argued threeoptions are optimal for multiple-choice tests.Finally, Rodriguez has argued that the use of theterm distractor is inappropriate given the pur-pose of incorrect options in a multiple-choiceitem; namely, the incorrect options should con-sist of common errors. The author has argued theterm attractors would be more appropriate sincethe objective for item writers should be to dis-criminate test-takers who know the material fromthose who still tend to make these types of errors.Implausible distractors, on the other hand, tendto be the least-selected and do not discriminate aswell.

Based on this guidance, the answer choices inFig. 9.3 are problematic for a number of reasons.First, the choices are unbalanced with regard tocontent. Choices A and D begin with OrganismA; choices B and C begin with Organism B.Choices A and D use the term “well adapted”;choices B and C use “better adapted.” ChoicesA, B, and D refer to geographic locations

(mid-latitudes, poles, and equator, respectively);choice C refers to water salinity. Choices Aand C use the comparative term than; choice Buses compared to. All of these factors increasethe cognitive demand of the element, particu-larly insofar as they require multiple elementinteractivity (salinity with location, temperaturewith survival, Organism A with Organism B).Second, choice D is highly implausible. Exceptfor two specific (unlabeled) locations on thegraph, the data for each organism are clearlydisparate from the other across the range of tem-peratures. This is a “throw-away” response option(and is a clear candidate for elimination accord-ing to Rodriguez, 2005). Much work can be doneto make the answer choices accessible for moretest-takers, and according to the ARM (Beddowet al., 2010) matrix, the answer choices elementreceives a rating of 1 or inaccessible for manytest-takers.

Page/Item Layout

The final element to examine for accessibil-ity concerns is the layout of the page or item.Again, at first glance everything about this ele-ment appears to be in order and nothing “clangs”of access barriers. The item is embedded in thecenter of the item between the item stimulus andstem, as recommended by cognitive load theo-rists (e.g., Clark et al., 2006), to reduce the needfor representational holding. As it is currentlypresented, the item allows the test-taker to findall of the information needed to solve the itemwithin the item space. Indeed, if the item visualwere presented on a facing page – or worse – onthe separate page that is out of immediate view,the element interactivity concerns noted earlierwould be compounded by the fact of the test-takerbeing required to “carry” this knowledge in hisor her working memory across the gap betweenpages or parts of the page. However, the blockof text at the top of the page may be intimidat-ing to test-takers with poor reading skills (again,we must remember that the target domain of theitem is science, not reading) and there is littlespace between lines of text. The graph appears

Page 190: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

174 P.A. Beddow et al.

cluttered by text and lines and one rater quippedabout a similar visual that “my eyes went go allgoogly trying to figure this out.” Finally, the itemnumber beside the item stimulus is small and,depending on the relative location of this item tothe one before it, may be missed by some test-takers. Based on these concerns, the page anditem layout for this item receives a rating of 2 ormaximally accessible for some test-takers.

Overall Accessibility Rating

The ARM contains an Overall Analysis rubricthat is designed to be completed after rating theaccessibility of each of the individual item ele-ments. Although the rater is advised to use theelement ratings to inform his or her determina-tion of the overall rating, it should be reiteratedthat the overall accessibility of the item maybe disproportionately influenced by one or moreitem feature over the others, so averaging acrossthe individual item element ratings is not rec-ommended. In this case of the item in Fig. 9.3,while the item stem may be maximally accessi-ble for most test-takers, the extraneous cognitiveload in the stimulus and visual both decrease theaccessibility of the item as a whole. The oppo-site is possible, of course: An item can consist ofan optimally accessible passage and visual, butthe stem contains so much extraneous cognitiveload that the test-taker is unable to understandhow to select the correct answer choice even ifhe or she possesses a sufficient amount of thetarget construct to demonstrate his or her knowl-edge on a more accessible item. Based on therubric, the item in Fig. 9.3 receives a rating of1 or inaccessible for many test-takers.

Figure 9.4 consists of the same science itemfollowing modification based on the identifica-tion and evaluation procedures described above.Using the ARM Record Form (Beddow et al.,2009), the raters who evaluated the originalitem suggested several modifications that mayimprove the accessibility of the item. For thestimulus, the raters suggested simplifying orshortening the text, reorganizing information, andchanging the text formatting to reduce cognitive

demand. The resulting stimulus, which consistsof an introductory sentence consisting of a direc-tions statement, is as follows: “Use these factsabout the Pacific Ocean to answer the ques-tion.” This directive is followed by two bulletedstatements containing bold text for key terms orinformation elements (i.e., those necessary forresponding; e.g., salinity, lowest, and very cold).These statements precede a simplified stimulusfor the visual.

These modifications are likely to improveaccess for more test-takers for a number ofreasons. First, the multiple element interactiv-ity – the primary cause of cognitive overload –is reduced. The temperature systems from theoriginal items have been removed in favor ofqualitatively descriptive text labels for watertemperature, as have the extraneous informationabout the locations where the high- and low-salinity water is found. On the latter, objectionscould be raised: Is the information about the high-salinity water being found at the poles and equa-tor in fact extraneous, or is it part of the targetconstruct of the item (i.e., the “biological knowl-edge”)? Unless the item is intended to measurethe test-taker’s ability to use knowledge of thegeographical locations on the earth where salinityof water is highest or lowest to interpret grapheddata, including this information arguably addsextraneous complexity to an item that alreadycontains access barriers for many test-takers. If,on the other hand, a secondary purpose of theitem was to teach test-takers this knowledge, thenits inclusion would be necessary. In this exam-ple, however, the item was designed purely formeasurement purposes and its inclusion does notfacilitate measurement. Indeed, the material mayoccupy cognitive resources that test-takers needto respond to the actual target construct, possiblycausing some to respond incorrectly to the itembecause of cognitive overload.

Even after eliminating this element interac-tion, however, the item still requires the test-takerto retain several element interactions in workingmemory. The subject of the stimulus is salinity;the subject of the visual is water temperature; andthe subject of the answer choices is also salinity.Thus, the test-taker must be able to use reflexive

Page 191: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

9 Accessibility Theory 175

Fig. 9.4 Grade 11 scienceitem, modified form A

logic to select the correct response option, which,it could be argued, is another ancillary requi-site construct. In its modified form, the stimulusreceives a rating of maximally accessible for mosttest-takers – an ARM rating of 3 – with the recom-mendation that it be simplified further to improveits accessibility.

Bold font was used to facilitate identificationof the stem among the other item elements. Inits modified form, the stem receives a rating of 4or maximally accessible for nearly all test-takers.The item stem was reworded in the active voice,to read as follows: “Based on the graph, whatcould you conclude about the two organisms?”It should be noted that some test writers recom-mend avoiding the word you in the questions,arguing that it makes the item sound less pro-fessional or could lead test-takers to believe theitem demands an opinion rather than a correctresponse. Use of the second person notwithstand-ing, the stem receives a rating of 4 or maximallyaccessible for nearly all test-takers.

Several changes made to the visual duringmodification reduced the cognitive demand ofthe item. The word Metabolic on the Y-axislabel was changed to Survival to facilitate test-takers’ comprehension of the connection betweenthe graphed data and the answer choices, sinceit is generally expected that the biological

concept of adaptation is delivered in the elemen-tary grades (e.g., National Center for EducationStatistics, 2011), while metabolism as an indica-tor of organism survival is taught much later. Thearrow embedded in the Y-axis was separated andincreased in size, along with the addition of thewords low and high to simplify the characteri-zation of the range of survival rates. Similarly,an arrow was added to the X-axis to facilitatethe test-taker’s interpretation of the water tem-perature progression. Finally, the two data serieswere labeled with the specific organism titlesinstead of using simply A and B. The modifiedvisual receives an ARM rating of 3 or maxi-mally accessible for most test-takers, with therecommendation that it be simplified even further.

The answer choices were revised significantly.To achieve balance, all three choices are posi-tively worded (i.e., better adapted, equally welladapted). All three refer to water with low salin-ity. The implausible option D was eliminated.The modified answer choices element receives arating of 4 or maximally accessible for nearly alltest-takers – the highest ARM rating.

Finally, the page and item layout are moder-ately improved. The bulleted text at the top ofthe item creates more white space than the origi-nal item. The item number is increased in size tomake it easier for the test-taker to see where one

Page 192: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

176 P.A. Beddow et al.

item ends and the next item begins. Additionally,there is an implication that the test-taker is nowpermitted to respond on the actual page instead ofrecording his or her answers on a separate answersheet, thus reducing the need for representationalholding and eliminating the possibility of thedreaded classic “bubble sheet misalignment boo-boo,” which needs no further explanation. Themodified page and item layout receives an ARMrating of 4 or maximally accessible for nearly alltest-takers, with the recommendation that whitespace be increased to improve the accessibility ofthe item.

Overall, the item receives an ARM rating of3 or maximally accessible for most test-takers.It should be noted that when the rater is ableto generate ways to revise an item further toimprove the accessibility of the item, the authorsof the TAMI argue the item should not receivethe highest accessibility rating. They argue itemdevelopment, particularly when there is a focuson accessibility, is an iterative process and itemsshould not be used for measurement for account-ability or decision making until they are deemedto be optimally accessible for the target test-takerpopulation. Although this version of the item wasrated highly using the ARM, the rater suggestedchanges to the item stimulus, visual, and item

layout. Figure 9.5 contains the same item follow-ing a second set of modifications to improve itsaccessibility for more test-takers. In its final form,you will note the item has changed considerably.

Specifically, the word count of the item stimu-lus has been shortened by 43%. The first sentenceof the item in Fig. 9.4 (situating the item inthe context of the Pacific Ocean) was deemedextraneous, as was the inclusion of the relationbetween salinity and water temperature. By elim-inating the demand for reflexive logic (i.e., thecognitive shift from the concept of salinity totemperature and then back to salinity) the itemnow requires the test-taker to hold one potentiallynovel informational element in working memoryto interpret the graph; namely, the definition ofsalinity. In fact, the graphed data are interpretableeven without this knowledge. (It should be noted,however, that were this primer removed, sometest-takers may feel anxious if the word salinityis unfamiliar to them.)

Clearly, the graph is easier to interpret in itscurrent form than in either of the two previousversions. The elimination of the temperature con-nection greatly simplified the data series, andthe elimination of the gray boxes makes theitem appear much simpler. In fact, it is likelysome readers will object to these changes, since

Fig. 9.5 Grade 11 scienceitem, modified form B

Page 193: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

9 Accessibility Theory 177

the item in its current form demands only thatthe test-taker know (a) that survival rate is aproxy for adaptability – a concept which, aswe noted earlier, typically is covered in ele-mentary school – and (b) how to read a fairlysimple two-way graph consisting of two dataseries. Consider, however, the target constructas stated earlier: “Use knowledge of biologi-cal concepts to interpret graphed data.” In itsfinal form, this item demands test-takers use theirknowledge of the fact that organisms who areable to adapt to their environments are morelikely to survive in that environment to interpreta simple graph comparing the survival rates oftwo organisms in different environmental condi-tions. This is the target construct of the item –our stated purpose from the beginning!

Technical Evidence to Verify IntendedEffects: The Accessibility ProofParadox

There are several item-level indices that may beused as indicators of the effects of modificationprocedures to enhance accessibility, includingword count, readability, depth of knowledge rat-ings (DOK; e.g., Webb, 2002), item difficulty(p), item discrimination indices such as item-total correlations (r), and reliability indices suchas Cronbach’s alpha (α). For many of theseindicators, expected changes are mostly straight-forward. Given the relation between cognitivedemand and reading load, for example, wordcount is likely to decrease for most items fol-lowing enhancement, with the occasional excep-tion of items for which contextual support orclarifying text is added (e.g., definitions forconstruct-unrelated but essential terms, labelsfor graphs). Similarly, readability indices mayindicate reduced reading load (though for someassessment instruments it may be important toensure readability does not fall below gradelevel expectations). It should be noted that typi-cal readability indices utilize weighted formulasthat include various factors, including sentencelength and average syllables. When two or more

readability indices are compared, results oftenare highly disparate (Wright, 2009). When usingreadability as a proxy for the grade level oftext, results should be interpreted with caution.Generally, readability is considered acceptablewhen the grade level of a text falls anywherewithin the range of the intended grade level (e.g.,7.0–7.9 would be acceptable readability valuesfor a grade 7 reading item).

By contrast, determining the desired effects ofitem changes, as estimated by other indices, issomewhat counterintuitive and may even presentlogical problems. The first of these challengesis the question of item difficulty. It would seemincreases in item difficulty are always desirable,since they demonstrate a greater proportion oftest-takers responded correctly to the item (whichmust be a good thing!). However, there is a crit-ical set of conditional limitations that determineswhether item difficulty should increase if accessbarriers are removed from items. For example,item difficulty is expected to increase follow-ing modification only if the accessibility of theoriginal item were less than optimal for a por-tion of the tested population, and if some of thesame test-takers were able to demonstrate theirknowledge of the tested content had the itembeen optimally accessible. Thus, increases initem difficulty are not always the best indicatorsof intended effects.

Ideally, item discrimination will remain thesame or increase if modifications resulted in theelimination of access barriers for a portion of thetarget population. It should be noted, however,that in this case, the ideal is an optimally accessi-ble test! If other test items contain access barriersfor some test-takers, then the total componentof the item-total correlation that comprises theitem discrimination index likely will reflect theseaccess skills, thus resulting in decreased item dis-crimination even if the accessibility of the indi-vidual item has improved. Of course, these arechallenging concerns for researchers interested inexamining the effects of item modifications orenhancements because decreases in item discrim-ination may indicate either (a) the modificationchanged the target construct of the item, while the

Page 194: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

178 P.A. Beddow et al.

tested construct represented by the balance of theitems remained pure, or (b) the modification iso-lated the target construct of the item, while thetested construct represented by the balance of theitems remained adulterated by ARCs.

The solution to both the item difficulty anditem discrimination paradoxes harkens back tothe essence of test validation and requires testdevelopers to generate a validity argument basedon all available data about the current test. Inmost cases, the reliability of an accessible test,often indicated by α, should be similar to that ofvalidated tests of similar constructs. To wit, if theaverage of the item-total correlations for the uni-verse of test items is high, the deduction is thatthe items map to the same construct (one thathopefully is uncontaminated by ARCs, a deduc-tion that must be tested). One solution may be toexamine point-biserial correlations with validatedmeasures of the same construct (e.g., correlatingitems from the current mathematics test with totalscores of a math test with established validityevidence). Additionally, test developers shouldexamine the test results of students who havedemonstrated access needs against the scores ofstudents who are able to demonstrate their knowl-edge of the target construct, notwithstanding thepresence of access barriers for other test-takers.If the modified version of the test or test itemscontains fewer access barriers than the originalversion, evidence should exist of a differential(greater) boost for the test-takers for whom acces-sibility has been improved (Kettler et al., inpress). Evidence of discriminant validity (i.e.,evidence that a measure correlates weakly withtests of theoretically orthogonal constructs) mayinclude examining correlations with measures oftest-taker working memory (e.g., Baddeley, 2003)or – in the case of a test of a construct other thanreading – reading fluency. In essence, we recom-mend test developers generate a priori criteriafor making a determination about whether accessstrategies have been successful. Additionally, wecontend it is unwise to rely on one index alone formaking such a determination; rather, an argumentabout the effectiveness of accessibility strategiesmust be based on the triangulation of multipleitem- and test-level indices.

Conclusion

Universal Design: The Endor the Beginning?

Universal design is a useful starting point forunderstanding the need to develop tests that areaccessible for the entirety of the intended popula-tion. However, the principles of UD contain littleoperational guidance for developing accessibleassessments. Indeed, tests are not designed to beuniversally accessible; rather, they are intendedto be accessible for a group of test-takers witha common set of individual characteristics. Testdevelopers should aim to design tests such thatvariation in this population does not influence testscores and subsequent inferences about the targetconstruct.

Accessibility Theory: The Test-Taker, Notthe Universe

Accessibility theory extends guidance from uni-versal design and universal design for assessment(Thompson et al., 2002) and operationalizes testaccessibility as the sum of interactions betweenfeatures of the test and characteristics of eachtest-taker. To enhance accessibility, therefore, wecontend the test developer must be groundedboth in effective test design strategies as wellas have familiarity with the range of abilitiesand needs of the population for which the test isdesigned. Accessibility theory integrates princi-ples of working memory, cognitive load theory,and research on test and item development to pro-vide specific guidance for ensuring tests and testitems do not contain features that prevent test-takers from demonstrating their knowledge of thetarget construct.

Cognitive Load Theory: From Teachingto Testing

Cognitive load theory arguably contains thedeepest wellspring of conceptual and opera-tional guidance for the development of highly

Page 195: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

9 Accessibility Theory 179

accessible tests. By attending to cognitive loadduring the design phase, test item writers canmanage cognitive resource demands, eliminatinginterference from access skills and isolating thetarget construct of the test.

Developing Accessible Test Items:Identify, Quantify, and Modify

While accessibility theory certainly offers guid-ance that can be used to inform the developmentof accessible items from their inception, we con-tend the process of item writing with a focuson accessibility is iterative. We recommend testdevelopers examine existing test items to identifyfeatures that may present access barriers for sometest-takers, quantify the accessibility of the items,and modify items with poor accessibility, remov-ing access barriers to improve their accessibilityfor more test-takers.

Technical Evidence to Verify IntendedEffects: The Accessibility Proof Paradox

After systematically evaluating the accessibilityof a test and ensuring, to the extent possible,that it is free from access barriers, evidenceshould be collected to verify the effectiveness ofthe process. Notwithstanding the complexities ofestablishing a test accessibility argument, it iscritical test developers generate explicit expecta-tions and utilize multiple indices at both the itemand test levels (e.g., readability indices, depth ofknowledge, item difficulty, item discrimination,and relations with other measures such as work-ing memory and reading fluency) to test theirveracity.

The Access Pathway: AccessibilityAcross the Educational Environment

The goal of the discussed item modifications viathe TAMI Accessibility Rating Matrix (Beddowet al., 2009) was to increase student accessto the item’s target construct. The systematic

application of such accessibility reviews (andsubsequent modifications) across a test’s con-stituent item set is thus meant to strengthen thevalidity of test score inferences about a student’sknowledge and skills. That is, the test scorefrom an optimally accessible test permits a moreaccurate inference about what a student knowsand can do because inaccessible tests inextrica-bly measure an unintended nuisance dimension,namely the student’s ability to manage (extrane-ous) cognitive load caused by construct-irrelevantitem features. Porter (2006) defined the contentrelated to tested knowledge and skills as theassessed curriculum. As such, test accessibilityis concerned with increased student access to theassessed curriculum at the time of the test event.

Besides the assessed curriculum, researchers(Anderson, 2002; Kurz & Elliott, 2011; Porter,2006) have identified additional curricula of theeducational environment such as the intendedcurriculum (e.g., content expressed in state stan-dards) and the enacted curriculum (e.g., contentcovered by teacher instruction). Kurz and Elliottargued that an important goal of schooling is toprovide students with the opportunity to learnthe intended curriculum. They further noted threemajor barriers to access for students with dis-abilities (see Fig. 1.1 in Chapter 1), includinginsufficient opportunity to learn the intended cur-riculum, inappropriate testing accommodations(or lack thereof), and construct-irrelevant vari-ance on assessments. The design and/or modifi-cation of more accessible tests using tools suchas the ARM (Beddow et al., 2009) clearly sup-port the promotion of greater student access toassessed curriculum. Unfortunately, we have torecognize that even the most accessible test isunable to overcome access barriers related tothe assessed curriculum preceding the test eventsuch as a lack of instruction covering the testedcontent.

The aim to enhance accessibility across theeducational environment should be focused onstudents’ access to the content of its variouscurricula, including the intended, enacted, andassessed curriculum. The teacher’s enacted cur-riculum is supposed to cover the knowledge andskills put forth in the intended curriculum, while

Page 196: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

180 P.A. Beddow et al.

the assessed curriculum is designed to sampleacross the various domains of the intended cur-riculum. Access to the assessed curriculum thuscan be compromised via a teacher’s enacted cur-riculum that fails to cover the content prescribedby the standards. Kurz discussed this aspectof accessibility in greater detail under the con-cept of opportunity-to-learn (OTL) in Chapter 6.Many students, especially students with disabili-ties, may require additional instructional adapta-tions to facilitate student learning of the teacher’senacted curriculum. The chapter by Ketterlin-Geller and Jamgochian explicated how accessibleinstruction can be supported through instruc-tional accommodations and modifications (seeChapter 7), and Phillips highlighted how acces-sibility for students with disabilities is integratedinto a legal framework that mandates their physi-cal and intellectual access to curriculum, instruc-tion, and assessment (see Chapter 3). In thischapter, we elucidated on the final stretch ofthe access pathway, the test event. While Tindal(Chapter 10) and Kettler (Chapter 13) discussedappropriate testing accommodations as an avenueto provide students with disabilities access totested content at the time of the test event, itemmodifications, such as the ones discussed in thischapter, are less concerned with the test’s admin-istration including its presentation, timing, modeof response, or environment and more concernedwith the accessibility of the test itself.

As evidenced by the many chapters in thisbook, there are a number of access points tothe intended curriculum that can be viewed asoccurring along an access pathway, which (a)begins with physical access to buildings andclassrooms; (b) continues with instruction thatprovides students with the opportunity to learnwhat is expected and measured; (c) supports stu-dent learning with needed instructional accom-modations and modifications; and (d) culmi-nates in assessments that grant students optimalaccess to demonstrating their achievement of theintended curriculum. The final access points forstudents occur during the test event, which maypresent access barriers for many test-takers. Toovercome these access barriers it is necessaryto consider the interactions between test-taker

characteristics and features of the test and itsadministration. Currently, two primary methodsare used to reduce the influence of access bar-riers: testing accommodations and test or itemmodifications. Just as the decision to assign a par-ticular testing accommodation should considerthe interaction between a test-taker’s disability-related characteristics and certain aspects of thetest administration, which may limit his or heraccess to demonstrate achievement of the tar-get construct (e.g., a scribe for a student witha physical disability), the design and/or modi-fication of items, with a focus on accessibility,must consider the interaction between specifictest-taker characteristics and certain features ofthe test itself (e.g., large print for a student witha visual impairment). An important distinctionbetween the two access strategies, however, isthat test-taker characteristics are considered onan individual basis in the case of testing accom-modations, whereas item modifications considerthe characteristics of a group of test-takers. Ofcourse, the purpose of enhancing accessibilityacross the access pathway is to afford studentswith greatest opportunity to learn and to enablethem to demonstrate their learning. The ultimategoal, however, is much grander, and infinitelyelusive: namely, to provide educational servicesthat are optimized to meet the specific needs ofeach child in our schools.

References

Anderson, L. W. (2002). Curricular alignment: A re-examination. Theory into Practice, 41(4), 255–260.

Baddeley, A. (2003). Working memory: Looking back andlooking forward. Neuroscience, 4, 829–839.

Beddow, P. A. (2010). Beyond universal design:Accessibility theory to advance testing for all students.In M. Russell (Ed.), Assessing students in the mar-gins: Challenges, strategies, and techniques (1st ed.,pp. 383–407). New York: Information Age Publishing.

Beddow, P. A., Elliott, S. N., & Kettler, R. J. (2009).TAMI accessibility rating matrix (ARM). Nashville,TN: Vanderbilt University.

Beddow, P. A., Elliott, S. N., & Kettler, R. J. (2010).Test accessibility and modification inventory (TAMI)technical supplement. Nashville, TN: VanderbiltUniversity.

Page 197: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

9 Accessibility Theory 181

Beddow, P. A., Kettler, R. J., & Elliott, S. N. (2008).Test accessibility and modification inventory (TAMI).Nashville, TN: Vanderbilt University.

Chandler, P., & Sweller, J. (1991). Cognitive load the-ory and the format of instruction. Cognition andInstruction, 8, 293–332.

Chandler, P., & Sweller, J. (1996). Cognitive load whilelearning to use a computer program. Applied CognitivePsychology, 10, 151–170.

Clark, R. C., Nguyen, F., & Sweller, J. (2006). Efficiencyin learning: Evidence-Based guidelines to managecognitive load. San Francisco: Jossey-Bass.

Elliott, S. N., Kettler, R. J., Beddow, P. A., Kurz, A.,Compton, E., McGrath, D., et al. (2010). Effects ofusing modified items to test students with persis-tent academic difficulties. Exceptional Children, 76,475–495.

Elliott, S. N., Kurz, A., Beddow, P., & Frey, J. (2009,February). Cognitive load theory: Instruction-Basedresearch with applications for designing tests. Paperpresented at the national association of school psychol-ogists’ annual convention, Boston.

Haladyna, T. M., Downing, S. M., & Rodriguez,M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. AppliedMeasurement in Education, 15, 309–333.

Johnstone, C. J., Bottsford-Miller, N. A., & Thompson,S. J. (2006). Using the think aloud method (cogni-tive labs) to evaluate test design for students withdisabilities and English language learners (Technicalreport 44). National Center on Educational Outcomes,University of Minnesota, 25.

Kettler, R. J., Elliott, S. N., & Beddow, P. A. (2009).Modifying achievement test items: A theory-guidedand data-based approach for better measurement ofwhat students with disabilities know. Peabody Journalof Education, 84, 529–551.

Kettler, R. J., Rodriguez, M. R., Bolt, D. M., Elliott,S. N., Beddow, P. A., & Kurz, A. (in press).Modified multiple-choice items for alternate assess-ments: Reliability, difficulty, and differential boost.Applied Measurement in Education.

Ketterlin-Geller, L. R. (2008). Testing students withspecial needs: A model for understanding theinteraction between assessment and student char-acteristics in a universally designed environment.Educational Measurement: Issues and Practice, 27,3–16.

Kurz, A., & Elliott, S. N. (2011). Overcoming barri-ers to access for students with disabilities: Testingaccommodations and beyond. In M. Russell (Ed.),Assessing students in the margins: Challenges, strate-gies, and techniques. Charlotte, NC: Information AgePublishing.

Mace, R. L. (1991). Definitions: Accessible, adaptable,and universal design (Fact Sheet). Raleigh, NC: Centerfor Universal Design, NCSU.

Mace, R. (1997). The principles of universal design(2nd Ed.). Raleigh, NC: Center for Universal

Design, College of Design. Retrieved May 20,2010, from http://www.design.ncsu.edu/cud/pubs_p/docs/poster.pdf

Mace, R. L., Hardie, G. J., & Place, J. P. (1996). Accessibleenvironments: Toward universal design. RetrievedMay 20, 2010, from http://www.design.ncsu.edu/cud/pubs_p/docs/ACC%20Environments.pdf

Mayer, R. E., Bove, W., Bryman, A., Mars, R., &Tapangco, L. (1995). When less is more: Meaningfullearning from visual and verbal summaries of sciencetextbook lessons. Journal of Educational Psychology,88, 54–73.

Mayer, R. E., & Moreno, R. (2003). Nine ways to reducecognitive load in multimedia learning. EducationalPsychologist, 38, 43–52.

Miller, G. A. (1956). The magical number seven, plus orminus two: Some limits on our capacity for processinginformation. Psychological Review, 63, 81–97.

Moreno, R., & Mayer, R. E. (1999). Cognitive princi-ples of multimedia learning: The role of modalityand contiguity. Journal of Educational Psychology, 91,348–368.

Mousavi, S. Y., Low, R., & Sweller, J. (1995). Reducingcognitive load by mixing auditory and visual presen-tation modes. Journal of Educational Psychology, 87,319–334.

National Center for Education Statistics. (2011). TheNation’s Report Card: Science 2009 (NCES 2011–451). Washington, DC: Institute of EducationSciences, U.S. Department of Education.

Plass, J. L., Moreno, R., & Brunken, R. (Eds.).(2010). Cognitive load theory. New York: CambridgeUniversity Press.

Porter, A. C. (2006). Curriculum assessment. In J. L.Green, G. Camilli & P. B. Elmore (Eds.), Handbookof complementary methods in education research (pp.141–159). Mahwah, NJ: Lawrence Erlbaum.

Roach, A. T., Beddow, P. A., Kurz, A., Kettler, R. J.,& Elliott, S. N. (2010). Incorporating student inputin developing alternate assessments based on mod-ified academic achievement standards. ExceptionalChildren, 77, 61–80.

Rodriguez, M. C. (1997, August). The art & science ofitem-writing: A meta-analysis of multiple-choice itemformat effects. Paper presented at the annual meet-ing of the American Education Research Association,Chicago, IL.

Rodriguez, M. C. (2005). Three options are optimal formultiple-choice items: A meta-analysis of 80 yearsof research. Educational Measurement: Issues andPractice, 24, 3–13.

Rose, D. H., & Meyer, A. (2002). Teaching every stu-dent in the digital age: Universal design for learn-ing. Alexandria, VA: Association for Supervision andCurriculum Development.

Sweller, J. (2010). Cognitive load theory: Recent theoret-ical advances. In J. L. Plass, R. Moreno & R. Brunken(Eds.), Cognitive load theory (pp. 29–47). New York:Cambridge University Press.

Page 198: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

182 P.A. Beddow et al.

Sweller, J., & Chandler, P. (1994). Why some material isdifficult to learn. Cognitive Instruction, 12, 185–233.

Thompson, S. J., Johnstone, C. J., Anderson, M. E. &Miller, N. A. (2005). Considerations for the develop-ment and review of universally designed assessments(Technical report 42). Minneapolis, MN: Universityof Minnesota, National Center on EducationalOutcomes.

Thompson, S. J., Johnston, C. J., & Thurlow, M.L. (2002). Universal design applied to large-scaleassessments (Synthesis Report 44). Minneapolis,MN: University of Minnesota, National Center onEducational Outcomes.

Torcasio, S., & Sweller, J. (2010). The use of illustra-tions when learning to read: A cognitive load the-ory approach. Applied Cognitive Psychology, 24(5),659–672.

Webb, N. L. (2002, April). An analysis of the align-ment between mathematics standards and assessmentsfor three states. Paper presented at the american edu-cational research association annual meeting, NewOrleans, LA.

Wright, N. (2009). Towards a better readabilitymeasure – The Bog index. Retrieved June 5,2010, from http://www.clearest.co.uk/files/TowardsABetterReadabilityMeasure.pdf

Page 199: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

10Validity Evidence for MakingDecisions About Accommodatedand Modified Large-Scale Tests

Gerald Tindal and Daniel Anderson

The Standards on Psychological and EducationalTests are typically referenced as the starting pointto any argument about general education test-ing. Likewise, they should be the starting pointfor testing students with disabilities, whether thedecision focuses on the use of an accommoda-tion or a recommendation to participate in analternate assessment based on modified academicachievement standards (AA-MAS). Using fivedifferent kinds of evidence, the standards pro-vide a frame of reference focused on makinginferences and decisions. “Validity refers to thedegree to which evidence and theory support theinterpretations of test scores entailed by proposeduse of tests . . . It is the interpretations of testscores required by proposed uses that are evalu-ated, not the test itself” (American EducationalResearch Association, American PsychologicalAssociation, & National Council of Measurementin Education, 1999, p. 9).

The process is iterative with different kindsof evidence being collected at different pointsin a testing program or in reference to vary-ing inferences and decisions. For example, instandards-based testing, content-related evidenceis considered essential. Usually, this type of evi-dence comes from an alignment study with statestandards. When the focus is on the constructsbeing tested, evidence on internal structures tends

G. Tindal (�)University of Oregon, Eugene, OR 97403, USAe-mail: [email protected]

to dominate. In large-scale testing programs,criterion-related evidence may be less valuedbecause of the emphasis on achieving proficiencyon grade-level content standards, but neverthelessmay be important when considering independentinformation from other measures of student aca-demic performance. In assessing students withlimited English or being served in special edu-cation programs, the focus may turn to responseprocesses, or the degree to which relevant targetskills are being measured rather than superflu-ous access skills. Finally, in any standards-based,large-scale testing program, the consequence ofassigning student proficiency arises from docu-menting adequate yearly progress (AYP). In theend, validity is about inferences and the supportbehind them.

Generally, these sources of evidence are usedto counter threats to construct misrepresentationor construct under-representation. In the former,irrelevant constructs are being conflated withthose the test was designed to measure. For exam-ple, many mathematics tests require considerablereading skills and therefore misrepresent mathe-matics achievement. In the latter, constructs areunder-represented by tapping primarily prereq-uisite or partial skills that do not fully reflectthe construct for which the test was designed.A common example of this might be a sim-ple mathematics story problem that reflects onlycomputation skills, not the decision-making skillsneeded to weigh relevant from irrelevant infor-mation for solving mathematical problems. Thesesources of evidence, along with documentation of

183S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4_10, © Springer Science+Business Media, LLC 2011

Page 200: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

184 G. Tindal and D. Anderson

reliability (or reproducibility), however, are reallyonly part of the argument; indeed, they representonly the beginning of the argument.

Although proper test design (see Downing &Haladyna, 2006) is used to assemble a validityargument and organize the collection of support-ing evidence, further procedural and statisticaldocumentation is needed to document the process(test administration and scoring) and the outcome(decisions and consequences). In standards-basedtesting, participation is a critical issue by sub-group and performance levels. Standardizationis another issue, often spilling into the use ofaccommodations or changes in the test so thatspecific subgroups have equitable access. To fullyunderstand the outcomes, evaluations of testingprograms need to include concurrent informationabout populations of students and implementa-tion by teachers.

A problem exists, however, in that studentgroups are often more fluid than would bedesired and administration-scoring more flexi-ble than normally assumed. For example, inthe mid-1980s, the findings from the Institutefor Research on Learning Disabilities (IRLD –University of Minnesota) raised serious doubtsabout the identification of students with learningdisabilities (Ysseldyke et al., 1982). Therefore,reporting test outcomes by disability groups maybe problematic. Another significant problem isthat the Standards (AERA, APA, NCME, 1999)confuse critical terms in test administration forstudents with disabilities, treating modificationsand accommodations as the same within a valid-ity argument: “A variety of test modificationstrategies have been implemented in various set-tings to accommodate the needs of test takerswith disabilities” (p. 103). As a consequence,confusion may exist in defining the (substantive)nature of changes: When is a change sufficientlysubstantial that it reflects a different constructand should be reserved for a different kind ofdecision?

In both instances, validity evidence can nolonger be assembled from test design and devel-opment. Rather other sources of evidence needto be collected to document the effects of test-ing specific groups and making specific changes.

From the perspective of the testing program (notjust the test), these sources of variance are embed-ded in practice: in the training and implementa-tion process and in the procedures used in givingthe test to students.

We turn to the evidence from quasi-experimental and experimental studies conductedwith varying levels of control that seek to affirmcausal inference or, at the very least, clarityof interpretation. At this point, the validationprocess turns to research studies in which testsand measures are used in the field under varyingconditions. In the course of the study, students areassigned testing conditions, teachers are trainedin testing administrations, and data are collectedon the outcomes (using some type of dependentvariable). This research base (experimental andquasi-experimental) also needs to be consideredin addition to the evidence collected as part ofproper test design and construction. Here theterm validity refers to “the approximate truthof an inference” (Shadish, Cook, & Campbell,2002, p. 34).

In considering the evidence from theseresearch studies, different criteria are used tojudge their adequacy (or more properly, the ade-quacy of the causal description or explanation).Generally, four types of validity evidence canbe documented, with each type focusing on athreat that counters or weakens causal inferences(Shadish et al., 2002). The first type involvesthe internal validity and focuses on the controlof the study and how false conclusions can bemade between cause and effect: “The validity ofinferences about whether observed co-variationbetween A (the presumed treatment) and B (thepresumed outcome) reflects a causal relationshipfrom A to B as those variables were manipu-lated or measured” (p. 38). For example, manystudies involve intact groups of students with-out random assignment and administer tests overlong periods of time. As a consequence, con-founding variables may be present (from par-ticipant backgrounds, histories, settings, testing,etc.) that make any causal inferences suspect.The second type of validity is statistical conclu-sion validity or “the validity of inferences aboutthe correlation (co-variation) between treatment

Page 201: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

10 Decisions-Making Evidence in Large-Scale Tests 185

and outcome, including effect sizes” (p. 38). Forexample, accommodation studies have varyingsample sizes receiving a treatment of unknownintegrity, use specific outcome measures, andconclusions based on various significance tests.The central question is whether the study canreflect co-variation by having sufficient power,in the presence of a known treatment, and usingappropriate analytical techniques on an adequateoutcome measure. The third type of validity isexternal validity or “the validity of inferencesabout whether the causal relationships holds overvariation in persons, settings, treatment variables,and measurement variables” (p. 38). Can the find-ings from the study be generalized to others whoare similar to them and given similar circum-stances (treatments and outcomes)? For example,in studies of students with disabilities, the man-ner in which they have been labeled, recruited,treated, or measured may limit generalizations.Finally, the fourth type of validity is constructvalidity, “defined as the degree to which infer-ences are warranted from the observed persons,settings, and cause and effect operations includedin a study to the constructs that these inferencesmight represent” (p. 38). For example, with aspecific accommodation administered to a popu-lation in a study with relevant quasi-experimentalcontrols and findings (conclusions) generated,can the study provide an appropriate explanation?

In the next section of this chapter, we focusour review from these two validation perspec-tives. The first focuses on the response pro-cesses from the Standards for Educational andPsychological Tests (1999). Have researchersadequately reflected upon the test and itsdemands in conjunction with the students andtheir characteristics? We use two areas fromaccommodations research (extended time andread aloud) to illustrate the research basis. Thesecond part of the review approaches the vali-dation process from the criteria used to judgethe adequacy of quasi-experimental and experi-mental research (Shadish et al., 2002). We focusprimarily on construct validity as it addressesthe explanation in the most comprehensive man-ner. Again, we consider only studies within two

areas of accommodations (extended time andread aloud).

Current Practice in TestingApproaches and Implicationsfor Response Processes

With increased attention to large-scale testingand the disaggregation of outcomes by disabil-ity and English-language proficiency required bythe No Child Left Behind Act (NCLB), concur-rent emphasis has been placed upon participa-tion and accommodation, which are a necessarycomponent of all large-scale testing programs(No Child Left Behind Act, 2002). Thompson,Johnstone, Thurlow, and Altman (2005) reportthat all states now document accommodationsuse on statewide testing. According to a 2007report by the National Center on EducationalOutcomes (NCEO), 51% of students withIndividual Educational Programs (IEPs) receiveaccommodations on statewide tests (Thurlow,Altman, Cuthbert, & Moen, 2007). When accom-modations are viewed as inadequate to success-fully document student proficiency, some stateshave developed AA-MAS. These assessments aredesigned to be aligned with grade-level contentstandards but reduce the complexity of the items:“The expectations of content mastery are modi-fied, not the grade-level content standards them-selves” (U. S. Department of Education, April,2007, p. 9). They are designed for a small groupof students with a disability that would preventtheir attainment of grade-level proficiency in thesame time frame as their peers.

In this section, we focus on response pro-cesses using two lines of test accommodationsresearch to explicate the relation between taskdemands and student characteristics in under-standing response processes for: (a) extendedtime, and (b) read aloud. We use the Standards(AERA, APA, NCME, 1999) to define this con-struct (response processes) as “the fit between theconstruct and the detailed nature of the perfor-mance or response actually engaged in by exami-nees . . . validation may include empirical studies

Page 202: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

186 G. Tindal and D. Anderson

of how observers or judges record and evaluatedata along with analyses of the appropriatenessof these processes to the intended interpretationof construct definition” (pp. 12–13). In our anal-ysis, we believe it is important to go beyondthe focus on effectiveness of accommodationsbecause the procedures are quite disparate andthe findings considerably inconsistent, with littleattention given to the manner in which accom-modations are aligned with task demands andstudent characteristics.

Early research on accommodations wasquite limited (National Center on EducationalOutcomes, undated) with only few publishedpeer-reviewed studies (National Center onEducational Outcomes, undated). In 1999, 134studies were found and described in a summaryby Tindal and Fuchs (1999); this research basewas further updated by the National Center onEducational Outcomes (2001). However, bothreports provided only a description of the studies,including their methodologies but without resultson the outcomes of various accommodations.Finally, in the past decade, research on the effectsfrom test accommodations has expanded withtwo meta-analyses completed (Chiu & Pearson,1999; Sireci, 2004).

Nevertheless, little research has focused onthe decision-making process itself, either in thedescriptive summaries of research or in the doc-umentation of outcomes. While there have beenhelpful manuals and procedural documents thatexplicate the process for selecting accommoda-tions (Thompson, Morse, Sharpe, & Hall, 2005),very few empirical studies have been conductedon these processes. As a consequence, we knowlittle about how teachers actually recommendor select specific accommodations. Generally,what little research exists is not particularlysupportive of the process (see Fuchs, Fuchs,Eaton, Hamlett, & Karns, 2000 and Hollenbeck,Tindal, & Almond, 1998).

As we view the decision-making processfor either recommending specific accommoda-tions or assigning students to participate in theAA-MAS, the focus needs to be on (a) the taskpresented (i.e., the specific domain the measureis assessing) and (b) the student’s individual

background, knowledge, and skills. Together,these characteristics should guide the decision toeither (a) provide an accommodation to the stan-dard test or (b) recommend the student take anAA-MAS. This decision is likely to reflect theinteraction between the two; hopefully, however,it also is guided by the available research. As wenoted earlier, an additional consideration shouldbe the research basis and its quality.

In our view, however, the definition and analy-sis of task demands need to be specific enoughto be useful. For instance, an investigator maystate that they have used a “math” measure,without explicitly stating or analyzing the dif-ferent sub-domains within the measure. Giventhat math tasks may require qualitatively dif-ferent skills (i.e., algebra versus geometry), theeffect of the decision may be confounded. Forexample, it may be inferred that the accommo-dation provides a “differential boost” (Fuchs,Fuchs, Eaton, Hamlett, & Karns, 2000) to stu-dents on items assessing geometric relationshipsbut not on items assessing algebraic relationships.A similar problem exists in the way student char-acteristics are considered or the manner in whichstudents have been labeled. Students with a rangeof academic difficulties may all be grouped underthe label “learning disabled” (Morris et al., 1998).Yet the effect of the decision may vary substan-tially between students with the same label butdifferent skills as well as between students withdifferent labels. Unfortunately, students receivelabels with little detail in specific behaviors asso-ciated with the label (and not all labels are aboutdisabilities). For instance, if a read aloud accom-modation is being investigated, how do studentsclassified as “in-need” of the accommodationqualify? Is it because they have received somesort of label by the school? Or, is it more specific,such as the student demonstrating low fluency? Ifso, how low is “low”?

Between each of these variables (taskdemands and student characteristics) are poten-tial interactions that may influence the outcome.Future research may examine these factorsexplicitly; however, paying greater attention tothe task demands and student characteristicsmay alleviate considerable confusion. In the

Page 203: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

10 Decisions-Making Evidence in Large-Scale Tests 187

remainder of this chapter, we present a reviewof literature but limit it to experimental andquasi-experimental research in the K-12 set-tings. Additionally, because considerably moreresearch has been conducted on accommoda-tions, we focus primarily on this area but arguethat the same issues apply to decisions aboutselecting the AA-MAS for students. To keep thechapter within a reasonable length, we furtherlimit our focus to two areas of accommodationsresearch that have been most heavily investi-gated: extended time and read aloud. These twoareas are among the most prevalent accommoda-tions used in practice and are, therefore, amongthe most researched (Thurlow, 2007).

Although evidence of accommodation effec-tiveness is quite disparate, ranging from highlyeffective (Huesman & Frisbie, 2000; Johnson,2000) to completely ineffective (Cohen, Gregg,& Deng, 2005; Meloy, Deville, & Frisbee, 2002),we are less concerned with the outcome than therelation between task demands and student char-acteristics. For example, some research showshow accommodations provide a differential boostfor students with disabilities (Tindal, Heath,Hollenbeck, Almond, & Harniss, 1998) whileothers show the effect as homogenous acrossgroups (Elliott & Marquart, 2004; Meloy et al.,2002; Munger & Loyd, 1991). Still others showa differential boost for students without a disabil-ity (Elbaum, 2007; Elbaum, Arguelles, Campbell,& Saleh, 2004; Fuchs, Fuchs, Eaton, Hamlett, &Karns, 2000; Tindal et al., 1998). Examining thetask demands and the student characteristics mayhelp explain these results. Unfortunately, as we

document with the research, this information isoften unavailable for both variables.

In Tables 10.1 and 10.2, we present a summaryjudgment of adequacy for a sample of researchliterature on extended time and read aloudaccommodations. The judgments come from ananalysis of the study descriptions of task demandsand student characteristics. Task demands arerated on a 0–2 scale. Studies describing the taskas a generic “math” or “reading” test received ascore of 0; studies describing, but not analyzing,sub-domains received a score of 1; studies bothdescribing and analyzing sub-domains receiveda score of 2. Student characteristics are ratedon a 0–3 scale. Studies that grouped studentsinto generic, school assigned “LD/non-LD” cat-egories received a score of 0; studies providingany additional distinctions received a score of 1;studies providing a specific label (i.e., dyslexic)received a score of 2; and studies using specificmeasures to categorize students (i.e., all studentsscoring below the nth percentile), received a scoreof 3. These criterion ratings are then totaled toprovide an overall criterion rating on a 0–5 scale.

We primarily include only recent studiesjust prior to and following passage of NCLB,although we reference the Munger and Loyd(1991) study because of its strong researchdesign. We also focus on experimental andquasi-experimental studies rather than field-baseddescriptive studies. The Cohen et al. (2005) studywas non-experimental, but is included becausethe sample was large enough to be limited to onlystudents receiving one accommodation (extendedtime). In this particular study, differential item

Table 10.1 Descriptions for task demands and student characteristics for extended time

ArticleTask description rating(0–2)

Accommodationgrouping rating (0–3)

Total criteria rating(0–5)

Elliott and Marquart (2004) 0 1 1

Munger and Loyd (1991) 1 0 1

Fuchs, Fuchs, Eaton, Hamlett,and Karns (2000)

2 0 2

Cohen et al. (2005) 2 0 2

Fletcher et al. (2006) 2 3 5

Fuchs, Fuchs, Eaton, Hamlett,Binkley, et al. (2000)

2 0 2

Huesman and Frisbie (2000) 2 0 2

Page 204: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

188 G. Tindal and D. Anderson

Table 10.2 Descriptions for task demands and student characteristics for read aloud

ArticleTask descriptionrating (1–3)

Accommodationgrouping rating(1–4)

Total criteria rating(2–7)

Crawford and Tindal (2004) –reading

3 2 5

Elbaum (2007) – reading 2 2 4

Fletcher et al. (2006) – reading 3 4 7

Fuchs, Fuchs, Eaton, Hamlett,Binkley, et al. (2000) – reading

3 1 4

Hale et al. (2005) – reading 3 3 6

Elbaum et al. (2004) – math 3 1 4

Helwig et al. (1999) – math 2 4 6

Helwig et al. (2002) – math 2 2 4

Johnson (2000) – math 1 2 3

Tindal et al. (1998) – math 1 1 2

Meloy et al. (2002) – all subjects 3 2 5

functioning (DIF) analyses were then conductedto examine the difficulty of items in the accom-modated condition versus a random sample ofstudents in the un-accommodated condition.

Extended Time Research

Typically, extended time involves providing areasonable amount of extra time (often up toone-half the standard amount of time) so that stu-dents may finish the test. This is an importantaccommodation because standards-based testsare designed to measure what students can do,not just what they did do. If students’ perfor-mance is low because an arbitrary time limit isreached, no one can know what the student cando. Furthermore, it helps distinguish missing data(items left blank) from incorrect data (items withthe wrong option selected). Otherwise, these twotypes of data are conflated (and missing itemsbecome de facto incorrect items).

Task Demands

Many studies provide exemplary descriptionsof the task demands (e.g., Cohen et al., 2005;Fletcher et al., 2006; Fuchs, Fuchs, Eaton,Hamlett, Binkley, et al., 2000; Fuchs, Fuchs,

Eaton, Hamlett, & Karns, 2000; Huesman &Frisbie, 2000). Rather than simply stating the taskas “math” or “reading,” the authors of these stud-ies describe domains such as “plane geometry”(Cohen et al., 2005), “reading comprehension”(Fletcher et al., 2006; Huesman & Frisbie, 2000),or “concepts and applications” (Fuchs, Fuchs,Eaton, Hamlett, & Karns, 2000). Additionally,each of these studies analyzes the effect of theaccommodation on each sub-domain, rather thanwith the test as a whole. Other studies (e.g.,Munger & Loyd, 1991) provide a descriptionof the sub-domains, but provide no analysis onthem. Therefore, it is not possible to documentany potential interaction between the accommo-dation and the task sub-domain. Still other studiesprovide detailed accounts of the test developmentprocess, while the analysis is based on a genericlabel (Elliott & Marquart, 2004).

Student Characteristics

While the majority of studies describe the taskdemands, student characteristics are often notdescribed beyond a generic label (Cohen et al.,2005; Fuchs, Fuchs, Eaton, Hamlett, Binkley,et al., 2000; Fuchs, Fuchs, Eaton, Hamlett, &Karns, 2000; Huesman & Frisbie, 2000; Munger& Loyd, 1991). Elliott and Marquart (2004)

Page 205: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

10 Decisions-Making Evidence in Large-Scale Tests 189

provide a more detailed account of student char-acteristics than is typical, classifying studentsinto three groups: students identified as LD,non-LD, and educationally at-risk in math. Theat-risk label came from teacher ratings on anevaluation scale. This is a helpful step in goingbeyond the standard labels, although, as theauthors themselves highlight, students groupedunder the generic LD label were heterogeneous,comprised of “students with mild learningdisabilities, emotional disabilities, behavioraldisabilities, mild physical disabilities, speechand language disabilities, and mild cognitivedisabilities” (Elliott & Marquart, 2004, p. 354). Itis reasonable to expect that students within eachof these disability categories respond to accom-modations differently. Fletcher et al. (2006) useda detailed descriptive label (dyslexic) and wasthe only study found that classified studentsas in need of the accommodation based ondocumented performance.

Read Aloud Research

This accommodation involves having tests read tothe student (a) to provide access to the test whichwould otherwise be inaccessible, and (b) so thatirrelevant access skills may be distinguished fromrelevant target skills. However, considerable vari-ation exists in (a) which parts of the test can beread and (b) the manner in which the reading isdone. For example, test directions often are read;on occasion, the prompts (but not the options)are read; finally, the subject matter often is con-sidered in the use of read aloud accommoda-tions, with mathematics more typically targetedthan reading tests. The delivery itself also varieswhether the accommodation is read by a trainedperson, or by a controlled reader through either acompact disc or a computer.

Task Demands

Of the 12 studies reviewed on the read aloudaccommodation, six describe and analyze thetask demands in detail (Crawford & Tindal,

2004; Elbaum et al., 2004; Fletcher et al., 2006;Fuchs, Fuchs, Eaton, Hamlett, Binkley, et al.,2000; Hale, Skinner, Winn, Allin, & Molloy,2005; Meloy et al., 2002); four describe thetask demands in detail but conduct no analy-ses on the sub-domains (Elbaum, 2007; Helwig,Rozek-Tedesco, & Tindal, 2002; Helwig, Rozek-Tedesco, Tindal, Heath, & Almond, 1999); andtwo do not describe task demands in sufficientdetail (Johnson, 2000; Tindal et al., 1998).

Reading comprehension was the mostcommon task demand specifically described.However, the way this task was accommodateddiffered markedly among studies – ranging fromthe student reading the test aloud on their own(Fuchs, Fuchs, Eaton, Hamlett, Binkley, et al.,2000), to the test being administered with avideo read aloud (Crawford & Tindal, 2004) orby a designated proctor (Elbaum, 2007; Haleet al., 2005). Still others read aloud only theproper nouns, stems, and answer options to thestudent (Fletcher et al., 2006). Both the Fuchs,Fuchs, Eaton, Hamlett, Binkley, et al. (2000) andthe Hale et al. (2005) studies further describedthe item demands, stating the number andtype of each item. These descriptions were notcarried on to item-level analyses in either study;however, unlike differences between domains,the subtleties of item type differences within adomain are not likely to confound the observedeffectiveness of a particular accommodation.Meloy et al. (2002) read aloud four sub-testsof the Iowa Test of Basic Skills (ITBS), cov-ering Reading Comprehension, Science, MathProblem-Solving and Data interpretation, andUsage and Expression; separate analyses wereconducted for each test.

In mathematics, the task demands were oftendescribed in detail (Elbaum, 2007; Helwig et al.,2002, 1999; Johnson, 2000; Tindal et al., 1998),including tasks such as “number sense and oper-ations, geometry, data analysis and probability,algebraic thinking, and measurement” (Elbaum,2007, p. 221) or “number concepts, mathe-matic relationships, geometry, estimation, statis-tics, and measurement” (Helwig et al., 1999,p. 116). However, sub-domain analyses were notconducted in any of the studies, which resulted in

Page 206: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

190 G. Tindal and D. Anderson

Tab

le1

0.3

Des

crip

tion

ofre

sear

chel

emen

tsfo

rex

tend

edtim

est

udie

s

Aut

hor

(yea

r)Pa

rtic

ipan

tsn(

gen/

sped

)C

onte

ntar

eaD

esig

nT

reat

men

tM

easu

re

Elli

otta

ndM

arqu

art(

2004

)97

8th

grad

ers

from

4m

iddl

esc

hool

sin

4Io

wa

dist

rict

s;23

stud

ents

with

disa

bilit

ies

iden

tified

,23

stud

ents

“atr

isk”

and

51st

uden

tsat

orab

ove

grad

ele

vel

Mat

hC

ross

edE

xten

ded/

note

xten

ded

Two

equi

vale

ntsh

ortf

orm

sof

Terr

aN

ova

18A

ccom

mod

atio

nsSu

rvey

Aca

dem

icco

mpe

tenc

eE

valu

atio

nSc

ale

(AC

ES)

Elli

otte

tal.

(199

9)R

hode

Isla

nd4t

hgr

ader

sw

ithto

taln

:11,

273–

11,4

29Sp

ed:1

264–

1306

Mat

hW

ritin

gH

ealth

Com

para

tive/

Nes

ted

Exa

min

edw

heth

eror

nots

tude

nts

rece

ived

anac

com

mod

atio

nan

dif

so,w

hatt

ype

and

itsef

fect

onsc

ores

byst

uden

tgro

ups

Rho

deIs

land

Perf

orm

ance

Ass

essm

ent

Mun

ger

and

Loy

d(1

991)

222

5th

grad

ers

from

18el

emen

tary

scho

ols

in6

scho

oldi

stri

cts

inV

irgi

nia.

6w

ere

phys

ical

lyha

ndic

appe

d,94

LD

,11

2no

hand

icap

ping

cond

ition

Mat

h&

Lan

guag

eA

rts

Cro

ssed

Tim

ed/n

ottim

edIT

BS:

Lan

guag

eus

age

&ex

pres

sion

(n=

109)

ITB

S:M

ath

conc

epts

(n=

113)

Fuch

s,Fu

chs,

Eat

on,H

amle

tt,an

dK

arns

(200

0)

373

stud

ents

190

LD

:129

G4,

63G

518

1no

n-L

D:G

rade

4

Mat

hC

ross

edA

dmin

iste

red

CB

Mun

der

stan

dard

cond

ition

sth

enun

der

vari

ous

acco

mm

odat

ions

•Tw

oal

tern

ate

form

sof

com

puta

tion

CB

M•F

our

alte

rnat

efo

rms

ofco

ncep

tsan

dap

plic

atio

nC

BM

•Fiv

eal

tern

ate

form

sof

prob

lem

solv

ing

CB

M•A

llat

3rd

grad

ele

vel

Coh

enet

al.

(200

5)D

ata

obta

ined

from

2003

adm

inis

trat

ion

ofFl

orid

aC

ompr

ehen

sive

Ass

essm

ent

Test

.211

,601

9th

grad

ers

eval

uate

dR

ando

msa

mpl

eof

1,25

0L

D&

1,25

0w

ithou

tLD

eval

uate

d

Mat

hC

ompa

rativ

eO

nly

the

stud

ents

rece

ivin

gex

tend

edtim

eas

thei

ron

lyac

com

mod

atio

nw

ere

incl

uded

inan

alys

is

Flor

ida

Com

preh

ensi

veA

chie

vem

entT

est

Hue

sman

and

Fris

bie

(200

0)St

uden

tsfr

omtw

odi

stri

cts:

6th

grad

e12

9L

D;4

09no

n-L

DR

eadi

ngC

ross

ed61

LD

had

both

exte

nded

and

stan

dard

time,

68ha

don

lyex

tend

ed;n

on-L

Dst

uden

thad

both

cond

ition

s

Form

K–

Lev

el12

,of

read

ing

com

preh

ensi

onIT

BS

Page 207: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

10 Decisions-Making Evidence in Large-Scale Tests 191Ta

ble

10

.4D

escr

iptio

nof

rese

arch

elem

ents

for

read

alou

dst

udie

s

Aut

hor

(yea

r)Pa

rtic

ipan

tsn(

gen/

sped

)C

onte

ntar

eaD

esig

nT

reat

men

tM

easu

re

Cra

wfo

rdan

dT

inda

l(20

04)

338

stud

ents

with

89T

itle

1an

d76

spec

iale

duca

tion)

in4t

han

d5t

hgr

ades

Rea

ding

(five

pass

ages

with

5–8

ques

tions

inea

chfo

rm)

Cro

ssed

desi

gn(w

ithin

subj

ects

anal

ysis

)V

ideo

tape

ofite

ms

read

alou

dve

rsus

stan

dard

test

adm

inis

trat

ion

Form

Aan

dB

for

each

Rea

ding

Test

Elb

aum

(200

7)64

3st

uden

tsin

grad

es6–

10(3

88w

ithL

D)

Mat

hem

atic

sC

lass

room

sas

sign

edto

one

offo

urco

nditi

on(t

reat

men

tand

form

)al

low

ing

for

1be

twee

n–

1w

ithin

AN

OV

A

Teac

her

read

alou

dw

ithtim

edpa

cing

for

stud

ents

tore

spon

dve

rsus

stan

dard

Two

auth

orco

nstr

ucte

dm

ultip

lech

oice

mat

hte

sts

(30

item

spe

rfo

rm)

usin

gpr

actic

eite

ms

from

stat

ete

sts

and

cont

rolle

dfo

rlin

guis

ticco

mpl

exity

Flet

cher

etal

.(2

006)

3rd

grad

est

uden

ts(9

1w

ithD

ysle

xia

and

91av

erag

e)R

eadi

ngSt

uden

tsra

ndom

lyas

sign

edto

eith

erst

anda

rdor

acco

mm

odat

edco

nditi

onan

dte

sted

insm

allg

roup

s

Ext

ende

dse

ssio

nto

two

bloc

ksan

dte

ache

rsre

adpr

oper

noun

sas

wel

las

stem

san

dpo

ssib

lere

spon

ses

vers

usst

anda

rdte

stad

min

istr

atio

n

Texa

sR

eadi

ngA

sses

smen

tof

Kno

wle

dge

and

Skill

s(T

AK

S)

Fuch

set

al.

(200

0)4t

han

d5t

hgr

ade

stud

ents

with

LD

(181

)an

dw

ithou

tL

D(1

84)

Rea

ding

Cro

ssed

(usi

ngw

ithin

-sub

ject

sco

mpa

riso

ns)

Stud

entr

ecei

ved

exte

nded

time,

larg

epr

int,

and

read

the

test

alou

dve

rsus

stan

dard

test

adm

inis

trat

ion

Iow

aTe

stof

Bas

icSk

ills:

Rea

ding

Com

preh

ensi

on

Hal

eet

al.

(200

5)4

mal

ese

cond

ary

stud

ents

diag

nose

das

emot

iona

llydi

stur

bed

Rea

ding

Cro

ssed

(tim

ese

ries

desi

gnov

er9

sess

ions

)L

iste

ning

and

Lis

teni

ngw

hile

read

ing

vers

usa

sile

ntre

adin

gco

ntro

lcon

ditio

n

Tim

edR

eadi

ngpa

ssag

es(n

umbe

ran

dra

tein

com

preh

ensi

on)

Hel

wig

etal

.(1

999)

247

elem

enta

ry-a

gest

uden

tsgr

oupe

das

high

and

low

onre

adin

gflu

ency

and

mat

hco

mpu

tatio

n

Mat

hem

atic

sC

ross

edde

sign

(usi

ngw

ithin

subj

ectc

ompa

riso

ns)

Vid

eota

ped

read

alou

dof

ast

ate

mat

hte

stw

ithpa

ced

pres

enta

tion

vers

usa

stan

dard

adm

inis

trat

ion

21-q

uest

ion

com

puta

tiona

ltes

tan

da

stat

ewid

em

athe

mat

ics

test

John

son

(200

0)11

54t

hgr

ade

stud

ents

Mat

hem

atic

sC

ross

edde

sign

:Stu

dent

sas

sign

edac

cord

ing

toed

ucat

ion

plac

emen

t:(a

)ge

nera

ledu

catio

nco

ntro

l(n

=39

),(b

)ge

nera

ledu

catio

nw

how

ere

poor

orgo

odre

ader

s(n

=38

),an

d(c

)sp

ecia

led

ucat

ion

for

read

ing

(n=

38)

vers

usa

stan

dard

adm

inis

trat

ion

Proc

tors

read

item

sve

rbat

imto

half

ofst

uden

tsin

grou

pB

and

tost

uden

tsin

grou

pC

Was

hing

ton

stat

ewid

eac

hiev

emen

ttes

t(W

ASL

)

Page 208: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

192 G. Tindal and D. Anderson

Tab

le1

0.4

(con

tinue

d)

Aut

hor

(yea

r)Pa

rtic

ipan

tsn(

gen/

sped

)C

onte

ntar

eaD

esig

nT

reat

men

tM

easu

re

Ket

tler

etal

.(2

005)

118

4th

grad

ers

(49

with

disa

bilit

ies)

and

788t

hgr

ader

s(3

9w

ithdi

sabi

litie

s)

Rea

ding

and

Mat

hem

atic

sC

ross

ed:S

tude

nts

with

disa

bilit

ies

assi

gned

toac

com

mod

atio

nsba

sed

onth

eir

IEP

and

then

ast

uden

twith

out

disa

bilit

ies

pair

edw

ithth

em:

Cou

nter

-bal

ance

din

adm

inis

trat

ion

with

and

with

outt

heac

com

mod

atio

n

Proj

ecta

ssis

tant

sad

min

iste

red

the

acco

mm

odat

ion

Terr

aN

ova

Mul

tiple

Ass

essm

entB

atte

ry:T

wo

read

ing

and

two

mat

hsu

btes

ts

Mel

oyet

al.

(200

2)A

tota

lof

260

stud

ents

:•9

86t

hgr

ader

s•8

47t

hgr

ader

s•7

88t

hgr

ader

s62

stud

ents

with

read

ing

disa

bilit

ies

and

198

with

out

All

subj

ecta

reas

:R

eadi

ng,W

ritin

g,M

athe

mat

ic,a

ndSc

ienc

e

Nes

ted:

Stud

ents

rand

omly

assi

gned

tota

keal

ltes

tsin

eith

eran

acco

mm

odat

edor

ast

anda

rdad

min

istr

atio

n

The

read

alou

dco

nditi

onw

assc

ript

ed(w

ithsl

ight

lym

ore

time)

Iow

aTe

stof

Bas

icSk

ills:

Scie

nce,

Usa

gean

dE

xpre

ssio

n,M

ath

Prob

lem

-Sol

ving

and

Dat

aIn

terp

reta

tion,

and

Rea

ding

Com

preh

ensi

on

Tin

dale

tal.

(199

8)11

44t

hgr

ade

stud

ents

with

and

with

outd

isab

ilitie

sM

athe

mat

ics

Nes

ted:

Stud

ents

rand

omly

assi

gned

toei

ther

are

adal

oud

cond

ition

ora

stan

dard

test

adm

inis

trat

ion

(allo

win

gus

eof

1w

ithin

and

1be

twee

nA

NO

VA

)

The

test

was

adm

inis

tere

din

both

cond

ition

sby

trai

ned

grad

uate

stud

ents

Ore

gon

Stat

ewid

eTe

st

Page 209: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

10 Decisions-Making Evidence in Large-Scale Tests 193

lower ratings in Tables 10.3 and 10.4. The overalllower ratings for math over reading suggestseither (a) more stringent standards used to eval-uate math articles, or (b) a lack of research evalu-ating math sub-domains. Unlike reading compre-hension, the accommodated read-aloud conditionin math was quite consistent: Conditions includedthe test being read by a trained teacher/proctor(Elbaum, 2007; Johnson, 2000; Tindal et al.,1998), or a video read aloud (Helwig et al., 2002,1999).

Student Characteristics

Of the studies reviewed, only Fletcher et al.(2006) and Hale et al. (2005) used a measurepercentile cutoff approach to classify students.Fletcher et al. “required that the participatingstudents clearly demonstrate word recognitiondifficulties” (p. 139). To ensure this, letter-word identification and word attack subtests ofthe Woodcock-Johnson III were administered.If the student scored above the 26th percentileor the student’s teacher had another assessmentshowing them functioning above the 25th per-centile, the student was no longer evaluated. Thisapproach actually documented the achievementlevel of students who were deemed “in need” ofthe accommodation. Students in the Helwig et al.(1999) study took a test of oral reading fluencyalong with a basic math skills test. Data analy-ses were then conducted on four student groups:low reader/high math, medium reader/high math,high reader/high math, and low reader/low math.However, too few students were available in themedium reader/low math or high reader/low mathfor meaningful analyses to be conducted.

Fletcher et al. (2006) also used a specificschool-assigned disability label to describe stu-dents (dyslexic), as did Hale et al. (2005) withtheir focus on emotionally disturbed. The mostcommon approach to classifying students was touse a generic, school-assigned label (i.e., LD).Fuchs, Fuchs, Eaton, Hamlett, Binkley, et al.(2000) have noted that “the appropriate unitof analysis in accommodation decisions is theindividual, not the LD label” (p. 69) and thatthe heterogeneity within the LD group “makes

conceptual analysis of meaningful test accom-modations impossible” (p. 68). Despite this, theauthors did not go beyond the school-identifiedlabels of LD and non-LD during data analysis,similar to others (Elbaum et al., 2004; Tindalet al., 1998).

Some studies classified students by genericlabels, but provided careful descriptions of howthe school deployed this label (i.e., discrepancymodel, RTI, etc.) or provided other achieve-ment information, such as state test results(e. g., Elbaum, 2007; Hale et al., 2005; Meloyet al., 2002). Other researchers stratified stu-dents into more than the typical two or threegroups (Special Education, General Education,and Title 1; Crawford & Tindal, 2004), whilestill other studies used a specific label (learningdisabled in reading, LD-R; Helwig et al., 2002;Johnson, 2000; Meloy et al., 2002). The LD-Rlabel was deemed slightly more specific than ageneral LD label; however, students identifiedas LD-R are still a largely heterogeneous group.Johnson (2000) never explicitly used the LD-Rlabel, but stated that the sample, “consisted ofstudents receiving special education services forreading disabilities, as defined by Washington’sAdministrative Code” (pp. 262–263). Meloy et al.(2002) used the school’s definition of LD-R, andfurther qualified the sample by stating that “stu-dents with additional diagnoses or service labelssuch as behaviorally or emotionally disturbedwere not included in the study” (p. 250).

Research Designs and Qualityof Research

As noted earlier, the experimental and quasi-experimental research can be analyzed by thedegree to which causal inferences are plausi-ble from known treatments (accommodations) todocumented outcomes (performance on a statetest or other measure of achievement).

In this analysis, threats to internal validityfocus on the co-variation of other causes asso-ciated with an effect and can arise from a num-ber of sources: ambiguous temporal sequences,subject selection biases, history and maturation,

Page 210: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

194 G. Tindal and D. Anderson

regression, attrition, testing, or instrumentationeffects. As can be seen in the sources of threatsto internal validity, the focus is on causal infer-ence in the context of study designs. Each of thethreats presents a counter-explanation of why aneffect was achieved irrespective of the treatment,which otherwise should provide the best explana-tion. Internal validity is concerned with clarity ofcausal reasoning and the errors that can arise fromexplanations attributed from a specific cause thatis related to a specific effect.

Threats to statistical conclusion validityreflect the strength of covariance between causeand effect and come from a number of sources:violated assumptions of the statistical tests,low statistical power, repeated analyses of data,restricted range of performance, marked het-erogeneity among populations, unreliable treat-ments, extraneous factors in the settings, unreli-ability of measures, and inaccurate effect sizes.In these threats, the focus is on the degree towhich co-variation can be documented; eachthreat weakens the ability of experiment to ascer-tain the presence of an effect, given a presumedcause. Statistical conclusion validity is concernedwith errors resulting from statistical analysesused to document co-variation.

With external validity, the focus is on gen-eralizations across students, settings, treatments,and measures. A number of threats can result inerrors resulting from an interaction of causal rela-tions among these four dimensions of a study,limiting inferences to only those students, set-tings, treatments, or measures that were used in astudy. External validity concerns the causal infer-ences (size or direction) interacting with students,settings, treatments, or measures.

Finally, and perhaps most importantly, thereare threats to construct validity that concernthe relation between the operations used ina study and the constructs used to describethem. A number of threats are present includ-ing inadequate explication of constructs, con-founding constructs, bias from mono-methodor mono-operations, constructs interacting witheach other, measurement reactivity, study partic-ipation and feedback, reactivity to experimen-tal situations, expectancies from experimenters,

novelty and disruption, compensatory equaliza-tion or rivalry, demoralization, and treatment dif-fusion. Construct validity addresses explanatorydescriptions of the study and the generalizationsthat can be made.

Extended Time

Of the six studies we reviewed on extended time,the internal validity of the studies reflected aclear direction of causal inference. Researchersconducted studies that were clear on the tem-poral sequence (of the treatment) and providedclean comparisons of subjects across treatmentand control conditions. However, most studiesalso used convenience samples, so subject selec-tion biases may have been present. All stud-ies were conducted over short time periods, sohistory and maturation were clearly not viablethreats as well as regression or attrition. Becauseof the crossed conditions, testing effects mayhave been present in studies that did not counter-balance introduction of the conditions. Finally,with crossed designs, testing and instrumenta-tion effects may have been present. In general,researchers on accommodations have producedstudies that avoid many threats to internal valid-ity, thus providing assurance of causation when(significant) effects were found.

Most of the research on extended time accom-modations reflected studies with appropriate sta-tistical conclusion validity. Researchers generallyused appropriate statistical tests and appropri-ately mined the data files. All studies had suf-ficient sample size to conduct various statisticaltests. With some populations (e.g. students withdisabilities), restricted ranges of performancewere present, perhaps lessening the ability toshow the strength of co-variance between treat-ments and outcomes; nevertheless, the researchon accommodations tended to include a widerange of (general education) students reflectingconsiderable heterogeneity of students. Few treat-ments were specifically defined or monitored,although they contained few extraneous factors.Finally, reliability of the measures was seldomdocumented. The research on accommodations

Page 211: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

10 Decisions-Making Evidence in Large-Scale Tests 195

in general included few controls for threatsto statistical conclusion validity, with consid-erable improvements needed before inferencescan be made about the presence (or strength) ofco-variation between accommodations and out-comes.

External validity is the area with the mostthreats. Only Elliott, Bielinski, Thurlow, DeVito,and Hedlund (1999) and Cohen et al. (2005) hadsufficient numbers of students to generalize tolarge-scale test populations. At the other extremewas a study by Hale et al. (2005) which includeda small sample size (n = 4). All of the stud-ies in accommodations reflected very idiosyn-cratic contexts, limiting generalizations to otherstudents, settings, treatments, and measures. Nostudies have been replicated.

Threats to construct validity (including con-sideration of the treatment and outcome measure)were present across all studies. With extendedtime, treatment integrity was rarely an issue asit was either defined or explicitly monitored.However, extended time in the field is rarelyoperationalized as a unitary construct but oftenconfounded with setting and grouping to allowschools an efficient manner for implementation.Clearly, then, the research on extended timehas more construct clarity than is executed inpractice. Most of the researchers used crosseddesigns, in which students served as their owncontrols (e.g., they received both the experi-mental accommodation treatment and the controlstandard test administration). Because crosseddesigns were primarily used, a number of threatswere minimized in the research, including studyparticipation, reactivity to experimental condi-tions, novelty or disruption, and compensatoryartifacts. Experimenter expectancies were neveraddressed in the research.

Read Aloud Accommodations

We reviewed 10 studies and found similar vari-ation among the researchers as we found withextended time. This variation affects threats tovalidity in different ways with the general effect

of providing the field only a nascent accumula-tion of findings to guide practice.

The studies on read aloud reflect a generalconsistency in the manner that threats to internalvalidity have been controlled. Nearly all stud-ies provided a treatment with a clear temporalsequence (some kind of treatment was uniquelyintroduced). Most of the studies were conductedwithin a short fixed period of time (typically,a class period). Generally, students were conve-niently sampled by using entire classrooms, anartifact that confounds students nested in teach-ers. Some studies have actually used randomassignment to treatments, a feature that precludessubject selection bias. For example, Tindal et al.(1998) used a nested design but their randomassignment of students controlled for threats tointernal validity. Likewise, Fletcher et al. (2006)randomly assigned students to accommodatedand control groups to take a state test. Otherfactors also were considered in selecting stu-dents. For Kettler et al. (2005) the IEP wasconsidered and then they matched the student(with a disability) to a student without a disabil-ity. Performance on a curriculum-based measureadministered under an accommodated conditionwas used to assign students to accommodations(Fuchs, Fuchs, Eaton, Hamlett, Binkley, et al.,2000). Hale et al. (2005) used a time series designand therefore may have established the best con-trol for various threats to internal validity (his-tory, maturation, regression, attrition, testing, andinstrumentation). Elbaum (2007), as well as mostother researchers, counter-balanced forms andorder to control threats from testing and instru-mentation. In general, because the time periodfor treatment was so constrained, few studies hadmany threats to internal validity from these lat-ter variables. Testing and instrumentation effectswere generally ignored (neither controlled nordescribed). Published tests were either a state test(or approximation of a state test; Helwig et al.,2002) or a researcher-developed test (Hale et al.,2005).

Studies on read aloud accommodationsaddressed statistical conclusion validity in anumber of ways. The most significant issue

Page 212: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

196 G. Tindal and D. Anderson

was obtaining a sufficient sample size to doc-ument co-variation between the treatment andthe outcome. Because the focus has been onunderstanding the effects for students withdisabilities, over-sampling has been used oftento attain sufficient numbers for conductingparametric analyses (Elbaum, 2007; Fletcheret al., 2006). In some research, sample sizes havebeen determined ahead of time by consideringpower and effect sizes needed (Elbaum et al.,2004). Nevertheless, as in the extended timestudies, student populations have been somewhatrestricted in performance (special education)as well as heterogeneous (general education);limited studies have been done where variousskill levels have been documented of studentsparticipating in the study. Probably, the mostsignificant issue in read aloud accommodationshas been the unreliability of treatment implemen-tation and extraneous factors in the settings thatappeared along with the accommodation. Theproblem with these latter two threats is that anyeffect may be partially associated with or causedby other factors in the procedures or setting. Fewstudies have addressed reliability of measureseven though considerable attention has beengiven to their construction.

Threats related to external validity are promi-nent in research on read aloud accommodations.However, it is difficult to ascertain the degree ofinfluence as no replications have ever appearedin this body of research. Each study representsa unique group of students, treated uniquely inboth the settings and the manner of deliveringan accommodation, and measured using study-specific outcomes. Ironically, the external validityof this accommodation is stronger with stud-ies using more generic treatments with a rangeof students sampled by classrooms taking mea-sures typical of large-scale testing programs. Forexample, the scripted directions from Elbaum(2007) may be easier to use in practice than thevideotaped administration used by Crawford andTindal (2004).

Construct validity may be the most troublingaspect of research on read aloud accommoda-tions. Great variation existed among study opera-tions and controls that may or may not have been

present and the constructs used to explain them.Furthermore, considerable variation existed inboth what was read and how it was read. Forexample, Elbaum (2007) had teachers read withtime paced; Johnson (2000) used proctors, andCrawford and Tindal (2004) used videotapedread alouds. Finally, read aloud accommodationswere rarely implemented as a uniform or singu-lar accommodation. Meloy et al. (2002) not onlyused a script, but they also implemented extratime. Likewise, Fuchs, et al. (2000) also assignedextended time and large print along with the readaloud. And almost endemic to this research, lit-tle connection has been documented between theuse of this accommodation and the use of assistedreading in the classroom (where students havematerials read aloud to them). Therefore, the con-struct validity of read aloud accommodations hasnot been clearly established and the very natureof its implementation has often been associatedwith other accommodations (e.g., a quiet and sep-arate setting or individual test administration). Inthe end, several threats associated with constructvalidity have been present, particularly confound-ing constructs and constructs interacting witheach other. On the other hand, most studies haveused crossed designs in which students served astheir own control and therefore limited possiblethreats associated with measurement reactivity,study participation and feedback, reactivity toexperimental operations, novelty and disruption,and rivalry and demoralization. The only poten-tial problem associated with this design has beentreatment diffusion, an unfortunate side effectfrom the same teachers implementing both treat-ments (the accommodation and the comparatorcondition).

Conclusions on Validity Evidence

Large-scale assessment programs reflect a com-plex process that requires appropriate test designand development along with the procedural andstatistical documentation that provides varioustest-focused validity evidence for making infer-ences on content proficiency, internal structuresand invariance across subgroups, relations with

Page 213: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

10 Decisions-Making Evidence in Large-Scale Tests 197

other measures, or the relation between accessand target skills (response processes). In addi-tion, however, validation evidence needs to arisefrom testing programs in practice, using quasi-experimental and experimental research con-ducted in the field after measures have beenintroduced. This is particularly true when spe-cific populations (e.g. students with disabil-ities) become the focus of test design andadministration.

Although we primarily addressed accommo-dations to standards-based grade-level tests, ourfindings have a number of implications for stu-dents being assigned to alternate assessmentsjudged against modified achievement standards.Using response processes as an important formof validity evidence, we reported on a consider-able amount of research on two accommodations:extended time and read aloud. By consideringboth the demands of the test and the characteris-tics of the students, we found a number of studieswith sufficient specificity to provide the field asolid research base. We also reported on the studydesigns and the degree to which adequate threatsto validity were controlled. Again, a number ofstudies have been completed on these two accom-modations to provide recommendations to thefield.

However, a number of shortcomings also wereapparent, particularly in the overall manner inwhich tests have been studied, to warrant sev-eral recommendations for improving practice. Wehave organized these conclusions around fivetopics: (a) definitions of constructs, (b) opera-tionalization of test design and implementation,(c) training of teachers, (d) organization of sys-tems, and (e) validation of outcomes in an itera-tive fashion.

Definitions of Constructs

Measurement practices need to improve consid-erably in defining constructs, particularly whentesting students with disabilities. Although theStandards (AERA, APA, NCME, 1999) provide adefinitive source for clarifying validity evidence,

they remain out of synch with the field of accom-modations with language precariously perchedbetween definitions of test changes and popula-tions for whom the tests were designed. They donot directly address issues of universal design;nor do they address policy (e.g., federal regula-tions) in which specific populations of studentsare targeted. This problem becomes compoundedwhen considering threats to construct misrepre-sentation, particularly in reference to responseprocesses (i.e., analysis of task demands andconsideration of student characteristics).

Operationalization of Test Design

The principles of test design have been well artic-ulated for decades and procedures well estab-lished for how items should be developed anddisplayed, whether they use selected or con-structed responses. The problem is that test-ing (its design and implementation) needs toalso attend to administration. Any variation isconsidered a threat to valid inferences, partic-ularly construct misrepresentation (Haladyna &Downing, 2004). Standardization becomes keyin test administration even though it is far moreencompassing and includes test planning anddevelopment (test specifications, item develop-ment, test design, assembly, and production):

The administration of tests is the most publicand visible aspect of testing. There are majorvalidity issues associated with test administra-tion because much of the standardization oftesting conditions relates to the quality of testadministration. . .Standardization is a commonmethod of experimental control for all tests. Everytest (and each question or stimulus within eachtest) can be considered a mini experiment (Van derLinden & Hambleton, 1997). The test administra-tion conditions – standard time limits, proceduresto ensure no irregularities, environmental condi-tions conducive to test taking, and so on – all seekto control extraneous variables in the “experiment”and make conditions uniform and identical for allexaminees. Without adequate control of all rele-vant variables affecting test performance, it wouldbe difficult to interpret examinee test scores uni-formly and meaningfully. This is the essence ofthe validity issue for test administration conditions.(Downing, 2006, p. 15)

Page 214: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

198 G. Tindal and D. Anderson

Unfortunately, the research on accommoda-tions has not led to consistent results, primarilyas a result of inadequate attention to responsedemands interacting with student characteris-tics or clear (or consistent) test administra-tion procedures in the context of experimentalcontrol.

Training of Teachers

Probably the weakest link in a validity argumenton appropriate accommodations is the teacher(or more properly, their role in making decisionsand the information that is used to make thosedecisions). In general, very little attention hasbeen given to teachers in the process of mea-suring students. In both our analysis of responsedemands with student characteristics along withthe design/execution of research studies, insuffi-cient attention has been devoted to training teach-ers on fundamental constructs that comprise theentire measurement process. Whether the con-structs are being measured as part of a large-scaletesting program or manipulated as part of a spe-cific accommodation study, teachers have notbeen the focus of attention.

Organization of Systems (Researchand Practice)

The difficulty in developing a fully functioninglarge-scale testing program is that often researchand practice are in conflict with each other. Forexample, to attain adequate control in estab-lishing sufficient validity evidence, the researchneeds to be done very carefully with all stepsin the design and implementation monitored.Yet typically, this attention to detail cannot beattained when considering systems implemen-tation with thousands of students. Most of theresearch has been conducted in specific waysto maximize experimental control; however, thisapproach is antithetical to informing the field ofpractice.

Validation of Outcomes

Too often, the research is a single study eventhough the validation process is iterative andstudies should be done in relation to each other.Typically, findings from studies do not build fromprevious research and application to practice isoften unclear. So much variation exists in the testdemands and student characteristics that it is notpossible to build an adequate validity argument inthe design and implementation of tests. This vari-ation is then carried forward to the experimentaland quasi- experimental research on accommo-dations where even more variation exits. No twostudies approach the same problem in a consis-tent and systematic manner. The students whoare included in the research vary; the treatments(accommodations) varied in every manner possi-ble, with little control over either their definitionsor potential confounds; though often the sam-ple sizes are sufficient to avoid Type II errors,certainly the generalization to large-scale test-ing programs is insufficient; in the end, the veryconstruct of an accommodation remains elusive.Little consistency exists on extended time andread aloud, the two most researched areas; it isunlikely to be present in other areas of accommo-dation.

References

American Educational Research Association, AmericanPsychological Association, & National Council ofMeasurement in Education. (1999). Standards of edu-cational and psychological testing. Washington, DC:American Psychological Association.

Chiu, C. W. T., & Pearson, P. D. (1999). Synthesizingthe effects of test accommodations for special educa-tion and limited English proficiency students. Paperpresented at the Annual Large-scale AssessmentConference of the Council of Chief State SchoolOfficers, Snowbird, UT.

Cohen, A., Gregg, N., & Deng, M. (2005). The roleof extended time and item content on a high-stakesmathematics test. Learning Disabilities Research &Practice, 20, 225–233.

Crawford, L., & Tindal, G. (2004). Effects of a read-aloud modification on a standardized reading test.Exceptionality, 12, 89–106.

Page 215: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

10 Decisions-Making Evidence in Large-Scale Tests 199

Downing, S. (2006). Twelve steps for effective test devel-opment. In S. Downing & T. Haladyna, M. (Eds.),Handbook of test development (pp. 3–25). Mahwah,NJ: Lawrence Erlbaum Associates.

Downing, S., & Haladyna, T., M. (2006). Handbook oftest development. Mahwah, NJ: Lawrence ErlbaumAssociates.

Elbaum, B. (2007). Effects of an oral testing accom-modation on mathematics performance of secondarystudents with and without learning disabilities. Journalof Special Education, 40, 218–229.

Elbaum, B., Arguelles, M., Campbell, Y., & Saleh, M.(2004). Effects of a student-reads-aloud accommoda-tion on the performance of students with and withoutlearning disabilities on a test of reading comprehen-sion. Exceptionality, 12, 71–87.

Elliott, J., Bielinski, J., Thurlow, M., DeVito, P., &Hedlund, E. (1999). Accommodations and the perfor-mance of all students on Rhode Island’s PerformanceAssessment. Minneapolis, MN: National Center onEducational Outcomes, University of Minnesota.

Elliott, S., & Marquart, A. (2004). Extended time asa testing accommodation: Its effects and perceivedconsequences. Exceptional Children, 70, 349–367.

Fletcher, J., Francis, D., Boudousquie, A., Copeland,K., Young, V., Kalinowski, S., et al. (2006). Effectsof accommodations on high-stakes testing for studentswith reading disabilities. Exceptional Children, 72,136–150.

Fuchs, L., Fuchs, D., Eaton, S., Hamlett, C., Binkley,M., & Crouch, R. (2000). Using objective data sourcesto supplement teacher judgment about reading testaccommodations. Exceptional Children, 67, 67–82.

Fuchs, L., Fuchs, D., Eaton, S., Hamlett, C., & Karns,K. (2000). Supplementing teacher judgments of math-ematics test accommodations with objective datasources. School Psychology Review, 29, 65–85.

Haladyna, T., & Downing, S. (2004). Construct-irrelevant variance in high stakes testing. EducationalMeasurement: Issues and Practice, 23(1), 17–27.

Hale, A., Skinner, B., Winn, R., Allin, J., & Molloy, C.(2005). An investigation of listening and listening-while-reading accommodations on reading compre-hension levels and rates in students with emotionaldisorders. Wiley InterScience, 42, 39–51.

Helwig, R., Rozek-Tedesco, M., & Tindal, G. (2002). Anoral versus a standard administration of a large-scalemathematics test. The Journal of Special Education,36, 39–47.

Helwig, R., Rozek-Tedesco, M., Tindal, G., Heath, B., &Almond, P. (1999). Reading as an access to mathemat-ics problem solving on multiple-choice tests for sixth-grade students. The Journal of Educational Research,93, 113–125.

Hollenbeck, K., Tindal, G., & Almond, P. (1998).Teachers’ knowledge of accommodations asa validity issue in high-stakes testing. TheJournal of Special Education, 32, 175–183. doi:10.1177/002246699803200304

Huesman, R., & Frisbie, D. (2000). The validity ofITBS reading comprehension test scores for learn-ing disabled and non learning disabled studentsunder extended-time conditions. Paper presented at theannual meeting of the American Educational ResearchAssociation, Seattle, WA.

Johnson, E. (2000). The effects of accommodationsin performance assessments. Remedial and SpecialEducation, 21, 261–267.

Kettler, R., Neibling, B., Mroch, A., Feldman, E., Newell,M., Elliott, S., et al. (2005). Effects of testing accom-modations on math and reading scores: An experi-mental analysis of the performance of students withand without disabilities. Assessment for EffectiveIntervention, 31(1), 37–48.

Meloy, L., Deville, C., & Frisbee, D. (2002). The effectof a read aloud accommodation on test scores of stu-dents with and without a learning disability in reading.Remedial and Special Education, 23, 248–255.

Morris, R., Stuebing, K., Fletcher, J., Shaywitz, S.,Lyon, G., Shankweiler, D., et al. (1998). Subtypes ofreading disability: Variability around a phonologicalcore. Journal of Educational Psychology, 90, 347–373.

Munger, G., & Loyd, B. (1991). Effect of speededness ontest performance of handicapped and nonhandicappedexaminees. Journal of Educational Research, 85,53–57.

National Center on Educational Outcomes. (2001). Asummary of research on the effects of test accom-modations: 1999 through 2001. Minneapolis, MN:University of Minnesota Institute for Research onLearning Disabilities.

National Center on Educational Outcomes. (u.d.). Testingaccommodations for students with disabilities: Areview of the literature. Minneapolis, MN: Universityof Minnesota Institute for Research on LearningDisabilities.

No Child Left Behind Act. (2002). No Child Left BehindAct.

Sireci, S. G. (2004). Validity issues in accommodat-ing NAEP reading tests. Washington, DC: NationalAssessment Governing Board.

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002).Experimental and quasi-experimental design for gen-eralized causal inference. Boston: Houghton Mifflin.

Thompson, S., Johnstone, C., Thurlow, M., & Altman, J.(2005). 2005 State special education outcomes:Steps forward in a decade of change. Minneapolis,MN: University of Minnesota, National Center onEducational Outcomes.

Thompson, S., Morse, A., Sharpe, M., & Hall, S.(2005). Accommodations Manual, How to select,administer, and evaluate use of accommoda-tions for instruction and assessment of Studentswith Disabilities. Retrieved February 28, 2011from http://www.osepideasthatwork.org/toolkit/accommodations_manual.asp http://www.cehd.umn.edu/NCEO/Presentations/AERA07Thurlow.pdf

Page 216: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

200 G. Tindal and D. Anderson

Thurlow, M. (2007). Research impact on state accommo-dation policies for students with disabilities. Paperpresented at the American Educational ResearchAssociation. Retrieved February 28, 2011, fromhttp://www.cehd.umn.edu/NCEO/Presentations/AERA07Thurlow.pdf

Thurlow, M., Altman, J., Cuthbert, M., & Moen, R.(2007). State performance plans: 2004–2005: Stateassessment data. Minneapolis, MN: University ofMinnesota, National Center on Educational Outcomes.

Tindal, G., Heath, B., Hollenbeck, K., Almond, P., &Harniss, M. (1998). Accommodating students withdisabilities on large-scale tests: An experimental study.Exceptional Children, 64, 439–450.

Tindal, G. & Fuchs, L. (1999). A summary of researchon test changes: An empirical basis for defining

accommodations. Lexington, KY: Mid-SouthRegional Resource Center.

U. S. Department of Education. (2007, April). Modifiedacademic achievement standards – Non-regulatoryguidance. Washington, DC: U. S. Department ofEducation.

Van der Linden, W., & Hambleton, R. (1997). Itemresponse theory: Brief history, common models, andextensions. In W. Van der Linden & R. Hambleton(Eds.), Handbook of modern item response theory (pp.1–28). New York: Springer.

Ysseldyke, J., Thurlow, M., Graden, J., Wesson, C., Deno,S., & Algozzine, B. (1982). Generalizations from fiveyears of research on assessment and decision making.Minneapolis, MN: Institute for Research on LearningDisabilities.

Page 217: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

11Item-Writing Practice and Evidence

Michael C. Rodriguez

Four Thousand Years of LimitedAccessibility

Evidence about the format and function of assess-ments dates back to at least 2200 BC during thetime of the performance assessments of Chinesecivil service exams. Records of oral and writ-ten exams can be traced back to the 1200s atthe University of Bologna and 1400s at LouvainUniversity. During the 1500s, Jesuits (e.g.,St. Ignatius of Loyola) adopted written tests forstudent placement and evaluation. By the 1800s,written exams were commonly replacing oralexams because of questions of fairness. However,these questions were more concerned about fair-ness in scoring and subjectivity, rather than acces-sibility or equity in access. Were questions aboutvalidity and access ever asked? There is no evi-dence of assessment-related research during theseperiods of performance assessments, even thoughthey were all “high stakes.”

Assessment in K-12 education settingsbecame prominent during the mid-1840s asBoston public schools adopted short-answeritems in district-wide tests. Once the multiple-choice (MC) item format was formally adoptedin the Army Alpha and Army Beta tests in1917, group-administered intelligence testsmodeled after the Binet-Simon Scale of school

M.C. Rodriguez (�)University of Minnesota, Minneapolis, MN 55455, USAe-mail: [email protected]

readiness (1908, as cited in DuBois, 1970),test design was becoming a science. Thorndike(1904) was the first to introduce principlesof test construction in his Introduction to theTheory of Mental and Social Measurements.Item-writing research (to which we have accessin psychology and education journals) began inthe 1920s, at a time when the College Boardbegan using the new MC item types in collegeentrance exams. Item writing and test designquickly advanced with the landmark chapter onitem writing by Ebel in the 1951 first editionof Educational Measurement. This period ofitem-writing advances and modern measurementtheory was intensified in the late 1960s, throughthe introduction of the National Assessmentof Educational Progress (NAEP) (see Jones &Olkin, 2004, for a history of NAEP), the assess-ment we have come to know as the nation’sreport card.

The history of testing is instructive, as compre-hensively reviewed by DuBois (1970), providingclues to important stages in thinking and practice.One interesting note is that formal assessmentswere originally in the form of performance tasks,including archery, horsemanship, and account-ing tasks (as in the ancient Chinese civil ser-vice exams; see DuBois for a more descriptivereview). When the MC item format was intro-duced, it was favored over the subjective oral andwritten exams. The MC item offered an advance-ment that was greatly appreciated until the 1960s,when the fairness of MC items was questionedbecause of poor testing practices that led to

201S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4_11, © Springer Science+Business Media, LLC 2011

Page 218: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

202 M.C. Rodriguez

inequitable placement of black and poor childrenin special education programs (Williams, 1970).There was a renewed interest in the promise ofperformance assessments that led to their adop-tion in classrooms and some large-scale test-ing programs in the 1970s through the 1990s –whereby performance assessments were viewedas an innovation and panacea for addressing theachievement gap; they were, for a brief time, thenew assessment model. Because of costs, poorquality and psychometric performance, and littleevidence of improved equity and fairness, perfor-mance assessments have once again fallen out offavor for large-scale, high-stakes testing.

Access and Assessment as Policy Tools

Through a series of legislative and federal admin-istrative rules and regulations (see Zigmond &Kloo, 2009, for a review), there has been intenseattention to issues related to inclusion, fairness,equity, and access to instruction and assess-ments that support multiple educational andaccountability purposes. Several research cen-ters and federally funded research projects havedevoted substantial resources to addressing themost recent regulations to accommodate studentswith modified and alternate academic achieve-ment standards in NCLB (No Child Left Behind)assessments. These researchers have contributedto a sparse volume of research on item writ-ing. Although such research has been conductedfor nearly 90 years at this point, the science ofitem writing is relatively undeveloped comparedto the science of scoring, scaling, and equat-ing. However, psychometric attention to alternateassessments is renewing interest in item-writingresearch, as a means of improving accessibility atthe item level (Rodriguez, 2009).

This chapter will review existing guidanceon item writing, the available body of item-writing research, recent research on item writ-ing and test development with alternate assess-ments, and, briefly, innovations in item typesand their relevance to accessibility. In the broadcontext of item writing and test development,ensuring test accessibility is a simple exercise

of employing best practice. Good item writ-ing, founded on what limited evidence exists,supported through the application of currentitem-writing guidance, is implicitly designedto ensure accessibility. Good item writing hasalways been about accessibility. Other promis-ing areas, including language learning, cognitiveload theory, cognition and learning models, uni-versal design, and instructional design, supportthis goal of test accessibility (many of whichare reviewed in this book). Since the majorityof test items are in the MC format, this will bethe focus of this chapter. However, similar issuesrelated to constructed-response (CR) items andperformance tasks will receive comment whenappropriate.

MC Formats

There are many formats of MC items, includ-ing innovative types, enabled through computer-based testing. Similarly there are several waysto score each of these formats (more on scor-ing later). The most common formats for MCitems include the following (based on formatspresented by Haladyna, Downing, & Rodriguez,2002):Conventional MCDropping poor items from a test on the basis ofitem analysis data alone may result in

A. lower content-related validity.B. lower score reliability.C. a more heterogeneous test.

Multiple True-FalseYou are a psychometrician and providing adviceto a novice test author. The test author asksseveral questions about improving test score reli-ability. Are these true or false concepts that, donein isolation, may improve reliability?

1. Adding more test items of similar qualityimproves test score reliability.

2. Increasing the sample size will improve testscore reliability.

3. Obtaining a sample with more test scorevariability improves reliability.

4. Balancing positively and negatively wordedstems improves test score reliability.

Page 219: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

11 Item-Writing Practice and Evidence 203

Alternate-ChoiceCoefficient alpha is a more appropriate index ofreliability for

A. criterion-referenced test scores.B. norm-referenced test scores.

True-FalseIf a distribution of raw scores is negativelyskewed, transforming the raw scores to z-scoresresults in a normal distribution.MatchingMatch each term on the right with the descriptionon the left.1. score consistency A. systematic error2. test-wiseness B. random error3. score accuracy C. item difficulty4. proportion correct D. item discrimination5. point-biserial E. reliability

correlationF. validity

Context-Dependent Item SetA college class of 200 students was given afinal exam. The item analysis based on stu-dents’ responses to five of the multiple-choiceitems is reported in the table. Response fre-quencies to correct options are in brackets [ ].For example, the correct option to item #1 is A,correctly answered by 125 of the 200 students.Write the item number on the blank followingquestions #19 to #24.

Number of students selecting eachchoice

Itemnumber

ChoiceA

ChoiceB

ChoiceC

ChoiceD

Correcteditem-totalcorrelation

1 [125] 25 20 30 .352 0 15 [185] 0 .153 [90] 25 15 70 −.404 40 30 [100] 30 .145 35 85 5 [75] .30

1. Which item has the best discrimination? _____2. Which item is the most likely candidate for

revision into a true-false item? _____3. Which item is the most difficult? _____4. Which item is the most likely to have two

correct answers? _____Note: Each item above is a 5-option MC item.

Complex Multiple-Choice (Type K)Which are criterion-referenced interpretations oftest scores?

1. John’s score is three standard deviationsabove the class mean.

2. Mary answered 80% of the items correctly.3. Eighty% of the class scored above a T-score

of 45.4. The average math score for Arlington High

School is equal to the district average.5. Antonio is proficient in 5th grade reading.A. 1 and 3.B. 1, 3, and 4.C. 2 and 5.D. 2, 4, and 5.E. All 5.

CR Formats

The many forms of CR items are less system-atically organized across the literature. For themost part, CR items are distinguished from MCitems because they require the examinee to gener-ate or construct a response. Many alternate formsexist, many of which are innovative items enabledthrough computer testing (described below).More traditional forms of CR items were classi-fied by Osterlind and Merz (1994), described inthe next section, in an attempt to create a taxon-omy of CR items. Osterlind and Merz, as well asHaladyna (1997), described over 20 formats forCR items, including more common formats, suchas essays, grid-in responses, research papers,short-answer items, and oral reports – as wellas fill-in-the-blank and cloze procedures, whichare not recommended for use in tests (Haladyna).They also included, what most measurement spe-cialists would include within the range of perfor-mance assessments, considerably different formsof tasks like portfolios, performances, exhibi-tions, and experiments. These tasks require muchmore extensive scoring rubrics and substantiallymore time, as well as require planning and prepa-ration time so they are typically not adaptable toon-demand testing.

There is no theoretical framework or taxo-nomic scheme that is all-encompassing across

Page 220: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

204 M.C. Rodriguez

the many forms of MC and CR items andperformance assessment tasks to make meaning-ful distinctions in all cases. Items are not con-sistently differentiated by features of the itemsthemselves, in response modes allowed for var-ious item formats, or in the scoring processes forvarious item formats. In large-scale achievementtests, one typically finds grid-in items, short-answer items, and essay formats. In many cases,responses to such item formats can be objectivelyscored, particularly through the use of automatedscoring (see, for example, Attali & Burstein,2006). In alternate assessments (mostly for AA-AAS), alternate forms of responding to thesemore constrained CR items are allowable, includ-ing verbal responses, drawings, and word lists orconstruct maps.

Item-Writing Guidelinesand Taxonomies

In the arena of accessibility, there is a renewedinterest in accommodations, item writing, testmodification, and assessment design. A numberof resources are available, including recent spe-cial issues of journals, for example the PeabodyJournal of Education (Volume 84, 2009) andApplied Measurement in Education (Volume 23,Number 2, 2010).

Every educational measurement textbook (ofwhich there are dozens) contains one or morechapters on item writing. A number of chap-ters are designed to be somewhat exhaustiveand instructive regarding item writing, includ-ing Chapters 12, 13, and 14 in the Handbook ofTest Development (Downing, 2006; Welch, 2006;and Sireci & Zenisky, 2006; respectively) andmore generally Chapters 9 and 16 in EducationalMeasurement (Ferrara & DeMauro, 2006; andSchmeiser & Welch, 2006). There are also ahandful of books devoted to aspects of item writ-ing – for example, Writing Test Items to EvaluateHigher Order Thinking (Haladyna, 1997) andDeveloping and Validating Multiple-Choice TestItems (Haladyna, 2004).

MC Item-Writing Guidelines

The first comprehensive taxonomy of MC item-writing guidelines was developed by Haladynaand Downing (1989a) with a companion piecesummarizing the available empirical evidence(1989b). This taxonomy was revised to includeadditional empirical evidence and a meta-analyticreview of some of that evidence (Haladyna et al.,2002). These reviews were largely informed froma review of item-writing advice from over twodozen textbook authors, with support from exist-ing empirical literature. Most of these guide-lines were based on logical reasoning and goodwriting practices; very few were based on empir-ical evidence. Guidance on item writing coversfour elements of MC items: content, format-ting and style, writing the stem, and writing theoptions.

Content concerns are paramount and the sub-ject matter expert is the key to writing a success-ful item. Items should be carefully constructedto tap important relevant content and cognitiveskills. These guidelines are largely based on log-ical argument and experience of item writersand examinee reactions. Aside from some gen-eral research on clarity and appropriate vocab-ulary use, there are no specific tests of theseitem-writing guidelines. The guidelines regardingcontent concerns include:

1. Every item should reflect specific contentand a single specific mental behavior, ascalled for in test specifications (two-waygrid, test blueprint).

2. Base each item on important content to learn;avoid trivial content.

3. Use novel material to test higher-level learn-ing. Paraphrase textbook language or lan-guage used during instruction when usedin a test item to avoid testing for simplyrecall.

4. Keep the content of each item independentfrom content of other items on the test.

5. Avoid over-specific and over-general contentwhen writing MC items.

6. Avoid opinion-based items.

Page 221: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

11 Item-Writing Practice and Evidence 205

7. Avoid trick items.8. Keep vocabulary simple for the group of

students being tested.Formatting and style concerns are largely

based on general good writing practice.There is some empirical evidence to supportgeneral use of most item formats (Haladynaet al., 2002), whereas others, like the com-plex MC format, appear to introduce greaterdifficulty that may be unrelated to the con-struct being measured in most settings. Thereis evidence to support the recommendationregarding format of the MC item (guide-line 9). Additional evidence is availableregarding guideline 13. These guidelinesinclude:

9. Use the question, completion, and bestanswer versions of the conventional MC,the alternate choice, true-false (TF), multipletrue-false (MTF), matching, and the context-dependent item and item set formats, butAVOID the complex MC (Type-K) format.

10. Format the item vertically instead of horizon-tally.

11. Edit and proof items.12. Use correct grammar, punctuation, capital-

ization, and spelling.13. Minimize the amount of reading in each

item.Writing the stem is also an area with

limited empirical evidence. With the excep-tion of negatively worded stems, which theempirical results suggest should be rarelydone, these guidelines are extensions of styleconcerns, specifically applied to the stem ofthe item. Although the work of Abedi andothers provides evidence to support theseguidelines, their work was not intentionallydesigned to test the validity of specific item-writing guidelines. The guidelines regardingwriting the stem include:

14. Ensure that the directions in the stem are veryclear.

15. Include the central idea in the stem instead ofthe choices.

16. Avoid window dressing (excessive verbiage).

17. Word the stem positively, avoid negativessuch as NOT or EXCEPT. If negative wordsare used, use the word cautiously and alwaysensure that the word appears capitalized andboldface.

Writing the choices is the area with thelargest volume of empirical evidence. Issuesrelated to MC options were studied in thefirst published empirical study of item writ-ing (Ruch & Stoddard, 1925). Since then,there have been dozens of published studiesin this area. Nevertheless, of the 14 guide-lines in the following, only 5 have beenstudied empirically (18, 24–27).

18. Develop as many effective choices asyou can, but research suggests three isadequate.

19. Make sure that only one of these choices isthe right answer.

20. Vary the location of the right answer accord-ing to the number of choices.

21. Place choices in logical or numerical order.22. Keep choices independent; choices should

not be overlapping.23. Keep choices homogeneous in content and

grammatical structure.24. Keep the length of choices about equal.25. None-of-the-above should be used carefully.26. Avoid All-of-the-above.27. Phrase choices positively; avoid negatives,

such as NOT.28. Avoid giving clues to the right answer,

such as:a. Specific determiners including always,

never, completely, and absolutely.b. Clang associations, choices identical to or

resembling words in the stem.c. Grammatical inconsistencies that cue the

test-taker to the correct choice.d. Conspicuous correct choice.e. Pairs or triplets of options that clue the

test-taker to the correct choice.f. Blatantly absurd, ridiculous options.

29. Make all distractors plausible.30. Use typical errors of students to write your

distractors.

Page 222: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

206 M.C. Rodriguez

31. Use humor if it is compatible with the teacherand the learning environment.

In the context of accountability, most of theseguidelines provide for clarity, efficiency in wordusage, and largely employ good writing practicesthat should provide for the greatest accessibil-ity to the widest audience of examinees. Forthose guidelines that have an evidence base,the direction of the guidance promotes the eas-ier form of an item that supports the intendedinferences (validity) compared to an alternative.This research supports the goal of accessibil-ity and is in line with principles of univer-sal design. This evidence is briefly reviewedbelow.

Constructed-Response Item-WritingGuidelines

Guidelines for writing CR items are less devel-oped and less uniform across sources, with evenless empirical evidence. An early attempt to for-malize a framework for CR items was offeredby Bennett, Ward, Rock, and LaHart (1990). Inthis work, they differentiated item types based onthe degree to which openness is allowed in theresponse. They tested this framework by assess-ing the ability of judges (27 test developers)to reproduce expected item classifications (46items) according to the framework and the abilityof judges (16 test developers) to assess scoringobjectivity. Their framework provided a schemefor categorizing items, in order of (0) multiple-choice, (1) selection/identification, (2) reorder-ing/rearrangement, (3) substitution/correction,(4) completion, (5) construction, (6) presenta-tion/performance. Judges relatively succeeded ontheir agreement of item classifications; however,there were disagreements on every item, evenamong the MC items. Oddly, there was greateragreement in scoring among items with moreclosed scoring keys, rather than those with openscoring keys. The authors surmised that openscoring keys provided for greater inclusion toa wide range of responses rather than morestructured items which provided specific correctoptions.

A taxonomy for CR items was proposedby Osterlind and Merz (1994), who reviewedthe work of cognitive psychologists, includ-ing Hannah and Michaels (1977), Snow (1980),Sternberg (1982), and others, as a framework forthe multidimensionality of cognition. This tax-onomy contained three dimensions – including(a) the type of reasoning competency employed,including (i) factual recall, (ii) interpretive rea-soning, (iii) analytical reasoning, (iv) predictivereasoning; (b) the nature of cognitive contin-uum employed, including (i) convergent think-ing and (ii) divergent thinking; and (c) the kindof response yielded, including (i) open-productand (ii) closed-product formats – producing 16combinations. The first two dimensions addresscognitive processes, whereas the third dimen-sion addresses the kinds of responses possible.Closed-product formats are those that allow fewpossible response choices, possibly scored witha relatively simple scoring key; open-productformats permit many more choices, requiringscoring with more elaborate rubrics potentiallyallowing unanticipated innovative responses.

Most introductory educational measurementtextbook authors provide guidance for CR itemwriting. Hogan and Murphy (2007) reviewedadvice about preparing and scoring CR itemsfrom authors of 25 textbooks and chapterson educational and psychological measurementfrom 1960 to 2007. The 25 sources resulted in124 points on preparing CR items and 121 pointson scoring. They reference a handful of empiri-cal research on these guidelines, but suggest thatmost are not based on empirical evidence. Theysuggest that it is important to assess the degreeto which the recommendations affect the psycho-metric quality of tests and assessments. Amongthose most relevant for accessibility, the most fre-quently cited recommendation for preparing CRitems was attention to testing time and length ofthe CR test (cited 20 times), followed by avoid-ing the use of optional items, defining questionsor tasks clearly, relating content to instructionalobjectives, assuring items test complex processes,avoiding the use of CR items to test recall, andconsidering vocabulary, grammatical, and syntaxappropriateness for the level of the test.

Page 223: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

11 Item-Writing Practice and Evidence 207

Major testing companies have developed CRitem-writing guides to guide the work of theiritem writers. Notably, the work of EducationalTesting Service (ETS) in their large-scaleprograms including CR items (e.g., NationalAssessment of Education Progress and AdvancedPlacement exams) has resulted in a large bodyof research on the quality of CR items, scoring,and scores. The ETS Guidelines for Constructed-Response and Other Performance Assessments(Baldwin, Fowles, & Livingston, 2005) providesa starting point. These guidelines are general,forming a basis from which more specific guide-lines can be developed for specific testing pro-grams. Most importantly, they argue that threeelements must be in place prior to planning thedesign of CR items. Each of these elementsspeaks directly to equity, fairness, and accessi-bility (and applies more generally to all forms ofassessment):1. Make sure the people whose decisions will

shape the assessment represent the demo-graphic, ethnic, and cultural diversity of thepeople whose knowledge and skills will beassessed.

2. Make relevant information about the assess-ment available during the early stages so thatthose who need to know and those who wishto know can comment on the information.

3. Provide those who will take the assessmentwith information that explains why the assess-ment is being administered, what the assess-ment will be like, and what aspects of theirresponses will be considered in the scoring.In his work on CR items and their ability to

tap intended cognitive functions in the NAEPassessments, Gitomer (2007) defines CR itemsas “well-defined tasks that assess subject matterachievement and that ask examinees to demon-strate understanding through the generation ofrepresentations that are not prespecified and thatare scored via judgments of quality” (p. 2). Withrespect to effective CR items and more generallyto a wide range of performance assessment tasks,Gitomer argues that task demands are only clearin the context of the rubric, and that the meaningof the rubric is only clear in the context of theassociated scoring process. Similarly, students

must understand what is being asked of themin the task and understand the response require-ments, and the scoring system must consistentlyinterpret the student’s response.

Both of these elements address accessibil-ity. Gitomer argues that these requirements aretypically satisfied for MC items, assuming thequestion is clear and students understand theirtask (i.e., good item writing). However, for CRtasks, these requirements are typically not satis-fied, and he presents a framework for CR itemdesign that is intended to secure these require-ments in a coherent way. The intent is to ensurethat students understand what is being asked ofthem and that scorers know how to interpret stu-dent responses appropriately. He presented sevenguidelines for effective CR task development:1. Justify the use of CR task formats; the tasks

should require student thinking that cannoteasily be obtained from MC of other fixed-choice formats.

2. Inferences should be explicit and construct-relevant; inferences about task demands andresponses should not require implicit assump-tions, such that they are a direct function ofknowledge or skill on the construct of interest.

3. Distinctions should be clear across scorepoints; clear distinctions regarding the fea-tures and quality of evidence to satisfy everyscore point are defined.

4. Justifications should be clear within scorepoints; when there are different ways toaccomplish a given score point, they should becognitively equivalent.

5. Avoid over-specification of anticipatedresponses; combinations of response fea-tures may satisfy the required support forintended inferences even if every elementrequired (through over-specification) toachieve a given score is not included in theresponse.

6. Empirically verify and modify task throughpilot studies; empirical evaluation of studentperformance on the task and the associatedrubric sufficiency needs to be considered.(Herein lies the examination of accessibilityto students with varying levels of cognitiveabilities.)

Page 224: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

208 M.C. Rodriguez

7. Construct measurement should be consistentacross tasks; the expectations for the samecognitive aspects should be consistent acrosstasks within the assessment.

Choosing the Item Format

Rodriguez (2002) provided guidance regardingthe choice of item format. He listed advan-tages and disadvantages of MC and CR items.Rodriguez argued for the primacy of purpose ofthe assessment as the guiding force regardingall test design stages, but particularly regard-ing choice of item format. For the purpose ofrank-ordering academic abilities, the two formatsappear to behave more alike than not. Althoughcarefully constructed MC items can assess a widerange of cognitive processes, CR items moredirectly (and perhaps more easily) measure com-plex processes, providing for opportunities fora wider range of responses and novel solutions,particularly when the target objective requires awritten response or extended explanation, rea-soning, or justification. However, most CR itemsin operational tests behave far too much likeMC items without options, making responsesfrom the two formats nearly perfectly correlated(Rodriguez, 2003). In fact, if CR items are writtenlike MC item stems, not really addressing differ-ent cognitive tasks, they will measure the sameaspects of the construct as MC items, but withmuch less precision. Issues of accessibility applyequally to both formats.

Evidence

The evidence base for existing item-writingguidelines is sparse, as described earlier. Some ofthat evidence is briefly reviewed here. Evidenceregarding item modifications to achieve the goalsof accessibility is growing and is presented in thefollowing section.

In their first comprehensive review of theempirical evidence on item writing, Haladynaand Downing (1989b) reported the frequency ofsupport for each of their 43 item-writing guide-

lines, for which only seven had empirical evi-dence. These included• avoid “none-of-the-above,”,• avoid “all-of-the-above,”,• use as many functional distractors as

feasible,• avoid complex item formats (Type-K),• avoid negative phrasing,• use the question or completion format,• keep option lengths similar.Except for the last three guidelines for whichthere was strong support, the others had mixedsupport (for and against). The issue of number ofdistractors was the most frequently investigated,so it will be treated separately in this review. Theothers were studied no more than 10 times in the63 years of literature covered.

The largely narrative review of the item-writing guidelines was enhanced with an empir-ical meta-analysis of existing research byRodriguez (1997). This analysis examined theoverall effect of violating several item-writingguidelines based on published research. Theseresults are summarized below. In a revision ofthe Haladyna and Downing (1989a) taxonomy ofitem-writing guidelines, Haladyna et al. (2002)reviewed 19 empirical studies published since1990. They mostly found support for the existingtaxonomy of item-writing guidelines and reducedthe number of guidelines to 31. They also con-cluded that the plausibility of distractors is anarea of item writing that is long overdue forempirical study.

Item-Writing Guidelines with EmpiricalEvidence

Not all of the seven guidelines have been investi-gated equally. Mueller (1975) was the only authorwho had examined the effects of inclusive alter-natives (both none-of-the-above or NOTA andall-of-the-above or AOTA) and reported inde-pendent outcomes for AOTA. One limitation ofMueller’s design was that items were not stemequivalent across formats. In order to generalizethese findings, we must assume that item difficul-ties were randomly distributed across all formats

Page 225: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

11 Item-Writing Practice and Evidence 209

Table 11.1 Summary of average effects of rule violations

Rule violation Difficulty indexDiscriminationindex

Reliabilitycoefficient

Validitycoefficient

Using NOTA –.035a (.005)n = 57

–.027b (.035)n = 47

–.001b (.039)n = 21

.073 (.051)n = 11

Using negative stems –.03a (.010)n = 18

–.166 (.082)n = 4

Using an open,completion-type stem

.016b (.009)n = 17

–.003b (.076)n = 6

.031b (.069)n = 10

.042b (.123)n = 4

Making the correct optionlonger

.057ab (.014)n = 17

–.259b (.163)n = 4

Using type-K format –.122ab (.011)n = 13

–.145ab (.063)n = 10

–.007b (.083)n = 4

aAverage effect is significantly different than zerobEffects are homogenous or consistent across studies based on the meta-analytic Q-test statisticNote: Standard errors are in parentheses. The number of study effects is nSource: Rodriguez (1997)

used. Mueller reported that items with AOTAwere the least difficult among the formats exam-ined (including NOTA and Type-K items) andwhere AOTA was the correct response, the itemwas very easy. The use of AOTA provides a clueto respondents with partial information, in thatknowing two options are correct suggests that allare correct.

For five of the empirically studied guidelines,the results of the Rodriguez (1997) meta-analysisare summarized in Table 11.1. The empiricalevidence of item-format effects was inconclu-sive. When rule violations resulted in statisti-cally significant effects, they tended to be small.For example, using NOTA reduced the diffi-culty index overall (making items more difficult),but had no significant effect on item discrimi-nation, score reliability, or validity. Using neg-atively worded stems made items more difficultand reduced score reliability, but with only foureffects regarding reliability, this effect was non-significant. Using an open completion-type stemhad no effect on any metric. Making the correctoption longer (a common error made by class-room teachers) made items easier and reducedvalidity (although with only four effects, thiswas not significant). Finally, using the complexType-K format made items much more difficult,reduced item discrimination, but had no overalleffect on score reliability.

Three Options Are Optimal

A comprehensive synthesis of empirical evi-dence regarding the optimal number of optionsfor MC items included 80 years of research(Rodriguez, 2005). Researchers included testsof K-12 achievement, postsecondary exams, andprofessional certification exams in areas includ-ing language arts, social sciences, science, math,and other specialty topics, where all but oneof the researchers recommended three optionsfor MC items (or four options if that was theminimum studied). Rodriguez found that mov-ing from five or four options to three optionsdid not negatively affect item or test score statis-tics. Negative effects were found if the numberof options was reduced to two (as in alternate-choice MC items). The results were evenmore compelling when examining the effect ofdistractor deletion method used in the studies.When distractors were randomly deleted to createforms with fewer options, a significant decreasein score reliability was observed. When ineffec-tive distractors were removed to go from five orfour options to three options, there was no effecton score reliability.

In addition to the empirical evidence pro-moting the use of three-option items, severaladditional arguments support this approach; inparticular, the reduction of cognitive load and

Page 226: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

210 M.C. Rodriguez

greater accessibility to the item by all students.Rodriguez (2005) argued that less time is neededto prepare three-option items and more items canbe administered in the same time frame comparedto four- or five-option items. Rodriguez reviewedseveral studies that demonstrated how manylarge-scale assessment programs exist where five-option items function like three-option items,since few items have more than two effectivedistractors. The overwhelming concern regardingthe probability of random guessing likely pre-vents test designers from employing three-optionitems. However, since most items function likethree-option items and the chance of getting ahigh score by random guessing on three-optionitems is remote, the evidence is compelling forits adoption. For example, on a 40-item test ofthree-option items (with .33 chance of randomlyguessing correctly), the probability of getting60% correct (score of 24/40) is .00033.

The most important consideration regardingthe distractors in a MC item is to make them plau-sible, where the distractors are based on commonmisconceptions or errors of students, providingdiagnostic information. Distractors are effectiveto the degree that they attract the right students,including students with misinformation, miscon-ceptions, or errors in thinking and reasoning. Wecan then examine the frequency each distrac-tor is selected to provide instructionally relevantinformation.

Research on Item Modificationsfor Accessibility

With federal requirements for complete inclu-sion in state NCLB assessments, attention toalternate assessments has increased substantially.Although states define the eligibility for partici-pation in alternate assessments differently, thereare two primary forms in which these assess-ments exist, including alternate assessments foralternate academic achievement standards (AA-AAS), generally for students with the most severecognitive impairments, and alternate assessmentsfor modified academic achievement standards(AA-MAS), generally for students with moderate

to severe cognitive impairments (persistent aca-demic difficulties).

AA-AAS

The technical quality and effectiveness of AA-AAS have been examined in several ways.Initially, issues related to alignment betweenassessments and curriculum were examined forstudents with the most significant impairments(see, for example, Ysseldyke & Olsen, 1997).Others examined influences on student perfor-mance on these assessments, addressing chal-lenges to validity of results, including technicalquality, staff training, and access to general edu-cation curriculum (see, for example, Browder,Fallin, Davis, & Karvonen, 2003). Others havealso examined assessment content given statestandards (see, for example, Johnson & Arnold,2004).

Recently, Towles-Reeves, Kleinert, andMuhomba (2009) reviewed 40 empirical studieson AA-AAS, updating a similar review com-pleted earlier by Browder, Spooner, et al. (2003).They made several recommendations, such as (a)to include both content experts and stakeholdersin studies to validate performance indicators(e.g., alignment studies), and (b) to designAA-AAS to produce data to inform instructionaldecisions. They argued that studies of technicalquality of AA-AAS are few and limited andrecommended more of this work as the first pointon their suggested research agenda.

Finally, through their work developing tech-nical documentation for states, Marion andPellegrino (2006, 2009) offered an outline forAA-AAS technical documentation, particularlyfocused on the unique characteristics of AA-AASin a validity-based framework. They created afour-volume model of technical documentation,including (a) a “nuts and bolts” volume that issimilar to the typical technical manual for stan-dard tests; (b) a validity evaluation combiningthe guidance from the Testing Standards (AERA,APA, & NCME, 1999), Kane (2006), Messick(1989), and others; (c) a stakeholder summary(also drawn from the first two volumes); and (d)

Page 227: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

11 Item-Writing Practice and Evidence 211

a transition document that provides procedures toensure effective transition through changing con-tractors or state personnel. Although this effortwas supported by their work on two federallyfunded collaborations with states on their AA-AAS, they suggested that this framework couldbe used for purposes of documenting technicalevidence for AA-MAS. The first two volumesspecify the kinds of technical information rela-tive to demonstrating the quality of the AA-AAS,which is directly a result of accessibility forstudents with the greatest academic challenges.

AA-MAS

There has been substantially more attention toAA-MAS, perhaps due to the larger intendedtest-taker population. A special issue of PeabodyJournal of Education (Volume 85) on AA-MAS recently provided several updates on thecurrent state of affairs. Each of the articles,in some way, commented on technical ade-quacy of AA-MAS as currently employed. Muchof the focus has investigated identification ofappropriate participants, levels of participation,impact of accommodations, access to the gen-eral education curriculum, and overall perfor-mance. Kettler, Elliott, and Beddow (2009) pro-vided a review of their work on a theory-guided and empirically based tool for guiding testmodification to enhance accessibility. The TestAccessibility and Modification Inventory (TAMI,available at http://peabody.vanderbilt.edu/tami.xml) is guided by the principles of universaldesign, test accessibility, cognitive load theory,test fairness, test accommodations, and item-writing research (see Chapter 9, this volume).

The TAMI provides a rating system to eval-uate accessibility, based on elements of (a) thereading passage or other item stimuli, (b) theitem stem, (c) visuals, (d) answer choices, (e)page and item layout, (f) fairness, and (g) sev-eral aspects of computer-based tests, if appli-cable. The rating system is guided by a seriesof rubrics covering overall accessibility ratingson each of the above elements and an opportu-nity to recommend item modifications to improve

accessibility; several standard modifications areprovided as a guide. The purposes of modifica-tions are to ensure the use of strong item-writingguidelines to improve the item, to remove sourcesof construct-irrelevant variance, and to maximizeaccessibility for all students. They also recom-mend using cognitive labs to guide item and testmodifications.

The TAMI has been employed in severalstates. Kettler et al. (in press) and Elliott, et al.(2010) have reported on results from multistateconsortium research projects involving modifi-cation of state tests using the TAMI. Thesestudies have found that modification preservesscore reliability and improves performance ofstudents (increases scores), and the performanceof students eligible for AA-MAS increased morethan for others. Rodriguez, Elliott, Kettler, andBeddow (2009) presented a closer analysis ofthe effect of modification on the functioning ofdistractors. The one common modification wasa reduction of the number of options to three,typically done by removing the least functioningor least plausible distractors. In mathematics andreading items, the retained distractors becamemore discriminating. An important element inthis line of research is that item modification typ-ically involves multiple modifications. Because apackage of modifications is employed, it is diffi-cult to isolate the effect of any one modification.Given the large number of possible modifica-tions and that each modification is tailored tothe accessibility needs of the item, it makes lit-tle sense to uniformly apply any one modificationacross items (see Chapter 13, this volume for areview of research on packages of modifications).Moreover, items should be developed initiallywith maximizing accessibility as a goal, whereTAMI could be used as an effective guide for itemwriters.

Innovations and TechnologicalEnhancements

Computers and other assistive devices have beenused as accommodations, providing a variety ofmeans for individuals to respond to test ques-tions. Computer-based testing also has provided

Page 228: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

212 M.C. Rodriguez

an open venue for a variety of new item for-mats, where innovations continue to be devel-oped. Some have argued that these innovationshave led to greater accessibility, and others haveargued that innovative item types have improvedthe degree to which items tap the target construct.Sireci and Zenisky (2006) identified 13 item for-mats that are enabled by computer-based testingthat also have potential for enhancing constructrepresentation of the target of measurement andreducing construct-irrelevant variance. Althoughthere is some evidence regarding the promiseof these formats, we have not seen evidenceregarding potential enhancement of accessibility.A couple of examples are provided here.

Extended MC items are typically found initems with reading passages, where each sentencein the passage provides an optional responseto specific questions. The exam can containquestions about a reading passage, for exampleregarding the main idea of a paragraph, and theresponse is selected by highlighting the appropri-ate sentence in the reading passage. This formatpresents a large number of options (contrary tothe recommendation of three-option items), butthe options are the sentences within the readingpassage itself – rather than rephrased ideas or out-of-context statements in the typical MC format.How these features interact and impact accessi-bility is unknown. Among the many innovativeitem types examined in the literature, there islittle evidence regarding their ability to enhanceaccessibility.

Other formats include connecting ideas withvarious kinds of links (dragging and connect-ing concepts) and other tasks like sorting andordering information. The computer environmentallows for other innovative response processes,including correcting sentences with grammaticalerrors or mathematical statements, completingstatements or equations, and producing or com-pleting graphical models, geometric shapes, ortrends in data. Computers provide a wide range ofpossibilities. Unfortunately, the development ofthese formats has been done without the investi-gation of their effect on accessibility and fairness.

Computer-enabled innovations have beenexamined in postsecondary and professional

exam settings. Innovations in the GRE andTOEFL have led to many studies of impact.Bennett, Morley, Quardt, and Rock (1999) inves-tigated the use of graphical modeling for measur-ing mathematical reasoning in the GRE, whererespondents create graphical representations. Theresults provided highly reliable scores that weremoderately related to the GRE quantitative totalscore and related variables. For example, exam-inees may plot points on a grid and then use atool to connect the points. Although examineesagreed that these graphing items were better indi-cators of potential success in graduate school,they preferred traditional MC items (which is aresult commonly found when comparing MC andCR items in other settings).

Bridgeman, Cline, and Levin (2008) exam-ined the effect of the availability of a calculatoron GRE quantitative questions. They found thatrelatively few examinees used the calculator onany given item (generally about 20% of exam-inees used the calculator) and the effects onitem difficulty were small overall. There werealso no effects for gender and ethnic group dif-ferences. Bridgeman and Cline (2000) earlierexamined response times for questions on thecomputer-adaptive version of the GRE, examin-ing issues related to fairness. The issue regardingdifferences in item response time is critical inadaptive tests because examinees receive dif-ferent items, depending on their response pat-terns. They found no disadvantage for studentswho received items with long expected responsetimes. They also found a complication in thatthe item response time also depended on thepoint in which the item was administered inthe test. Gallagher, Bennett, and Cahalan (2000)examined open-ended computerized mathemat-ics tasks for construct-irrelevant variance. Theyhypothesized that experience with the computerinterface might advantage some. Although noevidence of construct-irrelevant variance wasdetected, some examinees experienced technicaldifficulties and expressed preference for paperforms of the test. It appeared that it took muchlonger to enter complex expressions as com-pared to simply writing them out on paperforms.

Page 229: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

11 Item-Writing Practice and Evidence 213

Broer, Lee, Rizavi, and Powers (2005) lookedfor differential item functioning (DIF) amongGRE writing prompts. They found moderate tolarge DIF for African American and Hispanicexaminees among some items. In examiningthese prompts more closely, they found that sen-tence complexity and the number of differentpoints made in the passage increase processingload and might explain DIF results in these cases.

In the context of accommodations (seeChapter 10, this volume for further background),assistive technologies have been used effec-tively in the area of reading test accessibility.The Technology Assisted Reading Assessment(TARA) project conducts research and develop-ment to improve reading assessments for studentswith visual impairments or blindness. It worksin conjunction with the National AccessibleReading Assessment Projects (NARAP), throughthe Office of Special Education Programs and theNational Center for Special Education Research(http://www.narap.info/). The AccessibilityPrinciples for Reading Assessments (Thurlowet al., 2009) presents five principles withmultiple specific guidelines addressing theimplementation of each principle:1. Reading assessments are accessible to all stu-

dents in the testing population, including stu-dents with disabilities.

2. Reading assessments are grounded in a defi-nition of reading that is composed of clearlyspecified constructs, informed by scholarship,supported by empirical evidence, and attunedto accessibility concerns.

3. Reading assessments are developed withaccessibility as a goal throughout rigorous andwell-documented test design, development,and implementation procedures.

4. Reading assessments reduce the need foraccommodations, yet are amenable to accom-modations that are needed to make valid infer-ences about a student’s proficiencies.

5. Reporting of reading assessment results isdesigned to be transparent to relevant audi-ences and to encourage valid interpretationand use of these results.There is a great deal of overlap between the

accessibility principles for reading assessments

with the elements of the TAMI and founda-tional concepts in cognitive load theory and gooditem writing. Guideline 1-A requires understand-ing of the full range of student characteristicsand experiences, so that item writers produceitems “that have low memory load requirements”(p. 5). Guideline 1-B requires application of uni-versal design elements at the test developmentstage, including “precisely defined constructs;nonbiased items; and simple, clear, and intuitiveinstructions and procedures” (p. 6). Guideline 2-D suggests developing criteria to specify whenvisuals are used or removed, recognizing themixed results in the research literature on theuse of visuals to enhance reading assessments.Guideline 3-B addresses item development andevaluation, where the “content and format of theitems or tasks may be modified, to some extent toincrease accessibility for all subgroups” (p. 14).This includes the importance of conducting itemanalysis and think-aloud studies, examining itemfunctioning across relevant subgroups. Guideline3-C speaks to test assembly, examining factorsincluding “length of a test, the way items arelaid out on a page, whether the test is computer-administered” (pp. 14–15). These principles andguidelines are intended to provide direction forfuture test design and “a road map for improv-ing the accessibility of current assessments”(p. 23). The principles are a compilation of exist-ing guidance and the first three principles largelyreflect the guidance in the TAMI. The TAMI pro-vides more detailed and explicit direction for itemdevelopment and modification.

Alternative Scoring Models

Measurement specialists have argued that impor-tant information in most forms of items goesuntapped. In part, alternative scoring methodsare associated with alternate administration meth-ods – the administration methods themselves canenhance scoring by providing additional infor-mation about respondents during the adminis-tration process. Attali, Powers, and Hawthorn(2008) investigated the effect of immediate feed-back and revision on the quality of open-ended

Page 230: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

214 M.C. Rodriguez

sentence-completion items. As expected, beingable to revise answers resulted in higher scoresand the reliability of revised scores and cor-relations with criterion measures were higher.Writing high-quality accessible items or modi-fying existing items to maximize accessibility isimportant on the front end to secure responsesthat are construct-relevant, but the degree towhich we obtain construct-relevant information isa function of scoring.

There are several alternate scoring methods forany given item format, including MC items. Mostof these alternate methods are intended to capturepartial knowledge, an idea that is underutilizedwhen assessing students with limited access tothe general curriculum and persistent academicdifficulties. By ignoring information available inwrong responses, we lose what little informa-tion might be available about students with themost challenging academic learning objectives.For example, the ability to recognize that someoptions or responses are more correct than oth-ers should be rewarded. This may take place byallowing students to select more than one pos-sibly correct response and obtain partial credit.Another option is to allow students to rate theirconfidence in the correctness of their answers,where the confidence rating can be used to weightresponses.

Elimination testing was introduced in 1956 byCoombs, Milholland, and Womer, who allowedstudents to mark as many incorrect options asthey could identify, awarding one point for eachoption correctly identified, with a deduction ifthe correct option was identified as incorrect.This process yields scores in a range from –3 to +3 (for four-option items), providing for arange of completely correct response to partialcorrect to completely incorrect. A related methodallows students to identify a subset of options thatincludes the correct answer, with partial creditgiven based on the number of options selectedand whether the correct option is in the selectedsubset. Chang, Lin, and Lin (2007) found thatelimination testing provides a strong techniqueto evaluate partial knowledge and yields a lowernumber of unexpected responses (guessing thatresults in responses inconsistent with overallability, in an Item Response Theory framework)

than standard number correct scoring. Bradbard,Parker, and Stone (2004) also found that elimi-nation testing provides scores of similar psycho-metric quality, reduces guessing, measures partialknowledge, and provides instructionally relevantinformation. They noted that in some courses, thepresence of partial or full misinformation is crit-ical (e.g., health sciences), and instructors willhave to be more purposeful when developing dis-tractors, so that they reflect common errors andmisconceptions (so that they attract students withpartial information and meaning can be inferredfrom their selection). Bush (2001) reported on aliberal MC test method that is parallel to the elim-ination procedure, but instead of asking studentsto select the wrong options, they may select morethan one answer and are penalized for incorrectselections. Bush reported that the higher achiev-ing students liked the method (because of theadditional opportunities to select plausibly cor-rect options), but the lower achieving studentsstrongly disliked it (mostly regarding the negativemarkings for incorrect selections).

Models allowing students to assign a prob-ability of correctness to each option or assignconfidence to their correct response have alsoshown to be useful. Diaz, Rifqi, and Bouchon-Meunier (2007) argued that allowing studentsto assign a probability of correctness to oneor more options in MC items allows “the stu-dent to involve some of the choices that other-wise he wouldn’t consider, enriching his answer”(p. 63), and thus providing a closer picture of thestudent’s knowledge, skills, and abilities.

Pulling It All Together

As item writing and test design continue toimprove, through the application of a wide rangeof accessibility principles, good item writing,and good test design practice, the full arena ofassessment must work in a coherent fashion. Thisbook covers much of this arena, from politicsto classrooms to the technology of test designto obtaining student input. While item writing isonly part of the equation, it is a core component,since everything else depends on the quality ofthe information captured by test items.

Page 231: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

11 Item-Writing Practice and Evidence 215

References

American Educational Research Association, AmericanPsychological Association, & National Council onMeasurement in Education. (1999). Standards for edu-cational and psychological testing. Washington, DC:American Educational Research Association.

Attali, Y., & Burstein, J. (2006). Automated scoring withe-rater v.2.0. Journal of Technology, Learning, andAssessment, 4(3), 1–30.

Attali, Y., Powers, D., & Hawthorn, J. (2008). Effectof immediate feedback and revision on psychometricproperties of open-ended sentence-completion items(ETS RR-08-16). Princeton, NJ: Educational TestingService.

Baldwin, D., Fowles, M., & Livingston, S. (2005).Guidelines for constructed-response and other per-formance assessments. Princeton, NJ: EducationalTesting Service.

Bennett, R. E., Morley, M., Quardt, D., & Rock, D. A.(1999). Graphical modeling: A new response typefor measuring the qualitative component of mathe-matical reasoning (ETS RR-99-21). Princeton, NJ:Educational Testing Service.

Bennett, R. E., Ward, W. C., Rock, D. A., & LaHart, C.(1990). Toward a framework for constructed-responseitems. Princeton, NJ: Educational Testing Service. ED395 032.

Bradbard, D. A., Parker, D. F., & Stone, G. L. (2004).An alternate multiple-choice scoring procedure in amacroeconomics course. Decision Sciences Journal ofInnovative Education, 2(1), 11–26.

Bridgeman, B., & Cline, F. (2000). Variations in meanresponse times for questions on the computer-adaptiveGRE general test: Implications for fair assessment(ETS RR-00-07). Princeton, NJ: Educational TestingService.

Bridgeman, B., Cline, F., & Levin, J. (2008). Effects ofcalculator availability on GRE quantitative questions(ETS RR-08-31). Princeton, NJ: Educational TestingService.

Broer, M., Lee, Y.-W., Rizavi, S., & Powers, D. (2005).Ensuring the fairness of GRE writing prompts:Assessing differential difficulty (ETS RR-05-11).Princeton, NJ: Educational Testing Service.

Browder, D. M., Fallin, K., Davis, S., & Karvonen, M.(2003). Considerations of what may influence stu-dent outcomes on alternate assessment. Educationand Training in Developmental Disabilities, 38(3),255–270.

Browder, D. M., Spooner, F., Algozzine, R., Ahlgrim-Delzell, L., Flowers, C., & Karvonen, M. (2003). Whatwe know and need to know about alternate assessment.Exceptional Children, 70, 45–61.

Bush, M. (2001). A multiple choice test that rewardspartial knowledge. Journal of Further and HigherEducation, 25(2), 157–163.

Chang, S.-H., Lin, P.-C., & Lin, Z. C. (2007). Measuresof partial knowledge and unexpected responses in

multiple-choice tests. Educational Technology &Society, 10(4), 95–109.

Coombs, C. H., Milholland, J. E., & Womer, F. B. (1956).The assessment of partial knowledge. Educational andPsychological Measurement, 16(1), 13–37.

Diaz, J., Rifqi, M., & Bouchon-Meunier, B. (2007).Evidential multiple choice questions. In P.Brusilovsky, M. Grigoriadou & K. Papanikolaou(Eds.), Proceedings of Workshop on Personalisationin E-Learning Environments at Individual and GroupLevel (pp. 61–64). 11th International Conference onUser Modeling, Corfu, Greece. Retrieved September25, 2010 from http://hermis.di.uoa.gr/PeLEIGL/program.html

Downing, S. M. (2006). Selected-response item formats intest development. In S. M. Downing & T. M. Haladyna(Eds.), Handbook of test development (pp. 287–301).Mahwah, NJ: Lawrence Erlbaum.

DuBois, P. H. (1970). A history of psychological testing.Boston: Allyn & Bacon.

Ebel, R. L. (1951). Writing the test item. In E. F.Lindquist (Ed.), Educational measurement (1st ed.,pp. 185–249). Washington, DC: American Council onEducation.

Elliott, S. N., Kettler, R. J., Beddow, P. A., Kurz, A.,Compton, E., McGrath, D., et al. (2010). Effects ofusing modified items to test students with persis-tent academic difficulties. Exceptional Children, 76(4),475–495.

Ferrara, S., & DeMauro, G. E. (2006). Standardizedassessment of individual achievement in K-12. In R.L. Brennan (Ed.), Educational Measurement (4th ed.,pp. 324–345). Westport, CT: Praeger Publishers.

Gallagher, A., Bennet, R. E., & Cahalan, C. (2000).Detecting construct-irrelevant variance in anopen-ended, computerized mathematics task (ETSRR-00-18). Princeton, NJ: Educational TestingService.

Gitomer, D. H. (2007). Design principles for constructedresponse tasks: Assessing subject-matter understand-ing in NAEP (ETS Unpublished Research Report).Princeton, NJ: Educational Testing Service.

Haladyna, T. M. (1997). Writing test items to evaluatehigher order thinking. Boston: Allyn & Bacon.

Haladyna, T. M. (2004). Developing and validatingmultiple-choice test items (3rd ed.). Mahwah, NJ:Lawrence Erlbaum.

Haladyna, T. M., & Downing, S. M. (1989a). A taxon-omy of multiple-choice item-writing rules. AppliedMeasurement in Education, 1, 37–50.

Haladyna, T. M., & Downing, S. M. (1989b). The validityof a taxonomy of multiple-choice item-writing rules.Applied Measurement in Education, 1, 51–78.

Haladyna, T. M., Downing, S. M., & Rodriguez,M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. AppliedMeasurement in Education, 15, 309–334.

Hannah, L. S., & Michaels, J. U. (1977). A comprehensiveframework for instructional objectives. Reading, MA:Addison-Wesley.

Page 232: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

216 M.C. Rodriguez

Hogan, T. P., & Murphy, G. (2007). Recommendationsfor preparing and scoring constructed-response items:What the experts say. Applied Measurement inEducation, 20(4), 427–441.

Johnson, E., & Arnold, N. (2004). Validating an alternateassessment. Remedial and Special Education, 25(5),266–275.

Jones, L. V., & Olkin, I. (Eds.). (2004). The nation’s reportcard: Evolution and perspectives. Bloomington, IN:Phi Delta Kappa Educational Foundation.

Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.),Educational measurement (4th ed., pp. 17–64). NewYork: American Council on Education, Macmillan.

Kettler, R. J., Elliott, S. N., & Beddow, P. A. (2009).Modifying achievement test items: A theory-guidedand data-based approach for better measurement ofwhat students with disabilities know. Peabody Journalof Education, 84, 529–551.

Kettler, R. J., Rodriguez, M. R., Bolt, D. M., Elliott,S. N., Beddow, P. A., & Kurz, A. (in press).Modified multiple-choice items for alternate assess-ments: Reliability, difficulty, and the interactionparadigm. Applied Measurement in Education.

Marion, S. F., & Pellegrino, J. W. (2006). A validityframework for evaluation the technical quality of alter-nate assessment. Educational Measurement: Issuesand Practice, 25(4), 47–57.

Marion, S. F., & Pellegrino, J. W. (2009, April). Validityframework for evaluation the technical quality of alter-nate assessments based on alternate achievement stan-dards. Paper presented at the annual meeting of thenational council on measurement in education, SanDiego, CA.

Messick, S. (1989). Validity. In R. L. Linn (Ed.),Educational measurement (3rd ed., pp. 13–103). NewYork: American Council on Education, Macmillan.

Mueller, D. J. (1975). An assessment of the effectivenessof complex alternatives in multiple choice achieve-ment test items. Educational and PsychologicalMeasurement, 35, 135–141.

Osterlind, S. J., & Merz, W. R. (1994). Building a taxon-omy for constructed-response test items. EducationalAssessment, 2(2), 133–147.

Rodriguez, M. C. (1997, March). The art & science ofitem writing: A meta-analysis of multiple-choice itemformat effects. Paper presented at the annual meet-ing of the American educational research association,Chicago, IL.

Rodriguez, M. C. (2002). Choosing an item format. InG. Tindal & T. M. Haladyna (Eds.), Large-scaleassessment programs for all students (pp. 213–231).Mahwah, NJ: Lawrence Erlbaum.

Rodriguez, M. C. (2003). Construct equivalence ofmultiple-choice and constructed-response items: Arandom effects synthesis of correlations. Journal ofEducational Measurement, 40(2), 163–184.

Rodriguez, M. C. (2005). Three options are optimal formultiple-choice items: A meta-analysis of 80 years

of research. Educational Measurement: Issues andPractice, 24(2), 3–13.

Rodriguez, M. C. (2009). Psychometric considerationsfor alternate assessments based on modified academicachievement standards. Peabody Journal of Education,84, 595–602.

Rodriguez, M. C., Elliott, S. N., Kettler, R. J., & Beddow,P. A. (2009, April). The role of item response attractorsin the modification of test items. Paper presented at theannual meeting of the national council on educationalmeasurement, San Diego, CA.

Ruch, G. M., & Stoddard, G. D. (1925). Comparativereliabilities of objective examinations. Journal ofEducational Psychology, 16, 89–103.

Schmeiser, C. B., & Welch, C. J. (2006). Testdevelopment. In R. L. Brennan (Ed.), EducationalMeasurement (4th ed., pp. 324–328). Westport, CT:Praeger Publishers.

Sireci, S. G., & Zenisky, A. L. (2006). Innovative item for-mats in computer-based testing: In pursuit of improvedconstruct representation. In S. M. Downing & T. M.Haladyna (Eds.), Handbook of test development (pp.329–347). Mahwah, NJ: Lawrence Erlbaum.

Snow, R. E. (1980). Aptitude and achievement. InW. B. Schrader (Ed.), Measuring achievement:Progress over a decade. New Directions for Testingand Measurement (Vol. 5, pp. 39–59). San Francisco:Jossey-Bass.

Sternberg, R. J. (1982). Handbook of human intelligence.Cambridge, MA: Cambridge University Press.

Thorndike, E. L. (1904). An introduction to the theory ofmental and social measurements. New York: TeachersCollege, Columbia University.

Thurlow, M. L., Laitusis, C. C., Dillon, D. R., Cook,L. L., Moen, R. E., Abedi, J., et al. (2009).Accessibility principles for reading assessments.Minneapolis, MN: National Accessible ReadingAssessment Projects. Retrieved September 25, 2010.

Towles-Reeves, E., Kleinert, H., & Muhomba, M. (2009).Alternate assessment: Have we learned anything new?Exceptional Children, 75(2), 233–252.

Welch, C. (2006). Item and prompt development in per-formance testing. In S. M. Downing & T. M. Haladyna(Eds.), Handbook of test development (pp. 303–327).Mahwah, NJ: Lawrence Erlbaum.

Williams, R. I. (1970). Black pride, academic relevance &individual achievement. The Counseling Psychologist,2(1), 18–22.

Ysseldyke, J. E., & Olsen, K. R. (1997). Putting alter-nate assessments into practice: What to measure andpossible sources of data (Synthesis Report No. 28).Minneapolis, MN: University of Minnesota, NationalCenter on Educational Outcomes.

Zigmond, N., & Kloo, A. (2009). The “two percentstudents”: Considerations and consequences of eligi-bility decisions. Peabody Journal of Education, 84,478–495.

Page 233: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

12Language Issues in the Designof Accessible Items

Jamal Abedi

Issues concerning assessments of English lan-guage learner (ELL) students are among the topnational priorities as the number of these studentsis rapidly increasing in the nation. Accordingto a recent report by the U.S. GovernmentAccountability Office, about 5 million ELL stu-dents are currently enrolled in schools, represent-ing approximately 10% of all U.S. public schoolstudents (GAO, 2006). Nationally, ELL enroll-ment has grown 57% since 1995, whereas thegrowth rate for all students has been at less than4% (Flannery, 2009). This rapid growth demandsthat we consistently and accurately determinewhich students require English language services(Abedi & Gándara, 2006).

However, literature on the assessment ofEnglish language learners clearly and consis-tently shows a large performance gap betweenELL and non-ELL students in all content areas(Abedi, 2006; Abedi & Gandara, 2006; Solano-Flores & Trumbull, 2003; Wolf et al., 2008). Theperformance gap decreases as the level of lan-guage demand of test items decreases (Abedi,Leon, & Mirocha, 2003). Thus, the unneces-sary linguistic complexity of assessments mayinterfere with ELL students’ ability to presenta valid picture of what they know and are ableto do.

J. Abedi (�)Graduate School of Education, University of California,Davis, CA 95616, USAe-mail: [email protected]

Language is an important component of anyform of assessments. However, a distinctionmust be made between language that is partof the assessments, and therefore relevant tothe construct being measured, and languagethat may be irrelevant to the focal construct.Unnecessary linguistic complexity may affectthe overall accessibility of assessments, particu-larly for students who are non-native speakers ofEnglish. For example, English language learnersmay have the content knowledge, but may havedifficulty understanding the complex linguisticstructure of content-based assessments such asthose measuring student’s mathematics and sci-ence content knowledge. Therefore, the measureof a student’s content knowledge may be con-founded with their level of English language pro-ficiency. To make assessments more accessiblefor ELL students, test items must clearly com-municate the tasks that students are supposed toperform.

The Concept of Linguistic Accessibilityin Content-Based Assessmentsin English

Researchers have shown that large performancegaps exist between ELL and non-ELL studentsthroughout content areas. These performancegaps widen as the language demand of the testitems increase. In subject areas with high levelsof language demand, such as reading/language

217S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4_12, © Springer Science+Business Media, LLC 2011

Page 234: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

218 J. Abedi

arts, the performance gap between ELL andnon-ELL students is the largest. The gap deceasesin content areas such as math and science, as thelanguage demand of the items decrease withinthese subjects (Abedi, 2010; Abedi, 2006; Abedi,2008; Solano-Flores, 2008).

The main issue with language on assessmentsis relevancy of the language to the content beingassessed. In content areas where the focal con-struct is not language, the concept of languageaccessibility could be applied. Assessments couldbe modified to make them linguistically accessi-ble to all students, particularly those at the lowerlevel of English proficiency spectrum.

Assessments are linguistically accessible ifthey do not contain unnecessary linguistic com-plexity. Some of the linguistic features that arediscussed in this chapter affect understandingof test items but may not be content relatedand may not be relevant to the content beingmeasured. Linguistically accessible assessmentsare less impacted by such features. For exam-ple, as Abedi (2010) explained, accessible read-ing assessments use: (1) familiar or frequentlyused words, (2) words that are generally shorter,(3) short sentences with limited prepositionalphrases, (4) concrete rather than abstract orimpersonal presentations, (5) fewer or no com-plex conditional or adverbial clauses, and (6)fewer or no passive voices. In general, as indi-cated above, these assessments avoid any linguis-tic complexities that are not part of the focalconstruct.

The main focus of this chapter is the impactthat language factors have on the assessment ofEnglish language learners and on the linguis-tic accessibility of content-based assessments. Ifirst discuss the research evidence that points tothe impact that language has on content-basedassessments. I then introduce the linguistic fea-tures that have been shown to have major impactson ELL student performance in content-basedassessments. Finally, I discuss research-basedapproaches to control for the impact of unneces-sary linguistic complexity on the assessment ofELL students.

The Nature and Impact of LanguageFactors on Content-Based AssessmentOutcomes

Performance outcomes of ELL students in con-tent areas such as math and science may be con-founded with their level of English proficiency.That is, ELL students may not have the languagefacilities to understand the assessment questionsor express their content knowledge within open-ended questions written in English. The mainproblem behind the language factors on theseassessments is the fact that these factors may dif-ferentially impact the performance of ELL andnon-ELL students. For many non-ELLs,1 the lan-guage of assessment may not be a key issue asit is for ELL students. Therefore, the compara-bility of assessment outcomes for ELL and non-ELL students may seriously impact the validityof assessments. In situations like these whereassessment outcomes are differentially impactedby a source or sources of construct-irrelevantvariance, the outcomes from the different groupscannot be combined. Aggregating outcomes fromdifferent subgroups seriously violates the compa-rability assumption across the subgroups and itmay also violate assumptions underlying the clas-sical measurement theory. The source of variancedue to measurement error is assumed to be ran-dom within the classical theory of measurement(Thorndike, 2005), but in the assessment of ELLsthe linguistic and cultural factors may be a sys-tematic source of error or bias (see, Abedi, 2006for a detailed discussion of the violation of theassumption underlying the classical test for ELLstudents). To develop accessible assessments forELL students, it is imperative to identify sourcesof linguistic biases that pose threats to the validityof assessment outcomes for these students.

1 Non-ELLs are those who are considered as proficient inEnglish, which includes native speakers of English, non-native speakers who are identified as initially fluent inEnglish and ELL students who were reclassified as fluentEnglish speakers.

Page 235: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

12 Language Issues in the Design of Accessible Items 219

Language factors unrelated to the constructbeing measured strongly affect the academicassessment of ELL students. However, reasonedconclusions about the relevancy of language fac-tors within an assessment are difficult to for-mulate. A team of experts may be able tomake such judgments. This team would includecontent, language, and assessment experts whowould base their decisions on their expert judg-ment and research evidence from studies onthe assessment and accommodation of ELL stu-dents (see Abedi, 2006). The following sectionsof this chapter define linguistic complexity ofthe assessments, illustrate the impact of unnec-essary linguistic complexity on content-basedassessments, and show how a distinction canbe made between linguistic features that arerelated and those that are unrelated to the focalconstruct. In the next sections, I provide anexample of an original version of a math itemand a revised version of the item in which thelevel of unnecessary linguistic complexity wasreduced.

Creating More LinguisticallyAccessible Assessments

Research has shown that language factors sig-nificantly impact student performance, particu-larly performance of students with more chal-lenging academic careers, such as English lan-guage learners and students with disabilities(Abedi, 2010). The impact language carries onperformance becomes more critical when lan-guage is not the target of assessment and thelinguistic complexity of assessment becomes asource of construct-irrelevant variance (Abedi,2006). In this section, I discuss the concept of“unnecessary” linguistic complexity in assess-ments and introduce the linguistic features thatcould affect the accessibility of assessments. Ithen discuss research-based approaches that areused to revise test items to reduce linguisticcomplexity.

Unnecessary Linguistic Complexityin Assessments

Language is an important part of any assess-ments without which the assessment system(test questions and student responses) would beincomplete. For example, when presented withmath word problems, students have to read andcomprehend the math questions and share thedetails of how they propose to solve a particu-lar problem. Language may also be present as anunessential part of the assessment, but may helpfacilitate understanding of assessment items. Forexample, test items presented in a rich languagecontext would be more interesting for test takers.However, sometimes the language used for con-text in the assessment questions may be exces-sively complex and may cause misunderstand-ing and confusion. Therefore, it is of paramountimportance to distinguish between the languagethat is a natural part of the assessment and essen-tial to the assessment process and the languagethat is unrelated to the assessment process.

The first step in this process would be toidentify linguistic features in the test items thatcould potentially impact the assessment out-comes and then provide strategies to control forthese sources of systematic measurement error(bias). In the next section I discuss sources oflinguistic complexity of assessments.

Linguistic Features That May AffectAccessibility of Assessments

Studies focusing on the assessment of, and theaccommodations for, ELL students have identi-fied linguistic features that may not be relatedto the focal construct and may seriously impactstudent performance outcomes for those studentswho are challenged by excessive linguistic com-plexity within assessments (for a more detaileddiscussion of these features see Abedi, 2006;Abedi, Lord, & Plummer, 1997). These featuresslow down the reader, increase the likelihood of

Page 236: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

220 J. Abedi

misinterpretation, and add to the reader’s cogni-tive load, thus interfering with concurrent tasks.

Linguistic features that affect performance out-comes of ELL students include the following:

Linguistic feature Definition

Word frequency/familiarity Words that are high on a general frequency list for English are likely tobe familiar to readers.

Word length Longer words are more likely to be morphologically complex.

Sentence length Sentence length is an index for syntactic complexity and comprehensiondifficulty.

Voice of verb phrase Passive voice constructions are more difficult to process than activeconstructions and can pose a challenge for non-native speakers ofEnglish.

Length of nominals A reader’s comprehension of long nominal compounds may be impairedor delayed by problems in interpreting them.

Complex question phrases Longer question phrases occur with lower frequency than short questionphrases, and low-frequency expressions are harder to read.

Comparative structures Comparative constructions have been identified as potential sources ofdifficulty for ELLs.

Prepositional phrases Languages such as English and Spanish may differ in the ways thatmotion concepts are encoded using verbs and prepositions.

Sentence and discourse structure Two sentences may have the same number of words, but one may bemore difficult due to the syntactic structure.

Subordinate clauses Subordinate clauses may contribute more to complexity than coordinateclauses.

Conditional clauses Separate sentences, rather than subordinate “if” clauses, may be easier tounderstand.

Relative clauses Since relative clauses are less frequent in spoken English than in writtenEnglish, some students may have had limited exposure to them.

Concrete vs abstract presentations Information presented in narrative structures tends to be understoodbetter than information presented in expository text.

Negation Sentences containing negations (e.g., no, not, none, never) are harder tocomprehend than affirmative sentences.

For a detailed description of the features,along with research support of these features, seeAbedi et al. (1997).

Procedures for Linguistic Modificationof Test Items

The identification of the linguistic sources thatraise accessibility issues for ELL students ledto the development of a linguistic modifica-tion methodology for ELL assessments. Usingthe linguistic features presented in Fig. 12.1,two rubrics were developed to judge, identify,and quantify these features (see Abedi, 2006;

Abedi et al., 1996). The first rubric was usedto assess the level of linguistic complexity ofeach test item on each of the 14 linguisticfeatures. This rubric provides an analytic rat-ing for each of the test items (Abedi, 2006,Figure 12.1 of this Chapter). For example, basedon this rubric, the “word frequency/familiarity”in the test item is judged on a 5-pointLikert-Scale ranging from 1 (words that arefrequently used and are quite familiar to thetest takers) to 5 (words that are least fre-quently used and are quite unfamiliar to the testtakers).

The second rubric provides an assessmentof the overall complexity of the test items

Page 237: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

12 Language Issues in the Design of Accessible Items 221

2

ADEQUATE ITEM Sample Features:

Familiar or frequently used words; short to moderate word lengthModerate sentence length with a few prepositional phrases Concrete item No subordinate, conditional, or adverbial clauses No passive voice or abstract or impersonal presentations

3

WEAK ITEM Sample Features:

Relatively unfamiliar or seldom used words Long sentence(s) Abstract concept(s) Complex sentence/conditional tense/adverbial clause A few passive voice or abstract or impersonal presentations

4

ATTENTION ITEM Sample Features:

Unfamiliar or seldom used words Long or complex sentence Abstract item Difficult subordinate, conditional, or adverbial clause Passive voice/ abstract or impersonal presentations

5

PROBLEMATIC ITEM Sample Features:

Highly unfamiliar or seldom used words Very Long or complex sentence Abstract item Very difficult subordinate, conditional, or adverbial clause Many passive voice and abstract or impersonal presentations

LEVEL QUALITY

1

EXEMPLARY ITEM Sample Features:

Familiar or frequently used words; word length generally shorterShort sentences and limited prepositional phrases Concrete item and a narrative structure No complex conditional or adverbial clauses No passive voice or abstract or impersonal presentations

Fig. 12.1 Holistic item ratingrubric. Adapted from Abedi(2006)

(Abedi, 2006). A 5-point Likert-Scale ratingis used to provide a holistic rating of thelinguistic complexity of the test items. TheLikert-Scale ratings range between 1 (exem-plary items with simple linguistic structure) to 5(problematic) on items that have very complexlinguistic structures. For example, items with anoverall rating of 5 use words that are highly unfa-miliar or rarely being used; have very long sen-tences or complex sentences; are abstract; may

have very difficult subordinate, conditional, oradverbial clauses; and have many passive voiceand abstract or impersonal presentations.

After providing linguistic ratings, both interms of individual linguistic features andoverall linguistic complexity, items are groupedinto two categories: those that are linguisticallyaccessible and those that need serious revisionsand restructuring. Items in the second group(need revisions) can then be put through a

Page 238: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

222 J. Abedi

linguistic modification process. The linguis-tic modification process involves revisingthe language features that are judged tobe unrelated to the construct being mea-sured. However, the decision process ofwhat is language related and unrelatedis a challenging task and requires an expert’sjudgment. Therefore, a team of experts workstogether to first identify the complex languagestructure in the item that are not content relatedand then to propose linguistic revisions to makeitems more accessible for ELL students.

To illustrate the process of linguistic modi-fication, a released test item from the NationalAssessment of Educational Progress (NAEP) isused. For illustration purposes only sources ofgrammatical complexity of the item are identifiedand modified. Although grammatical complexityis not a clearly defined construct (Rimmer, 2006),researchers have recommended investigating cat-egories of words – verbs, conjunctions, nouns,etc. – to identify contributors to complex syntaxused in the academic language of school (Hall,White, & Guthrie, 1986). Based on a review ofliterature and expert opinion, six features wereexamined in this question as possible indica-tors of grammatical complexity (Abedi et al.,2010):• Passive verbs (PV)• Complex verbs other than passive (CV)• Relative clauses (RC)• Subordinate clauses other than relative clauses

(SC)• Noun phrases (NP)• Entities (EC)

Using these features provides a systematicmeans of analyzing text for grammatical com-plexity. I first present the original item (Fig. 12.2)with ratings on its grammatical complexity on allof the above six complexity features (Table 12.1)and then present the linguistically revised versionof the item (Fig. 12.3) along with the ratings onthe six features (Table 12.2).

The original test item contains a relative clausethat begins with the potentially awkward com-bination of words “in which,” followed by theremainder of the lengthy clause “the chanceof landing on blue will be twice the chance

Number of blues: _________________

Number of reds: _________________

Fig. 12.2 Original Item Luis wants to make a game spin-ner in which the chance of landing on blue will be twicethe chance of landing on red. He is going to label eachsection either red (R) or blue (B)

of landing on red.” The directive, “Show howhe could label his spinner” contains a com-plex verb (“could label”), but is not in the formof a question: a possible source of confusionfor elementary-age students who use a questionmark as a cue for what the problem requires.This NAEP-released test item also contains threenoun phrases (“game spinner” – noun plus noun,and “the chance of landing on red” and “thechance of landing on blue,” both having the struc-ture noun plus two prepositional phrases). Thethree entities are “Luis,” “the chance,” and “you”(understood in “You show how he could label hisspinner”).

The grammatical complexity in this prob-lem can be reduced by eliminating the relativeand subordinate clauses and the complex verbform; syntax elements are often identified assources of difficulty in reading (Larsen, Parker, &Trenholme, 1978; Haladyna & Downing, 2004).Although using character names in reading hasnot been specifically identified as a complexfeature, presenting the problem as a simple direc-tive to the student (You make a . . .spinner. . .)is a more straightforward presentation of whatis required of the test taker. The require-ment to count the red and blue sections is in

Table 12.1 Linguistic complexity rating for the originalitem

PV CV RC SC NP EC Total

1 1 1 3 3 9

Page 239: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

12 Language Issues in the Design of Accessible Items 223

Make a red and blue game spinner and follow this rule:

The chance of landing on a blue section is two times

the chance of landing on a red section.

How many blue sections and how many red sections do you make?

Number of blues: _________________

Number of reds: _________________

Fig. 12.3 Linguisticallymodified item

Table 12.2 Linguistic complexity rating for the reviseditem

PV CV RC SC NP EC Total

3 3 6

question form, possibly bringing attention towhat is required. The linguistically modifiedproblem, shown below, is reduced from ninegrammatically complex features to six.

Linguistic Modification: PracticalImplications

The linguistic modification approach has beenintroduced as a language-based accommoda-tion to make assessments more accessible forELL students. As discussed in this chapter, thisapproach has been supported by research evi-dence as an effective and valid accommodationfor ELL students. We will discuss the feasibil-ity and logistics surrounding the implementationof this approach as an effective, valid, and rel-evant accommodation for ELL students. Unlikemany other language-based accommodationsthat are implemented during the actual testing

process (such as providing a dictionary or glos-sary), the linguistic modification approach isimplemented during the test development pro-cess. A team of experts carefully examines lin-guistic features of the newly developed tests andrecommends changes in the linguistic structureof the items, if necessary. These recommenda-tions are based on two important assumptions:(a) the recommended changes are not contentrelated and do not alter the focal construct, and(b) they make the assessment more accessible toELL students.

Research and Methodological IssuesRelated to the Linguistic Accessibilityof Assessments

There are several methodological issues relatedto the linguistic modification approach for testingELLs. I present a summary of research focusingon three areas: (a) research on the effectiveness ofthe linguistic modification approach for improv-ing accessibility of assessments for ELLs, (b)research findings on the issues concerning reli-ability of linguistically modified tests, and (c)

Page 240: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

224 J. Abedi

research evidence on the validity of linguisticallymodified tests.

Research on the Effectivenessof Linguistic Modification Approachfor Improving Accessibility ofAssessments for ELLs

Findings from studies on the assessment andaccommodations of ELL students suggest thatthe linguistic modification of test items can bean effective and valid accommodation for ELLstudents. In fact, among many accommodationsused in several experimentally controlled studies,the linguistic modification accommodation wasthe only accommodation that reduced the perfor-mance gap between ELL and non-ELL studentswithout compromising the validity of assess-ments (Abedi, Hofstetter, & Lord, 2004; Abedi,Lord, & Hofstetter, 1998; Abedi, Hofstetter,Lord, & Baker, 2000; Maihoff, 2002).

The effects of some of the linguistic featuresdiscussed previously were examined on a sam-ple of 1,031 eighth-grade students in SouthernCalifornia (Abedi et al., 1997). In this study, themath items for eighth-grade students were mod-ified to reduce the complexity of sentence struc-tures and to replace potentially unfamiliar vocab-ulary with more familiar words without changingthe content-related terminologies (mathematicalterms were not changed). The results showedsignificant improvements in the scores of ELLstudents and also non-ELLs in low- and average-level mathematics classes, but changes did notaffect scores of higher performing non-ELL stu-dents. Low-frequency vocabulary and passivevoice verb constructions were among the lin-guistic features that appeared to contribute tothe differences. These features contributed to thelinguistic complexity of the text and made theassessment more linguistically complex for ELLstudents.

The findings of this study were cross-validatedin another study in which Abedi et al. (1998)examined the impact of linguistic modification onthe mathematics performance of English learn-ers and non-English learners using a sample of

1,394 eighth graders from schools with a highenrollment of Spanish speakers. Results wereconsistent with those of the earlier studies andshowed that the linguistic modification of itemscontributed to improved performance on 49% ofthe items, with the ELL students generally scor-ing higher on shorter/less linguistically complexproblem statements. The results of this study alsosuggested that lower-performing native speak-ers of English also benefited from the linguisticmodification of assessment.

Other investigations also provided cross-validation evidence on the effectiveness of lan-guage modification approach in improving thevalidity of assessments for 1,574 eighth-gradeELL students. The effects of the language modifi-cation approach on reducing the performance gapbetween ELL and non-ELL students were exam-ined in another study (Abedi, Courtney, & Leon,2003) using items from the NAEP and the ThirdInternational Math and Science Study (TIMSS).Students were provided with either a customizedEnglish dictionary (words were selected directlyfrom test items), a bilingual glossary, a linguisti-cally modified test version, or the standard testitems. Only the linguistically modified versionimproved performance of ELL students with-out affecting performance of non-ELL students.Maihoff (2002) also found linguistic modificationof content-based test items to make assessmentsmore accessible for ELL students. Kiplinger,Haug, and Abedi (2000) found that the linguis-tic modification of math items improved perfor-mance of ELL students in math without affectingperformance of non-ELL students. Rivera andStansfield (2001) compared ELL performance onregular and simplified fourth- and sixth-grade sci-ence items. Results of this study showed thatlinguistic modification approach did not affectscores of English-proficient students, indicatingthat linguistic modification is not a threat to scorecomparability.

In general, the research evidence shows lin-guistic complexity as a major source of mea-surement error on the assessment outcomes forELL students. Research findings also suggestthat reducing the level of unnecessary linguis-tic complexity of assessments helps to improve

Page 241: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

12 Language Issues in the Design of Accessible Items 225

assessment validity and reliability for these stu-dents and makes content-based assessments moreaccessible for ELL students.

Research Findings on Languageas a Source of Measurement Errorin Assessments

In classical measurement theory, “Reliabilityrefers to the accuracy or precision of a mea-surement procedure” (Thorndike, 2005, pp.109–110) or “consistency of measurement; thatis, how consistent test scores or other assessmentresults are from one measurement to another”(Linn & Gronlund, 1995, p. 81). Reliabilityis directly affected by measurement error andthere are many sources of measurement errorthat can affect reliability of test scores. Sourcesof measurement errors in paper-and-pencil testscan be from sources such as test format, pag-ination, font size, complex charts and graphs,and crowded pages. Unclear test instructions andproblems with test administration and scoringmay also be sources of measurement error andmay impact reliability of assessments. Sourcesof measurement error often have random distri-bution and may directly or indirectly affect thevalidity of assessment, since reliability puts alimit on the validity of assessment (see, Allen &Yen, 1979; Thorndike, 2005, pp. 191, 192).However, there are sources of measurement errorthat may systematically affect reliability of atest which are often referred to as sources of“bias.” Unnecessary linguistic complexity withinassessments is a major source of measurementerror which systematically affects the reliabilityof assessments for ELL students.

To examine the impact of linguistic complex-ity on the assessment of ELLs, we comparedreliability coefficients for ELL and non-ELL stu-dents on state assessments across several contentareas, including mathematics and science. Sincelanguage factors may differentially impact per-formance of ELL and non-ELL students, wecomputed reliability coefficients separately foreach of the two groups (ELLs and non-ELLs)using internal consistency approach. The main

limitation of the internal consistency approachin classical test theory, however, is the assump-tion of unidimensionality. That is, the inter-nal consistency approach can only be appliedto assessments that measure a single construct(see, for example, Abedi, 1996; Cortina, 1993).Unnecessary linguistic complexity, however, mayintroduce another dimension into the assessment,making the assessment multidimensional. Thehigher the level of impact of linguistic com-plexity of assessment, the more serious is themultidimensionality issue.

To examine the pattern of differences of theinternal consistency coefficient between ELLand non-ELL students, data from several loca-tions nationwide were analyzed (Abedi et al.,2003). Access to the item-level data provided anopportunity to examine item-total correlation andunique contribution of each item to the overallreliability of the test.

The results of internal consistency analysesshowed a large gap in the reliability coeffi-cients (alpha) between ELL and non-ELL stu-dents. The gap in the reliability between thesetwo groups decreased as the level of languagedemand of the assessment reduced. The reliabil-ity coefficients for non-ELL students ranged from0.81 for science and social science to 0.90 formath. For ELL students, however, alpha coef-ficients differed considerably across the contentareas. In math, where language factors mightnot have as great an influence on performance,the alpha coefficient for ELL (0.80) was onlyslightly lower than the alpha for non-ELL stu-dents (0.90). However, in English language arts,science, and social science, the gap on alphabetween non-ELL and ELL students was large.Averaging across English language arts, science,and social science, the alpha for non-ELL was0.81 as compared to an average alpha of 0.60for ELL students. Thus, as elaborated earlier, lan-guage factors introduce a source of measurementerror, negatively affecting ELL students’ test out-comes while their impact on students who arenative or fluent speakers of English is limited (fora more detailed discussion of reliability differ-ences between ELL and non-ELL students, seeAbedi et al., 2003).

Page 242: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

226 J. Abedi

Research Findings on the Impactof Linguistic Complexity on the Validityof Assessments for ELL Students

“Validity refers to the degree to which evi-dence and theory support the interpretations oftest scores entailed by proposed uses of tests”(American Educational Research Association,American Psychological Association, & NationalCouncil on Measurement in Education, 1999,p. 9). If the proposed use of content-based assess-ments as a measure of students’ knowledge incontent areas such as mathematics and science isconfounded by unnecessary linguistic complex-ity of assessments, then the outcomes of the testsmay not be valid. For example, if a test thatis intended to measure algebra in grade 8 hasa linguistic structure so complex that ELL stu-dents have difficulty understanding the questions,then the test actually measures more than whatis intended to measure. In this example, alge-bra content is the relevant or target construct andunnecessary linguistic complexity is the unin-tended or irrelevant construct. The Standardsfor Educational and Psychological Testing(American Educational Research Association[AERA], American Psychological Association[APA], & National Council on Measurement inEducation [NCME], 1999) caution interpretingthe test outcomes for individuals who have notsufficiently acquired the language of the test.

The impact of unnecessary linguistic complex-ity as a source of construct-irrelevant variance,which may differentially impact the validity ofstandardized achievement tests between non-ELLand ELL students, was examined in a study byAbedi et al. (2003). A multiple-group confirma-tory factor analysis model was used to illustratehow linguistic complexity may impact the struc-tural relationships between test items and the totaltest scores as well as between test scores andexternal criteria. In this study, test items weregrouped into parcels, and those parcel scoreswere used to create latent variables. The results ofanalyses indicated major difference in the struc-tural relationship between ELL and non-ELLstudents on relationship between test items andto total test scores, and between test scores and

external criteria. The correlations of item parcelswith the latent factors were consistently lowerfor ELL students than they were for non-ELLstudents. This finding was true for all parcelsregardless of which grade or which sample ofthe population was tested. For example, amongninth-grade ELL students, the correlation for thefour reading parcels ranged from 0.72 to 0.78,across the two samples. In comparison, amongnon-ELL students, the correlations for the fourreading parcels were higher, ranging from 0.83to 0.86, across the two samples. The item parcelcorrelations were also larger for non-ELL stu-dents than for ELL students in math and science.Again these results were consistent across the dif-ferent samples (see Abedi et al., for a detaileddescription of the study; see also Kane, 2006, fora discussion of criterion-related validity).

The data presented above clearly suggest thatELL students perform far behind their non-ELLpeers. Studies have shown that the distortioncaused by construct-irrelevant or “nuisance” vari-ables, such as linguistic biases, may mainly beresponsible for such performance gaps (Abedi,2006).

Practical Steps for ImprovingAssessments for ELL Students

Formative Assessments to Help IdentifyLanguage Issues in the Assessmentsof ELL Students

Formative assessments can serve as useful toolsfor teachers and assessment developers to iden-tify areas where ELL students may need lan-guage assistance (Abedi, 2009). The outcomes ofsummative assessments (e.g., state standardizedassessments) can also be used formatively andcan inform curriculum planning and instructionalpractices for ELL students. However, such out-comes may have limited utility for ELL studentsand their parents when it comes to understand-ing the language issues. Yet most studies onthe impact of language factors on the assess-ments of ELL students are conducted on the datafrom end-of-year state standardized summative

Page 243: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

12 Language Issues in the Design of Accessible Items 227

assessments. While these data are helpful toidentify sources of linguistic factors affect-ing performance of ELL students in general,they may not be helpful for individual studentcases. Assessment outcomes would be too lit-tle too late for teachers to identify sources ofproblem and adjust their instructional plans tohelp ELL students become proficient enoughin English to meaningfully participate in themainstream instruction and assessments (Abedi,2009).

Outcomes of formative assessments, ifdesigned properly, can provide diagnostic infor-mation for teachers to assess areas that the ELLstudents need language assistance in. Such infor-mation may also help state assessment officialsand test publishers to examine the accessibilityissues in the current assessments and to plannecessary revisions to the current assessmentsto make them more linguistically accessible forthese students.

Accessible Assessments at theClassroom Level Versus AccessibleAssessments at the Stateor National Level

Language factors impact all assessments, whetherused locally (in the classrooms), at the districtor state level, or at the national level. Reducing

the level of unnecessary linguistic complexity isshown to make those assessments more accessi-ble and helps provides a more clear interpretationof the assessment outcome. However, due to themuch larger population of test takers at the stateand national levels, more attention is usuallypaid to the large-scale assessment instruments,which has a larger impact on students population.Therefore, it would be extremely helpful to solicitinput from teachers and students in the processof linguistic modification and incorporate thosesuggestions into the process along with the inputfrom the linguistic modification team.

Guidelines and Recommendationsfor Creating More LinguisticallyAccessible Assessments for Students

As discussed earlier in this chapter, the practiceof linguistically modifying assessments starts atthe beginning of the item development process.A linguistic rubric, which has been developedby researchers, can guide the process (for anexample of linguistic rubric, see Abedi, 2006,p. 392). For example, a test item can be ratedon its linguistic complexity in a 5-point LikertScale, 1 being less linguistically complex and 5being complex. Below is the scoring rubric takenfrom Abedi, 2006, page 392, for rating “1”, leastlinguistic complex:

1 Exemplary ItemSample features:• Familiar or frequently used words; word length generally shorter• Short sentences and limited prepositional phrases• Concrete item and a narrative structure• No complex conditional or adverbial clauses• No passive voice or abstract or impersonal presentations

However, the process can be enhanced byintroducing additional checks and balances.Some recommendations to improve the level ofeffectiveness of linguistic modification approachin making assessments more linguistically acces-sible for all students, particularly for ELL stu-dents, include the following:

1. Use a series of cognitive labs and focus groupson the target population for the newly devel-oped assessments to identify linguistic struc-tures that are considered as complex basedon the input from the cognitive lab and focusgroup participants.

Page 244: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

228 J. Abedi

2. Ask a group of experts (linguist, content andmeasurement experts, and a teacher) to iden-tify which of those linguistically complexstatements are content related and which areunnecessary in measuring the focal construct.

3. Ask the group of experts to provide sugges-tions on how to linguistically modify the testitems without altering the focal construct.

4. Ask content experts to review the linguisticrevisions to make sure the linguistic structuresrelated to the focal construct have not beenaltered. In their review of linguistic revisions,experts can use the linguistic modificationrubric that has been discussed above.

Summary and RecommendationsThere are many different factors that mayaffect accessibility of assessments for ELLstudents; among them linguistic factors playa major role on the accessibility of assess-ments for ELL students. Research on theassessment of ELL students has clearly linkedthe outcome of assessments with a student’slanguage background. Students with a lowerlevel of English proficiency may performpoorly in content-based assessments, mainlydue to problems in understanding assessmentquestions and a difficulty in presenting theircontent knowledge in open-ending questions,rather than a lack of content knowledge. Thus,assessment outcomes are confounded with thestudent’s level of English proficiency.

Researchers have illustrated that the lin-guistic modification of test items is a viableoption to make assessments more accessi-ble to ELL students. Under this approach,the linguistic structures of test items that arejudged to be unnecessarily complex are mod-ified based on a rubric that identifies differ-ent features of linguistic complexities. Thedistinction between language that is relatedand language unrelated to the focal con-struct is made by a team of experts includinga linguist, a content expert, and a teacher. Afterall confounding linguistic features are identi-fied, the team of experts provides suggestions

for revisions of the test items to make themmore linguistically accessible to ELL students.

Research on the impact of the linguisticmodification of test items discussed in thischapter has demonstrated that this approach iseffective in making assessments more accessi-ble to ELL students. ELL students who havereceived linguistically modified versions oftests have performed significantly and sub-stantially higher than ELL students who weretested under the original version of test.That is, linguistic modification of test itemsimproved the validity of assessments for ELLstudents. The performance of the non-ELLstudents under linguistically modified versionof assessment remained the same, which is agood indication of the validity of linguisticmodification approach since it does not alterthe construct being measured.

The linguistic modification of assessmentsalso helps improve the psychometric qualityof the assessments. A major source of sys-tematic error can be controlled by reducingthe linguistic complexity of items, and thereliability of the assessment can be substan-tially improved. Summaries of the studies pre-sented in this chapter demonstrate the impactlanguage factors have on the reliability ofassessments and how such factors can be con-trolled in order to improve the reliability ofassessments. Similarly, results of studies pre-sented in this chapter showed how linguisticmodification improves the validity of tests byreducing the impact of unnecessary linguisticcomplexity as a source of construct-irrelevantvariance.

To summarize, standardized achievementtests that are used for assessment and account-ability purposes must control for the culturaland linguistic biases that threaten the valid-ity of interpretation for ELL students. Thischapter discussed the sources of threats due tolinguistic complexities and approaches of howto deal with them. Finally, recommendationson how to make assessments more accessibleto ELL students were provided.

Page 245: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

12 Language Issues in the Design of Accessible Items 229

References

Abedi, J. (1996). The interrater/test reliability system(ITRS). Multivariate Behavioral Research, 31(4),409–417.

Abedi, J. (2006). Language issues in item-development. InS. M. Downing & T. M. Haladyna (Eds.), Handbookof test development (pp. 377–398). Mahwah, NJ:Lawrence Erlbaum.

Abedi, J. (2008). Measuring students’ level of englishproficiency: Educational significance and assessmentrequirements. Educational Assessment, 13, 2–3.

Abedi, J. (2010). Linguistic factors in the assessment ofEnglish language learners. In G. Walford, E. Tucker& M. Viswanathan (Eds.), The sage handbook ofmeasurement (pp. 377–398). Oxford: Sage.

Abedi, J., Courtney, M., & Leon, S. (2003). Effectivenessand validity of accommodations for English lan-guage learners in large-scale assessments (CSETechnical Report No. 608). Los Angeles: Universityof California, National Center for Research onEvaluation, Standards, and Student Testing.

Abedi, J., & Herman, J. L. (2010). Assessing Englishlanguage learners’ opportunity to learn mathemat-ics: Issues and limitations. Teachers College Record,112(3), 723–746.

Abedi, J., Hofstetter, C. H., & Lord, C. (2004).Assessment accommodations for english languagelearners: Implications for policy-based empiricalresearch. Review of Educational Research, 74(1), 1–28.

Abedi, J., Kao, J., Leon, S., Bayley, R., Ewers, N.,Herman, J., et al. (2010). Accessible reading assess-ments for students with disabilities: The role of cog-nitive, linguistic, and textual features. Manuscript inpreparation.

Abedi, J., Leon, S., & Mirocha, J. (2003). Impactof students’ language background on content-basedassessment: Analyses of extant data. Los Angeles:University of California, National Center for Researchon Evaluation, Standards, and Student Testing.

Abedi, J., & Lord, C. (2001). The language factor inmathematics tests. Applied Measurement in Education,14(3), 219–234.

Abedi, J., Lord, C., & Hofstetter, C. (1998). Impactof selected background variables on students’ NAEPmath performance (CSE Technical Report No. 478).Los Angeles: University of California, National Centerfor Research on Evaluation, Standards, and StudentTesting.

Abedi, J., Lord, C., Hofstetter, C., & Baker, E. (2000).Impact of accommodation strategies on englishlanguage learners’ test performance. EducationalMeasurement: Issues and Practice, 19(3), 16–26.

Abedi, J., Lord, C., & Plummer, J. (1997). Languagebackground as a variable in NAEP mathematics per-formance. Los Angeles: University of California,National Center for Research on Evaluation,Standards, and Student Testing.

Adams, M. J. (1990). Beginning to read: Thinkingand learning about print. Cambridge, MA: MITPress.

Allen, M. J., & Yen, W. M. (1979). Introduction tomeasurement theory. Monterey, CA: Brooks/Cole.

American Educational Research Association, AmericanPsychological Association, & National Council onMeasurement in Education. (1999). Standards for edu-cational and psychological Testing. Washington, DC:American Educational Research Association.

Baugh, J. (1988). Review of the article twice as less: BlackEnglish and the performance of black students in math-ematics and science. Harvard Educational Review,58(3), 395–404.

Bormuth, J. R. (1966). Readability: A new approach.Reading Research Quarterly, 1(3), 79–132.

Botel, M., & Granowsky, A. (1972). A formula formeasuring syntactic complexity: A directional effort.Elementary English, 49, 513–516.

Celce-Murcia, M., & Larsen-Freeman, D. (1983). Thegrammar book: An ESL/EFL teacher’s book. Rowley,MA: Newbury House.

Chall, J. S., Jacobs, V., & Baldwin, L. (1990). The readingcrisis: Why poor children fall behind. Cambridge, MA:Harvard University Press.

Cortina, J. M. (1993). What is coefficient alpha? An exam-ination of theory and applications. Journal of AppliedPsychology, 78(1), 98–104.

Finegan, E. (1978, December). The significance of syntac-tic arrangement for readability. Paper presented at thepaper presented to the linguistic society of America,Boston.

Freeman, G. G. (1978). Interdisciplinary evaluation ofchildren’s primary language skills. ERIC Microfiche,ED157341.

Gathercole, S. E., & Baddeley, A. D. (1993). Workingmemory and language: Hillsdale, NJ: Erlbaum.

Haladyna, T. M., & Downing, S. M. (2004). Construct-irrelevant variance in high-stakes testing. EducationalMeasurement: Issues and Practice, 23(1), 17–27.

Hall, W. S., White, T. G., & Guthrie, L. (1986). Skilledreading and language development: Some key issues.In J. Orasanu (Ed.), Reading comprehension fromresearch to practice (pp. 89–111). Hillsdale, NJ:Erlbaum.

Halliday, M. A. K., & Martin, J. R. (1994). Writing sci-ence: Literacy and discursive power. Pittsburgh, PA:University of Pittsburgh Press.

Hunt, K. W. (1965). Grammatical structures written atthree grade levels (NCTE Research Report No. 3).Urbana, IL: National Council of Teachers of English.

Hunt, K. W. (1977). Early blooming and late bloom-ing syntactic structures. In C. R. Cooper & L. Odell(Eds.), Evaluating writing: Describing, measuring,judging (pp. 91–106). Urbana, IL: National Council ofTeachers of English.

Jones, P. L. (1982). Learning mathematics in a secondlanguage: A problem with more and less. EducationalStudies in Mathematics, 13(3), 269–287.

Page 246: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

230 J. Abedi

Just, M. A., & Carpenter, P. A. (1980). A theoryof reading: From eye fixation to comprehension.Psychological Review, 87, 329–354.

Kane, M. T. (2006). Validation. In R. L. Brennan(Ed.), Educational measurement (4th ed., pp.17–64). Westport, CT: American Council onEducation/Praeger.

King, J., & Just, M. A. (1991). Individual differ-ences in syntactic processing: The role of work-ing memory. Journal of Memory and Language, 30,580–602.

Kiplinger, V. L., Haug, C. A., & Abedi, J. (2000).Measuring math – not reading – on a math assessment:A language accommodations study of English lan-guage learners and other special populations. Paperpresented at the presented at the annual meeting ofthe American Educational Research Association, NewOrleans, LA.

Larsen, S. C., Parker, R. M., & Trenholme, B. (1978).The effects of syntactic complexity upon arithmeticperformance. Educational Studies in Mathematics, 21,83–90.

Lemke, J. L. (1986). Using language in classrooms.Victoria, Australia: Deakin University Press.

Linn, R. L., & Gronlund, N. E. (1995). Measuring andassessment in teaching (7th ed.). Englewood Cliffs,NJ: Prentice-Hall.

Lord, C. (2002). Are subordinate clauses more difficult?In J. Bybee & M. Noonan (Eds.), Subordination indiscourse. Amsterdam: John Benjamins.

MacDonald, M. C. (1993). The interaction of lexicaland syntactic ambiguity. Journal of Memory andLanguage, 32, 692–715.

MacGinitie, W. H., & Tretiak, R. (1971). Sentence depthmeasures as predictors of reading difficulty. ReadingResearch Quarterly, 6, 338–363.

Maihoff, N. A. (2002, June). Using Delaware data in mak-ing decisions regarding the education of LEP students.

Paper presented at the paper presented at the coun-cil of chief state school officers 32nd annual nationalconference on large-scale assessment, Palm Desert,CA.

Messick, S. (1994). The interplay of evidence and conse-quences in the validation of performance assessments.Educational Researcher, 23(2), 13–23.

Mestre, J. P. (1988). The role of language comprehen-sion in mathematics and problem solving. In R. R.Cocking & J. P. Mestre (Eds.), Linguistic and culturalinfluences on learning mathematics (pp. 200–220).Hillsdale, NJ: Erlbaum.

Orr, E. W. (1987). Twice as less: Black English and theperformance of black students in mathematics andscience. New York: W. W. Norton.

Pauley, A., & Syder, F. H. (1983). Natural selection insyntax: Notes on adaptive variation and change in ver-nacular and literary grammar. Journal of Pragmatics,7(5), 551–579. doi:10.1016 0378-2166(83)90081-4.

Rimmer, W. (2006). Measuring grammatical complexity:The Gordian knot. Language Testing, 23(1), 497–519.

Rivera, C., & Stansfield, C. W. (2001, April). The effectsof linguistic simplification of science test items on per-formance of limited English proficient and monolin-gual English-speaking students. Paper presented at thepaper presented at the annual meeting of the Americaneducational research association, Seattle, WA.

Schachter, P. (1983). On syntactic categories.Bloomington, IN: Indiana University LinguisticsClub.

Thorndike, R. M. (2005). Measurement and evaluation inpsychology and education. Upper Saddle River, NJ:Pearson, Merrill.

Wang, M. D. (1970). The role of syntactic complexity asa determiner of comprehensibility. Journal of VerbalLearning and Verbal Behavior, 9(4), 398–404.

Page 247: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

13Effects of Modification Packagesto Improve Test and ItemAccessibility: Less Is More

Ryan J. Kettler

The final regulations for the No Child Left Behind(NCLB) (U.S. Department of Education, 2007a,2007b) Act indicated that some students with dis-abilities may take an alternate assessment basedon modified academic achievement standards(AA-MAS), a version of the general assessmenttest that has been modified to increase the valid-ity of test score inferences for students identifiedwith a disability. The AA-MAS policy is intendedto help those students with disabilities who areexposed to grade level material but who becauseof their disabilities persistently fail to obtain pro-ficiency, to better show what they know and cando. This assessment may be used for all eligiblestudents, but only 2% of the entire student pop-ulation may take the assessment and be reportedproficient within a district or state.

Modifications is the term that refers to changesthat are made to a test’s content or item format,when evidence has not yet been gathered indicat-ing whether the construct measured by the testhas been preserved (Kettler, Elliott, & Beddow,2009; Koretz & Hamilton, 2006; Phillips &Camara, 2006). Most states that are develop-ing AA-MASs are doing so by using packagesof modifications (e.g., removing one incorrectanswer choice, removing unnecessary words, and

R.J. Kettler (�)Department of Special Education, Peabody College ofVanderbilt University, Nashville, TN 37067, USAe-mail: [email protected]

adding white space), rather than individual modi-fications in isolation, on items from the generalexamination in order to improve access to theexam for the eligible population. (The reader isdirected to Chapter 9, this volume or to Beddow[in press] for examples of items prior to andfollowing the application of a package of modifi-cations.) The need to gather validity evidence forachievement tests that have been modified usingthese packages has inspired a growing body ofresearch (Elliott et al., 2010; Kettler et al., inpress; Roach, Beddow, Kurz, Kettler, & Elliott,2010). Kettler (in press) described a frameworkfor collecting and interpreting such evidence,and applied it to early research studies in thearea; this chapter serves as an update of thatresearch, with a narrower focus on those studieswhich involve packages of modifications. Whileresearch (Russell & Famularo, 2009; Smith &Chard, 2007) has been completed on individualmodifications, this work is less representative ofthe process in which developers of state assess-ments currently are engaging.

The AA-MAS Policy

The AA-MAS policy is intended to help thosestudents for whom the general achievement testwould be too difficult, but for whom an alternateassessment based on alternate academic achieve-ment standards (AA-AAS) would be too easy.The AA-AASs are designed for students with

231S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4_13, © Springer Science+Business Media, LLC 2011

Page 248: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

232 R.J. Kettler

significant cognitive disabilities who are unableto meaningfully complete tests, and are, there-fore, typically assessed using rating scales, port-folios, or performance assessments (Elliott &Roach, 2007). Students for whom AA-MASs areintended are able to complete tests in standardformats, often with the support of modifications,but cannot be accurately assessed using a generalassessment.

The new policy includes three criteria for iden-tifying students for whom an AA-MAS would beappropriate: (a) each student must have an indi-vidualized education program (IEP) with contentstandards for the grades in which she or he isenrolled, (b) each student’s disability must haveprecluded her or him from achieving proficiencyas demonstrated on the state assessment or onanother assessment that can validly documentacademic achievement, and (c) each student’sprogress in response to appropriate instructionand based on multiple measurements is such thateven if significant growth occurs, the IEP team isreasonably certain that she or he will not achievegrade-level proficiency within the year (U.S.Department of Education, 2007a). Measurementfor statewide proficiency, historically, has notbeen very precise for students with disabilitieswho would be eligible (SWD-Es) for an AA-MAS, theoretically because (a) their disabilitiesincrease their sensitivity to barriers to assessment(e.g., long reading passages, confusing graphics,single items spread across multiple pages) thatoften are present on such tests and (b) the itemsare too difficult on average to give maximuminformation on these students (Kettler, in press).

The final regulations allow that AA-MASs may be less difficult, so long as theyremain linked to grade-level content standards.Modifications resulting in less difficult itemsmay yield tests that are more informative forSWD-Es, but evidence for such tests must gobeyond changes in mean scores and item diffi-culty to more direct indicators of measurementprecision. The Standards for Educational andPsychological Testing (the Standards, AmericanEducational Research Association [AERA],American Psychological Association [APA], &National Council on Measurement in Education

[NCME], 1999) present an approach to eval-uating tests based on evidence regarding testcontent, response processes, relations to othervariables, internal structure, and consequencesof testing. (The reader is directed to Kettler [inpress] for a review of these forms of evidence asthey relate to research that has been done, or maybe done in the future, to evaluate AA-MASs.)The approach and evidence types described inthe Standards should be a guide in determiningwhether modified tests are more precise mea-sures of the same constructs that are targeted bygeneral achievement tests.

This chapter reviews research from (a)state technical manuals (Louisiana Departmentof Education, 2009b; Poggio, Yang, Irwin,Glasnapp, & Poggio, 2006 [Kansas]; TexasStudent Assessment Program, 2007) and (b)experimental studies (Elliott et al., 2009, 2010;Kettler et al., in press) on the effects of pack-ages of modifications to items and tests. Eachof these studies addressed a different set ofmodification packages, question types, grade lev-els, and content areas; Table 13.1 depicts thecharacteristics addressed by the three state testtechnical manuals and three experimental stud-ies that were primary sources of evidence forthis review. Estimates from the general assess-ment are included for comparison, because themeasurement of knowledge and skills using thegeneral assessment taken by students who donot qualify for an alternate assessment remainsthe “gold standard” by which AA-MASs shouldbe judged (Kettler, in press). Table 13.2 depictscoefficient alpha across the three states bytest, content area, and grade level or gradeband.

State-Modified Achievement Tests

By October of 2010, three states – Kansas,Louisiana, and Texas – had AA-MASs that hadbeen approved by the federal government. Allthree states had technical data for their AA-MASsposted on the websites of the departments ofeducation. Data related to these examinations arediscussed next.

Page 249: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

13 Packages of Modifications 233

Table 13.1 Characteristics of AA-MAS studies

Grades

Question types Examples of modifications studied Reading/ELA Math Science

AA-MAS technical manuals

KAMM MC Remove a response option, select lessitems and less complex items

3–8, 11 3–8, 10

LAA 2 MC, CR Select items from general pool that areless cognitively complex

4–10 4–10 4, 8, 11

TAKS-M MC, CR Remove a response option, increase font,change layout, simplify text

3–8, 10 3–8, 10 5, 8, 10

Experimental studies

CAAVES MC Remove a response option, simplifylanguage, add graphic, change layout

8 8

CMAADI MC Remove a response option, embedquestions in passages, simplify graphics

7 7

OAASIS MC Remove a response option, voice-over,simplify language, simplify graphics

9–12

Note: ELA = English/language arts; AA-MAS = alternate assessment based on modified academic achievementstandards; CAAVES = Consortium for Alternate Assessment Validity and Experimental Studies; CMAADI =Consortium for Modified Alternate Assessment Development and Implementation; KAMM = Kansas Assessment ofModified Measures; LAA 2 = Louisiana Educational Assessment Program Alternate Assessment, Level 2; OAASIS =Operationalizing Alternate Assessment of Science Inquiry Skills; TAKS-M = Texas Assessment of Knowledge andSkills-Modified; MC = multiple choice; CR = constructed response

Kansas Assessments of MultipleMeasures

The Kansas Assessments of Multiple Measures(KAMM) are modified assessments in readingand mathematics developed with the intentionthat items would be less complex than thoseused on the Kansas General Assessments (Poggioet al., 2006). In both reading and mathematics,the KAMM include less items than do the generalassessments, and all of the items are in a multiplechoice format with three response options. Thereading test is composed of items independentfrom the general assessment. The mathematicstest is composed of items selected from the gen-eral pool and modified, and students are allowedto use a calculator throughout it.

The technical manual for the KAMM includesreliability estimates in the form of coefficientalpha, as well as classification accuracy and clas-sification consistency. These estimates tend tobe similar in value to those observed on thegeneral assessment. Coefficient alphas are in anacceptable range (i.e., < 0.80) and lower, but

close to those reported for the general assess-ment at most grade levels, with the exception ofmathematics at the higher grades. The only valid-ity evidence included in the KAMM technicalmanual is a description of the connection betweengrade-level indicators and the items selected forthe mathematics test.

Louisiana Educational AssessmentProgram Alternate Assessment, Level 2

The Louisiana Educational Assessment ProgramAlternate Assessment, Level 2 (LAA 2) iscomposed of multiple choice and constructedresponse items selected from the general assess-ment pool based on data from the general edu-cation and special education populations. TheLAA 2 covers reading, mathematics, science, andsocial studies across various content areas, usinga smaller number of items than are used for thegeneral assessment. Teams of teachers reviewedthe items for appropriateness for the eligiblepopulation, potential for bias, content relevance,

Page 250: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

234 R.J. Kettler

Table 13.2 Coefficient alphas across three states by test, content area, and grade level or band

Kansas Louisiana Texas

Grade level/band AA-MAS Generala AA-MASb General AA-MASc General

Reading/English/Language arts

Third 0.86 0.88–0.90 – 0.93 0.82 0.89

Fourth 0.88 0.91–0.92 0.74 (0.83) 0.89 0.84 0.88

Fifth 0.89 0.88–0.92 0.68 (0.89) 0.92 0.87 0.87

Sixth 0.86 0.92–0.93 0.76 (0.92) 0.93 0.85 0.88

Seventh 0.86 0.92–0.94 0.78 (0.94) 0.93 0.86 0.91

Eighth 0.86 0.92–0.94 0.74 (0.84) 0.88 0.88 0.88

Ninth – – 0.76 (0.95) 0.95

Tenth 0.90 0.92–0.93 0.81 (0.89) 0.87 0.86 0.91

Mathematics

Third 0.87 0.91–0.93 – 0.90 0.84 0.88

Fourth 0.87 0.91–0.92 0.86 (0.90) 0.92 0.82 0.89

Fifth 0.87 0.91–0.92 0.79 (0.81) 0.86 0.85 0.89

Sixth 0.86 0.93–0.95 0.74 (0.80) 0.93 0.74 0.92

Seventh 0.82 0.94–0.95 0.71 (0.77) 0.89 0.76 0.92

Eighth 0.81 0.94–0.95 0.80 (0.85) 0.92 0.76 0.91

Ninth – – 0.69 (0.76) 0.91

Tenth 0.75 0.94–0.95 0.72 (0.79) 0.92 0.69 0.94

Science

Elementary school – – 0.82 (0.86) 0.86 0.84 0.84

Middle school – – 0.76 (0.80) 0.88 0.81 0.90

High school – – 0.76 (0.80) 0.86 0.79 0.91aA range of reliabilities were reported on the Kansas general assessment, because multiple forms were usedbValues in parentheses are based on a Spearman-Brown estimate and the length of the general assessmentcOn Texas’s AA-MAS, English/language arts was measured at the high school level, and a stratified coefficient alphawas used. AA-MAS = alternate assessment based on modified academic achievement standards

cognitive complexity, and format (LouisianaDepartment of Education, 2009b).

Reliability evidence for the LAA 2 is reportedas coefficient alpha, via an estimate based onthe Spearman-Brown formula that projects whatalpha would be if the test were the length of thegeneral assessment, and also using a stratifiedalpha that may more appropriately characterizea test that includes multiple item types. In all17 comparisons at various grades and contentareas, coefficient alpha for the LAA 2 is lowerthan coefficient alpha for the general assessment,with the difference exceeding 0.10 in 12 com-parisons (Louisiana Department of Education,2009a, 2009b, 2009c). The technical summaryincludes three reasons for this difference: (a) the

LAA 2 tests are shorter, (b) the LAA 2 popu-lation is smaller, and (c) the LAA 2 populationis more homogenous (Louisiana Department ofEducation, 2009b).

While it is unclear how a smaller popula-tion would systematically result in lower reliabil-ity estimates, it is definitely true that reliabilityestimates tend to be lower when using a morehomogenous population. It is also true that theLAA 2 tests are shorter, and that shorter teststend to be less reliable, which highlights a crit-ical issue with the strategy that many statesare using to develop an AA-MAS. While thepolicy is intended to improve measurement foreligible students, it is unlikely the reliability ofscores representing these students’ knowledge

Page 251: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

13 Packages of Modifications 235

and skills can be increased as the number ofitems is decreased. The Spearman-Brown esti-mate is, therefore, not appropriate for comparingreliabilities of shorter AA-MASs with those oflonger general examinations, because it obscuresthe fact that the AA-MASs are less reliable atthe lengths that are actually used. However, it isinformative with regard to the types of reliabili-ties that could be obtained using AA-MASs thathave as many items as the corresponding gen-eral assessments. The coefficient alphas for theLAA 2 that were calculated using the Spearman-Brown formula are much closer to the coefficientalphas for the general assessment, and only tendto differ and fall below the acceptable rangein mathematics and science at the higher gradelevels. The stratified alpha in every case acrossgeneral assessments and AA-MASs is equal toor greater than the coefficient alpha, but is within0.03. Validity evidence presented in the technicalmanual is limited to a description of the itera-tive process intended to maintain content validity,which featured a diverse group of experts and anemphasis on the statewide content standards.

Texas Assessment of Knowledgeand Skills – Modified

The Texas Assessment of Knowledge and Skills –Modified (TAKS-M) is composed of multiple-choice and constructed response items, whichoriginated from the general assessment, andwere modified by the Texas Education Agencybased on feedback from teachers and from aspecial task force (Texas Student AssessmentProgram, 2007). The agency used a list ofmodification strategies specified by content areato develop AA-MASs in reading/English lan-guage arts (e.g., simplify difficult vocabulary,divide section into meaningful units), mathemat-ics (e.g., delete extraneous information, simplifygraphics), and science (e.g., delete one part ofa compound answer, provide appropriate formu-las). The items were then subjected to reviews forappropriateness of modification, maintenance ofthe original construct, and connection to class-room instruction.

Reliability for the TAKS-M is reported asthe Kuder-Richardson 20, which is equivalent tocoefficient alpha but for dichotomous data, andas a stratified coefficient alpha, where appro-priate. Values for these coefficients are lowerin all cases than corresponding values from thegeneral assessment, but tend to be similar and inthe acceptable range on reading/English languagearts tests and on all content areas at the ele-mentary grade levels (Texas Education Agency,2009; Texas Student Assessment Program, 2007).Reliability estimates tend to diverge more, withcoefficients for the AA-MAS falling below theacceptable range, on mathematics and sciencetests at higher grade levels. The technical manualfor the TAKS-M includes extensive informationon the process used to ensure alignment to thecontent standards, as a method of maintainingcontent validity. The manual also includes evi-dence based on relations with other variables,indicating that the reading and mathematics testsare measures of related but separate constructs,and that neither measure was inappropriatelyrelated to demographic variables such as genderor ethnicity.

In addition to research to evaluate stateAA-MASs, experimental studies of the effectsof modification packages on items and testshave been completed as part of three feder-ally funded projects. These studies are discussednext.

Experimental Studies of Modifications

The first large-scale, experimental research studyon the validity of packages of test modificationswas completed as part of the Consortium forAlternate Assessment Validity and ExperimentalStudies (CAAVES1). The study influenced futurestudies by:

1 CAAVES was a U.S. Department of EducationEnhanced Assessment grant codirected by ElizabethCompton and Stephen N. Elliott. Several studies on itemmodification were conducted within this multistate projectduring 2007–2009.

Page 252: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

236 R.J. Kettler

(a) using packages of modifications designedby teams of educators to yield originaland modified conditions of items withoutchanging the connection to grade-level con-tent standards. [The reader is directed toKettler, Elliott, and Beddow (2009) for adescription of this Modification Paradigm,as well as for a list of theory-based andresearch-supported modifications.],

(b) including groups of eligible students andcontrol groups identified by special educatorsbased on federal criteria,

(c) incorporating a within-subjects design thatcounterbalanced the order of conditions andof equal-length forms, and

(d) analyzing the impact of modifications on reli-ability and student performance on tests anditems.

Replication studies have been conducted aspart of the Consortium for Modified AlternateAssessment Development and Implementation(CMAADI2), and Operationalizing AlternateAssessment for Science Inquiry Skills(OAASIS3) projects.

Consortium for Alternate AssessmentValidity and Experimental Studies

As part of the CAAVES project, Elliott et al.(2010) administered short sets of items in read-ing and mathematics to eighth-grade students(n = 755) in four states. Each participant waspart of one of three groups: students without dis-abilities (SWODs), students with disabilities who

2 CMAADI was a U.S. Department of Education GeneralSupervision Enhancement grant codirected by StephenN. Elliott, Michael C. Rodriguez, Andrew T. Roach,and Ryan J. Kettler. Several studies on item modifica-tion were conducted within this multistate project during2007–2010.3 OAASIS was a U.S. Department of Education EnhancedAssessment grant directed initially by Courtney Fosterand ultimately by John Payne. Several studies on itemmodification were conducted within this multistate projectduring 2008–2010.

would not be eligible for an AA-MAS (SWD-NEs), and students with disabilities who would beeligible (SWD-Es). Each student completed setsof equal length in a condition with original items,in which each student read unmodified itemssilently from a computer screen, and in two con-ditions with modified items: (a) Modified and (b)Modified with Reading Support. In the Modifiedcondition students read modified items silentlyfrom a computer screen, and in the Modifiedwith Reading Support condition some elementsof the items (i.e., the directions, stem, and answerchoices, but not passages or essential vocabulary)were read aloud to the students using voice-overtechnology. The two modified conditions in read-ing had only 72% of the total number of words,when compared to the Original condition, andthe modified conditions in mathematics had only74% of the total number of words.

The researchers used the Spearman-Brownprophecy formula to estimate the reliability coef-ficients of hypothetical test forms with 39 items, alength more representative of a state achievementtest. Table 13.3 includes a summary of coefficientalpha estimates across studies and conditions. Allreliabilities across groups, conditions, and con-tent areas were acceptable, ranging between 0.85and 0.94. The estimates were not meaningfullyhigher or lower for any group in the Originalcondition versus the two modified conditions,indicating that the modifications did not changethe reliability of the test.

While large differences were not found inreliabilities across groups and conditions in theCAAVES study, it is possible that items in theOriginal condition were more accessible thanthose that are typically found on state generalassessments (Kettler, in press). The average p-values (i.e., percent of items correct) in theOriginal condition (SWODs = 64%, SWD-NEs= 51%, and SWD-Es = 34%; Elliott et al.,2010) indicate that the difficulty of the test mayhave been more appropriate for a group of stu-dents with disabilities. An average p-value of50% for the intended population of test-takersis ideal when trying to maximize discrimina-tion. Therefore, the results of the CAAVES studymay have overestimated the reliability of general

Page 253: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

13 Packages of Modifications 237

Table 13.3 Coefficient alpha estimates by condition, study, content area, and group

Grade level Original ModifiedModified withreading support

CAAVES reading(13 items projected to 39 items)

SWODs 0.91–0.92 0.93–0.94 0.93–0.94

SWD-NEs 0.90–0.91 0.90–0.91 0.91–0.92

SWD-Es 0.88–0.89 0.88–0.89 0.88–0.89

CAAVES mathematics(13 items projected to 39 items)

SWODs 0.88–0.89 0.89–0.90 0.89–0.90

SWD-NEs 0.86–0.87 0.86–0.87 0.86–0.88

SWD-Es 0.86–0.88 0.86–0.88 0.86–0.87

CMAADI reading(20 items projected to 40 items)

SWODs 0.80 0.66 –

SWD-Es – 0.52 –

CMAADI math(20 items projected to 40 items)

SWODs 0.84 0.66 –

SWD-Es 0.77 0.77 –

OAASIS science(20 items projected to 60 items)

SWODs 0.88 – 0.83–0.87

SWD-NEs 0.65–0.84 – 0.79–0.86

SWD-Es 0.65–0.85 – 0.76–0.78CAAVES = Consortium for Alternate Assessment Validity and Experimental Studies; CMAADI =Consortium for Modified Alternate Assessment Development and Implementation; OAASIS =Operationalizing Alternate Assessment of Science Inquiry Standards; SWODs = students with-out disabilities; SWD-NEs = students with disabilities – not eligible; SWD-Es = students withdisabilities – eligible

achievement tests for SWD-Es, and consequentlyunderestimated the potential improvement in reli-ability through modification.

Kettler et al. (in press) used an item responsetheory (IRT) framework on the same data setto test for differential boost based on modifi-cations. Differential boost is a concept that hashistorically been used to evaluate testing accom-modations, based on the idea that an appropriateaccommodation should help students who need itmore than those who do not (Fuchs et al., 2000;Kettler & Elliott, 2010). Differential boost fitswithin the domain of evidence based on rela-tions with other variables, and in modification

studies would be apparent when the boost for eli-gible students is greater than the boost for one ormore control groups. In statistical terms, a boostis expressed as an effect size, which reflects themagnitude of the relationship between variables –in this case, the condition (original versus a mod-ified condition) and the test scores. A differentialboost is found when the magnitude of boosts issignificantly different for different groups. In theCAAVES study, the researchers found significantdifferential boosts between the Original condi-tion and the Modified condition, indicating thatSWD-Es benefited more from the modificationsthan did SWODs, when differences in ability

Page 254: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

238 R.J. Kettler

Coh

en’s

d

Fig. 13.1 Boosts by groupacross modificationconditions, studies, andcontent areas. rs = readingsupport

level among groups were controlled in a Raschmodel. Significant differential boosts were alsoapparent in this model when comparing SWD-Eswith SWD-NEs in reading, or when comparingthe Original condition with the Modified withReading Support condition in mathematics.

Elliott et al. (2010) analyzed the same datawithin a classical test theory framework, find-ing boosts that were significant for all groupsbetween the Original condition and the two mod-ified conditions, but not finding any boosts thatwere differentially significant. Figure 13.1 depictsthe boosts experienced by each group acrossmodification conditions and content areas forthe three experimental studies reviewed in thischapter. All effect sizes in Fig. 13.1 were com-puted using Cohen’s d, an expression of thedifference between mean scores in the two con-ditions, divided by the pooled standard devia-tion. Effect sizes for SWD-Es tended to be inthe medium or high ranges, while effect sizesfor SWODs tended to be in the small range.Collectively, results from the CAAVES studyindicate that measurement precision can be main-tained and scores can be improved through amodification process that keeps items connectedto grade level content standards.

Consortium for Modified AlternateAssessment Development andImplementation

As part of the CMAADI project, Elliott and col-leagues (2009) conducted a replication of theCAAVES study using original and modified setsof items from the Arizona Instrument to Measure

Standards. The participants included seventh-grade students representing two groups: SWODs(n = 106) and SWD-Es (n = 46). Each stu-dent completed two 20-item sets, which theyread silently to themselves in both conditions, ineither reading or mathematics. A MAZE mea-sure of reading was administered to all studentsin order to determine whether low reading abili-ties are a likely barrier that keeps SWD-Es fromaccessing achievement tests. The SWD-E groupcorrectly completed a significantly smaller num-ber of choices on the MAZE; the effect size of thedifference between the groups was in the mediumrange (d = 0.72).

The modification process in the CMAADIstudy appeared to yield better measurement forstudents who would be eligible for an AA-MAS(Elliott et al., 2009). Estimated coefficient alphafor a 40-item set improved for reading, although itremained below the acceptable range, and stayednearly equal in mathematics. Item-to-total corre-lations increased by an average of 0.11 in reading,but only increased by an average of 0.01 in math-ematics. A differential boost was observed inboth content areas, as the effect of the modifica-tion process was significantly larger for SWD-Esthan it was for SWODs in both reading and math-ematics. These findings provide strong supportfor the impact of the modification process onreading, while the results for mathematics weremixed.

Operationalizing Alternate Assessmentfor Science Inquiry Skills

The OAASIS project replicated the design of theCAAVES and CMAADI studies in the context

Page 255: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

13 Packages of Modifications 239

of a high school biology test. Three groups ofstudents – SWODs (n = 168), SWD-NEs (n =156), and SWD-Es (n = 76) – were recruitedacross three states. Each student completed two20-item sets of online multiple-choice items. Theitems were read silently by the students in theOriginal condition, but were read aloud by voice-over technology in the Modified condition. Itemsin the Modified condition contained, on aver-age, only 70% of the total number of wordsincluded in the Original condition. A MAZEmeasure was administered to a subset of students,with SWODs correctly completing a significantlylarger number of choices, compared to SWD-NEs(d = 0.66) and SWD-Es (d = 0.83). This findingreplicated the finding from the CMAADI study,indicating that SWD-Es as a group do not read asquickly or as accurately as do SWODs.

The modification process in the OAASISstudy yielded mixed results with regard tomeasurement precision for eligible students.Coefficient alpha for the two sets of items usedin the study differed greatly in the Original con-dition, but alpha was reduced for one set throughthe modification process and increased for theother set. The projected coefficient alphas forthe modified sets based on 60-item test forms(the median length of the general achievementtests in the participating states) were both slightlylower than the acceptable range. Boosts in totalscores for both SWD-Es and SWODs were sig-nificant, but the boost for SWD-Es was not signif-icantly larger than the boost for SWODs. Theseresults indicate that the scores of SWD-Es areincreased through the modification process, butthat measurement precision is not systematicallyincreased.

What Do We Know About ModifiedAchievement Tests?

The research completed on packages of modifi-cations used to make a test more accessible hasprovided test developers and special educationleaders a number of lessons:(a) The federal criteria for AA-MAS eligibility

identify a population of students that attains

lower scores on tests across content areas,and reads less quickly and less accurately,compared to the general population of stu-dents without disabilities.

(b) Packages of modifications designed toimprove the accessibility of tests resultin scores that are higher for both eligibleand non-eligible groups of students acrosscontent areas.

(c) This boost is sometimes significantly dif-ferent in favor of the eligible group, andis sometimes relatively equal across groups,but does not typically favor the non-eligiblegroups (i.e., data indicate modifications donot increase the performance gap).

(d) The reliability of AA-MASs for the eligiblegroup tends to be similar to the reliability ofthe general achievement test at the elemen-tary school grade band across content areasand in reading at higher grade bands. Thereliability estimates, however, tend to divergein mathematics and science at the middleand high school grade bands, with AA-MASsnot being as reliable as general achieve-ment tests and falling below the standard foracceptability.

(e) A package approach to modifications appearsto help improve the reliability of those read-ing tests which do not work well for SWD-Esin their original form; results are mixed inmathematics and science.

Collectively, these findings yield the conclu-sion that in the process of modifying items tomake them more accessible, less is often more.The modification process is typically aimed atreducing the content of an item, often resultingin items that have less words, less cognitive load,and less answer choices, and which likely takeless time for eligible students to complete. This isan important factor for a group of students whoread slowly and therefore likely take longer tocomplete examinations. The modified tests allowstudents to be more successful, as quantified byhigher scores, and to have an experience that ismore similar to the experience of students with-out disabilities completing a general assessment.The tests developed using packages of modifica-tions often have equal or greater measurement

Page 256: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

240 R.J. Kettler

precision for eligible students. Therefore, it issurely the case that when less is equal (in termsof measurement precision), and simpler (in termsof reading or cognitive load), and allows eligiblestudents to have a more typical testing experi-ence, less is better in the context of item and testaccessibility.

Future Needs

Research on packages of modifications madeto items and tests to improve accessibility hasonly been emphasized recently. The movementhas been inspired by the final regulations ofNCLB (U.S. Department of Education, 2007a,2007b), which have pushed states that often donot have funds necessary to purchase new poolsof items, to instead develop tests by improvingitems they already own. The findings are likelyto become more positive as ineffective modifi-cations are removed from consideration, yieldingmore effective packages of modifications overall.For example, exploratory analyses conducted aspart of the CAAVES and OAASIS studies haveindicated that adding graphics to reading tests orthat changing passages to bulleted lists on sci-ence tests may reduce the measurement precisionof these tests when used with the eligible popu-lation. Additional replications of the studies thathave been completed to date, as well as similarstudies on grade levels and content areas that havenot been addressed, are critical to developingmore accessible tests.

Subsequent studies on the effects of itemand test modifications should also incorporatemore information about the eligible populationand their experience during testing. One exam-ple of information that should be collected onthe eligible population is an indicator of workingmemory. The modifications made in the reviewedstudies were often aimed at reducing cognitiveload, based on the assumption that working mem-ory limitations are one barrier to success ontypical achievement tests. Test data on workingmemory should be gathered on students identi-fied as belonging to eligible groups, in order todetermine whether this assumption is true. An

example of data that should be collected on thetesting experience is the amount of time taken tocomplete forms in original versus modified con-ditions. While it is highly likely that the modifiedversions consisting of less words and less answerchoices take less time to complete, this shouldbe verified and the magnitude of the differenceshould be quantified.

ConclusionsTests developed using packages of modifica-tions can already make the testing experienceof students eligible for an AA-MAS moresimilar to the experience of students who takethe general examination. While this experi-ence does not appear to come at a cost interms of reliability in reading, mathematics, orscience at the elementary grade band, it mayresult in less precise measurement in math-ematics and science at the middle and highschool grade bands. Experimental researchfindings indicate that measurement precisioncan be maintained in these content areasand at these grade bands, and perhaps evenimproved. These findings are positive, andhold promise for greater improvement in thenear future, resulting in even better methodsthat allow all students to show their knowledgeand skills through accessible tests.

References

American Educational Research Association, AmericanPsychological Association, & National Council onMeasurement in Education. (1999). Standards for edu-cational and psychological testing. Washington, DC:Author.

Beddow, P. A. (in press). Beyond universal design:Accessibility theory to advance testing for all students.In M. Russell (Ed.), Assessing students in the margins:Challenges, strategies, and techniques. Charlotte, NC:Information Age Publishing.

Elliott, S. N., Kettler, R. J., Beddow, P. A., Kurz, A.,Compton, E., McGrath, D., et al. (2010). Effects ofusing modified items to test students with persis-tent academic difficulties. Exceptional Children, 76(4),475–495.

Elliott, S. N., & Roach, A. T. (2007). Alternateassessments of students with significant cognitivedisabilities: Alternative approaches, common technical

Page 257: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

13 Packages of Modifications 241

challenges. Applied Measurement in Education, 20(3),301–333.

Elliott, S. N., Rodriguez, M. C., Roach, A. T., Kettler, R.J., Beddow, P. A., & Kurz, A. (2009). AIMS EA 2009pilot study. Nashville, TN: Learning Sciences Institute,Vanderbilt University.

Fuchs, L. S., Fuchs, D., Eaton, S. B., Hamlett, C., Binkley,E., & Crouch, R. (2000). Using objective data sourcesto enhance teacher judgments about test accommoda-tions. Exceptional Children, 67(1), 67–81.

Kettler, R. J. (in press). Holding modified assessmentsaccountable: Applying a unified reliability and valid-ity framework to the development and evaluation ofAA-MASs. In M. Russell (Ed.), Assessing students inthe margins: Challenges, strategies, and techniques.Charlotte, NC: Information Age Publishing.

Kettler, R. J., & Elliott, S. N. (2010). Assessment accom-modations for children with special needs. In E. Baker,P. Peterson & B. McGaw (Eds.), International ency-clopedia of education (3rd ed., pp. 530–536). Oxford,UK: Elsevier Limited.

Kettler, R. J., Elliott, S. N., & Beddow, P. A. (2009).Modifying achievement test items: A theory-guidedand data-based approach for better measurement ofwhat students with disabilities know. Peabody Journalof Education, 84, 529–551.

Kettler, R. J., Rodriguez, M. R., Bolt, D. M., Elliot,S. N., Beddow, P. A., & Kurz, A. (in press).Modified multiple-choice items for alternate assess-ments: Reliability, difficulty, and differential boost.Applied Measurement in Education.

Koretz, D. M., & Hamilton, L. S. (2006). Testing foraccountability in K-12. In R. L. Brennan (Ed.),Educational measurement (4th ed., pp. 531–578).United States of America: American Council onEducation and Praeger Publishers.

Louisiana Department of Education. (2009a). iLEAP 2009technical summary. Baton Rouge, LA: Author.

Louisiana Department of Education. (2009b). 2009 LAA 2technical summary. Baton Rouge, LA: Author.

Louisiana Department of Education. (2009c). LEAPGEE 2009 technical summary. Baton Rouge, LA:Author.

Phillips, S. E., & Camara, W. J. (2006). Legal andethical issues. In R. L. Brennan (Ed.), Educationalmeasurement (4th ed., pp. 733–757). United States ofAmerica: American Council on Education and PraegerPublishers.

Poggio, A. J., Yang, X., Irwin, P. M., Glasnapp, D.R., & Poggio, J. P. (2006). Kansas assessments inreading and mathematics: Technical manual. TheUniversity of Kansas: Center for Educational Testingand Evaluation. Retrieved November 2, 2009, fromhttp://www.ksde.org

Roach, A. T., Beddow, P. A., Kurz, A., Kettler,R. J., & Elliott, S. N. (2010). Incorporating stu-dent input in developing alternate assessmentsbased on modified academic achievement standards.Exceptional Children, 77(1), 61–80.

Russell, M., & Famularo, L. (2009). Testing what stu-dents in the gap can do. Journal of Applied TestingTechnology, 9(4).

Smith, R. L., & Chard, L. (2007). A study of item for-mat and delivery mode from the California ModifiedAssessment (CMA) pilot test. Educational TestingService: Statistical Analysis and Research.

Texas Education Agency. (2009, updated May 7).Technical digest 2007–2008. Austin, TX: Author.Retrieved November 5, 2009, from http://www.tea.state.tx.us

Texas Student Assessment Program. (2007). Texas assess-ment of knowledge and skills-modified: Technicalreport. Austin, TX: Author. Retrieved November 2,2009, from http://www.tea.state.tx.us

U.S. Department of Education. (2007a). Modified aca-demic achievement standards: Non-regulatory guid-ance. Washington, DC: Author.

U.S. Department of Education. (2007b). Standards andassessments peer review guidance. Washington, DC:Author.

Page 258: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy
Page 259: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

14Including Student Voices inthe Design of More InclusiveAssessments

Andrew T. Roach and Peter A. Beddow

We hardly know anything about what students think about educational change because noone ever asks them. . ..The information is negligible as to what students think of specificinnovations that affect them. To say that students do not have feelings and opinions aboutthese matters is to say that they are objects, not humans (Fullan, 2001, pp. 182–189).

Due, in part, to changes in federal policy(e.g., Individuals with Disabilities Education Act[IDEA] and No Child Left Behind [NCLB] Act),the past two decades have seen a dramatic changein the number of students included in stateand district accountability systems. According tothe National Center on Educational Outcomes(NCEO), in the early 1990s most states reportedthat fewer than 10% of students with disabilitiesparticipated in their states’ large-scale assess-ment. By the year 2000, the average percentageof students with disabilities in the general assess-ment had risen to 84%, and by 2008 that numberhad risen to above 95% (i.e., the participation raterequired by NCLB).

There are a number of potentially positiveconsequences of increased participation in stateaccountability systems: (a) higher expectations

A.T. Roach (�)Department of Counseling and Psychological Services,Georgia State University, Atlanta, GA 30302, USAe-mail: [email protected]

Portions of this chapter appeared previously in Roach, A.T., Beddow, P. A., Kurz, A., Kettler, R. J., & Elliott, S. N.,(2010). Incorporating student input in developing alternateassessments based on modified achievement standards.Exceptional Children, 77, 61–80.

and improved performance in core academicareas; (b) increased access to the general grade-level curriculum; and (c) additional professionaldevelopment and material resources allocated toimproving the performance of students with dis-abilities.

State assessment data, however, suggest thatthere has been only modest movement toward“closing the gap” in proficiency between stu-dents with disabilities and their peers, and thedifferences remain very large – ranging from31.4 percentage points in elementary school read-ing to 39.7 points in middle school mathemat-ics (Albus, Thurlow, & Bremer, 2009). Surely,these numbers cannot tell the whole story ofhow inclusive accountability has influenced theeducational experiences of students with disabil-ities. More research is needed to “break openthe black box” of inclusive assessments andaccountability systems, and to understand theways in which implementing these programsinfluences students’ knowledge, perceptions, andbehaviors.

Concurrent with the movement to includestudents with disabilities in state tests, a num-ber of researchers have undertaken efforts toapply universal design principles to assess-ment development (Dolan, Hall, Banerjee, Chun,

243S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4_14, © Springer Science+Business Media, LLC 2011

Page 260: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

244 A.T. Roach and P.A. Beddow

& Strangman, 2005; Thompson, Johnstone,Anderson, & Miller, 2005). The goal of theseefforts has been to design accessible assessmenttasks and delivery systems that support, ratherthan inhibit, students’ ability to show what theyknow and can do.

In April 2007, the U.S. Department ofEducation (USDOE) revised regulations underthe No Child Left Behind Act and codifiedthe pursuit of accessible assessment strategiesby allowing states to develop alternate assess-ments based on modified academic achieve-ment standards (AA-MAS). According to theUSDOE Non-Regulatory Guidance (2007), fea-tures included in AA-MAS items should facilitatethe understanding of students with disabilities, orprovide background information and support inways that do not “compromise the validity andreliability of the test results” (p. 26).

In essence, test developers and policymakersexpect students’ experiences with, and cognitionswhile, completing accessibility-enhanced assess-ments to be different from what happens when thesame students take existing items on the generallarge-scale tests. Some information to supportthis assumption can be gathered from statisti-cal analyses of test results (e.g., differential itemfunctioning), but these methods can only providequantitative evidence to support test item devel-opment. To understand the effects of universaldesign features intended to enhance item accessi-bility and students’ performance, test developersand researchers must use a variety of methodsthat tap students’ cognitions, problem-solvingbehaviors, and opinions.

Asking for student perspectives and feed-back makes intuitive sense, if test developers’objective is to create the most effective anduseful assessment instrument. In other indus-tries, manufacturers regularly conduct consumer-focused evaluations to insure that their productsare viewed as meeting customers’ needs. Yet,as Fullan states in the epigraph to this chapter,we “hardly know anything about what studentsthink about” educational assessments becausetest developers, researchers, and policymakershave seldom paused to ask them.

Epistemological and MethodologicalFrameworks for Including StudentVoices

Integrating student voice in the development andvalidation of inclusive and accessible assess-ment strategies requires an expansion of theepistemological and methodological frameworksthat dominate traditional measurement research.Research in measurement and psychometricsgenerally has been conducted from a positivistepistemology, which values precision and stan-dardization in data collection, control of externaland contextual influences, and limiting researchersubjectivity. Because of this epistemologicalstance, measurement research emphasizes the useof quantitative methods (i.e., statistical analyses)with large samples of participants (Gallagher,2009).

Moss (1996) called for inclusion of researchmethods built on an interpretative frameworkin measurement research and practice. She sug-gested “(interpretative approaches) would enableus to support a wide range of sound assessmentpractices, including those less standardized formsof assessment that honor the purposes teachersand students bring to their work; (and) to theorizemore productively about the complex and subtleways that assessments work within the local con-texts in which they are developed and used. . . .”(p. 20). Moss contrasts interpretative methodswith a “naturalist” view of social science, whichposits that social scientists should use methodssimilar to those employed by biological and phys-ical scientists to study the natural world, and shesuggests a variety of reasons why interpretativeframeworks might be more appropriate for inves-tigating different measurement and assessmentprograms and strategies (e.g., including inclusiveand accessible assessments):1. Assessment data generally consist of symbolic

constructions – texts, products, performances,and actions – that reflect the behaviors, under-standings, and interpretations of test takersand users.

2. Measurement researchers must inquire aboutthe intentions and perspectives of test takers

Page 261: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

14 Including Student Voices in the Design of More Inclusive Assessments 245

and users because they cannot fully under-stand these from position of an outside, objec-tive (i.e., distant) observer.

3. Interpretations (e.g., test scores, performancedescriptions) that measurement researchersand test developers construct can be, and oftenare, reinterpreted by and subsequently influ-ence the behaviors and beliefs of the studentsand teachers they describe.

4. A primary goal of measurement researchshould be to understand meaning in the con-text in which it is produced and received.The National Research Council’s Knowing

What Students Know (Pellegrino, Chudowsky, &Glaser, 2001) also called for an expansion of the-oretical and methodological approaches to educa-tional assessment. Pellegrino and colleagues indi-cated that many widely used tests of educationalachievement were built on outdated psychometricand cognitive models that did not reflect the mostrecent advances in either field. Moreover, theysuggested assessments “should focus on makingstudents’ thinking visible to both their teachersand themselves so that instructional strategies canbe selected to support an appropriate course forfuture learning” (p. 4). Achieving this goal isparticularly relevant for students with (or at riskfor experiencing) learning difficulties. Researchmethods that elicit students’ written and ver-bal responses to tasks are necessary in order to“unpack” student cognition, identify supportiveas well as confusing tasks features, and producedata that support current and future instructionaldecision making.

In 2001, The Spencer Foundation fundeda series of conversations and ongoing collab-oration between prominent psychometric andsociocultural researchers on the topic of edu-cational assessment. The goal of this projectwas “two-fold: (a) to provoke examination ofthe influence of psychometrics on assessmentand, in turn, on concepts of learning and teach-ing, educational reform, fairness, and ‘scientifi-cally based evidence’; and (b) to imagine howthe processes of assessment might be betterserved or even reshaped through collaborative,cross-disciplinary efforts” (Moss, Pullin, Gee, &

Haertel, 2005). This group’s examination of howscholarship regarding situated and distributedcognition might influence assessment researchand design is particularly applicable to devel-opment and validation of inclusive and accessi-ble assessments. In essence, researchers work-ing from a situated/distributed framework viewcognition as “not locked privately inside indi-vidual heads” but spread across interactions withother individuals and with tools and technolo-gies. “What someone can do in one situationor with other people or tools and technologiescan be quite different than what a person can doalone or across a wide array of contexts” (p. 71).These questions about what students can do withdifferent levels and types of support are of theutmost importance in constructing more inclu-sive and accessible assessments. Unfortunately,traditional psychometric research is not wellsuited to investigating these sorts of context-bound behaviors; interpretative/qualitative meth-ods (e.g., interviews, participant observation) areneeded to develop “thick descriptions” of theeffects of accommodations, universal design fea-tures, and other supports.

Uses of Student Response Datain the Standards for Testing

Support for integrating student input in develop-ing and validating accessible assessments can befound in test standards as well as various profes-sions’ standards for ethics and professional prac-tice. For example, the Standards for Educationaland Psychological Testing (AERA, APA, &NCME, 1999) are intended to guide the develop-ment and validation of testing practices in educa-tion and psychology and also include a relativelycomprehensive overview of the rights and respon-sibilities of various stakeholder groups, includ-ing test developers and test users. The value ofinformation regarding student responses and per-ceptions in supporting the development of inclu-sive and accessible assessments is addressed atmultiple points in the Standards for Educationaland Psychological Testing.

Page 262: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

246 A.T. Roach and P.A. Beddow

Standard 10.3, in the chapter on testing indi-viduals with disabilities, indicates: “Where fea-sible, tests that have been modified for usewith individuals with disabilities should be pilottested on individuals who have similar disabil-ities to investigate the appropriateness and fea-sibility of the modifications” (p. 106). Becausepilot testing often occurs with a smaller sampleof participants, collecting information regardingstudent behaviors and cognitions during test-ing and their perceptions of assessment tasksmay be more manageable than during actualimplementation. Gathering student response dataduring pilot testing allows test developers toidentify items with features students perceive asconfusing. Identifying items and item featuresthat may unintentionally influence and inhibitthe performance of students with and/or with-out identified disabilities during pilot testingcan reduce the unnecessary costs required to“retrofit” test forms and procedures during “live”testing.

The Standards for Educational and Psycho-logical Testing suggest information about studentresponse processes and test-taking behaviors canprovide evidence to support the construct validityof an assessment. “Questioning test takers abouttheir performance strategies can yield evidencethat enriches the definition of a construct. . .”(AERA, APA, & NCME, 1999, p. 12). Whenattempting to develop accessible test items, stu-dent response data can provide important infor-mation about the reasons for observed differencesin performance across item types (unmodifiedvs modified) and student groups (students withand without identified disabilities). The use ofconcurrent think-aloud protocols and follow-upquestioning may allow researchers to “unpack”unexpected results. For example, differential itemfunctioning may indicate a particular item wasdifficult for students with identified disabilitiesin comparison to their peers. Recording students’concurrent verbalizations while solving the itemin question and questioning students followingcompletion of the task may illuminate item fea-tures that contributed to the observed results. TheStandards for Educational and PsychologicalTesting identify the latter as an important

potential contribution of student response data:“Process studies involving examinees from dif-ferent subgroups can assist in determining theextent to which capabilities irrelevant or ancillaryto the construct may be differentially influenc-ing (student) performance” (p. 12). This type ofevidence is central in the development of moreaccessible tests, where test items may includefeatures that are intended to reduce or elim-inate construct-irrelevant influences on studentoutcomes.

Test developers also may collect data aboutstudent perceptions to provide consequentialvalidity evidence. One desired outcome of theinclusion of universal design features in testdevelopment is that the resulting tests will bemore accessible and comprehensible, leadingto improved student motivation and sense ofefficacy. The Standards for Educational andPsychological Testing address this claim, indi-cating “Educational tests. . .may be advocated onthe grounds that their use will improve studentmotivation. . ..Where such claims are central tothe rationale of testing, the direct examinationof testing consequences necessarily assumes evengreater importance” (AERA, APA, & NCME,1999, p. 17). Follow-up questioning and sur-veys can provide important information about theinfluence of accessible tests on student motiva-tion and efficacy.

Ethics and Standards for ProfessionalPractice

Developing and implementing inclusive andaccessible assessments also could be conceptu-alized as a form of educational intervention. Inthis case, information about student experiencesand perceptions are essential evidence about theacceptability of these inclusive assessment strate-gies. Acceptability refers to an individual’s per-ceptions regarding the appropriateness, fairness,and reasonableness of an intervention (Kazdin,1981). Evaluating the acceptability of testingaccommodations, test item modifications, or theintegration of universal design features in testforms and delivery requires surveys or interviews

Page 263: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

14 Including Student Voices in the Design of More Inclusive Assessments 247

to understand the perceptions of students whoqualify for these inclusive assessments strategies.

As the professionals most likely to make deci-sions about the provision of inclusive assessmentstrategies (e.g., testing accommodations) andmore accessible alternate forms of assessment,special educators and school psychologists havean ethical responsibility to determine the accept-ability of these supports. In What Every SpecialEducator Must Know: Ethics, Standards, andGuidelines, the Council for Exceptional Children(CEC) emphasized the need to consider how stu-dents’ experiences, perceptions, and beliefs influ-ence their learning and academic performance.For example, CEC’s Initial Content Standard III(2008) states:

Special educators are active and resourceful inseeking to understand . . . individuals’ academicand social abilities, attitudes, values, interests, andcareer options. The understanding of these learningdifferences and their possible interactions providesthe foundation upon which special educators indi-vidualize instruction to provide meaningful andchallenging learning for individuals with excep-tional learning needs (p. 51).

Integrating information regarding students’ socialskills, perceptions, or occupational aspirationsinto decisions about their participation in large-scale testing calls for an expanded conceptual-ization of acceptability. Yet, it seems clear thatthese factors can have a dramatic influence onengagement and achievement on large-scale tests.For example, students with disabilities may lesslikely to view postsecondary education (or, insome cases, high school graduation) as feasible,leading to reduced motivation and effort on state-mandated high school graduation examinations.Within this context, educators’ efforts to provideaccommodations or to provide more accessibletest forms may not result in passing scores andbe (incorrectly) coded as ineffective because theacceptability and perceived utility of these effortswere never gauged with targeted students.

Similarly, school psychologists are expectedto collaborate with students to develop supportsand interventions like testing accommodationsand accessible assessments. According to theNational Association of School Psychologists’Principles for Professional Ethics:

“School psychologists discuss with studentsthe recommendations and plans for assistingthem. To the maximum extent appropriate, stu-dents are invited to participate in selecting andplanning interventions” (NASP, 2010, II 3.11).Currently, decisions about participation in stateassessment programs are made by students’ IEPteams. In practice, these decisions often aremade by special educators with limited consul-tation with school psychologists, family mem-bers, general educators, and students themselves.Although the ethics codes suggest that educa-tors have a responsibility to include students withdisabilities in decision making about how to beincluded in large-scale assessments, we unfortu-nately know very little about students’ ability toselect appropriate supports for taking and suc-ceeding on large-scale assessments. Additionalresearch is needed to determine if students cansuccessfully identify and use accommodationsand modifications that result in improved assess-ment performance.

Beyond these professional standards of prac-tice and ethics, many educational researchershave been influenced by the United NationsConvention on the Rights of the Child(UNCRC) Article 12 that indicates educators andresearchers should:

. . .Assure to the child who is capable of forminghis or her own views the right to express thoseviews in all matters affecting the child, the viewsof the child being given due weight in accordancewith the age and maturity of the child.

Indeed, UNCRC Article 12 may provide thestrongest endorsement for including student per-spectives and opinions in the development andvalidation of the inclusive and accessible assess-ment strategies as it clearly recognizes chil-dren and adolescents’ agency and right to self-determination.

Existing Assessment Research ThatIntegrates Student Voice

Several research groups have conducted exam-inations of student responses to testing, usingthree primary methodologies: student drawings

Page 264: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

248 A.T. Roach and P.A. Beddow

(e.g., Wheelock et al., 2000), surveys (e.g., Roachet al., 2010), and various interview strategies(Roderick & Engel, 2001; Triplett & Barksdale,2005; Roach et al., 2010; Johnstone, Liu,Altman, & Thurlow, 2007; Johnstone, Bottsford-Miller, & Thompson, 2006). Taken together, thisbody of research highlights the importance ofintegrating student voices in the design of assess-ments for students with a broad range of abilitiesand needs. In this section, we discuss the proce-dures and results of a set of recently publishedstudies of students’ perspectives on assessment.Our review focuses on two data collection meth-ods – student drawings and interviews – thatwe believe hold the most promise for provid-ing inclusive and accessible forums for students’voices to be heard and honored. We encour-age researchers who are interested in studentperspectives on inclusive assessments to con-sider the potential barriers to communicationthat may be inherent in some data collectionmethods (e.g., reading survey items, compos-ing open-ended written responses) in designingstudies.

Research Using Student Drawings: TwoExamples

To understand students’ experiences with andopinions of large-scale tests, Wheelock, Bebell,and Haney (2000) asked a sample of highschool students to respond by drawing their reac-tions after participating in the MassachusettsComprehensive Assessment System (MCAS).The MCAS is a series of paper-and-pencil testsconsisting of multiple-choice and constructedresponse items. The MCAS is considered a high-stakes assessment system in that students arerequired to pass the test in order to receive ahigh school diploma. At the time of the study,the MCAS required students to sit for up to13 one-hour test sessions in English/languagearts, mathematics, science and technology, andhistory/social studies. In all grades, the authorsreported the test times exceeded those recom-mended by the Massachusetts Department ofEducation.

The authors gathered participants by send-ing electronic mail to the Massachusetts teacheronline, listserve, asking for their help in a studyexamining student perceptions about the MCAS.They sent a follow-up e-mail to respondents,directing them to ask their students to follow asimple prompt, as follows: “Draw a picture ofyourself taking the MCAS.” The researchers sub-sequently received 411 student drawings from18 teachers in grades 4, 8, and 10, across eightschool districts. Drawings were disaggregatedby the type of community (i.e., urban, subur-ban, rural). They coded the drawings accordingto student postures, testing materials, and theinclusion of other students’ or teachers’ figures,as well as affective features (e.g., facial expres-sions that clearly denoted emotion). Additionally,the researchers collected comments written inspeech or thought bubbles and captions. Interrateragreement on data coding exceeded 90%. Thedrawings “provided a variegated picture of howstudents. . .view high stakes testing” (Wheelocket al., 2000, p. 6). The majority of drawingsdepicted the test event as a solitary experience(i.e., 70% of the drawings did not include otherfigures in the drawings). Nearly two-thirds of thedrawings contained indicators of students’ emo-tions or perceptions while taking the test. Ofthese, the frequency of students who indicated thetest was difficult was four times greater than thosewho indicated the test was easy. Some depicteda combination of hard and easy items. Questionmarks were depicted in 8.5% of student drawings;they were typically included in thought bubbles.In some, students pictured themselves asking forhelp from a teacher. For example, one drawingdepicted the student asking, “Who was Socrates?Who was Socrates? What kind of question isthat?”(Wheelock et al., 2000, p. 8). A greater per-centage of urban students reported the MCAS asdifficult (15.6%) when compared to rural students(5.7%).

A larger percentage of urban students (16.5%)depicted themselves taking a test that was “toolong” compared to suburban and rural students(0 and 3.3%, respectively). One eighth graderdrew a picture of herself with steam comingout of her ears as she sat before a test booklet

Page 265: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

14 Including Student Voices in the Design of More Inclusive Assessments 249

containing 6,021,000 pages. One fourth graderdrew herself taking an even longer test, con-taining 1,000,956,902 pages, which she labeled“Stinkn’ [sic] test.” Other students’ drawingsdepicted themselves thinking or saying thingssuch as “Five pages more,” “TO [sic] MUCHTESTING,” “Is it over yet?” and “Not MCASagain!” Several student drawings depicted thestudent feeling tired or rushed to complete thetest.

The researchers coded 13.4% of the drawingsas depicting the student experiencing anxiety.One student depicted himself thinking the testwas “nerve-wracking.” Others drew themselvespraying or wishing for help. Several drawingsshowed students who feared failing the test andhaving to go to summer school. Little discrep-ancies were observed in anxiety drawings acrossgrade levels or community types. Ten percent ofthe drawings portrayed students feeling anger.One drawing depicted the student marching onCity Hall. Another showed the student setting fireto the MCAS. Yet another depicted the studentthinking the test was designed to reveal what hehad failed to learn. Greater percentages of stu-dents in grades 8 and 10 (19.4%) depicted them-selves as angry compared to students in grade 4(6.6%). Four times as many urban students com-pared to rural students depicted themselves asangry.

Given the importance of test-taker motiva-tion, it was notable that many drawings (5.3%)portrayed the student languishing throughout thecourse of a day, daydreaming about things unre-lated to the test, or sleeping. Similarly, 3.9%of the drawings depicted students as relieved atthe conclusion of testing. Some student draw-ings, by contrast, contained positive depictions(18%). A greater percentage of students in grade4 (21.5%) depicted themselves as “thinking, solv-ing problems, confident, or working hard” (p. 9)compared to students in grades 8 and 10 (8.3%).Greater percentages of urban and suburban stu-dents (21.1 and 20.1%, respectively) comparedto rural students (9.7%) depicted themselves asdiligent.

Triplett and Barksdale (2005) conducted asimilar study in which they asked students in

grades 3 through 6 to “draw a picture aboutyour recent testing experience.” Students thenresponded in writing to the prompt “tell me aboutyour picture” (p. 237). Their rationale for thestudy was to discover “what they could learnabout the test milieu from the primary stakehold-ers – the children” (p. 237).

For the total sample, no drawings depicted thestudent smiling. The majority of the drawings(56%) depicted the student in isolation. Emotionswere depicted in 32% of the drawings, withnervous being the most frequent emotiondepicted or discussed. Students frequently indi-cated their nervousness was the result of timepressure, not being able to find the correctanswer, and failing the test. One student wrote,“I felt as if time was slipping through my fingers.I tried to stay calm....There was only 55 min tocomplete it. I thought I was a goner!”(p. 253).Another common emotion was anger, primarilyover the length of the test, its difficulty, social iso-lation during the test event, and the possibility offailure. Fifteen of the drawings included depic-tions of fire, including one student who noted,“The school is gonna burn, we’re saved!”(Triplett& Barksdale, 2005, p. 251). Question markswere also a common feature in the drawings,usually with the purpose of indicating confu-sion. One student drew a detailed battle scenebetween question marks and light bulbs, cen-tered around an answer sheet. He wrote, “I feltlike there was a war going on in my head.The light bulbs won and the question markslost!”(p. 250). Fourteen drawings included pos-itive statements, using the words happy, glad,or liked; in most cases, the positive statementswere connected with the completion or ter-mination of the test event. No student draw-ings contained positive statements about the testitself.

Research Using Student Interviews:Four Examples

A number of researchers have examined stu-dent perceptions of large-scale testing throughthe use of various interview strategies. Roderick

Page 266: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

250 A.T. Roach and P.A. Beddow

and Engel (2001) interviewed 102 low-achievingstudents in grades 6 and 8 about Chicago PublicSchools’ policies ending social promotion (i.e.,the implementation of standards-based tests toguide the determination of students’ passage tothe next grade). Roderick et al. found studentsmostly understood the main purposes of tests;indeed, some reported tests were useful andargued they should be used to determine whichstudents “deserve to be retained” (p. 197). Only4% of students interviewed, however, reported“not worrying” (p. 204) about passing the test.No students in the high-risk portion of the sam-ple reported they did not worry about the test. Bycontrast, 52% of students in the high-risk groupreported “Worrying a lot”; whereas, the percent-ages were lower for the moderate risk group(34%) and the low risk group (6%). Nine percentof the sample reported they spent time outside ofclass working on skills to help them pass the test.While a plurality of students for the total sample(53%) reported working hard in school, there waslarge variance across groups; 60% of the moder-ate and low-risk students reported working hard,while only 37% of the high-risk students reportedworking hard. Few differences were observedin any of the results across grades, genders, orrace/ethnicity.

Beyond the information on students’ opinionsof Chicago Public Schools’ retention policies,Roderick and Engel (2001) reported several stu-dent comments that provided some indicationof their perceptions of the accessibility of testsand test items. For instance, when asked if thesetests are hard or tricky for him, one studentresponded, “Hard. . .there’s a lot of hard stuff onthere that’s tricky – that you’ve got to know”(p. 209). Another student anticipated that the testitself probably was more difficult than the classwork leading up to it: “I got bad grades on [theclass work] and it’s too hard. If it’s going to behard just doing the class work, it’s going to bereal hard on the test” (p. 210). Another notedthe importance of teachers’ preparing studentsby providing practice tests: “(The teacher) playsaround saying she doesn’t want to see us againnext year, that it’s time for us to leave . . . . Shecares about all the children . . . . She shows us by

teaching us more stuff and giving us examples ofthe test”(p. 214).

Based on their analyses, Roderick and Engelconcluded creating incentives for low-achievingstudents through rewards and feedback mayimprove school effort (but not necessarily testscores). They also identified a group of studentsfor whom incentives appeared to be ineffective.They cited two primary reasons for this: (a) first,some students, even those who valued promo-tion, felt the goal was unattainable; (b) second,students with low motivation were less likelyto find support or encouragement in or out ofschool, facilitating the students’ maintenance oftheir present trajectories of performance.

A cognitive interview study conducted byresearchers at the National Center on EducationalOutcomes aimed to use student input to informthe development of “comprehensible and read-able” test items (Johnstone et al., 2007, p. 1).Students completed items in original or modi-fied form (e.g., decreased word count, simplifiedvocabulary, bold font for key words) while beingencouraged to think aloud and then participatedin videotaped interviews about their experiences.The authors found students did not perceive anydifference in item difficulty as a result of decreas-ing the number of words in stems and addingbold font to key words, although students pre-ferred the bold font. They reported vocabularyof item stems and answer choices was perceivedas important, particularly when non-construct-relevant words were included, as well as wordswith negative prefixes. It should be noted thatthe authors based their conclusions on a totalparticipant sample consisting of eight students.

A recent study conducted by researchersfrom the five-state Consortium for AlternateAssessment Validity and Experimental Studies(CAAVES) project examined the effects of itemmodification with students in grade 8. DiscoveryEducation Assessment provided grade 8 mathe-matics and reading items with extensive opera-tional histories. Assessment experts, content-areaspecialists, educators, and researchers then col-laborated to modify these items using a frame-work that was subsequently released as the TestAccessibility and Modification Inventory (TAMI;

Page 267: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

14 Including Student Voices in the Design of More Inclusive Assessments 251

Beddow, Kettler, & Elliott, 2008). The TAMIguides item writers to isolate the target constructof test items by reducing reading load of passagesand stimuli, clarifying the question or directivein item stems, simplifying visuals, and managingthe cognitive demands of item elements.

Following the item modification session, theCAAVES team conducted a cognitive interviewstudy with a small sample of eighth-grade stu-dents (n = 9). The purpose of the cognitiveinterviews was to examine how students pro-cess the kinds of test items that were to beincluded on the CAAVES field tests. Specifically,the investigators asked students to think aloudas they completed a set of eight test items ineither original or modified form (i.e., some stu-dents completed the original item while otherscompleted the modified form). Student responseswere audio- and video-recorded for transcriptionlater. These transcripts were analyzed with theaim of understanding student perceptions abouttest items and provoking consideration about fur-ther changes that may be necessary to the itemsto enhance their accessibility, including revertingto their original forms in some areas if resultsof modifications were determined to be subop-timal. The cognitive interviews revealed severalpatterns in student views. It should be noted thatstatistical analyses were not feasible due to thesmall sample size.

As noted previously, data were intended toprovide indicators of student views to facili-tate item development as opposed to yieldingempirical results that were generalizable to alarge population. First, students required fewerresearch prompts (e.g., “keep talking,” “tell mewhat you’re thinking”) when engaging the mod-ified items compared to the originals. Analysesof reading fluency indicated the students withoutdisabilities (SWODs) group read more fluentlythan the students with disabilities who would beeligible (SWD-Es) group (158.3 words correctper minute compared to 86.3 words correct perminute). Further, students across groups spentless time on the modified items compared to theoriginal items.

Most students in the SWD group indi-cated visuals were helpful and supported their

comprehension of reading passages and ques-tions. Most students in the SWOD, by contrast,reported the visuals made no difference. Moststudents in both groups reported visuals andgraphs were helpful in mathematics.

As per Rodriguez’s (2005) conclusion thatthree choices are optimal for multiple-choiceitems, each of the modified items contained threeanswer choices instead of the four choices foundin the original item. Students who completed themodified items were asked about which versionthey preferred. With but one exception, studentsin the SWD group perceived no difference inthe difficulty of the items due to the numberof answer choices. By contrast, the majority ofstudents in the SWOD group indicated threeanswer choices made the items easier. One stu-dent reflected on his preference, as follows: “Ifyou didn’t get the answer right the first time,you. . .only had three choices to go back and lookat. . .instead of four” (Roach et al., 2010, p. 10).Additionally, a number of the modified itemsused bold font to emphasize key terms. Whenasked to discuss their thoughts about this feature,the majority of students reported the bold fontwas helpful (though one student noted while thebold font helped him find the answer, it didn’tmake the reading passage easier to comprehend).

Feldman, Kim, and Elliott (in press) investi-gated students’ perspectives on testing, specifi-cally focusing on testing accommodations (whichremains the most commonly used strategy toincrease the accessibility of tests for students withspecial needs). Study participants were eighthgraders: 24 students with disabilities and 24 with-out a disability. Although SWDs reported similarlevels of pre-test anxiety, positive self-regard,and anxiety compared with SWODs, results indi-cated SWDs reported significantly lower testself-efficacy compared with their peers withoutdisabilities. SWDs who received testing accom-modations, however, showed decreases in post-test self-efficacy compared to students who didnot receive testing accommodations; however,this result was not observed for SWODs.

In addition, Feldman and colleagues (in press)found significant positive correlations betweenpre-test self-efficacy and test performance

Page 268: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

252 A.T. Roach and P.A. Beddow

(r = 0.34 for the accommodated conditionand r = 0.45 for the non-accommodated test).A significant negative correlation also wasobserved between pre-test anxiety and testperformance (r = – 0.33), but only for studentsin the non-accommodated condition.

Incorporating Student Voicesin Future Assessment Research

In our research on inclusive and accessibleassessment strategies, we have found it essentialto involve students with disabilities more directlyand actively in our work. Given limited researchon universally designed assessments, advances incognitive load theory and mental load analyses(Clark et al., 2006), and ongoing concerns aboutimproved accountability for students with dis-abilities, it seems logical and appropriate to invitestudents to be actively involved in the develop-ment and evaluation of more accessible tests. Theinvolvement of students is not required by policy,but we believe that it is an essential componentof assessment development, and that it will leadto more accessible items and tests. To that end,we proposed some “next steps” for research thatinclude the perspectives of students in supportof development of more accessible assessmentsystems.

There is a need for additional research thatilluminates the effects of stress and test anx-iety on the test performance of students withdisabilities. Nicaise (1995) defined test anxietyas an individual’s physiological, cognitive, andbehavioral responses that produce negative feel-ings in a test situation. An emerging body ofresearch suggests that students with LD expe-rience test anxiety at rates equal to or abovetheir peers. By comparing students’ test per-formance to self-reported test anxiety, Bryan,Sonnefeld, & Grabowski (1983) found that (a)students with LD reported more test anxietythan their peers; and (b) the level of reportedanxiety was a significant predictor of subse-quent achievement test scores. Using the TestAnxiety Inventory for Children and Adolescents(TAICA), Sena, Lowe, and Lee (2007) foundthat students with LD had higher scores than

their peers on two TAICA subscales (CognitiveObstruction/Inattention and Worry) that arebelieved to interfere with test performance,while also reporting lower scores on a thirdsubscale (Performance Enhancement/FacilitationAnxiety) that is related to improved achievement.These reports of increased levels of test anxietyon the part students with LD are troublesomebecause they represent both (a) a negative unin-tended consequence of large-scale testing (i.e.,increased emotional distress) for students withLD; and (b) a potential source of construct-irrelevant variance that undermines score-basedinferences regarding the achievement of studentswith LD (Saliva, Ysseldyke, & Bolt, 2007).Along with our efforts to create more accessibleassessments, research is needed to identify effec-tive interventions that will help students with LDmanage text anxiety so they can fully demon-strate their skills during high-stakes testing. Inaddition, investigations of the anxiety-reducingeffects of accessibility-enhancing test and itemfeatures would provide additional support fortheir utility and social validity.

Similarly, there is a need for additionalresearch on the effects of student effort and moti-vation on test performance. Wise and Cotton(2009) suggested valid interpretation of testresults requires “both a well-designed test andthe student’s willingness to respond with effortto that test. Without adequate effort, test perfor-mance is likely to suffer, resulting in the test scoreunderestimating the student’s actual level of per-formance” (p. 189). This is an area of concernfor the implementation of inclusive and accessi-ble assessments because lack of motivation andeffort can undermine the effects of test featuresintended to facilitate student performance. In fact,the effect of decreased effort can be quite large;Wise and Demars’s (2005) review of the researchsuggested the test of performance of under-motivated students is over half a standard devia-tion lower than their peers. Research on students’test effort and motivation generally has been con-ducted using self-report questionnaires that maybe prone to response bias, but computerized test-ing provides opportunities to collect additionaldata (e.g., time spent per item, random guess-ing) that can corroborate these reports. Test item

Page 269: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

14 Including Student Voices in the Design of More Inclusive Assessments 253

modifications are often created under the assump-tion that they will facilitate students’ efforts inresponding to inaccessible items that would bede-motivating in their unmodified form; researchis needed that provides support for this claim.

ConclusionsAccessible assessments are intended to facil-itate the inclusion of students with disabil-ities in testing programs and support thesestudents in demonstrating their knowledge.Therefore, it is essential to demonstrate therelationship of accessible assessments and stu-dents’ perceptions of, and experiences with,testing. The Standards for Educational andPsychological Tests (AERA, APA, & NCME,1999) support this idea, stating “certain educa-tional testing programs have been advocatedon the grounds that they would. . .clarify stu-dents’ understanding of the kind or level ofachievement they were expected to attain. Tothe extent that such claims enter into the jus-tification for a testing program, they becomepart of the argument for test use and shouldbe examined as part of the validation effort”(p. 23). Certainly, setting higher achievementstandards for students and expecting improvedefforts on the part of students (as well as edu-cators and parents) to meet these standards isa key component of theory of action underly-ing standards-based accountability programs.Understanding whether (and how) this claimholds for students with disabilities and Englishlanguage learners is important, but it has sel-dom been examined.

When students’ perspectives regardingassessments and modifications are not con-sidered, educators, policymakers, and testdevelopers may work from a “paternalistic”assumption – that is, “acting upon (our) ownidea of what’s best for another person withoutconsulting that other person” (Marchewaka,cited in Smart, 2001, p. 200). Although thereare some cases in which ascertaining studentperspectives and preferences would be diffi-cult (e.g., students with significant cognitivedisabilities with no reliable mode of com-munication), we believe most students with

and without disabilities are fully capable ofexpressing their opinions regarding the acces-sibility and acceptability of testing practices.

Many researchers and policymakers (inclu-ding the authors included in this book) areengaged in efforts to improve the qualityof assessments for students with disabili-ties and English language learners. Althoughthe development and implementation of moreaccessible tests may not result in improvedscores for every student, it appears test designpractices that affect educational prospects andplanning for many students can be improved.Given the omnipresence of high-stakes assess-ments in our nation’s schools, it is importantto give the students most likely to be affectedby such assessments a voice in the develop-ment process. We encourage educators andtest developers to include these data as part oftheir test development and validation efforts.

References

Albus, D., Thurlow, M., & Bremer, C. (2009). Achievingtransparency in the public reporting of 2006–2007 assessment results (Technical Report 53).Minneapolis, MN: University of Minnesota, NationalCenter on Educational Outcomes.

American Educational Research Association, AmericanPsychological Association, & National Council onMeasurement in Education. (1999). Standards for edu-cational and psychological testing. Washington, DC:Author.

Beddow, P. A., Kettler, R. J., & Elliott, S. N.(2008). Test accessibility and modification inventory(TAMI). Nashville, TN: Vanderbilt University; Boston:Houghton Mifflin.

Bryan, J. H., Sonnefeld, L. J., & Grabowski, B. (1983).The relationship between fear of failure and learn-ing disabilities. Learning Disabilities Quarterly, 6,217–222.

Council for Exceptional Children. (2008). What everyspecial educator must know: Ethics, standards, andguidelines (6th ed.). Arlington, VA: Author.

Dolan, R. P., Hall, T. E., Banerjee, M., Chun, E., &Strangman, N. (2005). Applying principles of univer-sal design to test delivery: The effect of computer-based read-aloud on test performance of high schoolstudents with learning disabilities. The Journal ofTechnology, Learning, and Assessment, 3(7). RetrievedNovember 7, 2010, from the http://www.jtla.orgdatabase

Elliott, S. N. (1986). Children’s ratings of the acceptabilityof classroom interventions for misbehavior: Findings

Page 270: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

254 A.T. Roach and P.A. Beddow

and methodological considerations. Journal of SchoolPsychology, 24, 23–35.

Feldman, E., Kim, J., & Elliott, S. N. (in press).The effects of accommodations on adolescents’ self-efficacy and test performance. Journal of SpecialEducation.

Fullan, M. (2001). The new meaning of educationalchange. New York: Teachers College Press.

Gallagher, M. (2009). Data collection and analysis.In E. K. Tisdal, J. M. Davis & M. Gallagher(Eds.), Researching with children and young people:Research design, methods, & analysis (pp. 65–88).Thousand Oaks, CA: Sage.

Johnstone, C., Liu, K., Altman, J., & Thurlow, M. (2007).Student think aloud reflections on comprehensibleand readable assessment items: Perspectives on whatdoes and does not make an item readable (TechnicalReport 48). Minneapolis, MN: National Center onEducational Outcomes.

Johnstone, C. J., Bottsford-Miller, N. A., & Thompson,S. J. (2006). Using the think aloud method (cognitivelabs) to evaluate test design for students with disabil-ities and English language learners (Technical Report44). Minneapolis, MN: University of Minnesota,National Center on Educational Outcomes.

Kazdin, A. E. (1981). Acceptability of child treatmenttechniques: The influence of treatment efficacy andadverse side effects. Behavior Therapy, 12, 493–506.

Moss, P. (1996). Enlarging the dialogue in educationalmeasurement: Voices from interpretive research tradi-tions. Educational Researcher, 25(1), 20–28.

Moss, P. A., Pullin, D., Gee, J. P., & Haertel, E. H.(2005). The idea of testing: Psychometric and socio-cultural perspectives. Measurement: InterdisciplinaryResearch and Perspectives, 3(2), 63–83.

National Association of School Psychologists. (2010).Principles for professional ethics. Bethesda, MD:Author.

Nicase, M. (1995). Treating text anxiety: A review ofthree approaches. Teacher Education and Practice, 11,65–81.

Pellegrino, J. W., Chudowsky, N., & Glaser, R. (2001).Knowing what students know: The science anddesign of educational assessment. National Research

Council, Division of Behavioral and Social Sciencesand Education, Committee on the Foundations ofAssessment. Washington, DC: National AcademyPress.

Roach, A. T., Beddow, P. A., Kurz, A., Kettler, R. J.,& Elliott, S. N. (2010). Incorporating student inputin developing alternate assessments based on mod-ified academic achievement standards. ExceptionalChildren, 77, 61–80.

Roderick, M., & Engel, M. (2001). The grasshopper andthe ant: Motivational responses of low-achieving stu-dents to high-stakes testing. Educational Evaluationand Policy Analysis, 23(3), 197.

Rodriguez, M. C. (2005). Three options are optimal formultiple-choice items: A meta-analysis of 80 yearsof research. Educational Measurement: Issues andPractice, 24(2), 3–13.

Saliva, J., Ysseldyke, J. E., & Bolt, S. (2007). Assessmentin Special and Inclusive Education. Boston: HoughtonMifflin.

Sena, W. J. D., Lowe, P. A., & Lee, S. W. (2007).Significant predictors of test anxiety among studentswith and without learning disabilities. Journal ofLearning Disabilities, 40, 360–376.

Smart, J. (2001). Disability, Society, and the Individual.Gaithersburg, MD: Aspen.

Thompson, S. J., Johnstone, C. J., Anderson, M. E.,& Miller, N. A. (2005). Considerations for thedevelopment and review of universally designedassessments (Technical Report 42). Minneapolis,MN: University of Minnesota, National Center onEducational Outcomes. Retrieved June 1, 2009,from the World Wide Web: http://education.umn.edu/NCEO/OnlinePubs/Technical42.htm

Triplett, C. F., & Barksdale, M. A. (2005). Third throughsixth graders’ perceptions of high-stakes testing.Journal of Literacy Research, 37(2), 237–260.

United States Department of Education. (2007, April).Modified academic achievement standards: Non-regulatory guidance. Washington, DC: Author.

Wheelock, A., Haney, W., & Bebell, D. (2000). Whatcan student drawings tell us about high-stakes testingin Massachusetts? The Teachers College Record, IDNumber: 10634.

Page 271: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

15Computerized Tests Sensitiveto Individual Needs

Michael Russell

The presence of computers in our schools createsvaluable opportunities to enhance educationaltesting. The number of computers in schools haschanged dramatically over the last three decades.In 1983, schools had, on average, only one com-puter for every 125 students. Today, that ratio hasdropped to one computer for every 3.8 studentsand nearly every school across the nation has ahigh-speed Internet connection (Bausel, 2008).In addition, an increasing number of schools areadopting 1:1 computing programs that provide alaptop for every student.

A decade ago, Bennett (1999) predicted thatcomputer-based testing would pass through threeevolutionary stages before reaching its full poten-tial. First, Bennett predicted that computers willbe used to increase the efficiency of testing.Second, multimedia will be integrated into teststo increase the authenticity of items and taskspresented to students. Finally, computers will beused to deliver tests anywhere and at any time,so that testing becomes more integrated withinstruction.

Today, many states are exploring or havebegun to transition their testing programs tocomputer (Bennett et al., 2008). In most cases,however, testing programs that have embracedcomputer-based testing have done so solely to

M. Russell (�)Boston College, Chestnut Hill, MA, USA; NimbleInnovation Lab, Measured Progress, Newton,MA 02458, USAe-mail: [email protected]

increase efficiency. Their goals are simple –improve the efficiency with which tests aredistributed, decrease the time required to scoremultiple-choice answers, and increase the speedwith which results are reported. While achievingthese goals saves time and money, they donot harvest the full benefits of computer-basedtesting.

This chapter focuses specifically on the bene-fits computer-based testing can bring to increas-ing accessibility, and thus test validity, by cus-tomizing the delivery of, interaction with, andresponse to test items based on the specific needsof each individual student. Given the capability ofcomputers to customize delivery and presentationof tests, this chapter also explores the emergenceof a new approach to framing accommodationsas adaptations designed to meet four categoriesof access needs. The opportunity to develop anaccess needs profile for each student and to thentailor the delivery of tests based on this profileis also discussed. Finally, this chapter shares evi-dence that is emerging from one testing programthat is capitalizing on these benefits.

Background

Over the past decade, several events have cre-ated a critical need for computer-based assess-ments that flexibly meet the accessibility andaccommodation needs of individual students.The Individuals with Disabilities EducationImprovement Act (IDEA) requires that students

255S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4_15, © Springer Science+Business Media, LLC 2011

Page 272: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

256 M. Russell

with disabilities and special needs receive appro-priate accommodations during instruction andthat they receive similar accommodations dur-ing testing. Universal Design for Learning (UDL)emphasizes the importance of designing curricu-lar materials in a way that increases access forstudents with a variety of needs. The impor-tance of careful design and consideration formultiple ways of accessing information by stu-dents with disabilities are at the foundation ofthe National Instructional Materials AccessibilityStandards (NIMAS, for more information seenimas.cast.org). Response to intervention (RTI)places strong emphasis on the collection andanalysis of data to examine the extent to whichspecific interventions are having a positive effecton a student. The No Child Left Behind (NCLB)Act places strong emphasis on assessment as anaccountability tool and requires students with dis-abilities and special needs participate in stateassessment programs. Finally, in the field of test-ing, there has been a move toward computer-based test (CBT) delivery, in part, to increasethe efficiency with which information is collectedand returned to educators.

Together, NCLB and IDEA require testingprograms to provide appropriate accommoda-tions for students during testing. Appropriateaccommodations vary with each individualstudent’s disability, but include such things asBraille copies of reading materials for studentswho are blind, the reading aloud of writtenmaterials for students with dyslexia or otherreading-related disabilities, the magnification ofmaterials for students with visual impairments,the use of tools that isolate (or mask) informa-tion on a page for students with informationprocessing disabilities, and the use of oversizedwriting materials or a keyboard for students withfine motor skill disabilities. In a testing situation,these types of accommodations are required sothat students are able to comprehend what atest question is asking of them and so that theycan better demonstrate their understanding oftested knowledge and concepts. Without accessto appropriate accommodations, students areplaced at a severe disadvantage in terms ofdemonstrating their achievement on the tests they

are required to take and, in many cases, mustpass to move to the next grade or to graduate.

Focusing specifically on state assessmentprograms, there is ample evidence that theseprograms struggle to provide appropriate acces-sibility and accommodations for students withdisabilities and special needs. For example, inFebruary of 2002, a Federal District Court placeda temporary injunction that effectively halted theadministration of California’s High School ExitExam to students with disabilities and specialneeds because there was compelling evidencethat appropriate accommodations were not beingprovided to all students with disabilities and spe-cial needs across the state (Chapman v. CaliforniaDepartment of Education).

Some states have written guidelines regardingthe roles and responsibilities of people who assistin the administration of accommodations (e.g.,readers, scribes, and sign language interpreters);however, there is great variability in both thebreadth and depth of these guidelines (Clapper,Morse, Lazarus, Thompson, & Thurlow, 2005a).A study that included a focus group with blindand visually impaired adults highlights theramifications of this (Landau, Russell, Gourgey,Erin, & Cowan, 2003). Study participants whopreviously had read aloud accommodationsprovided by a human reader pointed out severalproblems with this type of accommodation,which included: (a) the quality of the readersvaried widely; (b) readers occasionally mispro-nounced or misread words; (c) readers providedintentional as well as unintentional hints to thecorrect answer; and (d) participants were some-times reluctant to ask proctors to re-read partsof an item. In other words, while the studentswere provided with a read aloud accommodation,the accommodation itself was not delivered ina standardized or equitable manner, and likelydid not provide students with an appropriateopportunity to demonstrate their achievement.The problem of inappropriate accommodationsfor students with sensory disabilities was alsodocumented by Bowen and Ferrel (2003), whowrote, “few tests are valid for use with studentswith sensory disabilities, and the adaptationsmade by uninformed professionals can result in

Page 273: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

15 Computerized Tests Sensitive to Individual Needs 257

both over- and under-estimates of an individualstudent’s potential” (p. 10).

A 2003 Rhode Island Department ofEducation study also revealed difficultiesschools face in providing accommodations.For example, schools had problems providingsome of the most basic accommodations dueto the large number of students requiring themcombined with lack of space, equipment, andstaff. Observations and teacher surveys revealedthat some schools “bundle” accommodations forgroups of students rather than follow individualIEP recommendations. In particular, the accom-modations provided to the high school studentswere sometimes found to be “less than ideal”(Gibson, Haeberli, Glover, & Witter, 2003, p. 3).In addition, the study found significant differ-ences between the daily accommodations thatwere provided to students during instruction andthe accommodations that were available duringassessments. Students that were provided withcertain instructional accommodations such asone-on-one reading assistance or shorter assign-ments were not provided with comparable testingaccommodations. Furthermore, accommodationsfrequently recommended for instruction, suchas computers and other assistive devices, wererarely used during assessments.

In short, the struggle to provide appropri-ate and valid test accommodations results fromthree factors: (a) It is currently too expensiveto provide different versions (Braille, magnifiedversions, etc.) of paper-based test materials thatsatisfy the accommodation needs of their stu-dents; (b) it is too expensive to provide individualtest proctors who can read or sign a test; and (c)it is not possible for testing programs to provideaccommodations in a standardized manner or tomonitor the quality with which accommodationsare provided by schools. While issues of stan-dardization apply more to large-scale tests, theseissues are relevant for both classroom-based andstate assessment programs.

The fundamental problem encountered whenattempting to meet federal accommodationrequirements stems from the need to adapt stan-dard assessment materials and administrationprocedures used for most students who do not

need accommodations to meet the unique needsof students who do need accommodations. In thefixed medium of paper, meeting the accommoda-tion needs of students requires the developmentof multiple versions of test materials. Given thelimited space and personnel available in schools,meeting accommodation needs of students whoneed to work individually with a test proctor isoften not feasible.

The sections that follow explore the tremen-dous potential computer-based test delivery holdsfor resolving these issues.

Rethinking Test Accommodations

From a universal design perspective, instructionaland test accommodations are intended to sup-port each student’s access to instructional or testcontent, interactions with content, and responseto content. Accessing content requires informa-tion presented in a given form to be internal-ized by the student. Interactions with contentrequire students to process, assimilate, manipu-late, and/or interpret content that has been inter-nalized. Responding requires students to producean observable product that is the outcome of theirinteraction with content. During each of thesethree stages, the degree to which a variety ofconstructs operate within a student can interferewith the student’s ability to access, interact, andrespond in a manner that allows the student todemonstrate that construct.

Over the past 30 years, discussions aboutaccommodations have generally focused on thespecific method used to meet a need. Whilemethods are important, the essential aspect of agiven instructional or test accommodation is thespecific need that must be met to support thedevelopment of a construct (i.e., learning) or themeasurement of that construct (i.e., testing). Forexample, one commonly used instructional andtest accommodation is “read aloud.”

Read aloud is a method that presents anauditory representation of print-based content.Most often, the read aloud accommodation isassociated with meeting the need of a studentwho has difficulty accessing print-based text,

Page 274: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

258 M. Russell

either due to a visual impairment or a reading-related disability. The read aloud accommo-dation, however, can meet a variety of needs,including decoding text presented in a narra-tive form, visually perceiving text presented ina narrative form, visually perceiving informa-tion presented in graphical form, visually per-ceiving information presented in tabular form,processing information presented in tabular form,and pacing students as they access and processcontent.

Depending on which of these needs are beingmet, the way in which the read aloud accommo-dation is delivered may vary. For example, whenthe need that must be met focuses on decod-ing content presented in narrative form, a studentonly needs text presented in an auditory form,and does not need descriptions of graphics ortables presented in an auditory form. In con-trast, a student with a visual impairment mayrequire an auditory presentation of narrative text,graphics, and tables. While both students maybe said to have used a read aloud accommoda-tion, in reality the construct needs of each studentthat are being accommodated differ in importantways.

Categories of Accommodations

Traditionally, accommodations have been classi-fied into five categories, each of which capturesthe type of change made to instructional or testcontent, or the conditions under which instruc-tion is provided or a test instrument is adminis-tered. These five categories include changes in(a) presentation; (b) equipment and/or materials;(c) response methods; (d) schedule and timing;and (e) setting (Clapper et al., 2005a). Whenviewed from the perspectives of universal designand accessibility, accommodations can be reclas-sified into four categories: adapted presentation,adapted interactions, adapted response modes,and alternate representations.

Adapted presentation focuses on changes tothe way in which instructional or test con-tent is presented to a student. Examples ofadapted presentation include changing the font

size used to present text-based content, alteringthe contrast of text and images, increasing whitespace, and reducing the amount of content pre-sented on page. For paper-based tests, adaptedpresentation of test items often requires testdevelopers or teachers to create special versionsof these materials. As described in detail below, acomputer-based test delivery system could buildtools directly into the delivery interface thatallow students to alter the presentation of contentwithout requiring the development of additionalversions of test materials.

Adapted interactions focus on changes to theway in which students engage with test content.Examples of adapted interactions include assist-ing students with pacing, masking content, andscaffolding. For paper-based tests, adapted inter-actions often require students to work directlywith an adult and/or with additional materials,such as templates or masks. For a digital-contentdelivery system, adapted presentation and inter-action tools can be built into the delivery interfaceand do not require any additional information,versions, or formats for instructional content ora test item.

Adapted response modes focus on the methoda student uses to provide responses to instruc-tional activities or assessment tasks. Examplesof adapted response modes include producingtext orally to a scribe or using speech-to-textsoftware; pointing to answers or using a touchscreen instead of circling, clicking, or bubbling;and using assistive communication devices toproduce responses. For paper-based instructionalactivities and assessment tasks, adapted responsemodes may require students to interact with ascribe to produce a permanent record of theirresponse. For digital instructional activities andassessment tasks, a computer-based test deliv-ery system could allow students to use a vari-ety of assistive technologies connected to thecomputer (touch screen, single switch devices,alternate keyboards such as Intellikeys, speech-to-text software, eye-tracking software, etc.) thatenable students to produce responses. In somecases, however, additional methods that allow anexaminee to respond to text content may also berequired.

Page 275: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

15 Computerized Tests Sensitive to Individual Needs 259

The final aspect of accessibility focuses onalternate representations. As Mislevy and hiscolleagues (2010) explain, alternate representa-tions change the form in which instructional ortest content is presented to a student. Unlikeadapted presentations, which manipulate the wayin which the same content is presented toan examinee, alternate representations presentstudents with different versions of the testcontent. Reading content aloud; presenting text-based content in sign language, Braille; tactilerepresentations of graphical images; symbolicrepresentations of text-based information; narra-tive representations of chemical compounds (e.g.,“sodium chloride” instead of “NaCl”) or math-ematical formulas; and translating to a differentlanguage are all forms of alternate representa-tions. For paper-based instructional and test mate-rials, alternate representations often require thedevelopment of different versions or forms ofthe materials, or the use of translators or inter-preters who present altered representations to thestudent. As described below, a computer-basedtest delivery system could tailor which represen-tational forms are presented to a student based onhis or her individual need or preference.

Thinking about designing and implementingaccommodations using this four-category frame-work is helpful. First, by focusing on the specificneed, it is possible to develop more nuancedmethods and guidelines for meeting that need. Asdiscussed above, read aloud can be used to meetseveral different needs, but it can only do so ifthe information presented orally matches the stu-dent’s specific need. Reading alternate descrip-tions of graphics for a sighted student with areading-related disability does not meet his or herneed and may create distractions. Similarly, read-ing aloud text without providing oral descriptionsof graphics for a visually impaired student alsodoes not meet the student’s need. By focusing onthe specific need, specific instructions or meth-ods for meeting that need can be specified andimplemented with high degrees of integrity viacomputer.

Second, focusing on the specific need first,and then on potential methods for meetingthat need, simplifies the process of determining

whether the resulting accommodation may vio-late the measurement of a given construct. Insome cases, adapted presentation, alternate repre-sentations, and/or adapted response methods mayconflict with the valid measure of a construct. Forexample, the validity of inferences one makes,based on a set of science items that are designedto measure a student’s ability to read and inter-pret information presented in scientific nota-tion, would decrease if students were able toengage with representational forms that providednarrative descriptions of the scientific notation.Similarly, providing tactile representations foritems designed to measure spatial and visualiza-tion skills might decrease the validity of infer-ences based on the tactile representational formsof content contained in those items. However,when the adapted presentation or alternate repre-sentational form is independent of the constructbeing measured, there are a number of ways inwhich advances in technology may permit thedelivery of test items using alternate formats,without sacrificing the validity of the results.

In this section, we explore potential uses ofcomputer-based technology to meet a varietyof needs, including audio, signed, alternate lan-guage, and Braille access to knowledge repre-sentations. In addition, we explore methods foraltering the presentation of knowledge represen-tations through magnification, alternate contrast,and masking. Before doing so, however, it isimportant to recognize two additional advan-tages that computer-based technology provides,namely standardizing the provision of accessi-ble test instruments and monitoring the use ofaccessibility features.

As noted above, many accommodations pro-vided for paper-based tests require interactionsbetween the examinee and a test proctor. Forexample, a read aloud accommodation requires aproctor to read content to a student and a sign-ing accommodation requires a proctor to trans-late content to a signed representation and tothen present that signed representation to theexaminee. Similarly, students who require assis-tance in recording responses in the correct loca-tion, maintaining focus, or monitoring their pacemay have a proctor assist with these functions.

Page 276: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

260 M. Russell

In all of these cases, the interaction betweenthe examinee and a test proctor may resultin undesired interpretations of content, uninten-tional or intentional cues or hints, or undesiredassistance. Computer-based technology providesopportunities to eliminate these undesirable inter-actions and allows test developers to make carefuland thoughtful decisions about the exact mannerin which text is to be read or interpreted and pre-sented in sign. Test developers can also providemultiple methods for recording responses, allow-ing the student to select the method that allowsthem to record responses in an accurate manner.Finally, tools can be built into a delivery sys-tem to provide structured assistance with focusor pacing without introducing unintended assis-tance. While the end product of a well-designedcomputer-based test delivery system may provideeducators and students with greater flexibilityin the presentation, interaction, and response totest content, it also allows the way in whicheach option is provided to be standardized acrossstudents who make use of a given option andfor this standardization to be based on care-ful decisions made during the test developmentprocess.

A second challenge that results when accom-modations are provided on paper is accuratereporting of the provision of accommodations.For all paper-based tests, the provision of accom-modations is recorded by a test proctor oranother adult in the school. The degree to whichaccommodations are reported accurately, how-ever, varies widely. In addition, because the pro-visions of test accommodations for a paper-basedtest are reported at the test level, testing pro-grams do not have any information about theactual use of an accommodation for each item.For example, a student may be provided a readaloud accommodation, but may only ask to havetext read aloud for a subset of items on a test.A computer-based test delivery, however, allowsa testing program to collect accurate informationabout the use of each access tool or feature foreach individual item on a test. This level of detailholds potential to improve the accuracy of infor-mation about the provision of accommodations

while also allowing testing programs to con-duct detailed analyses about the items for whichaccessibility features were used more or lessfrequently. This level of detail allows test devel-opers to examine features of items that maycause confusion for some students or may inter-fere with the measurement of a given constructdue to construct-irrelevant elements of the item.Collectively, the improved accuracy with whichcomputer-based test delivery can provide anaccommodation holds potential to improve testvalidity, while the increased accuracy and levelof detail about the actual use of access tools holdpotential to assist test developers in improving thequality of test items.

Audio Access to KnowledgeRepresentations

For paper-based test administration, read aloudis one of the most frequently provided testaccommodation (Bielinski, Thurlow, Ysseldyke,Freidebach, & Freidebach, 2001). In essence,the read aloud accommodation presents test con-tent in an audio form. Depending upon the testadministration guidelines, readers are instructedto present only text-based information and todo so by reading text verbatim. For a mathe-matics test, these guidelines mean that narra-tive text associated with direction, prompts, oranswer options can be read, but any numbers,formulas, or equations cannot be read. Other pro-grams allow readers to present text, numbers,and mathematical and scientific nomenclature inaudio form. Still other programs allow alternatedescriptions of charts, diagrams, and other visualcontent to be provided in audio form. In practice,the read aloud accommodation is sometimes pro-vided individually to each student, while, in othercases, an audio representation is presented to asmall group of students.

These different guidelines and practices trans-late into many different forms of the readaloud accommodation. Importantly, each readaloud form meets a different and distinct need.Interestingly these multiple forms may partially

Page 277: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

15 Computerized Tests Sensitive to Individual Needs 261

explain the varied findings regarding the effectof read aloud accommodations on students’ testperformance (Sireci, Li, & Scarpati, 2003).

Needs Met by an Audio Presentation

In a test situation, it is important that each exam-inee understands the problem presented by anitem, the information provided by the item withwhich s/he is expected to work, and, for mul-tiple choice items, the responses from whichs/he is expected to select an answer. Presentingthis information in a text-based representationcan present challenges for some students withdisabilities and special needs.

For example, students with dyslexia or otherdisabilities related to processing text may expe-rience difficulty decoding information presentedin a text-based form. For these students, readingtext aloud reduces challenges to accessing infor-mation that result from difficulties decoding text.Some students with dyscalculia may struggle pro-cessing numbers and mathematical expressionspresented in print-based form. For these students,audio representations that accompany the print-based form can help them to accurately internal-ize information presented in numerical form orusing mathematical nomenclature.

Other students with vision-related needs havedifficulty perceiving information presented inprint-based form. Depending on the level of thevision need, these students may have difficultyviewing information presented in narrative, tab-ular, or graphic form. To enable these studentsto access item content, all text-based informa-tion may need to be presented in audio formand descriptions of graphics and tables may alsobe needed. Other students who experience diffi-culty processing information presented in tabularor graphical form may also benefit from audiodescriptions of tables, charts, and diagrams.

Still other students who are developing theirEnglish language skills may have difficulty rec-ognizing words presented in text-based form.These students may also benefit from having text-based information read aloud. Finally, students

who experience difficulty pacing themselves asthey perform a test may also benefit from havinga proctor pace them through the test by readingcontent aloud. While these students may not havedifficulty accessing the item content, readingcontent aloud may assist them in maintaining apace that prevents them from becoming distractedor from working too rapidly through test content.

Clearly, some of these needs overlap. Forexample, students with reading-related or vision-related needs, both benefit from having text-basedinformation presented in audio form. However,there are also important differences among theseneeds that prevent a single method of providingread aloud support from meeting each of thesespecific needs. For example, providing descrip-tions of charts or diagrams for students with areading-related need may be distracting if theyare able to access and process information pre-sented in graphical form. However, these stu-dents may benefit from having text that appearsin charts or diagrams read to them without amore general description of the image itself.Conversely, providing only audio representationsof text contained in an image may be insuffi-cient for enabling a student with a vision-relatedneed to access the concept(s), relationships, orinformation presented in that image.

Similarly, presenting text contained in eachcell of a table in audio form may be sufficientfor a student with a reading-related disability orwho is developing fluency in English. But, for astudent with information processing needs, whohas difficulty seeing relationships among infor-mation presented in tabular form, more detaileddescriptions of the table and its content may berequired. For example, consider the table pre-sented in Fig. 15.1. A student with a reading-related disability may benefit from having thecolumn labels and the student names read aloudverbatim (“Student,” “Friday,” “Bill,” etc.), butmay not need numbers read aloud. Conversely, astudent with dyscalculia may benefit from hav-ing the numbers contained in the table read aloud(“one hundred three,” “one hundred fifty-seven,”“ninety-eight,” etc.), but may not need the col-umn labels or student names presented in audio

Page 278: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

262 M. Russell

Student Friday Saturday Sunday

Bill 103 157 98

Mary 118 198 134

Steve 87 134 113

Jane 178 243 54

How many more apples did Mary pick on Saturday than on Sunday?Number of Apples Picked

Fig. 15.1 Table Associatedwith an Item

form. Meanwhile, a student with a vision-relatedor information processing–related need may ben-efit from a fuller description of the informa-tion presented in the table. For example, ratherthan simply reading the contents of each cell(“Friday,” “Bill,” “one hundred three,” etc.), therelationships represented through the table maybe presented (“Bill Picked one hundred threeapples on Friday,” “Bill picked one hundred fifty-seven apples on Saturday,” etc.). Presenting infor-mation in this manner may enable these studentsto both access the individual words and numbers,as well as the relationships among each of thoseelements of the item’s content. Finally, for astudent with a low vision need, additional infor-mation about the table that orients the studentto the table’s purpose and design may need tobe presented in audio form before presenting theactual information and relationships containedin the table. For example, the student might bepresented with the following audio overview ofthe table: “This table is titled ‘number of applespicked.’ The table contains four columns andfour rows of data. The first column is labeledStudent and contains the names of four stu-dents who include Bill, Mary, Steve, and Jane.The three remaining columns are labeled Friday,Saturday, and Sunday. Each of these columnscontains information about the number of applespicked by each student on each day.” Once ori-ented to the table’s design and general contents,these students may then be presented with audiodescriptions that present the relationships amonginformation presented in each cell of the table.

The audio representations used to meet eachof these needs differ noticeably with respect to

the content that is presented to each student. Asnoted above, presenting a student with an audiorepresentation that is not aligned with the stu-dents’ need may result in either distracting thestudent with unnecessary information or provid-ing students with inadequate access to the contentwith which the student is expected to work. Forthis reason, it is important to align the informa-tion presented in audio form with the specificneed for a given student. As will be discussedin greater detail next, it is also important forthe test developer to carefully consider the con-struct being measured by an item and considerwhether the audio representation provided foran item interferes with the measurement of thatconstruct. For example, if an item containing atable is intended to measure a student’s abilityto read and interpret information presented intabular form, providing an audio representationthat describes the relationships among informa-tion presented in the table will interfere with themeasurement of the intended construct. However,if the item is measuring a student’s ability toperform another mathematical function, such ascalculating a mean, using the information pre-sented in the table, an audio representation willlikely not interfere with the measurement of theintended construct.

In a computer-based environment, matchingaudio-based representations of item content withindividual student needs requires two importantsteps. First, the full range of audio representa-tions for each element of an item’s content mustbe developed for the item. For example, for theitem depicted in Fig. 15.1, audio-based represen-tations must be generated for each narrative and

Page 279: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

15 Computerized Tests Sensitive to Individual Needs 263

numerical element. In addition, an audio descrip-tion that provides an overview of the table asso-ciated with the item must be generated. Finally,audio representations that describe the relation-ships among information presented in the tablecells must be generated.

Second, once these multiple audio represen-tations are established, the test delivery systemmust be able to tailor the presentation of the audiorepresentations to the specific need of each stu-dent. For example, for a student with a reading-related disability, the system may limit the audiopresentations to only words presented in text-based form. For a student with dyscalculia, thesystem may limit audio presentations to only thenumbers contained in the table. For a studentwith information processing needs, the systemmay limit the audio presentations to descriptionsthat capture the relationships among informationcontained in the table cells. Finally, for a stu-dent with vision-related needs, all forms of audiorepresentations may be presented.

To accomplish this tailored audio contentdelivery, a test delivery system may require usersto establish an access needs profile that speci-fies the specific needs that must be met whilethe student interacts with item content. The pro-file would then drive the representational formthat is available for each student as each item ispresented to the student.

Signed Access to KnowledgeRepresentation

Students who communicate in sign language mayread below grade level. For items that mea-sure constructs unrelated to reading and containinformation presented in narrative form, read-ing skills may interfere with the student’s abilityto access item content. As described above, theread aloud accommodation is intended to reducethis challenge by presenting text-based informa-tion in audio form. For students who are deafor hard of hearing, however, audio-based repre-sentations are insufficient for providing access totext-based information. For these students, signedrepresentations are required.

Similar to read aloud, there are multiple needsto consider when creating signed representationsof test content. Of primary importance is thesigned representation itself. Depending on a stu-dent’s prior experience, content might be pre-sented in American Sign Language (ASL) or asSigned English. ASL and Signed English bothemploy hand gestures and symbols to representwords, phrases, and expressions. American SignLanguage, however, is a unique form of commu-nication that has its own set of rules for grammarand often employs word order that differs notice-ably from spoken English. In contrast, SignedEnglish employs the same grammatical rules asspoken English and effectively represents wordsthrough hand gestures. Although most peoplewho communicate in ASL can also communicatein Signed English, people who communicate inSigned English often are not familiar with ASL.

In addition to the form of sign that isemployed, it is important to consider whethera student also relies on “mouthing” and/or par-tial auditory input to access content. Mouthingis the movement of the mouth to either say ormimic the pronunciation of words as they aresigned. For students who developed oral languageskills prior to developing a hearing-related need,mouthing may assist in interpreting or processinginformation presented in sign. Similarly, partialauditory input occurs when a person simultane-ously presents information in sign and speaksthat information aloud. Again, for students whocommunicate in sign and have partial hearing,auditory input may assist in accessing and pro-cessing information presented in sign. Finally,while both ASL and Signed English employ avariety of hand gestures to represent knowledge,facial expressions and body movement are oftenemployed to provide supplementary informationsuch as emotion, dialogue, and time.

To best meet the access needs of students whocommunicate in sign and read below grade level,it is important to match the specific way in whicha signed representation is provided to the methodwith which the student is accustomed to commu-nicating. When a student works individually withan interpreter, this matching can be accomplishedby assigning an interpreter that is accustomed to

Page 280: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

264 M. Russell

communicating information in the same mannerwith which the student is familiar. Research, how-ever, provides evidence that the use of humaninterpreters presents several challenges to theprovision of a signed accommodation and for testvalidity.

For example, the use of assistants who varyin their signing ability and in their familiaritywith the test content results in the inequitableprovision of the signing accommodation to stu-dents across a testing program. Johnson, Kimball,and Brown (2001) found that many interpreterswere unfamiliar with mathematics vocabulary,especially at the higher grades. In turn, the vari-ability in proficiency of the signing assistants islikely to have a negative effect on test validityand the comparability of test scores for studentswho use an access assistant (Clapper, Morse,Thompson, & Thurlow, 2005b).

One strategy employed by a few states, suchas South Carolina and Massachusetts, that over-comes the problems presented by the use ofsigning assistants is the provision of the sign-ing accommodation in a videotaped format. Forthis form of the accommodation, a single signeris used to present the test content to all students.Prior to producing a videotaped recording of thesigner, the testing program works with the signerand other experts familiar with the signing of thetest content to assure that the material is presentedaccurately and clearly. A single recording is thenmade and distributed to all students requiring asigning accommodation. In turn, the provisionof high-quality signing is standardized across theentire testing program.

While standardization is highly desirable for astandardized testing program, the use of video-taped recordings are unsatisfactory for fourreasons. First, it creates a physical separationbetween the test content and the presentation ofthat content. In most cases, the recorded accom-modation is played on a television screen thatsits in front of the classroom while the studentworks on a test booklet on his desk. This physicalseparation makes it difficult for students to movebetween the test item presented on paper and thesigning of the text. Second, the use of a VHS orDVD machine to play the signed version requires

a teacher or the student to use a remote controldevice to reverse or fast-forward through the tapein order to replay specific portions of the tape.This inefficiency increases the time of testing andcauses the focus to move away from the testedcontent onto the use of the controller. Third,in some cases, the video is played to a groupof students simultaneously, requiring individualstudents to make a request in front of their peersto have a portion of video replayed and to thenforce all students in the room to re-watch that sec-tion. Fourth, when played for a group of students,the distance between the video and the studentcan make it difficult for students to clearly viewthe signed delivery.

Computer-based delivery of a test, however,holds promise to overcome problems associatedwith providing the signing accommodation eitherby a human or through a VHS or DVD machinedisplayed on a television screen. By embeddingsigned video directly in a computer-based test,the video and the test content can be presentedin a consistent manner to all students and thesigned representation can be thoroughly reviewedand approved prior to the day of testing. Asshown in Fig. 15.2, embedding signed video in acomputer-based test also allows the examinee toview a signed presentation of test content in veryclose proximity to each other and to the student(e.g., the video can be displayed within inchesof the text-based representation of the item).Students can also be provided with individual-ized control over the size of the video displayedon their computer screen. Segments of video canbe linked to blocks of text or portions of an item(e.g., each answer option) such that a studentcan click on the text and the associated video isplayed automatically, thus eliminating the need touse a controller to fast-forward or reverse throughvideo. Finally, because the video is played oneach individual computer, students may view por-tions of a video as many times as needed withoutaffecting the test experience of other students.

Computer-based delivery of signed accommo-dations also provides an opportunity to employsigning avatars to present the signed accommo-dation. Avatars are human-like digital figures thatcan be programmed to move like a human. Using

Page 281: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

15 Computerized Tests Sensitive to Individual Needs 265

Fig. 15.2 Signedpresentation of test item by arecorded human and an avatar

avatars provides opportunities for students tocustomize specific aspects of the signed accom-modation. Avatars base their movements on ascript. A single script can be used to make mul-tiple avatar characters move in exactly the samemanner. This, then, allows students to select thecharacter they like best from among a set of char-acters. Thus, a student could select a characterthat is male or female, short or tall, brown hairedor blond haired, dark skinned or light skinned,and so on. Giving students the choice of character

may help increase their connection with the char-acter and in turn their motivation during testing.Similarly, students could also control the back-ground color to optimize the contrast between theavatar and the background so that they can bestview signed presentation of the test content.

In addition, students can be given controlover specific features of the signed presentation.For example, a student who reads lips whilecommunicating via Signed English could activatemouthing, which allows the avatar’s lips to move

Page 282: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

266 M. Russell

in a manner that imitates the speaking of words.Similarly, a student with partial hearing who bothviews Signed English and obtains verbal cuescould activate sound that works in conjunctionwith the signing. Finally, a student who is equallyfluent in ASL and Signed English could switchbetween the two at any time during testing. Eachof these features of avatars may help studentstailor the provision of signing accommodationsto meet their specific individualized needs.

Alternate Language Accessto Knowledge Representation

As noted above, audio presentation can assistsome students who are developing English lan-guage proficiency access text-based content. Insome cases, however, the use of unfamiliar terms,phrases, or expressions are difficult for these stu-dents to access even when presented in audioform. In these cases, item content may need tobe presented in an alternate language.

Like read aloud and signing, alternate lan-guage presentation of content may take differentforms. One option is to allow students to accesstranslations of individual words or phrases. Onpaper, this typically occurs by allowing a studentto use an English-to-Heritage Language dictio-nary to look up words. One shortcoming withthis strategy is that many words have multiplemeanings and the burden is placed on the stu-dent to select the appropriate translation giventhe context in which the word is being used.On computer, key words in an item that maybe unfamiliar to a student might be linked usinghyperlinks to the appropriate translated word,given the context in which the word or phraseis used. This computer-based method may beadvantageous for two reasons. First, it greatlyreduces the amount of time a student would other-wise spend searching for a word using a Heritagelanguage dictionary. Second, it assures that thetranslated word appropriately conveys the samemeaning given the context in which the word isused.

A second strategy is to provide students accessto a translated version of the entire item. On

paper, this can be cumbersome because multipleversions of a test must be created, distributed, andmatched to a student. On computer, however, astudent needs profile can be used to connect thestudent to the appropriate version. In addition,the student can be provided the option to switchbetween the original language and translated ver-sion of an item.

When presenting translated versions of itemson computer, it is important to consider whetherthe student may also benefit by having other ele-ments of the test delivery interface presented ina translated version. These elements may includenavigation buttons, menu options, and directions.

Tactile Access to KnowledgeRepresentation

Students who are blind or have low vision maybenefit by accessing information in tactile form.For text-based information, content might be pre-sented in Braille for students who are Braillereaders. While some graphical content might beconveyed through audio descriptions, other con-tent may be difficult to describe accurately. Insuch cases, tactile representations may providegreater access.

For paper-based tests, both Braille versions oftext-based content and tactile versions of graphicsare commonly provided. Computer-based tech-nologies, however, also allow Braille represen-tations of text-based content to be presentedusing refreshable Braille displays. Similarly,recent advances in peripheral devices also pro-vide opportunities to present graphics in tactileform with supplemental information presentedin audio form. Specifically, the Talking TactileTablet (TTT) allows students to place sheetscontaining tactile representations of images ontoa touch-sensitive screen (Landau et al., 2003).While exploring the tactile representation, thestudent can press various features of the imageand the computer can provide information aboutthat feature in audio form. For example, Fig. 15.3displays an item containing a graphic in its orig-inal form and in its tactile form. When usingthe tactile representation on the TTT, the student

Page 283: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

15 Computerized Tests Sensitive to Individual Needs 267

Fig. 15.3 Original and tactilerepresentation of image

may press on the top most angle of the figureand the computer could play an audio file thatsays, “Angle A, 45 degrees.” In this way, regard-less of whether the student is a Braille reader, hecould use his sense of touch to explore the imagewhile also accessing all text-based informationintended to provide additional information aboutthe image.

Adapted Presentation of Item Content

The preceding sections focused on providingalternate representations of item content. Thissection focuses on altering the presentation ofcontent to make it more accessible. When focus-ing on adapted presentation, it is important torecognize that the representational form of thecontent is not being changed. Instead, the pre-sentation of a given representational forms isbeing altered to meet a given need. In manycases, the needs met by altering presentationrelate to vision or information processing needs.In this section, three types of adapted presentationwill be explored: magnification, altered contrast,and masking. Similar to alternate representations,however, within each category of adapted pre-sentation, there are multiple options that mustbe considered in order to meet each student’sspecific need.

Magnification. Students with vision-relatedneeds may benefit from working with larger ver-sions of text and graphics. On paper, this istypically accomplished by providing a large-printcopy of test materials that presents text in a

larger font and larger versions of images. Analternate approach is to allow students to viewcontent using a magnifying glass. On computer, atest delivery system could provide students withseveral methods for magnifying content. Onemethod is to allow students to adjust the sizeof the font used to present text-based content.While this method provides student flexibilityin determining the font size, there are potentialproblems that occur when one relies on changingfont sizes to increase access for low-vision stu-dents. First, changing font size often does notaffect text that appears in charts, diagrams, andother images. As a result, students may experi-ence difficulty accessing this text. Second, chang-ing font size often alters the layout of text. Thisis particularly problematic when an item refersto text in a specific line of a passage or whentext is presented in tables. Third, mathematicaland scientific nomenclature is often presentedusing embedded images, the size of which arenot affected by changes in font size. Similarly,graphical content is also not affected by font sizechanges.

Another method for increasing magnificationis to provide students with a digital magnifyingglass. Like its physical counterpart, a digital mag-nifying glass could be placed over any contentdisplayed on the screen to render a larger image.A digital magnifying glass might also be resizedto cover a larger portion of the screen and/orits magnification level could be adjusted. Unlikeadjusting font size, a digital magnifier allows allcontent to be enlarged. However, for a studentwho requires all content to be magnified at all

Page 284: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

268 M. Russell

times while taking a test, a digital magnifier maynot be a good option because the magnifyingglass would need to be repositioned repeatedly toview all content.

An alternative to a digital magnifying glass isto magnify the entire testing interface. For low-vision students, this is advantageous for severalreasons. First, rather than manipulating a magni-fying glass, they are able to focus on a single areaof the screen and move content into that area forviewing. Second, magnifying the entire interfaceprovides the same level of access to all ele-ments of a test item as well as navigation buttonsand menu options. Third, a significantly largeramount of magnification can be provided becausethe entire screen real estate is used to display con-tent. One downside of enlarging the entire inter-face, however, is that content may be pushed offscreen, so methods that allow students to easilymove content on- and off-screen are required.

In each case, the method used and the levelof magnification should be matched with the stu-dent’s individual need. Like read aloud, this canbe accomplished by establishing a user profilethat defines the specific need for each student, andthen tailors the tools and environment to thoseneeds.

Alternate Contrast. Students with vision needsand reading-related needs may benefit from pre-senting content with alternate contrasts. Forexample, students with low vision may be ableto better perceive content when contrast isincreased. This can be accomplished by present-ing text using a yellow font on a black back-ground. As a second example, some students withIrlen’s Syndrome experience difficulty perceiv-ing text. While the cause of Irlen’s Syndromeis unknown, individuals with the syndrome seewords that are blurry or that appear to move.Placing a colored overlay on top of text helpsstabilize and improve clarity of text.

There are three primary methods for alter-ing contrast. First, the foreground and back-ground colors of text and images can be altered.Yellow text on a black background is one com-bination commonly used to increase contrast.Several other color combinations, however, areeffective for some students. Second, reverse

contrast changes the color pattern with whichcontent is presented. For black text on a whitebackground, reverse contrast presents white texton a black background. For other colors, reversecontrast changes the color to its counterpart ofthe opposite end of the color spectrum. For exam-ple, content presented with a yellow hue would bepresented with a blue hue. Third, a color overlaycould be placed over all content to change thetint with which content is displayed. For exam-ple, placing a red overlay over white text andblue-colored images would make all text appearwith a pink hue and images with a purple hue.While altering contrast is effective for meetingmany types of access needs, altering color or con-trast must be done with care when items containcolored images and these colors are referencedin the item prompt or options. For example, anitem that refers to a specific colored section in apie chart, with each section colored differently,might become problematic if a colored overlayor reverse contrast alters the colors used in thepie chart. Thus, like magnification, the specificmethod and color combinations used to improveaccess for each student should be matched to thestudent’s needs profile.

Masking. Some students with information pro-cessing needs may benefit by limiting the amountof information that is presented to them at a giventime (Bahar & Hansell, 2000; Messick, 1978;Riding, 1997; Tinajaro and Paramo, 1998; WitkinMoore, Goodenough, & Cox, 1977; Richardson& Turner, 2000). One technique for limiting theamount of presented information is masking. Onpaper, masking involves placing an object overcontent that is not the current focus of the stu-dent. In some cases, sticky notes, paper, or a handis used to mask content. Masking templates arealso available and allow students to create a framethrough which content is viewed.

On a computer, digital tools can be used tomask content and the delivery of items can bedesigned to present blocks of information in apredefined manner. For example, some studentsbenefit by reading a prompt and working on aproblem before they view answer options fora multiple-choice item. A computer-based testdelivery interface can support this need in a

Page 285: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

15 Computerized Tests Sensitive to Individual Needs 269

couple of ways. First, an answer-making toolcould block response options until the studentopts to reveal them. In essence, when a studentfirst views the item, the prompt is shown, buta solid box is placed over each answer option.At the student’s choosing, each answer mask canbe toggled on or off to reveal or obscure thatoption. Alternatively, a delivery interface couldpresent the student with the item prompt andthen require the student to enter an open-endedresponse before revealing the multiple-answeroptions. While this method is more prescriptive,it helps guide students through the process offorming their own answer before engaging withanswer options.

An alternate method for masking contentfocuses on separating content based on its rep-resentational form. For example, some itemscontain a combination of text, mathematical orscientific nomenclature, and graphics. To assiststudents in focusing on and processing the con-tent conveyed through each of these representa-tional forms, masking can separate the presenta-tion of each form. This can be accomplished bydisplaying narrative content and blocking otherrepresentational forms when the student firstviews an item. The student could then chooseto reveal or hide each representational form ofcontent at any time while working on the item.

A third method of masking is to provide toolsthat enable students to create custom masks.These tools can take the form of electronic stickynotes that the student can place anywhere on thescreen and resize as desired. The notes can thenbe repositioned or toggled on or off as desired. Adigital framing tool could also be used to create acustom mask through which content is viewed. Inessence, a digital frame covers the entire screen.The student is then able to define the size andshape of the window through which content isviewed. The window can then be positioned any-where on the screen by the student as s/he workson the item. When sized so that the window isone line high and the width of a passage, thedigital frame can serve as a line-by-line reader.In some cases, a digital frame can be combinedwith magnifying to create a large-print single-linemask. As touch-sensitive screens become more

commonplace, there is also potential to capital-ize on this technology to allow students to createcustom masks dynamically by dragging their fin-gertip over sections of an item they would like totemporarily hide.

Like magnification and alternate contrasttools, the masking tools available to the stu-dent can be tailored based on the students’ needprofile. Unlike magnification and alternate con-trast, however, the use of masking tools oftenrequires training and multiple practice opportu-nities prior to use during an operational test.

Implications for Test Validity

Test accommodations are intended to increasetest validity by reducing the influence of non-tested constructs on the measure of the testedconstruct. As discussed throughout this chapter, avariety of access and interaction needs can inter-fere with the measure of a given construct. Bytailoring the availability and use of access toolsand alternate representations, the influence ofthese non-tested constructs may be reduced, thusimproving the validity of resulting test scores.

When deciding which access and interactiontools and which alternate representations shouldbe made available to students during testing, it isessential to consider carefully the construct thatis intended to be measured. When the intendedconstruct overlaps with the construct addressedby an access tool or representational form, thenthe provision of that tool or representational formwill have a negative effect on test score valid-ity. However, when the measured construct andthe construct addressed by an access tool or rep-resentational form are independent, test scorevalidity should either be unaffected or positivelyaffected.

For example, the validity of inferences basedon a set of items designed to measure whethera student can read, interpret, and use informa-tion presented in a table to solve a problemshould not be negatively affected by the useof magnification, alternate contrast, or maskingtools. Similarly, validity should not be negativelyaffected by reading aloud the text or numerical

Page 286: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

270 M. Russell

Item 1

CO2 + H2O →

Item 2

Increasing amounts of CO2 in the atmosphere is believed to contribute to globalwarming…

Fig. 15.4 Sample item stemscontaining scientificnomenclature

elements contained in tables. However, providingaudio descriptions that express the relationshipsamong elements in tables would decrease valid-ity because such descriptions remove the needfor students to recognize and understand relation-ships among data in the table.

Similarly, depending upon the construct beingmeasured, providing an audio representation ofscientific nomenclature may or may not reducetest validity. For example, consider the two itemsin Fig. 15.4. Both items include the expres-sion CO2, which employs scientific nomenclatureto represent carbon dioxide. The first item isintended to measure whether students can inter-pret scientific nomenclature to complete a chem-ical equation. The second item is intended tomeasure students’ knowledge and understand-ing of global warming. The ability to read andunderstand scientific nomenclature is an essen-tial component of the construct measured by thefirst item, but is not a component of the constructmeasured by the second item. For this reason,providing an audio-based narrative representationthat presents CO2 as “carbon dioxide” would beproblematic for the first item, but not the second.

To determine whether or not a given accesstool or representational form has potential to vio-late the construct being measured, two issues areimportant. First, it is essential that the constructbeing measured is carefully and fully defined. Aspart of the definition process, it is important todescribe what is and what is not intended to bemeasured. Second, it is essential that the con-struct being addressed by a given access toolor representational form is clearly specified. Indoing so, it is useful to focus on the access needmet by a given tool or representational form.Upon comparing the construct being measuredand the construct addressed by an access tool or

representational form, one can determine whetherthe use of that tool or representational form willnegatively impact test validity.

The Future Is Today

Over the past decade, several events have createdconditions that allow us to reconceptualizetest accommodations. The growing presence ofcomputers in schools makes the delivery of testson computer a realistic option for many schoolsand their teachers. Once committed to computer-based test delivery, digital technologies allow testdevelopers, whether they be classroom teachersor large-scale testing programs, to capitalize onthe flexibility of computers to tailor the test-taking experience to maximize each student’sopportunity to demonstrate what they know andcan do. To be clear, the goal of flexible, tailoredtest delivery is not to produce higher test scoresor to increase the number of students who scoreabove a certain point (e.g., proficient). Instead,the aim is to tailor conditions so that test scoresprovide a measure of student knowledge and skillthat accurately reflects their current achievement.However, since many construct-irrelevant factorsare believed to negatively affect the test perfor-mance of many students, more accurate measuresfor these students will often result in higherscores.

As described in detail earlier, computer-basedtest delivery provides an opportunity for us toshift away from thinking of test accommoda-tions as changes made by test developers ortest proctors for a small percentage of studentswith disabilities, and instead to think about alter-ing the presentation, interaction, response modes,and representations used to engage all students in

Page 287: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

15 Computerized Tests Sensitive to Individual Needs 271

tasks designed to reveal what each student knowsand can do. In other words, rather than thinkingof test accommodations as changes required fora small percentage of the test-taking population,computer-based test delivery coupled with princi-ples of universal design permit us to flexibly tailorthe test-taking experience for all students.

Providing the level of flexibility and individualtailoring described above in a paper-based worldwould be unaffordable and create an adminis-trative nightmare. By investing in careful plan-ning and development, however, flexibly tailoredtest-taking experiences on computer is achiev-able. For example, through funding from theU.S. Department of Education, a consortium ofstates led by New Hampshire Department ofEducation are currently implementing tailoredcomputer-based test delivery for increasing num-bers of students (http://nimbletools.com/necap/index.htm). This work started in 2006, when NewHampshire began working with software devel-opers to develop a universally designed test deliv-ery system. The first version of the test deliverysoftware employed to deliver a mathematics testto high school students provided only three typesof support – access to narrative content in audioform, magnified presentation of content, and nav-igation using assistive communication devices.To assure that the audio representations employedfor the audio form did not violate the constructmeasured by a given item, considerable effort wasinvested in developing detailed scripts that pre-scribed the exact way in which text-based contentwas read aloud. As part of this process, scriptswere reviewed by experts in mathematics and byitem writers, and all audio versions of items weredeemed appropriate given the constructs mea-sured. Students who might benefit from thesetailored supports were given the option of tak-ing the state test on computer. In response, thestate experienced nearly a tripling of studentswho opted to exercise their right to test accom-modations. In addition, students who used thetailored delivery system for a given accommo-dation performed significantly higher than stu-dents who opted for a paper-based version of anaccommodation (Russell, Higgins, & Hoffmann,2009).

This work has since been expanded to includeseveral additional methods for tailoring the pre-sentation of and interaction with, representationalforms of test content. In the spring of 2009,New Hampshire, Vermont, and Rhode Islandgave students the option of taking their eleventh-grade science test on paper or on computer usinga universally designed test delivery interface.The computer-based interface provided a vari-ety of access and interaction supports, includingaudio access to narrative content, audio accessto graphical content, audio access to all inter-face elements, navigation and response usingany assistive communication peripheral device,magnification, multiple methods of altered con-trast (including color overlays, reverse contrast,alternate font and background colors), multiplemasking tools, auditory calming, and scheduledbreaks. In the 92 schools that opted to deliverthe test on computer to their students, the statesagain saw nearly a tripling in the number of stu-dents who opted to employ one or more tailoredtest delivery tool. In addition, regression analy-ses indicate that students who used one or moreof the tailored delivery tools performed betterthan students who opted to receive a matchedaccommodation on paper.

While these findings are from just two earlyimplementations of tailored computer-based testdelivery, they provide three lines of preliminaryevidence regarding the benefits of tailored deliv-ery. First, these two implementations demonstratethat it is feasible and practical to develop tailoredtest delivery systems and to use those systemsto administer large-scale tests. This is importantbecause it demonstrates that the ideas presentedabove can be, and have been, translated into prac-tice, and can be widely adopted in schools today.Second, both studies provide evidence that manystudents who opt not to employ one or moreaccommodation for a paper-based test will opt tohave their needs met using a tailored computer-based test delivery system. In research performedto date, decisions about the use of access toolswere not based on IEP status. Instead, teachersexposed students to the access tools through atutorial and practice tests. In some cases, teach-ers then gave all students the option of using

Page 288: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

272 M. Russell

tools if they felt the tools would be useful forthem. In other cases, teachers limited the optionto those students who they believed were likelyto benefit from the use of a given tool. In bothcases, however, the research conducted to dateprovides evidence that many students may not betaking advantage of opportunities to reduce theinfluence of construct-irrelevant factors throughaccommodations on paper but are willing to doso through tailored test delivery on computer.Third, these implementations provide prelimi-nary evidence that tailored test delivery has apositive effect on student performance and doesallow many students to better demonstrate theirknowledge and skills.

Going forward, research is needed to betterunderstand the decisions students and teachersmake about the accessibility tools they opt to useduring testing. As this understanding evolves, itis also important to develop resources that willhelp teachers and students make more informeddecisions about tool use and assist in building astudent’s access needs profile. Finally, teachersand students will need support in updating andmodifying a profile as a student’s skills, knowl-edge, and facility with specific representationalforms and access tools evolve. Once these toolsand procedures are in place, it seems logical thatincreased benefits of tailored test experienceswill result.

Clearly, more research is needed before defi-nite claims can be made about the benefits of tai-lored computer-based test delivery. Nonetheless,the potential to improve test score validity bydelivering tests using computer-based systemsthat are sensitive to individual needs shows greatpromise. As tailored test delivery is adopted morebroadly, it will become important for testing pro-grams to carefully define the constructs a test isintended to measure to inform decisions aboutwhich tailored delivery options will enhance testscore validity and which may decrease validity. Inaddition, to fully capitalize on tailored test deliv-ery, item writers will need to carefully considerand develop alternate representations that meetthe needs of a diverse body of test takers. By fullydefining constructs and developing appropriatealternate representational forms of item content,

testing programs will be better positioned to cap-italize on tailored test delivery to provide validmeasures of test performance for all students.

References

Bahar, M., & Hansell, M. H. (2000). The relation-ship between some psychological factors and theireffect on the performance of grid questions andword association tests. Educational Psychology, 20(3),349–364.

Bausell, C. V. (2008). Tracking U.S. trends. EducationWeek, March 27, 2008.

Bennett, R. E. (1999). Using new technology to improveassessment. Educational Measurement: Issues andPractices, 18(3), 5–12.

Bennett, R. E., Braswell, J., Oranje, A., Sandene, B.,Kaplan, B., & Yan, F. (2008). Does it matter if I takemy mathematics test on computer? A second empiricalstudy of mode effects in NAEP. Journal of Technology,Learning, and Assessment, 6(9). Retrieved January 21,2010, from http://www.jtla.org

Bielinski, J., Thurlow, M., Ysseldyke, J., & Fieidebach,M. (2001). Read-aloud accommodations: Effects onmultiple-choice reading and math items. Minneapolis,MN: National Center on Educational Outcomes.

Bolt, S. E., & Thurlow, M. (2004). Five of themost frequently allowed testing accommodations instate policy. Remedial and Special Education, 25(3),141–152.

Bowen, S., & Ferrell, K. (2003). Assessment in low-incidence disabilities: The day-to-day realities. RuralSpecial Education Quarterly, 22(4), 10–19.

Clapper, A. T., Morse, A. B., Lazarus, S. S., Thompson,S. J., & Thurlow, M. L. (2005a). 2003 state poli-cies on assessment participation and accommodationsfor students with disabilities (Synthesis Report 56).Minneapolis, MN: University of Minnesota, NationalCenter on Educational Outcomes.

Clapper, A. T., Morse, A. B., Thompson, S. J., & Thurlow,M. L. (2005b). Access assistants for state assess-ments: A study of state guidelines for scribes, readers,and sign language interpreters (Synthesis Report 58).Minneapolis, MN: University of Minnesota, NationalCenter on Educational Outcomes. Retrieved [today’sdate], from the World Wide Web: http://education.umn.edu/NCEO/OnlinePubs/Synthesis58.html

Gibson, D., Haeberli, F. B., Glover, T. A., & Witter,E. A. (2003). The use of recommended and providedtesting accommodations (WCER Working Paper No.2003–2008). Madison, WI: University of Wisconsin,Wisconsin Center for Education Research.

Johnson, E., Kimball, K., & Brown, S. O. (2001).American sign language as an accommodation duringstandards-based assessments. Assessment For EffectiveIntervention, 26(2), 39–47.

Page 289: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

15 Computerized Tests Sensitive to Individual Needs 273

Landau, S., Russell, M., Gourgey, K., Erin, J., & Cowan, J.(2003). Use of the talking tactile tablet in mathematicstesting. Journal of Visual Impairment and Blindness,97(2), 85–96.

Messick, S. (1976). Personality consistencies in cogni-tion and creativity. In S. Messick & Associates (Eds.),Individuality in learning (pp. 4–22). San Francisco:Josey-Bass.

Mislevy, R. J., Behrens, J. T., Bennett, R. E., Demark,S. F., Frezzo, D. C., Levy, R., et al. (2010). On theroles of external knowledge representations in assess-ment design. Journal of Technology, Learning, andAssessment, 8(2). Retrieved January 21, 2010, fromhttp://www.jtla.org

Richardson, J., & Turner, T. (2000). Field dependencerevisited I: Intelligence. Educational Psychology, 20,255–270.

Riding, R. (1997). On the nature of cognitive style.Educational Psychology, 17(1), 29–49.

Russell, M., Higgins, J., & Hoffmann, T. (2009). Meetingthe needs of all students: A universal design approachto computer-based testing. Innovate, 5(4).

Sireci, S. G., Li, S., & Scarpati, S. (2003). The effects oftest accommodations on test performance: A review ofthe literature (Research Report 485). Amherst, MA:Center for Educational Assessment.

Tinajero, C., & Paramo, M. (1998). Field dependence-independence cognitive style and academic achieve-ment: A review of research and theory. EuropeanJournal of Psychology of Education, 13, 227–251.

Witkin, H., Moore, C., Goodenough, D., & Cox, P.(1977). Field-dependent and field-independent cogni-tive styles and their educational implications. Reviewof Educational Research, 47(1), 1–64.

Page 290: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy
Page 291: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

16The 6D Framework: A ValidityFramework for Defining ProficientPerformance and Setting CutScores for Accessible Tests

Karla L. Egan, M. Christina Schneider,and Steve Ferrara

Standard setting is commonly considered oneof the final steps in the test-development cycle,occurring just prior to the release of test results,and it is usually narrowly defined, referring onlyto the workshop where cut scores are deter-mined. Instead, standard setting should encom-pass the entire test-development cycle. It ismore than a workshop; rather, standard settingis a multistep process where a state makes pol-icy decisions about the rigor for achievementexpectations, explicates descriptions of levels ofachievement, and gathers stakeholders to rec-ommend cut scores that separate students intoachievement levels.

The standard setting process should beginwith states writing achievement-level descriptors(ALDs), which are a means for state educationagencies to communicate their expectation forstudent performance to local education agenciesand other stakeholders. In addition, states shoulduse ALDs to guide writing items and settingcut scores. This practice enables the test to bedesigned so that it supports the test score inter-pretations (i.e., what students should know andbe able to do in relation to the content standards)intended by the state. By writing ALDs early,a state explicates a policy statement and fleshesout the content-based competencies that describeacademic achievement as well as its aspirations

K.L. Egan (�)CTB/McGraw-Hill, Monterey, CA 93940, USAe-mail: [email protected]

for the state’s schoolchildren. Item writers canuse ALDs to guide item development so that theitems on the test better align to what the statemeans by Proficient.

The purpose of this chapter is to introducea framework that structures standard settingas a multistep process in which the develop-ment and refinement of ALDs are emphasized.This framework should aid practitioners, such asstate education agencies, in defining Proficientperformance (and, by extension, other achieve-ment levels), and designing and implementingstandard setting for the alternate assessmentsof modified academic achievement standards(AA-MAS) and alternate assessments of alternateacademic achievement standards (AA-AAS).Although there is much overlap with standardsetting for unmodified grade-level assessments,issues pertinent to standard setting for the AA-MAS and AA-AAS are highlighted through-out the chapter.1 The framework is designedaround key events in the multistage standard

1 Accessibility issues on unmodified grade-level assess-ments are not addressed in this chapter. States that providestudents appropriate accommodations on an unmodifiedgrade-level assessment that are also implemented duringclassroom instruction have made the test accessible to stu-dents with disabilities. That is, the items and test measurethe same construct for students with disabilities as fortheir nondisabled peers. For standard setting, this meansthat panelists should be able to discuss assessment con-structs without discussing accessibility and that the samerecommended cut scores are appropriate for both sets ofstudents.

275S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4_16, © Springer Science+Business Media, LLC 2011

Page 292: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

276 K.L. Egan et al.

setting process. Validity evidence is collected ineach phase of the framework so that practition-ers are able to build an argument to support theuses of the test scores for the AA-MAS andAA-AAS. Named the 6D Framework, it consistsof six phases: Define, Describe, Design, Deploy,Deliver, and Deconstruct.

ALDs and Standard SettingTerminology

Before delving into this framework, however,it is important to define commonly used stan-dard setting terms that may be new to novicestandard setters. First is the term achievementlevel. Even before the advent of No Child LeftBehind (NCLB), achievement levels were com-monplace in K–12 testing. Almost every state hasachievement-level designations, such as Basic,Proficient, and Advanced. Achievement levelsare simply the labels associated with each levelof performance, and they provide a normative-type judgment of an examinee’s test performance(Haertel, 1999). If test performance is labeledProficient instead of Basic, it implies that the stu-dent possessed more knowledge and skills thandid her or his peers who attained the Basic sta-tus. Sometimes achievement levels are calledperformance levels.

The term achievement standard is sometimesused interchangeably with the term cut score.In this chapter, the two are not interchangeable.Achievement standards are levels and descrip-tions of performance (i.e., ALDs), and cut scoresare the specific points on the test scale that sepa-rate students into achievement levels. Cut scoresare the numeric operationalization of the ALDsand are established during a cut-score recommen-dation workshop.

Finally, the term standard setting is often usedto refer to the workshop where educators recom-mend cut scores. In this chapter, standard settingrefers to a multiphase process that is used bystates to define ALDs and their accompanyingcut scores. The term cut-score recommendationworkshop refers to a workshop where educatorsrecommend cut scores.

A Validity Framework for StandardSetting for All Standards-BasedAssessments

Validity refers to the appropriateness of the usesor interpretations of test scores, as opposed tothe test score itself. In the case of the standardsetting process, the cut scores, the ALDs, theirrelationship, and the intended interpretations arevalidated. It also may be necessary to validate thecut scores and ALDs of the AA-MAS andAA-AAS in relationship to those of the unmod-ified grade-level assessments.

To validate the cut scores, ALDs, and theirrelationship, states must collect various types ofevidence throughout standard setting process aswell as input from various stakeholder groups.Practitioners can use the 6D Framework to pro-vide a logical structure for planning and imple-menting their AA-MAS or AA-AAS standardsetting, in addition to collecting validity evi-dence. After providing general descriptions foreach of the phases of the 6D Framework, detaileddescriptions are provided in subsequent sectionsin this chapter.Define: States define the population taking the

test, the content standards underlying thetest, the relationship of the AA-MAS orAA-AAS to the unmodified grade-level assess-ment, and the intended uses of the ALDs.All such decisions affect how ALDs arewritten.

Describe: States develop ALDs that illustrate thepolicy expectations regarding what students ineach achievement level should know and beable to do.

Design: States plan the workshops associatedwith the multiphase standard setting processso that educators, policy makers, and stateeducation staff can provide input into the cutscores. States also write items and constructtests during this phase.

Deploy: States implement the multiphase stan-dard setting workshops, and the decision-making agency in the state (e.g., stateboard of education) approves the final cutscores.

Page 293: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

16 Validity Framework 277

Deliver: States deliver score reports along withinterpretation guides, including ALDs, to localeducation agencies, teachers, and examineesand their parents.

Deconstruct: States and their contractors collab-orate in collecting the documented validityevidence from the prior steps and compiling itinto the standard setting technical report. Statesexplicate the evidence confirming (and discon-firming) the relationship between the ALDsand cut scores in this phase.Oftentimes the phases of the 6D Framework

will occur in the order delineated above;however, it is possible for some phases tooccur out of order. For example, it is possi-ble that states will complete the standard set-ting technical report before delivering scorereports. It is also likely that states will con-duct various validity studies long after complet-ing the program and standard setting technicalreport. The remainder of this chapter exam-ines the phases of the 6D Framework in moredetail.

Define

Within this first phase of the 6D Framework,states must define the scope of the assessment,deciding who and what will be tested. They mustalso define the number, intended purposes, andrigor of the ALDs. In large part, the adviceabout standard setting in this chapter and inthe literature is applicable to unmodified grade-level assessments as well as the AA-MAS andAA-AAS. The unique considerations associatedwith the AA-MAS and AA-AAS are the stu-dents themselves and the designs of the tests(Egan, Ferrara, Schneider, & Barton, 2009). Thissection discusses the student populations andthe relationship between the AA-MAS or theAA-AAS and the unmodified grade-level assess-ment. In addition, it discusses the desired rigor ofthe ALDs, the means for increasing accessibility,the intended uses of the ALDs, and the involve-ment of stakeholders. This section ends with adiscussion of validity evidence.

Identify the Examinee Population

With unmodified grade-level assessments, the tar-get population is obvious: almost all students inthe state. With the AA-MAS and AA-AAS, thedecision of who will take these tests is less obvi-ous. Students with mild and moderate disabilitieswho take an AA-MAS are believed to be ableto learn most of what their non-disabled, grade-level peers learn, but with more difficulty, overlonger periods of time, perhaps in less depth, andperhaps with less demonstrable mastery over con-tent and skills. It is not clear, however, exactlywho constitutes this group, and the definitionvaries from state to state. Examinees takingthe AA-AAS have significant cognitive disabil-ities and test on general content standards thathave been extended downward (e.g., Browder,Wakeman, & Jimenez, n.d.).

Identify the Relationship to UnmodifiedGrade-Level Assessments

To design a meaningful standard setting pro-cess, states must also identify the intended con-ceptual and psychometric relationship betweenthe unmodified grade-level assessment and theAA-MAS or AA-AAS. The AA-AAS is likely tobe a distinct assessment, unlinked to the unmodi-fied grade-level assessment. In some cases, statesmay want a psychometric or conceptual relation-ship between the AA-MAS and AA-AAS.

The AA-MAS, on the other hand, is requiredto target the same general content standardsas the unmodified grade-level assessment. Tofulfill this requirement, most states are devel-oping an AA-MAS derived directly from theunmodified grade-level assessment. Typically,the items in the unmodified grade-level assess-ment are modified to reduce the cognitivecomplexity of the skills that the items tar-get through such techniques as scaffoldingitems, bolding and underlining, and provid-ing graphic organizers to help examinees orga-nize their thinking (Wothke, Cohen, Cohen, &Zhang, 2009). In addition, the AA-MAS oftencomprise fewer items and are targeted to measure

Page 294: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

278 K.L. Egan et al.

student skills at a lower ability level than theunmodified grade-level assessment (which is typ-ically unable to provide much information aboutwhat these students know and can do).

Even though states are designing theAA-MAS and unmodified grade-level assess-ment to measure the same standards (andsometimes using some of the same items), therelationship between the two assessments is notobvious. States may intend that the AA-MAS(a) predict performance on the unmodifiedgrade-level assessment, (b) link vertically tothe unmodified grade-level assessment, or (c)be distinct from the unmodified grade-levelassessment. If the intended relationship ispredictive, then, for example, Proficient perfor-mance on the AA-MAS might indicate that thestate expects these students to achieve scoresat a Basic level on the unmodified grade-levelassessment. Predictive relationships can be estab-lished empirically, through prediction studies,or conceptually, following vertical articulationprocedures (e.g., Lewis & Haug, 2005). If thegoal is to link the AA-MAS to the unmodifiedgrade-level assessment vertically, then cut scoresfrom the unmodified grade-level assessment canbe located on the AA-MAS score scale. Verticallinking must be established through a successfulvertical linking study (e.g., Young, 2006). Ifthe state intends the relationship to be distinct,then consideration must be given to choosingachievement-level labels and writing ALDs thatwill allow this to happen.

The left side of Table 16.1 illustrates pre-dictive and linked relationships between ALDsfor the unmodified grade-level assessments andthe AA-MAS. Aligning Proficient on the AA-MAS with Below Basic on the unmodified grade-level assessment is intentional in this illustra-tion. Most unmodified grade-level assessmentsprovide some useful psychometric informationabout student achievement near the bottom ofthe test scales, so examinees who achieve theProficient level on the AA-MAS should be readyto demonstrate what they know and can do onthe unmodified grade-level assessment. The rightside of Table 16.1 illustrates a distinct relation-ship between the ALDs for the two assessmentsin which a state will not link performance on

Table 16.1 Predictive or linked relationship and distinctrelationship between ALDs for the unmodified grade-levelassessments and AA-MAS

Predictive or linked Distinct

Unmodifiedgrade-levelassessment AA-MAS

Unmodifiedgrade-levelassessment AA-MAS

Advanced Advanced

Proficient Proficient

Basic ≈ Advanced Basic

Below basic ≈ Proficient Below basic

Basic

Belowbasic

AdvancedProficientBasicBelowbasic

the AA-MAS to performance on the unmod-ified grade-level assessment. In this case, thestate offers no expectation of how student perfor-mance on the AA-MAS may translate into perfor-mance on the unmodified grade-level assessment.States must define these relationships betweenthe assessments before they write ALDs.

Identify Desired Rigor for ALDs

Before writing ALDs, states should decide therigor that they want to associate with the ALDsfor the AA-MAS and AA-AAS through genericpolicy statements. These statements will pro-vide guidance and direction when writing ALDs.Generic policy statements do not describe specif-ically what students should know and be ableto do within each achievement level; rather, thegeneric policy statements set the tone for thetype of ALDs the state would like developed dur-ing the ALD-writing workshop. The tone usedin the generic policy statements may have asignificant impact on how the ALDs are writ-ten. For example, if the state is looking forminimal competency standards, the generic pol-icy statements might talk about students “mas-tering basic skills.” On the other hand, if thestate is looking for challenging ALDs, the state-ments might talk about “mastering challengingcontent.”

Page 295: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

16 Validity Framework 279

Identify Means for IncreasingAccessibility

Prior to developing the AA-MAS, states shouldidentify the approaches that will increase acces-sibility. This will be difficult because researchon item modifications is limited, with most mod-ifications based on the rationale (as opposedto empirical research) that the modificationswill increase accessibility (Elliott et al., 2010;Kettler et al., in press; Welch & Dunbar, 2009).Early findings are emerging. For example, recentresearch from a series of item design studies anda pilot test of item modifications showed thatgraphic organizers appear to reduce the impact ofworking memory deficits, while bolding key textappears to aid attention for reading items (Wothkeet al., 2009).

Identify Intended Uses of ALDs

Test development, cut-score recommendation,and scale score interpretation are three intercon-nected processes for which ALDs provide guid-ance (Schneider, Egan, Siskind, Brailsford, &Jones, 2009). The three processes work togetherto build a unified assessment system. During theDescribe phase of standard setting, the state cre-ates ALDs to support these three intertwineduses.

Stakeholder Involvement

A wide variety of stakeholders (e.g., politi-cians, concerned citizens, local educators, and/orparents) may be brought together to providea state education agency feedback about thetypes of assessment relationships and ALD usesthat would benefit students and educators. Forexample, the policy makers who crafted NCLBdecided that all children must be assessed, regard-less of disability. Once politicians enacted thelaw, each state education agency was requiredto determine exactly how to meet these require-ments within their state. In the case of the AA-MAS, the states need to determine who and what

to assess. State education agencies may undertakea decision-making process where stakeholdersmake recommendations to the state educationagency through a series of workshops or commit-tee meetings.

Validity Evidence

It is necessary to collect validity evidence withineach phase of the 6D Framework; much of thisevidence is focused on the fidelity of implement-ing the standard setting procedure itself (Cizek,1996). Called procedural evidence of validity, it“focuses on the appropriateness of the proce-dures used and the quality of the implementa-tion of these procedures” (Kane, 1993, p. 14).The value of this type of evidence cannot beunderstated. Although it alone cannot supportthe validity of the ALDs and cut scores result-ing from the standard setting process, its absencewill undermine arguments to support theirvalidity.

This means that practitioners will find thatthey collect the same types of proceduralevidence in different phases, including doc-umentation on process planning and panelistrecruitment and selection as well as process eval-uations. Instead of repeating these types of evi-dence in each section, they will be explained forthe Define phase and should be assumed for theremaining phases where workshops or meetingsoccur.

Process Planning

Whenever the state holds a committee meetingor workshop where key decisions regarding thestandard setting process will be made, the stateshould create a detailed plan that explicates thesteps to be taken during that committee meet-ing or workshop. During the Define phase, forexample, a state may hold a committee meet-ing to decide the examinee population for theAA-MAS. The plan for this committee meet-ing should be practicable, meaning that it shouldbe easy to implement and understandable to the

Page 296: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

280 K.L. Egan et al.

general public (Hambleton & Pitoniak, 2006).A well-articulated plan will show the logic andrationales for the process that is to be followed inthe committee meeting or workshop.

Panelists and Committee Members

As part of process planning, states should explainhow workshop panelists or committee memberswill be recruited, including specific informationsuch as race, gender, or area of expertise thatmay be targeted. It is important that the statecan show a legitimate attempt to recruit a panelor committee that is representative of the state’sdiversity. Following the meeting, the state shouldsummarize demographic information about thepanel or committee as well as information ontheir expertise.

Process Evaluations

These are especially important to all commit-tee meetings and workshops implemented in the6D Framework. Process evaluations are attitudi-nal surveys administered to workshop panelists tocollect information on their views of the validityof the procedure as well as their levels of agree-ment with the final workshop product. Processevaluations should be collected throughout thecommittee meeting or workshop to gauge panelistattitudes and understanding. Positive evaluationswill strengthen the validity arguments associ-ated with a workshop. Negative evaluations canundermine an entire workshop; therefore, it isimportant to collect feedback often in order toaddress negative attitudes early.2 Panelists mayreport that they understood the task, thought theprocess was fair, but then report that they arenot able to defend the final workshop product.If such results are found for a minority of pan-elists, this should not undermine the workshoprecommendations.

2 See Cizek, Bunch, and Koons (2004) for an example ofa workshop evaluation.

Describe

In the Describe phase of the 6D Framework,ALDs are developed. It seems obvious that statesshould make decisions about what to assess atthe beginning of the test-development cycle. Lessobvious is that states should also determine howmuch students are expected to know in regardto the content standards. This means that statesshould define the knowledge, skills, and abili-ties (KSAs) in the ALDs (i.e., decide what itmeans to be Proficient) prior to the beginningof the test-development cycle so that they serveas a foundation for item development. This sec-tion defines the types of ALDs, discusses definingProficiency on the AA-MAS and AA-AAS, andprovides practical advice for writing ALDs.

Types of ALDs

In the past, ALDs described the typical or bor-derline student (Hambleton, 2001). Such descrip-tors were usually framed in the context of thecut-score recommendation workshop, ignoringthat ALDs have purposes beyond the cut-scorerecommendation workshop. Achievement-leveldescriptors can be used for intertwined purposes(test development, cut-score recommendations,and score reporting), and different types of ALDsalign to each purpose. There are three distinctstages of ALD development that support the threeuses of ALDs:• First, states should develop target ALDs to

specify their expectations for students at thethreshold of each achievement level. The tar-get ALDs define the state’s policy and content-based expectations (i.e., what it means to beProficient). They are the lower-bound descrip-tions of an achievement level and guide thecut-score recommendation workshop. Thesedescriptors target the skills all Proficient stu-dents should have in common.

• Next, states should develop range ALDs,which are expansions of the target ALDs.Along with the target ALDs, states shoulddevelop range ALDs prior to item writingbecause they should guide the item-writing

Page 297: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

16 Validity Framework 281

process. The range ALDs reflect the KSAsexpected of examinees in the Proficient range.Because these descriptors target a range of stu-dents, not all Proficient students should be ableto demonstrate all KSAs in the descriptions.

• Last, states should produce reporting ALDsonce they adopt final cut scores. These ALDsrepresent the reconciliation of the targetALDs with the final cut scores. The targetALDs reflect a state’s expectations of stu-dent performance, while the reporting ALDsreflect actual student performance. Becausethe final cut scores reflect policy considera-tions, in addition to the content considerationsfound in the target ALDs, it is necessary toreconcile the target ALDs with the KSAsreflected by the final cut scores. The reportingALDs define the appropriate, validated infer-ences stakeholders may make about examineeKSAs based upon the student’s test score. Inother words, the reporting ALDs support scoreinterpretation.The range ALDs are the expectations regard-

ing what students should know across the rangeof Proficiency, while the target ALDs focus onthe skills students should possess to just enterProficiency. The target ALDs are a specific sub-set of range ALDs. The target and range ALDsaddress the KSAs that a state expects Proficientstudents to know and be able to do. These expec-tations are based on aspirations for student per-formance, rather than actual student performance(i.e., reporting ALDs). The development of thetarget and range ALDs is akin to the work beingdone in the area of evidence-centered design(Bejar, Braun, & Tannenbaum, 2007; Plake, Huff,& Reshetar, 2009) because of the recommenda-tion that ALDs be used to define the intendedinferences about students, and the items be devel-oped specifically to elicit the KSAs explicated inthe ALDs.

Once a state finalizes cut scores, it is necessaryto review the target ALDs based on the KSAs stu-dents demonstrated on the test. It is important toremember that target ALDs reflect expectations,which sometimes do not align with the reality ofstudent performance. The reporting ALDs shouldreflect KSAs demonstrated by students on the

assessment. The Deploy section in this chap-ter discusses the refinement of target ALDs intoreporting ALDs.

Defining Proficiency

Before discussing a method for writing ALDs,it is necessary to comment further on therelationship among the AA-MAS, AA-AAS, and the grade-level assessment. Forthe AA-AAS, the test measures the extendedcontent standards, which means that the AA-AASis established separately from the grade-levelassessment. For the AA-MAS, the relationship tothe grade-level assessment is more complicatedbecause the state must define a linked, predic-tive, or distinct relationship between the twoassessment types. Table 16.2 shows an exampleof a linked relationship where the grade-levelProficient definition has been translated into anAA-MAS achievement level called ApproachingExpectations. Both sets of achievement-leveldescriptors are based upon the grade-levelstandards and both define the intended inferencesthat can be made about the student KSAs. Asmay be seen, however, the ALDs may vary inrelation to grade-level standards. The way inwhich Proficiency is defined on the AA-MASwill very much depend on the way the stateeducation agency plans to link the AA-MAS tothe grade-level assessment and how clearly thestate defines how grade-level standards are to betested in the item specifications.

ALD-Writing Workshop

A workshop where a diverse group of stake-holders come together to create ALDs is oneof the various ways to write descriptors. Theproposed ALD-writing workshop is a multi-dayevent where the ALD committee of stakeholdersparticipates in multiple rounds of discussion. Theworkshop should begin with an orientation whererepresentatives from the state agency describethe test’s target population, the test-developmentcycle, and how the ALDs function within thatcycle. Following this, the panelists are trained

Page 298: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

282 K.L. Egan et al.

Table 16.2 Bridging the unmodified grade-level assess-ments and AA-MAS target ALDs through a linkedrelationship

Unmodified grade-levelassessment AA-MAS

What Proficient studentsshould be able to do

What ApproachingExpectations studentsshould be able to do

• find the area ofrectangles and irregularfigures drawn

• find the area of rectanglesand irregular figuresdrawn on a grid

• solve multistepproblems

• solve multistep problemswhen steps are scaffolded

• find the correct orderedpair for a point onthe coordinate plane

• find the correct orderedpair for a point on thecoordinate plane whensteps are scaffolded

• add money amountsunder $100

• add benchmark moneyamounts under $100(20, 30, 40, etc.)

• add or subtract fractionswith like denominators

• add or subtract commonfractions with likedenominators whengraphics of fractions areshown

• add or subtract fractionswith unlikedenominators

• add or subtract commonfractions with unlikedenominators whengraphics of fractions areshown

• find the range of a set ofwhole numbers whereno whole number isgreater than 50

• find the range of a set ofwhole numbers where nowhole number is greaterthan 10

• find the probability ofsimple events

• find the probability ofsimple events whenillustrated by graphics

• extend the patterns ofgeometric shapes

• extend short patterns ofgeometric shapes

• compare, order, andsimplify fractions

• compare, order, andsimplify fractions whengraphics of fractions areshown

• simplify mathematicalexpressions by using theorder of operations rules

• simplify mathematicalexpressions when order ofoperations is sequential

on the method for writing both target and rangeALDs. The following section provides a descrip-tion of the proposed method for writing ALDs.

The proposed ALD-writing workshop has fourrounds. In the first round, the AA-MAS andAA-AAS committees should:

• study and discuss the generic policy state-ments

• study and discuss the ALDs for the unmodi-fied grade-level assessments

• discuss and annotate the content standardsIn the second round, the AA-MAS and AA-

AAS committees should compile and edit theALDs. In the third round, a panel exchangesALDs with a panel from a different grade leveland provides feedback. In the final round, pan-elists consider the revisions and finalize theALDs.

A meta-committee should meet immediatelyafter the final round. The meta-committee is across-grade panel that will examine, edit, andrevise the ALDs for coherency and articulationacross the grade levels. The results of the meta-committee are provided to the state educationagency for further consideration.

Method for Writing Rangeand Target ALDs

One method for beginning the ALD-writing pro-cess is to request that panelists parse the statecontent standards (or extended content standards)into achievement levels. For example, the con-tent standards are parsed into Basic, Proficient,or Advanced. For these achievement levels, pan-elists will annotate the content standards to indi-cate whether the skills are expected of the justProficient (P-), average Proficient (P), or highlyProficient (P+) student. The content standardsprovide ALD writers a framework for catego-rizing KSAs into achievement levels and pro-vide parameters for the content the state educa-tion agency has deemed important for students.Throughout the parsing and annotation process,panelists study and discuss the content stan-dards as well as their expectations for studentachievement.

Once parsed, the annotated content standardsmust be compiled into range and target ALDs.Figure 16.1 shows an example of how a con-tent standard may be annotated and transformedinto ALDs. These ALDs should be written sothat the language of the ALDs is accessible

Page 299: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

16 Validity Framework 283

Original Grade 3 Content Standard*

Annotated Standard

Example of Target ALD for Proficient

Standard 1: Students will read, write, listen, and speak for information and understanding. • Locate and use library media resources to acquire information, with assistance • Read unfamiliar texts to collect data, facts, and ideas • Read and understand written directions • Locate information in a text that is needed to solve a problem • Identify main ideas and supporting details in informational texts • Recognize and use organizational features, such as table of contents, indexes, page numbers, and chapter

headings/subheadings, to locate information, with assistance

Standard 1: Students will read, write, listen, and speak for information and understanding. B P–

• Locate and use library media resources to acquire information, with assistance P+• Read unfamiliar texts to collect data, facts, and ideas B P–• Read and understand written directions

P–• Locate information in a text that is needed to solve a problem

P P+• Identify main ideas and supporting details in informational texts

B+ P– • Recognize and use organizational features, such as table of contents, indexes, page numbers, and chapter

headings/subheadings, to locate information, with assistance

Standard 1: The Grade 3 target Proficient Student will read, write, listen, and speak for information and understanding with the following limitations: • Use a variety of library media resources to acquire information, with assistance • Understand written directions • Locate information in a text that is needed to solve a problem • Use organizational features, such as table of contents, indexes, page numbers, and chapter

headings/subheadings, to locate information, with assistance

Skills represented by P– are compiled for the target Proficient ALD.

Skills represented by P–, P, and P+ wouldbe compiled for the range Proficient ALD.

P–: Skills of the just Proficient student P: Skills of the average Proficient student P+: Skills of the high-performing Proficient student

B– : Just entering Basic B : Average Basic B+ : High-performing Basic

Fig. 16.1 Creating target and range ALDs∗Example content standard from New York States (2005) Grade 3 English Language Arts Core Curriculum (http://www.emsc.nysed.gov/ciai/ela/elacore.pdf)

and clear. Mehrens and Lehmann (1991) pro-vide guidance for developing measureable con-tent objectives that can easily be adapted andextended to develop well-written ALDs.

ALDs should contain explicit, action verbs.When explicit verbs are used, teachers, parents,and stakeholders have a better sense of whatstudents can actually do, whereas descriptors

that use implicit verbs require that stakeholdersdecode their meaning. For example, writing “TheLevel 2 student recognizes the main idea” doesnot lead to a clear conceptualization of what thestudent should do to demonstrate his or her skill.Rather, “The Level 2 student is able to tell orshow the main idea” provides a better concep-tion of actual student behavior used to measure

Page 300: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

284 K.L. Egan et al.

the skill. [See Mehrens and Lehmann (1991) forother examples for clarifying the language usingimplicit verbs.]

ALDs should provide contextual or scaffoldingcharacteristics. Contextual or scaffolding char-acteristics of items can elicit intended examineeknowledge and skills. For example, students maybe able to respond to mathematics questionsthat incorporate adding and subtracting positiveand negative integers, only when number linesare present. This information can describe dif-ferences in examinee knowledge and skills thatteachers can use to support instruction.

Validity Evidence

The range ALDs should guide item writing. Ifthis is the case, then the state should show howthe newly developed items align to the differentachievement levels. Once the test is adminis-tered and cut scores are finalized, the state canre-examine how well the items aligned to theirintended achievement level. This empirical evi-dence will show how well (or poorly) item writerswere able to align items to the desired ALDs.

Design

During the Design phase of the 6D Framework,a standard setting plan is created that addressesthe breadth of the process, meaning that the planshould encompass the cut-score recommenda-tion workshop, the review by policy makers, anda method for refining the target ALDs (to thedegree that is necessary). This phase acknowl-edges the importance of multiple stakeholdergroups in the standard setting process, and it rec-ognizes that several different types of workshopsmay be held before the cut scores are finalized.

The various groups often view the cut scoreswith competing perspectives. For example, teach-ers at a cut-score recommendation workshop mayview the cut scores from a content-driven per-spective, while administrators at a policy reviewmay view the cut scores in terms of the numberof students who will need remediation. For thesame set of cut scores, state-level policy makers

may view the cuts from a reasonability perspec-tive, considering how the performance data willbe received by the public. Each group brings aunique view of the data, and these views need tobe reconciled.

In the Design phase, the state will decide howthe views of the different groups will informone another. Typically, the teacher group recom-mends cut scores first, followed by the policyreview, then the state review. If it is known inadvance that state-level policy makers have a cer-tain entrenched view (e.g., our state must haveNAEP-like achievement standards), then it seemsreasonable to share that view with the othergroups.

To understand what methodologies can beused to establish teacher-recommended cutscores, it is necessary to consider the test design.In many ways, the test design will drive themethodology that will be used to set cut scores. Inthis section, possible AA-MAS and AA-AAS testdesigns are briefly discussed to set the stage formethodologies that can be used for teacher work-shops. The section ends with a brief overview ofa methodology for a policy review.

AA-MAS Test Designs

The most popular design for the AA-MAS is testscomprised of multiple-choice and constructedresponse items. Almost all planned and imple-mented AA-MAS are derived from correspond-ing unmodified grade-level assessments. In re-designing an unmodified grade-level assessmentfor use as the state AA-MAS, it is popular toreduce the total number of items as well as theoverall difficulty of the test. States have onlyrecently begun to address subtler modificationissues, such as taking steps to reduce the cogni-tive complexity of items by bolding or underlin-ing portions of the items.

AA-AAS Test Designs

Ferrara, Swaffield, and Mueller (2009) iden-tify three designs for the AA-AAS currentlyused in K–12 assessments: assessment portfolios,

Page 301: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

16 Validity Framework 285

Table 16.3 Standard setting methods relevant for the AA-MAS and AA-AAS test designs

Methodology Test design Sample references

Bookmark Multiple-choice/constructed response;Multiple choice only

Cizek and Bunch (2007, chapter 10); Mitzel,Lewis, Patz, and Green (2001)

Item-descriptor (ID)matching

Multiple-choice/constructed response;Multiple choice only

Cizek and Bunch (2007, chapter 11); Ferrara,Perie, and Johnson (2008)

Modified Angoff Multiple-choice/constructed response;Multiple choice only

Cizek and Bunch (2007, chapter 6)

Body of work Constructed response only; portfolio;performance task; rating scale

Cizek and Bunch (2007, chapter 9); Kingston,Kahl, Sweeney, and Bay (2001)

Profile sorting Constructed response only; portfolio;performance task; rating scale

Jaeger (1995)

Reasoned judgment Constructed response only; portfolio;performance task; rating scale

Roeber (2002)

performance tasks, and rating scales. Assessmentportfolios contain collections of student academicwork, video and audio recordings of studentsperforming academic tasks, and other evidenceof their performance in relation to the extendedcontent standards.

Performance tasks generally focus on severalrelated extended academic standards and containanywhere from 3 to 15 items. Items often arescaffolded. Administration of assessment tasksusually is supported by manipulatives such as realobjects (e.g., soil or rocks for science), responsecards for nonverbal students, and other commu-nication systems and assistive devices used bystudents during classroom activities.

Rating scales typically include considerablenumbers of items linked to extended contentstandards. Teachers and other observers rate stu-dents as they complete tasks, collect student worksamples, or interview other teachers and adults.Rating scales are typically multilevel (e.g., per-formance can be classified as nonexistent, emerg-ing, progressing, or accomplished). The typesof supporting evidence that must be providedtypically are prescribed.

Methodologies for RecommendingCut Scores

The cut-score recommendation methods usedfor the unmodified grade-level assessments arealso appropriate for the AA-MAS and AA-AAS. Item-based, cut-score recommendation

methods (e.g., Angoff, Bookmark, Item-descriptor matching) are appropriate for testscomprised of multiple-choice and constructedresponse items (including essay prompts) or rat-ing scales. Methods requiring that panelists makeproficiency judgments based upon student worksamples (e.g., Body of Work, Profile Sorting)or scoring rubrics (e.g., Reasoned Judgment)are suitable for portfolio assessments that orga-nize collections of student work as evidenceof achievement. It is beyond the scope of thischapter to detail the cut-score recommendationmethods, but there is an extensive, easily accessi-ble literature on a variety of methods.3 Table 16.3matches several cut-score recommendationmethods with the test design for which they arefrequently recommended. Table 16.3 also citessample references that provide information onthe cut-score recommendation methods, wherethe interested user can find more information ona particular methods.

Methodology for Policy Reviewof Cut Scores

Although policy reviews of cut scores are a fre-quent occurrence in the standard setting processfor K–12 assessments, a standard method is not

3 Egan, Ferrara, Schneider, and Barton (2009) overviewseveral cut-score setting methodologies for use withAA-MAS.

Page 302: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

286 K.L. Egan et al.

used. In some cases, policy reviews consist of astate superintendent along with one or two polit-ical operatives adjusting the cut scores to bettersuit policy needs. In other cases, a more formalprocess is followed that allows educators with anunderstanding of policy (e.g., district superinten-dents, school principals) to recommend changesto the teacher-recommended cut scores. In thisprocess, a committee of 10–20 may participate asfollows:• Round 1: Review impact data (percentage of

students in each achievement level) from theteacher workshop along with the target ALDs.Additional data may include impact data fromother state, national, or international tests.Discuss the appropriateness of the impact datawithin and across grades and across contentareas. Recommend changes to the cut scores.

• Round 2: Review the recommended changesto the cut scores. Re-open discussion on theappropriateness of the cut scores within andacross grades and content areas. Recommendchanges to the cut scores. Articulate a ratio-nale for the changes.The recommendations of the policy review

may then be provided to the state’s decision-making agency. Although policy considerationshave the most weight during this phase of the pro-cess, it may be helpful to policy makers to seehow the content in the ALDs may be modified bychanges to the cut scores.

Validity Evidence

When the cut-score recommendation workshopis implemented, the state should plan to col-lect three types of evidence internal to the pro-cess itself: within-method consistency, intrapan-elist consistency, and interpanelist consistency(Hambleton & Pitoniak, 2006). These types ofevidence examines the consistency with whichpanelists are able to translate the target ALDs intocut scores (Kane, 2001).

Within-Method Consistency. This type of evi-dence examines what happens when the cut-scorerecommendation method is replicated acrossgroups and is indicated by the standard error of

the cut score. If the recommendations are rela-tively consistent across groups, then the resultingstandard error will be small. Similarly, discrepantrecommendations may result in unacceptablylarge standard errors.

Intrapanelist Consistency. This type of evi-dence examines the consistency of judgmentsmade by the panelist across rounds. It is antic-ipated that panelists will adjust their cut scoresacross rounds as they learn new information andgain better understanding of the target ALDs.

Interpanelist Consistency. This type of evi-dence examines the consistency of judgmentsacross panelists. Most cut-score recommendationmethods use consensus-building activities, wherepanelists are expected to engage in structureddialogues about their expectations of student per-formance. It is expected that the panelist cutscores will converge across rounds.

Deploy

Although the Define, Describe, and Designphases occur across months or years, the Deployphase often occurs very rapidly. In the Deployphase of the 6D Framework, panelists arerecruited, meeting space is reserved, and the stan-dard setting design is implemented. It is duringthis phase that the state implements the teacherand policy workshops, and the state decision-making agency enacts the final cut scores. Anoften-ignored step in this phase is an empiri-cal validation of the target ALDs. It is crucialto the accurate interpretation of student perfor-mance that the target ALDs align well with thefinal cut scores. To ensure this, the state shouldrevisit the target ALDs once final cut scores aredetermined.

Refining Target ALDs into ReportingALDs

Target ALDs describe what students at the cutscore should know. Ideally, panelists use the tar-get ALDs to develop their cut-score recommen-dations so that the cut scores on the test scale

Page 303: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

16 Validity Framework 287

reflect the information articulated in the targetALDs. Over the course of a multiphase standardsetting, the target ALDs may no longer reflectwhat students are expected to know based uponfinal cut scores; thus, when final cut scores areestablished, the target ALDs do not reflect thefinal cut scores. One reason for this is that pol-icy considerations often influence a state’s finalcut scores. When a policy-based review follows astandard setting, policy experts are guided by dif-ferent considerations, including the cross-gradelogic of the impact data (e.g., Does the percent-age of students at or above Proficient in eachgrade make sense?), available funding, politicalimplications of cut scores (e.g., would they beconsidered “easy” compared to NAEP), and pastperformance in the state (Schneider et al., 2009).In short, the policy experts may consider a vari-ety of reasons to adjust cut scores and, most often,they do not consider the target ALDs.

Once target ALDs no longer reflect the cutscore, they do not provide a valid interpreta-tion of the meaning of the achievement levels.The target ALDs can be refined into reportingALDs by adjusting the content to reflect thefinal cut scores. This process may be accom-plished by mapping the test items in order ofdifficulty, separating them by the cut scores, andsummarizing the KSAs found in each achieve-ment level. This may mean that some contentis removed and some is added to the reportingALDs (Bourque, 2009; Schneider, Egan, Kim,& Brandstrom, 2008; Schneider et al., 2009).As new test forms are introduced, the reportingALDs can be refined further to reflect the newcontent.

Validity Evidence

In the Deploy phase, at least three workshopsare implemented: the cut-score recommendationworkshop, the policy review workshop, and theALD refinement workshop. Given the role ofALDs for reporting, ALD refinement is an impor-tant validation step for a standard setting pro-cess. Traditionally, practitioners have collectedevidence to show that the cut scores from the

cut-score recommendation workshop reflectedthe target ALDs. Now, it is necessary to turn thisidea on its head and ensure that the reportingALDs reflect the final cut scores. To do this, statesshould show how items align to each achieve-ment level once cut scores are finalized. Thiscan be accomplished by gathering stakeholderfeedback to ascertain whether the KSAs of theitems located around the cut scores are consis-tent with the descriptors for each achievementlevel.

Deliver

In this phase of 6D Framework, the results of thestandard setting process are released to the publicin the form of score reports to parents, teach-ers, and administrators. Often, shortened versionsof the reporting ALDs will appear on the scorereport, and longer, more informative reportingALDs will be on the state’s website. When scorereports from the AA-MAS and AA-AAS aredelivered, it is important to communicate howthose results should be interpreted in light of thegrade-level assessment. If the term Proficient isused to label student performance on all threeassessments, it is important to communicate thedifferences in what the term means for eachassessment.

Using Reporting ALDs

Reporting ALDs should contain guidance so thatteachers and parents understand how to inter-pret and use the ALDs. For example, states mayexplain whether the reporting ALDs are cumu-lative so that Proficient students will most likelypossess the KSAs found in the achievement lev-els below Proficient. Additionally, teachers canuse the reporting ALDs to better understandthe general differences in student performancewithin a content area. Teachers should not use thereporting ALDs as shortcut teaching frameworksnor should they use them as a mini-curriculumbecause the reporting ALDs do not encompassthe breadth of the content standards.

Page 304: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

288 K.L. Egan et al.

Validity Evidence

Validity evidence for the Deploy phase consistsof the score reports and test results releasedto schools, and the interpretative guidance withsuggestions for using the reporting ALDs. Aninteresting study would involve surveying edu-cators to inquire how well the ALDs describestudent performance, given how their studentswere classified on the test.

Deconstruct

In this phase of the 6D Framework, the varioustypes of validity evidence collected throughoutthe standard setting process are compiled and

evaluated for the degree to which they supportthe conclusion that there is a good relationshipbetween the ALDs and the cut scores. Statesand their contractors assemble this validity evi-dence into the technical report. Because technicalreports tend to have a confirmationist bias, it maybe beneficial to seek an outside perspective onthe relationship between the range ALDs, thereporting ALDs, the cut scores, and the intendedinterpretations.

Technical Report

The technical report is an important piece of evi-dence for the procedural validity of the standardsetting. Figure 16.2 suggests an outline for a

I. Executive Summary

II. Chapter 1: Planning a. Panelist recruitmentb. Committee formationc. Panelist demographics and expertise

III. Chapter 2: Executiona. Training Description

i. Validity evidence: training materials; objective tests of panelist understanding of the cut score method; evaluations

b. Round 1 Descriptioni. Validity evidence: Round 1 votes; objective tests of panelist understanding;

Round 1 evaluationsc. Round 2 Description

i. Validity evidence: Round 2 votes; Round 2 evaluations; intrapanelist consistency; interpanelist consistency; standard error of cut score

d. Round 3 Descriptioni. Validity evidence: Round 3 votes; Round 3 evaluations; intrapanelist

consistency; interpanelist consistency; standard error of cut score

IV. Chapter 3: Policy Review of Cut Scores a. Committee selection b. Process c. Final recommendations

V. Chapter 4: Approval Process

VI. Chapter 5: Refinement of Target ALDs into Reporting ALDs a. Process b. Final reporting ALDs

VII. Chapter 6: Summary

Fig. 16.2 Outline for a cut-score recommendation technical report

Page 305: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

16 Validity Framework 289

technical report for the cut-score recommenda-tion workshop. Technical reports should be cre-ated for each important panelist workshop heldduring the standard setting process. The technicalreports should be developed as the authoritativesource to understand the overall process under-taken to set cut scores and to review the technicalevidence associated with the standard setting.The audience for technical reports includes mem-bers of technical advisory committees or peerreview committees.

Within the narrative of the technical report,the author should directly address any anoma-lies that may have occurred during the course ofthe standard setting. For example, a grade-levelpanel may request an additional round of votingand discussion, and the motivation for this addi-tional round should be discussed. The technicalreport should also address any seemingly aber-rant results in the data. For example, a grade-levelcommittee may indicate disagreement with thecut-score recommendation process in the evalu-ations. This disagreement may result from othersources (e.g., a state law that drives cut-scoreplacement) as opposed to disagreement with theprocess itself.

Outside Perspective

As an additional source of validity evidence,a state education agency could hire a consul-tant to review the technical report and the rela-tionship between the ALDs and the cut scores.The consultant could specifically look for dis-confirming pieces of evidence that were unde-tected previously. The purpose of this reviewis not to undermine the results of the stan-dard setting; rather, it is to build further sup-port by finding the disconfirming evidence andaddressing it.

Validity Evidence

The technical reports will serve as an importantsource of evidence for procedural validity forthe standard setting process. Much of the valid-ity evidence discussed throughout this chapter,such as the recruiting process and the panelist

demographics, will be summarized in a technicalreport.

Discussion

This chapter describes the 6D Framework fordesigning a standard setting process that encom-passes all phases from writing ALDs to settingand finalizing cut scores. The purpose of thisframework is to help practitioners collect valid-ity evidence throughout a multistage standardsetting process that supports the intended inter-pretation of test scores. Given that ALDs havesuch an important role in the assessment devel-opment and reporting process, they should bedeveloped carefully. Although the measurementfield has been calling for increased attention tothe ALD-development process, change in currentpractice has been slow.

Implementing the 6D Framework

The 6D Framework should be implemented inconjunction with the test-development cycle.Table 16.4 provides an overview of each phasein the framework. It is important to understandthat the Define phase and range ALDs from theDescribe phase remain static once developed.They feed into the test-development cycle but arenot part of the cycle itself. Thus, the content stan-dards, general policy statements, target examineepopulation, and range ALDs should not changethroughout the life of the testing program. Therest of the phases and the target ALDs are partof the test-development cycle. Information fromeach test administration can be used to modify thereporting ALDs so they better describe studentperformance in the achievement levels.

There are two reasons that range ALDs are notupdated based on the final cut scores. First, therange ALDs guide item writing and test devel-opment. As such, they cannot be moving targetsfrom year to year or the underlying test constructwill also change, thereby jeopardizing the pro-cess used to equate test scores from year to year.Second, the range ALDs reflect the entire rangeof performance in the Proficient achievementlevel; thus, the range ALDs should be more

Page 306: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

290 K.L. Egan et al.

Table 16.4 Steps in the 6D framework for designing and developing the AA-AAS and AA-MAS standard settingprocess

6D phase Standard setting tasks Validity evidence

Define Define target examineepopulation;Describe relationship tounmodified grade-levelassessments;Create generic policy statement

Academic KSAs that students are expected to learn in each contentarea• For AA-MAS, these are the grade-level content standards• For AA-AAS, these are extended from the grade-level content

standardsDefinition of relationship to the unmodified grade-level assessmentDefinition of the population of students who take the assessmentIntended uses of ALDsProcess plans, committees, evaluations

Describe Plan and hold ALD-writingworkshop;Create target ALDs;Create range ALDs

Description of ALD-writing workshopOverview of panelist recruitment and demographics fromALD-writing workshopDefinition of target achievement for the students just entering theProficient achievement level and other levelsDefinition of the range of KSAs expected of examinees at theProficient achievement level and other levelsAlignment of items to range ALDs

Design Design cut-scorerecommendation workshop

Description of cut-score recommendation workshopAppropriate cut-score recommendation method for the test design

Deploy Recruit panelists;Hold cut-scorerecommendation workshops;Hold policy-review workshop;Obtain state approval of cutscores

Description of panelist recruitment effortPanelist demographicsComparison of panelist demographics to state student demographicsPanelist evaluations of workshopsWithin-method consistency, intrapanelist consistency, interpanelistconsistencyFinal cut scoresAlignment of reporting ALDs to final cut scores

Deliver Distribute score reports andinterpretation guides

Reporting ALDs delivered to public

Deconstruct Implement validity studies;Write technical reports

Description of workshops from other 6D phasesDetailed results from workshops (e.g., individual panelistrecommendations)Outside review of validity evidence

robust to changes in the cut scores than are tar-get and reporting ALDs, which focus on the verynarrow range of KSAs that are right at the cutscore.

What to Do When Test Development HasAlready Occurred

Because standard setting has long been consid-ered the end of the test-development cycle, manystates may find themselves in the situation wheretest development has occurred prior to the devel-opment of ALDs. In this event, it is still importantto develop the ALDs prior to the standard setting,using the content standards. Ideally, the ALDsshould be developed in a separate workshop

from the cut-score recommendation workshop.This allows for community input and for thestate agency to revise and approve the ALDsprior to the cut-score recommendation workshop.Because these ALDs were not used to guide thetest-development process, it will be important toperiodically review the ALDs to ensure that thetest has not drifted from the content of the ALDs.It may also be necessary to update the ALDs sothat they continue to accurately reflect the KSAsof the students in each achievement level.

ConclusionsThe main purpose of this chapter was tointroduce the 6D Framework, which demon-strates the centrality of ALD development and

Page 307: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

16 Validity Framework 291

refinement to both a multistage standard set-ting process and the test-development process.Three types of ALDs were introduced: range,target, and reporting. States use range ALDsto guide item writing and test development,target ALDs to guide the cut-score recommen-dation workshop, and reporting ALDs to helpstakeholders interpret test scores.

The multistage standard setting processalso considers that various members of thecommunity have a stake in what the ALDsreflect and where the cut scores are set.Defining standard setting as a multistage pro-cess should increase the transparency of theprocess for all stakeholders involved.

Finally, it is hoped that the 6D Frameworkwill aid practitioners in the design and imple-mentation of the multistage standard settingprocess and will provide guidance on thetypes of validity evidence to collect duringeach stage of the process. The 6D Frameworkshould aid practitioners in producing valid anddefensible cut scores and ALDs.

References

Bejar, I. I., Braun, H. I., & Tannenbaum, R. J. (2007).A prospective, progressive, and predictive approachto standard setting. In R. Lissitz (Ed.), Assessing andmodeling cognitive development in school (pp. 31–63).Maple Grove, MN: JAM Press.

Bourque, M. L. (2009). A history of NAEP achieve-ment levels: Issues, implementation, and impact1989–2009. Retrieved February 2, 2010, from http://www.nagb.org/who-we-are/20-anniversary/bourque-achievement-levels-formatted.pdf

Browder, D., Wakeman, S., & Jimenez, B. (n.d.).Creating access to the general curriculum withlinks to grade level content for students withsignificant cognitive disabilities. Retrieved October23, 2008, from http://www.naacpartners.org/products/presentations/national/OSEPleadership/9020.pdf

Cizek, G. J. (1996). Standard-setting guidelines.Educational Measurement: Issues and Practice,15, 13–21.

Cizek, G. J., & Bunch, M. B. (2007). Standard setting:A guide to establishing and evaluating performancestandards on tests. Thousand Oaks, CA: Sage.

Cizek, G. J., Bunch, M. B., & Koons, H. (2004).Setting performance standards: Contemporary meth-ods. Educational Measurement: Issues and Practice,23, 31–50.

Egan, K. L., Ferrara, S., Schneider, M. C., & Barton,K. E. (2009). Writing performance level descrip-tors and setting performance standards for assess-ments of modified achievement standards: The roleof innovation and importance of following conven-tional practice. Peabody Journal of Education, 84,552–577.

Elliott, S. N., Kettler, R. J., Beddow, P. A., Kurz, A.,Compton, E., McGrath, D., et al. (2010). Effects ofusing modified items to test students with persis-tent academic difficulties. Exceptional Children, 76(4),475–495.

Ferrara, S., Perie, M., & Johnson, E. (2008). Matchingthe judgmental task with standard setting panelistexpertise: The item-descriptor (ID) matching pro-cedure. Journal of Applied Testing Technology,9, 1–20. Retrieved February 11, 2009, fromhttp://www.testpublishers.org/Documents/JATT2008_Ferrara%20et%20al.%20IDM.pdf

Ferrara, S., Swaffield, S., & Mueller, L. (2009).Conceptualizing and setting performance standards foralternate assessments. In W. D. Schafer & R. W. Lissitz(Eds.), Alternate assessments based on alternateachievement standards: Policy, practice, and potential(pp. 93–111). Baltimore: Paul Brookes Publishing.

Haertel, E. H. (1999). Validity arguments for high-stakes testing: In search of the evidence. EducationalMeasurement: Issues and Practice, 18, 5–9.

Hambleton, R. K. (2001). Setting performance standardson educational assessments and criteria for evaluatingthe process. In G. J. Cizek (Ed.), Setting perfor-mance standards: Concepts, methods, and perspec-tives (pp. 89–116). Mahwah, NJ: Lawrence ErlbaumAssociates.

Hambleton, R. K., & Pitoniak, M. J. (2006). Settingperformance standards. In R. L. Brennan (Ed.),Educational measurement (4th ed., pp. 433–470).Washington, DC: American Council on Education.

Jaeger, R. M. (1995). Setting performance standardsthrough two-stage judgmental policy capturing.Applied Measurement in Education, 8, 15–40.

Kane, M. (1993). The validity of performance stan-dards. Unpublished manuscript developed for NationalAssessment Governing Board. Retrieved September 9,2008, from ERIC Document Reproduction Services(ED365689).

Kane, M. (2001). So much remains the same: Conceptionand status of validation in setting standards. InG. Cizek (Ed.), Standard setting: Concepts, methods,and perspectives (pp. 53–88). Mahwah, NJ: LawrenceErlbaum Associates.

Kettler, R. J., Rodriguez, M. R., Bolt, D. M., Elliott, S. N.,Beddow, P. A., & Kurz, A. (in press). Modifiedmultiple-choice items for alternate assessments:Reliability, difficulty, and differential boost. AppliedMeasurement in Education.

Kingston, N. M., Kahl, S. R., Sweeney, K. P., & Bay, L.(2001). Setting performance standards using the bodyof work method. In G. J. Cizek (Ed.), Setting per-formance standards: Concepts, methods, and perspec-

Page 308: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

292 K.L. Egan et al.

tives (pp. 219–248). Mahwah, NJ: Lawrence ErlbaumAssociates.

Lewis, D., & Haug, C. (2005). Aligning policyand methodology to achieve consistent across-gradeperformance standards. Applied Measurement inEducation, 18, 11–34.

Mehrens, W. A., & Lehmann, I. J. (1991). Measurementand evaluation in education and psychology. FortWorth, TX; Harcourt Brace College Publisher.

Mitzel, H. C., Lewis, D. M., Patz, R. J., & Green,D. R. (2001). The bookmark procedure: Psychologicalperspectives. In G. J. Cizek (Ed.), Setting perfor-mance standards: Concepts, methods, and perspec-tives (pp. 249–281). Mahwah, NJ: Lawrence ErlbaumAssociates.

Plake, B. S., Huff, K., & Reshetar, R. (2009). Evidence-centered assessment design as a foundation forachievement level descriptor development and forstandard setting. Paper presented at the NationalCouncil of Measurement in Education, San Diego, CA.

Roeber, E. (2002). Setting standards on alternateassessments (Synthesis Report 42). Minneapolis,MN: University of Minnesota, National Center onEducational Outcomes. Retrieved January 20, 2010,from http://education.umn.edu/NCEO/OnlinePubs/Synthesis42.html

Schneider, M. C., Egan, K. L., Kim, D., & Brandstrom,A. (2008, March). Stability of achievement leveldescriptors across time and equating methods. Paperpresented at the annual meeting of the NationalCouncil on Measurement in Education, New York, NY.

Schneider, M. C., Egan, K. L., Siskind, T., Brailsford,A., & Jones, E. (2009, April). Concurrence of tar-get student descriptors and mapped item demands inachievement levels across time. Paper presented at theNational Council of Measurement in Education, SanDiego, CA.

Welch, C., & Dunbar, S. (2009). Developing itemsand assembling test forms for the alternate assess-ments based on modified achievement standards(AA-MAS) In M. Perie (Ed.), Considerations forthe alternate assessment based on modified achieve-ment standards (AA-MAS): Understanding the eli-gible population and applying that knowledge totheir instruction and assessment (pp. 195–234).(A white paper commissioned by the New YorkComprehensive Center in collaboration with theNew York State Education Department.) RetrievedJanuary 31, 2010, from http://www.cehd.umn.edu/nceo/AAMAS/AAMASwhitePaper.pdf

Wothke, W., Cohen, D., Cohen, J., & Zhang, J. (2009).2% AA-MAS working group Spring 2009 pilot studytechnical report. Retrieved January 31, 2010, fromhttp://www.ode.state.oh.us/GD/Templates/Pages/ODE/ODEDetail.aspx?page=3&TopicRelationID=229&ContentID=62021&Content=75362

Young, M. J. (2006). Vertical scales. In S. M. Downing& T. M. Haladyna (Eds.), Handbook of test develop-ment (pp. 469–486). Mahway, NJ; Lawrence ErlbaumAssociates.

Page 309: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

Part IV

Conclusions

Page 310: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy
Page 311: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

17Implementing ModifiedAchievement Tests: Questions,Challenges, Pretending, andPotential Negative Consequences

Christopher J. Lemons, Amanda Kloo,and Naomi Zigmond

As we write, thousands of educators across thecountry are working with Individual EducationProgram (IEP) teams to determine the testing fateof their students enrolled in special education.The teams must decide which students will berequired to participate in the challenging, rigor-ous, and lengthy regular accountability assess-ment (either with or without accommodations)and which students will be given the chance todemonstrate grade-level proficiency on a possiblyless-challenging, shorter, alternate assessmentbased on modified academic achievement stan-dards (AA-MAS). In the previous few years, hun-dreds of people, including staff from state depart-ments of education, educators, test developers,parents, and advisors, have spent numerous hoursdebating the merits of a modified exam and work-ing to develop an appropriate assessment and toselect the correct group of students for whom thistest would be most appropriate and beneficial.

The rationale for developing such a test is toimprove accessibility for students with disabili-ties who may need specific accommodations andmodifications to the regular assessment to allowthem to demonstrate their knowledge of grade-level academic content. As Elliott, Beddow, Kurz,and Kettler (Chapter 1, this volume) discuss,enhancing the accessibility of assessment sys-tems is critical to establishing test validity and

C.J. Lemons (�)University of Pittsburgh, Pittsburgh, PA 15260, USAe-mail: [email protected]

ensuring equitable access to educational oppor-tunities provided based upon assessment results.It is in this spirit, at least according to some,that the AA-MAS was initially proposed – it wasargued that the accountability system at the timedid not permit a small proportion of studentswith disabilities the opportunity to demonstratetheir proficiency in grade-level content. Thus, theUS Department of Education (USDE) called forthe development of another test that would levelthe playing field for this group of students. Itwas intended that this test would improve theaccountability system, enhance the instructionprovided to these students, and reduce the bur-den to many school districts of students in specialeducation who failed to meet the high adequateyearly progress (AYP) standards of No Child LeftBehind (NCLB).

Unfortunately, it appears that creating a differ-ent test in the spirit of enhancing equal accessmay be leading us down a road of unintendedconsequences. Perhaps, just as in Brown v. Boardof Education, separate tests are as inherentlyunequal as separate schools. The purpose of thischapter is to consider whether the potential ben-efits of creating a different test – the AA-MAS– outweigh the possible negative consequences.While we are highly supportive of both the valueof enhancing access to the general education cur-riculum and assessment for students in specialeducation and the work of colleagues aimed atachieving this goal, we feel that a further exam-ination of the benefits and consequences of the

295S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4_17, © Springer Science+Business Media, LLC 2011

Page 312: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

296 C.J. Lemons et al.

AA-MAS may be helpful to state leaders and testdevelopers as they move forward in its devel-opment. More specifically, we address questionsthat are currently being raised related to the AA-MAS: Will more students be able to demonstrate“proficiency” on a modified exam? Will fewerschools be penalized on AYP now that an addi-tional 2% of students in special education may becounted as proficient? Will instruction change forstudents who are now going to be held account-able for something other than full grade-levelacademic standards? Will all of the time andmoney spent to create this modified test actuallyhave been worth it?

A Brief History of NCLB,Accountability, and the AA-MAS

Since the beginning of the millennium, thehigh-stakes accountability requirements of theNo Child Left Behind Act (NCLB, 2001) haveshaped the face of public education in the UnitedStates. The law set ambitious goals for stu-dent achievement – all students were expectedto be proficient in reading and math (as mea-sured by performance on end-of-year grade-level-standards-aligned state assessments) by the year2014. And, all meant all. Such rigorous stan-dards for success communicated a “no excuses”model for education in which every school andevery teacher was expected to do what needed tobe done to ensure that every student, includingthose with disabilities, mastered grade-level con-tent. NCLB directives gave teeth to the provisionsfor full participation of students with disabilitiesin district and state assessments of the Individualswith Disabilities Education Act (IDEA, 1997)and further strengthened the growing nationalcommitment to have students with disabilitiesfully included in general education instruction.NCLB, IDEA 97, and IDEA 2004 made it clearthat students with disabilities were to be heldresponsible for the same academic content andperformance standards as their grade-level peers.

With only 14 years to achieve the ambi-tious goal of 100% student proficiency, theacademic progress of students with disabilitiesquickly became the target of public and political

scrutiny. On the one hand, this attention pro-moted improved educational opportunities forschool-aged populations of a group of studentshistorically ignored and undervalued by the edu-cational mainstream. On the other hand, it alsoplaced incredible pressure on special educators,students, and their families to achieve what manyconsidered impossible. Many educators arguedthat expecting all students to master age-basedgrade-level academic standards was unrealistic.After all, the students who receive special edu-cation and related services do so because theirdisability negatively impacts typical progress andachievement in school (IDEA 2004, § 300.8 (a)).

The AA-AAS

Recognizing the reality that achieving grade-levelproficiency was unrealistic and inappropriate fora very small proportion of students (namely,those with significant cognitive disabilities) andyet striving to uphold the tenets of IDEA 1997,NCLB initially permitted states to develop analternate assessment based on alternate academicachievement standards (AA-AAS). Standards forproficiency were aligned with academic contentstandards to promote access to the general educa-tion curriculum, but were “redefined” to reflectthe academic skills and knowledge that weredetermined to be of the highest priority for thetarget group of students with severe disabilities(See NCLB 2001, §200.1 (d)). Again, in thespirit of full inclusion, the law assumed that thevast majority of students with disabilities shouldtake the regular assessment alongside their non-disabled peers. So, although no cap was placed onthe number of students with significant cognitivedisabilities districts could assign to the AA-AAS,only 1% of the district’s overall proficient popula-tion that counted toward adequate yearly progresscould come from students scoring proficient onthe AA-AAS.

The AA-MAS

Despite the flexibility afforded by the AA-AAS,some leaders, educators, parents, and advocates

Page 313: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

17 Implementing Modified Achievement Tests 297

argued that there were many students with dis-abilities who were “left behind” by the man-dated accountability system. These students didnot have significant cognitive disabilities andwere therefore ineligible to take the AA-AAS.However, they did have disabilities that nega-tively impacted grade-level achievement so sub-stantially that mastery of grade-level content,particularly within one academic year, was seenas highly unlikely if not impossible. These persis-tently low-performing students with disabilitieswere commonly referred to as “the gap kids”because neither assessment option was an appro-priate measure of their knowledge and skills. InApril 2007, the US Department of Educationresponded to these concerns by permitting statesto develop another test – an alternate assessmentbased on modified academic achievement stan-dards. Legislators intended this additional flexi-bility to apply to a second small group of studentswith disabilities. This small group were those stu-dents “who can make significant progress, butmay not reach grade-level achievement standardswithin the same time frame as other students,even after receiving the best-designed instruc-tional interventions from highly trained teachers”(USDE, 2007, Part 200). The test was to coverthe same breadth and depth of grade-level contentstandards as the regular assessment yet was to beless rigorous than the regular assessment (USDE,2007). Similar to the regulations applied to the1% test, schools could only count up to 2% of thetested population as proficient toward AYP basedon performance on the AA-MAS.1

Federal regulations do not explicitly specifywhich students in special education shouldparticipate in the AA-MAS. Instead, the federalguidance stipulates only that students whoparticipate in the AA-MAS (a) have a disability

1 “Under specific limited circumstances, States and LEAsmay exceed the 2% cap. The 2% cap may be exceededonly if a state or LEA is below the 1% cap for stu-dents with significant cognitive disabilities who takethe AA-AAS. For example, if the number of profi-cient and advanced scores on the AA-AAS is 0.8%,the State or LEA could include 2.2% of the proficientand advanced scores on AA-MAS in calculating AYP”(Modified Academic Standards, Non-regulatory guidance,2007, p. 36).

that precludes grade-level achievement; (b)have IEP goals aligned to grade-level academiccontent; (c) experience persistent academicfailure despite high-quality instruction; and (d)are unlikely to achieve grade-level proficiencywithin a year’s time even if significant growthoccurs (USDE, 2007). Individual states arecharged with the daunting tasks of using theseliberal criteria to zero in on the appropriatetarget population. Furthermore, states are todevelop and disseminate explicit guidelines toaid IEP teams in assigning “the right” studentsto this new test. To complicate matters further,the USDE provided little specificity as to theachievement span states should consider whenconceptualizing the AA-MAS target population.“The students who participate in assessmentsunder this option are not limited to those whoare close to achieving at grade level or who arerelatively far from achieving at grade level” (TitleI—Improving the Academic Achievement of theDisadvantaged, 2007, p. 17749). This range ofpossibilities appears to apply to all non-proficientstudents with disabilities and to contradict theaforementioned criteria detailed in the Non-Regulatory Guidance requiring persistently lowacademic performance.

Pennsylvania’s GSEG Project

The apparent vagueness of the federal guidanceled many states to pull together a group ofadvisors to assist in determining whether theirstate would develop and implement the AA-MAS. And, if so, to whom it would be targeted,what it would look like, and what it wouldmeasure. In 2007, the Pennsylvania’s Bureauof Special Education was awarded a GeneralSupervision Enhancement Grant (GSEG) by theUS Department of Education’s Office of SpecialEducation Programs to engage in research andinquiry about issues related to the developmentand implementation of the AA-MAS in the state.The primary goals of the grant were to (a) iden-tify the target population for the test, and (b) toinvestigate those students’ opportunity to learngrade-level content so as to inform and improve

Page 314: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

298 C.J. Lemons et al.

the state’s framework for standards-alignedinstruction and assessment. The GSEG team,which included the chapter authors, was onegroup charged with providing guidance to stateleaders as they moved forward with the devel-opment and implementation of the AA-MAS inPennsylvania. The team conducted various activ-ities to gather information related to defining thetarget population. These activities and the guid-ance developed based on the findings from theactivities are discussed next.

GSEG Activities

Survey

The first information-gathering activity con-ducted by the GSEG team was a small-scalesurvey of special education teachers in schoolsidentified as key research sites for the grant (seeTable 17.1). Survey results are summarized inTables 17.1, 17.2, 17.3, and 17.4. The survey’spurpose was to help describe the opportunitiesstudents with IEPs in grades 5, 8, and 11 havehad to learn the eligible grade-level content of theregular grade-level test (the Pennsylvania Systemof School Assessment, PSSA) in special educa-tion or general education settings, or both. Survey

Table 17.1 Pennsylvania GSEG survey: summaryof response data

Survey response data Count

Number of school districts represented 6

Number of schools represented 20

Number of surveys distributed 110

Number of teacher respondents 109

Fifth grade 52Eighth grade 32Eleventh grade 25

questions were divided into four topic areas:(a) student information for which teachers wereasked to provide academic achievement informa-tion about a target student or students whom theyconsidered to be “persistently low performing”and a candidate suited for the AA-MAS basedon the criteria detailed in federal guidance (seeTable 17.2); (b) teacher information for whichteachers were asked to provide data about theirprofessional experience and professional devel-opment related to the AA-MAS and standards-aligned instruction for students with IEPs (seeTable 17.3); (c) opportunity to learn data forwhich teachers were asked to quantify the targetstudents’ opportunity to learn the eligible contentof the state reading test (see Table 17.4); and (d)IEP goals and instructional access that spoke to

Table 17.2 Pennsylvania GSEG survey: summary of targeted student data

Topic area: Targeted student information Count %

Disability categorySpecific learning disability 78 72Mental retardation 11 10Emotional/behavioral disorder 8 7Autism 3 3NOS (Not otherwise specified) 9 8

Grade-level academic content standards-aligned IEPYes 69 63No 15 14Unsure 20 18

Qualitative analysis of teacher reports of student competency with grade-level reading skillsNumber of narrative descriptions detailing student mastery of grade-level content 9 8Number of narrative descriptions detailing student progress/growth on grade-level content 107 98Number of narrative descriptions detailing deficient student progress with grade-level content 109 100

Page 315: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

17 Implementing Modified Achievement Tests 299

Table 17.3 Pennsylvania GSEG survey: summary of teacher data

Topic area: Teacher information Count %

Total teaching experienceTeaching 1–5 years 33 30Teaching 6–10 years 22 20Teaching 11–15 years 11 10Teaching 16–20 years 16 15Teaching > 20 years 27 25

Experience teaching reading/language artsTeaching 1–5 years 48 45Teaching 6–10 years 20 19Teaching 11–15 years 6 6Teaching 16–20 years 15 14Teaching > 20 years 17 16

Experience teaching students with disabilitiesTeaching 1–5 years 34 31Teaching 6–10 years 23 21Teaching 11–15 years 17 16Teaching 16–20 years 12 11Teaching > 20 years 23 21

Advanced/additional degreesEducation beyond bachelor’s degree 93 85Provisional teaching certificate 32 29Permanent teaching certificate 72 66Dually certified in elementary education 51 47Dually certified in secondary education-English/LA 22 20

Professional development/training in Standards-aligned Instruction/IEPsfor students with disabilities

Attended training monthly 0 0Attended training several times per year 61 56Attended training once per year 16 15Never attended training 23 21Unsure if they have attended training 9 8Reported that training impacted instructional practice 57 52

Overall preparedness to align instruction/IEPs with grade-levelacademic content standards

Extremely prepared 44 40Somewhat prepared 49 45Minimally prepared 5 5Unprepared 11 10Unsure 0 0

Eleventh-grade teacher reported preparedness to align instruction/IEPs withgrade-level academic content standards

Extremely prepared 4 16Somewhat prepared 6 23Minimally prepared 5 20Unprepared 10 40Unsure 0 0

Page 316: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

300 C.J. Lemons et al.

Table 17.4 Pennsylvania GSEG survey: snapshot of targeted eleventh-grade students

Topic area: Targeted student information Count %

Primary instructional setting for reading instruction

General education classroom 23 94

Special education setting (resource room, learning support,pull-out support, etc.)

2 6

Opportunity to learn Grade-level content

Assessment anchor descriptor Not Taught Exposure Mastery Unsure

Count % Count % Count % Count %

Understands fiction appropriate to grade level:

R11.A.1.1 Identify and apply the meaning of vocabulary 17 68 5 20 1 4 1 4

R11.A.1.2 Identify and apply word recognition skills 18 72 5 20 2 8 – –

R11.A.1.3 Make inferences, draw conclusions, and makegeneralizations based on text

20 80 3 12 – – 2 8

R11.A.1.4 Identify and explain main ideas and relevantdetails

18 72 5 20 1 4 – –

R11.A.1.5 Summarize a fictional text as a whole 18 72 5 20 2 8 – –

R11.A.1.6 Identify, describe, and analyze genre of text 18 72 5 20 2 8 – –

Understands nonfiction appropriate to grade level:

R11.A.2.1 Identify and apply the meaning of vocabulary innonfiction

17 68 5 20 – – 2 8

R11.A.2.2 Identify and apply word recognition skills 18 72 5 20 2 8 – –

R11.A.2.3 Make inferences, draw conclusions, and makegeneralizations based on text

20 80 3 12 – – 2 8

R11.A.2.4 Identify and explain main ideas and relevantdetails

18 72 5 20 1 4 – –

R11.A.2.5 Summarize a nonfictional text as a whole 18 72 5 20 1 4 – –

R11.A.2.6 Identify, describe, and analyze genre of text 16 64 5 20 2 8 2 8

Understands components between and within texts

R11.B.1.1 Interpret, compare, describe, analyze, andevaluate components of fiction and literary nonfiction

22 88 3 12 – – – –

R11.B.1.2 Make connections between texts 19 76 5 20 1 4 – –

R11.B.2.1 Identify, interpret, describe, and analyze figurativelanguage and literary structures in fiction and nonfiction

21 84 2 8 1 4 1 4

R11.B.2.2 Identify, interpret, describe, and analyze the pointof view of the narrator in fictional and nonfictional text

19 76 4 16 1 4 1 4

R11.B.3.1 Interpret, describe, and analyze the characteristicsand uses of facts and opinions in nonfictional text

18 72 5 20 2 8 – –

R11.B.3.2 Distinguish between essential and nonessentialinformation within or between texts

18 72 5 20 2 8 – –

R11.B.3.3 Identify, compare, explain, interpret, describe,and analyze how text organization clarifies meaning ofnonfictional text

20 80 2 8 – – 3 12

Not Taught = respondents reported that the grade-level skill(s) addressed by the assessment anchor was/were “NotTaught” to the target student; Exposure = respondents reported that the student was “Exposed To” the grade-levelskill(s) addressed by the assessment anchor; Mastery = respondents reported that the grade-level skill(s) addressedby the assessment anchor was/were taught “To Mastery”; Unsure = respondents reported they were “Unsure” as towhether or not any level of instruction had occurred for the grade-level skills addressed by the assessment anchor

Page 317: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

17 Implementing Modified Achievement Tests 301

the students’ level of academic functioning, thealignment of their IEP goals, and nature of theirindividualized instruction overall.

Respondents overwhelmingly (72%) identi-fied their lowest-performing students with spe-cific learning disabilities as the population ofstudents in need of an alternate assessment thatbetter captures what those students know and cando (Table 17.2). Overall, survey findings raiseserious concerns about target students’ oppor-tunity to learn grade-level academic content.When asked to briefly describe the reading skillsthat targeted students had “mastered,” only 8%reported that their lowest-performing studentswith IEPs had mastered any grade-level skillswhereas a majority (98%) reported that studentswere instead “making progress/demonstratinggrowth” with the support of significantly adaptedand modified instructional techniques and mate-rials (Table 17.2). Analysis of teachers’ narrativedescriptions of students’ academic strengths andweaknesses revealed that targeted students weremaking the most progress on grade-level skillsrelated to Language Arts goals such as spelling,writing mechanics, and grammar.

Conversely, grade-level reading-related skills,specifically comprehension and vocabularytasks, were highlighted as most troublesomefor students. In fact, 64% of the present levelof academic performance details on the targetstudents’ IEPs summarized reading assess-ment data, evidencing that fifth-grade studentswere achieving approximately two grade levelsbelow peers. In comparison 60 and 72% of theIEPs for targeted eighth- and eleventh-gradestudents, respectively, suggested they were3–4 years behind. Additionally, while 63% ofteachers overall reported that students’ read-ing IEP goals were aligned with grade-levelstandards (Table 17.2), analysis of actual IEPgoals revealed that over 90% of those goalsfocused on improving instructional-level read-ing fluency and 58% of those goals focusedon applying grade-level comprehension skillsto instructional-level text. In contrast, 14%indicated that target students’ IEP goals werenot aligned with grade-level standards givenhow far below grade level the students wereperforming (Table 17.2). Instead, these IEPs

favored use of significantly modified and adaptedinstructional materials to promote growth andprogress at the students’ individual instructionallevel as measured by progress-monitoring andcurriculum-based measurement. Interestingly,18% of survey participants reported that theywere “unsure” as to whether or not students’IEPs were standards-based (Table 17.2). Theprimary reason noted for this confusion wasa lack of professional development/training instandards-based IEP development.

Positively, the majority of teacher respondentsreported feeling “extremely” (40%) or “some-what” (45%) prepared to develop standards-aligned IEPs and deliver standards-alignedinstruction (Table 17.3). However, a closerexamination of response trends revealed thatelementary-level teachers reported feeling moreknowledgeable about and better equipped toteach grade-level content to students than dideighth- or eleventh-grade teachers. Specifically,only 16% of the eleventh-grade respondentsrated themselves as feeling “extremely” pre-pared to align instruction to grade-level aca-demic content. Another 23% rated themselvesas feeling only “somewhat” prepared, while theremaining 60% reported feeling “minimally” or“unprepared” to align their curricula and instruc-tion with assessment anchors and content stan-dards (Table 17.3). (Given the large number andbroad scope of Pennsylvania’s academic con-tent standards, the state developed a subset of“Assessment Anchors,” which focused on thegrade-level standards assessed at each grade levelon the state test to better equip teachers toprepare students for the test.) Moreover, whenasked to review assessment anchors and eligi-ble reading content for the eleventh-grade PSSAand indicate the level of instruction (i.e., “Nottaught,” “Exposure,” “Mastery,” or “Unsure”)they provided to targeted students in special edu-cation, no more than 3 teachers out of 25 ratedany grade-level content as “Taught to Mastery”(Table 17.4). Five teachers rated the majorityof eleventh-grade content as instructed at thelevel of “Exposure” (Table 17.4). Teachers’ nar-rative notes indicated that students were intro-duced to the big ideas related to the eleventh-grade content standards through significantly

Page 318: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

302 C.J. Lemons et al.

modified or adapted instructional materials andassignments.

IEP data reinforced these assertions indicat-ing that below-level text, text readers, and bookson tape were required as part of the speciallydesigned instruction provided to students to pro-mote curricular access. The remaining eleventh-grade surveys (ranging from 16 to 21 respon-dents for any given assessment anchor) reportedthat over the bulk of the eligible eleventh-gradecontent was “Not Taught” to the target stu-dents (Table 17.4). Teacher comments suggestedinstead that those students were working on mas-tering reading and language arts goals directlyrelated to transitional planning for post-schoollife. Respondents’ narrative notes explained thathad assessment anchor language not included“appropriate to grade level text,” they could haverated all content as being instructed at the levelof “Exposure” or “Mastery.” All eleventh-graderespondents expressed some level of concern thatthe state assessment anchors and eligible test con-tent related to comprehension and reading skillsrequire that those skills be applied to “grade leveltext.” (In Pennsylvania, as in many other states,“grade level text” equates to permissioned pas-sages selected from a variety of eleventh-gradereading material, considered to be “authentic”literature, such as excerpts from novels, clas-sic literature, and high school texts/anthologies.)Respondents reported that unless below grade-level reading passages were used, it would behighly unlikely for the students they saw asthe target population to demonstrate proficiency,regardless of what types of accommodations wereapplied to the test.

Ninety-four percent of the eleventh-gradeteachers who responded to the surveys indicatedthe general education setting was the primaryplace of instruction for all reading instructionprovided to the target students (Table 17.4). Threeout of four of these reported co-teaching withthe general education Reading/Language Artsteacher; the remaining teachers indicated thatthey consulted with the general education teacherto plan for differentiated instruction to sup-port the target students’ learning during readingtime without special education teacher support.

This is somewhat disconcerting given the limitedresearch base supporting the positive effects ofthe co-teaching/consultation model and differen-tiated instruction on the academic learning of stu-dents with disabilities (especially those with milddisabilities) who are fully included in the gen-eral education classroom (Fuchs & Fuchs, 1998;O’Sullivan, Ysseldyke, Christenson, & Thurlow,1990; Zigmond & Matta, 2004).

Overall, survey results demonstrated thatrespondents clearly felt that the AA-MAS wouldbe most appropriate for a small group of studentswho had previously performed at the lowest per-formance level (below basic) on the state assess-ment. However, the respondents also expressedgreat concern regarding how to make the testchallenging and aligned to grade-level standards,and at the same time “doable” for a group ofstudents who are significantly behind a major-ity of their grade-level peers. Further, the limitedconnection between grade-level standards andinstruction provided to students the respondentsidentified as potential AA-MAS takers raisesquestions regarding the appropriate content forthis assessment.

Focus Group

To extend the information provided by the survey,a series of stakeholder focus groups were con-vened statewide. In all, 110 participants (includ-ing parents of students with disabilities, generaland special educators, state-level personnel, con-tent area teachers, administrators, school psy-chologists, curriculum specialists, teacher train-ers, and related service personnel) reviewed thefederal guidance about the AA-MAS to discussissues related to identifying the target popula-tion, developing modified academic achievementstandards, and anticipating the implications andpotential impact of the AA-MAS on educationaland assessment experiences of students with dis-abilities. Three major themes arose from thesediscussions. First, a majority of focus group par-ticipants agreed that the students most appropri-ate for the test are those who are “far below”grade level. The rationale for this choice centered

Page 319: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

17 Implementing Modified Achievement Tests 303

on many of the points already raised in this chap-ter and in other publications on the 2% option.Most salient, however, was the belief that tar-geting only the lowest achievers would accom-plish two goals: (a) it would guard against over-identification of students for this “special educa-tion test” and (b) it would prompt the state tofocus its efforts on improving the regular assess-ment in terms of universal design and effectiveaccommodations so that more students could beproficient on the regular assessment. However,concern that a “grade-level” test would still betoo difficult for very low-performing students wasvery high, especially if reading passages on theAA-MAS were identical to (or equal in readinglevel to) those on the regular test. Nonetheless,some argued that any efforts to make the testmore accessible and the testing experience moreamenable for students so used to failure would bea welcome boost of self-confidence and decreasein stress level when taking the test – all real-istic benefits given the unrealistic likelihood ofproficiency.

Next, focus group participants’ concernsrevolved around the notion of “pretending” thatsome students are proficient fifth graders (or sixthgraders, or seventh graders, etc.) when they are infact significantly struggling with academic con-tent 2 or 3 years below grade level. Participantshad a difficult time resolving the fact that whilethe AA-MAS may result in improved testingexperiences for severely struggling students withdisabilities, it may not result in improved instruc-tional experiences for those students. These sen-timents are echoed in the literature base (seeElliott, Kettler, & Roach, 2008; Marion, 2007).

Finally, discussion about developing modifiedacademic achievement standards revealed thatparticipants were greatly confused by changesin the federal requirements related to the rigorof the AA-MAS. All participants enthusiasticallysupported the idea that the modified achieve-ment standards would reflect reduced breadth ordepth of grade-level content. Paring down andprioritizing the academic skills addressed on theassessment paralleled what they viewed as cru-cial everyday instructional practices to promotestudent learning (Carnine, 1994; Kame’enui,

Carnine, Dixon, Simmons, Coyne, 2002). In fact,it is what most of the special educators saidthat they were trained to do. Therefore, therevisions to federal guidance deleting the ref-erences “reduced breadth or depth” and insert-ing the phrase “modified academic achievementstandards must be challenging for eligible stu-dents, but may be less difficult than grade-levelacademic achievement standards” (§ 200.1 (e)(1)(ii)) were seen as being quite perplexing.Confusion over the terminology “less difficult”abounded. Frustrations with the apparent discon-nect between the assessment requirements andreal-world instructional practices were clear.

Analysis of PSSA Performance Trendsfor Students in Special Education

A third activity that the GSEG group conductedto assist in making recommendations about theAA-MAS was a trend analysis. Three consec-utive years (2006, 2007, 2008) of end-of-yearPSSA performance data were gathered for fourcohorts of students in special education (enrolledin grades 3, 4, 5, and 6 during the 2006–2007school year). Data were gathered on students whohad an IEP at any point during the 3-year windowand who had at least 2 years of reported PSSAscores.

The analysis was a simple examination ofmovement between proficiency levels acrossyears. Results, displayed in Table 17.5, are pre-sented only for reading; performance for mathfollowed a similar pattern. First, movement out ofthe below-basic classification happens rarely; 71–88% of students who perform in the below-basicrange repeat this level of performance the fol-lowing year. The percentage of students who domove from below-basic to proficient or advancedin 1 year is fairly small (with percentages around4% excluding the high of 10% for the sixth-gradecohort’s 2007–2008 performance). In compari-son, an average of 29% of students who score inthe basic range moved up to proficiency the fol-lowing year. Unfortunately, an average of 33% ofstudents moved down to the below basic range inthe subsequent year. These data provided some

Page 320: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

304 C.J. Lemons et al.

Table 17.5 Analysis of PSSA reading performance trends for children in special education

Grade 3 cohort

2007 Performance

2006–2007 Below basic Basic Proficient Advanced

2006 Performance Below basic 4990 (79%) 1029 (16%) 310 (5%) 16 (0.3%)

Basic 700 (28%) 931 (38%) 775 (31%) 61 (3%)

Proficient 218 (7%) 628 (21%) 1652 (56%) 454 (15%)

Advanced 28 (2%) 59 (5%) 470 (39%) 656 (54%)

2008 Performance

2007–2008 Below basic Basic Proficient Advanced

2007 Performance Below basic 4753 (88%) 554 (10%) 104 (2%) 8 (1%)

Basic 1127 (49%) 821 (36%) 339 (15%) 21 (1%)

Proficient 359 (14%) 846 (34%) 1129 (45%) 163 (7%)

Advanced 28 (4%) 59 (7%) 383 (48%) 331 (41%)

Grade 4 cohort

2007 Performance

2006–2007 Below basic Basic Proficient Advanced

2006 Performance Below basic 6420 (88%) 679 (9%) 165 (2%) 5 (0.1%)

Basic 1843 (50%) 1340 (36%) 496 (13%) 12 (0.3%)

Proficient 567 (16%) 1254 (35%) 1557 (44%) 194 (5%)

Advanced 32 (3%) 73 (6%) 647 (54%) 445 (37%)

2008 Performance

2007–2008 Below basic Basic Proficient Advanced

2007 Performance Below basic 5903 (74%) 1658 (21%) 428 (5%) 16 (0.2%)

Basic 633 (22%) 1141 (40%) 999 (35%) 68 (2%)

Proficient 144 (7%) 426 (20%) 1149 (53%) 441 (20%)

Advanced 7 (2%) 9 (3%) 121 (33%) 226 (62%)

Grade 5 cohort

2007 Performance

2006–2007 Below basic Basic Proficient Advanced

2006 Performance Below basic 7768 (77%) 1989 (20%) 347 (3%) 18 (0.2%)

Basic 849 (27%) 1390 (44%) 843 (27%) 64 (2%)

Proficient 219 (8%) 661 (24%) 1372 (49%) 528 (19%)

Advanced 13 (2%) 31 (6%) 155 (29%) 339 (63%)

2008 Performance

2007–2008 Below basic Basic Proficient Advanced

2007 Performance Below basic 5883 (74%) 1712 (22%) 346 (4%) 9 (0.1%)

Basic 931 (26%) 1552 (43%) 1081 (30%) 59 (2%)

Proficient 130 (6%) 446 (20%) 1260 (57%) 375 (17%)

Advanced 3 (0.5%) 24 (4%) 181 (28%) 445 (68%)

Page 321: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

17 Implementing Modified Achievement Tests 305

Table 17.5 (continued)

Grade 6 cohort

2007 Performance

2006–2007 Below basic Basic Proficient Advanced

2006 Performance Below basic 7202 (81%) 1423 (16%) 208 (2%) 18 (0.2%)

Basic 1547 (36%) 1858 (43%) 880 (20%) 60 (1%)

Proficient 261 (9%) 876 (29%) 1481 (49%) 411 (14%)

Advanced 15 (2%) 34 (4%) 306 (34%) 545 (61%)

2008 Performance

2007–2008 Below basic Basic Proficient Advanced

2007 Performance Below basic 6761 (71%) 1780 (19%) 787 (8%) 201 (2%)

Basic 1016 (23%) 1375 (31%) 1739 (39%) 381 (8%)

Proficient 115 (4%) 324 (11%) 1290 (45%) 1159 (41%)

Advanced 12 (1%) 10 (1%) 123 (13%) 770 (84%)

evidence that if any students in special educationare likely to move from one of the failing cate-gories of performance into a passing one, it wouldmost likely be students who scored in the basiclevel – a possible reason to exclude such a studentfrom taking the AA-MAS.

Recommendations from the GSEG

Based on the results of the survey, focus groups,examination of performance trends, and ongo-ing discussions between the GSEG team mem-bers, the group developed a set of recommen-dations for state leaders as they moved for-ward with development of Pennsylvania’s modi-fied exam. The GSEG team provided these rec-ommendations to the Pennsylvania Bureaus ofSpecial Education (BSE) and Accountability andAssessment (BAA). A summary of the recom-mendations is provided next.

Students who have previously performed inthe lowest performance category on the regu-lar assessment should be eligible to take theAA-MAS. According to federal guidance, themodified assessment was intended for “a groupof students with disabilities who can makesignificant progress, but may not reach grade-level achievement standards within the same time

frame as other students, even after receivingthe best-designed instructional interventions fromhighly trained teachers” (Title I—Improving theAcademic Achievement of the Disadvantaged,2005). The GSEG team interpreted this to meanthat the AA-MAS is intended for a group of stu-dents who, despite the best efforts of their teach-ers, cannot reasonably be expected to achieveon grade-level performance within one academicyear. The team recommended that students per-forming in the “low below basic range” (thelowest possible performance category) on thestate assessment are the students most likelyto fall within this category. It is unlikely thatany of them, even those who make significantacademic progress in 1 year, will achieve profi-ciency. Selecting this lowest-performing group ofstudents with disabilities, not those who are per-forming slightly below expectations (i.e., “almostmade proficient”), is consistent with federal guid-ance and within state mandates to hold studentswith disabilities to the highest possible standardsof performance.

Zigmond and Kloo (2009) debated the conse-quences associated with targeting “almost profi-cient” students versus “nowhere near proficientstudents” for the AA-MAS. The dangers of over-assignment to the test were central to our con-cerns for permitting students close to achieving

Page 322: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

306 C.J. Lemons et al.

at grade level to participate in the AA-MAS.This group of students logically fits the USDE’sdesignation of students likely to make signifi-cant progress, just not enough progress within ayear’s time. However, if those students participatein the AA-MAS, so will their lower-performingpeers because it would be impossible to justifyoffering a less difficult, more accessible test tostudents close to mastering grade level but amore difficult, less accessible test to their severelystruggling classmates. As such, the intention thatonly a small percentage of students (i.e., about2%) would take part in the modified assessmentwould be missed. Further, depending on stan-dard setting, it is possible that more students thanintended could achieve proficiency on the AA-MAS – and, these “extra” students would notcount toward AYP.

Allowing all (or the majority of) non-proficientstudents with disabilities to take the AA-MAScreates a special education track that is funda-mentally misaligned with the intent of IDEA andefforts for inclusive practices. The GSEG teamsuggested that if all or most students with disabil-ities who perform below proficient are allowed totake the AA-MAS, the new test would becomethe “special education test” and all students inspecial education would be held to modified or“easier” achievement standards. This could resultin teachers prioritizing some skills over others,spending more/less time on certain skills andconcepts than others, and differentiating betweenwhich skills students with IEPs should mas-ter versus those non-disabled peers should mas-ter. This would set back decades of progress,and the ideals of IDEA and NCLB for promot-ing high expectations for students with disabil-ities in the spirit of educational accountability,standards-based reform, and full inclusion wouldbe diminished (Samuels, 2007; Zigmond &Kloo, 2009).

In contrast, targeting the population of stu-dents who are very far from meeting proficiencystandards captures a smaller number of students.Unfortunately, such consistently depressed aca-demic performance suggests that these studentsachieve significantly below grade level. Muchresearch, and our own survey data from teach-ers, show that this achievement gap only con-

tinues to widen as students progress throughschool, especially for students with learningdisabilities who constitute the largest propor-tion of low-performing students with disabilities(Wagner, Newman, Cameto, Levine & Garza,2006; Schiller, Sanford, & Blackorby, 2008).Although it is difficult to rationalize that theyare the students the USDE expects to eventually“catch up” to their peers, it is this populationof students for whom a less rigorous assess-ment seems the most ethically and educationallyappropriate given their significant learning strug-gles (Zigmond & Kloo, 2009). That said, thesestudents (particularly in the upper grades) areso far behind academically that if the test mea-sures only grade-level content, it may be absurdto deem them “proficient” based on what theycan actually do with the grade-level material –a point highlighted by our teacher interviews.Thus, if this is the target population, the strictadherence to grade-level content may need to bereconsidered.

It will be easier to rationalize/justify (i.e.,explain to parent groups, advocates, and schoolpersonnel) the use of a “different” test forthe lowest-performing group of students withIEPs than for students currently scoring inthe basic range. GSEG team members sug-gested that students scoring in the lowest per-formance range are those whose disability trulyprecludes grade-level achievement of proficiencyand whose persistent academic difficulties makethem unlikely to achieve grade-level proficiencywithin 1 year even with intense, research-based,remedial instruction. These are the students whoperform hardly above the chance level on themultiple-choice sections of the reading, math,and science tests. Some may argue that thispopulation of students needs a modified test sothey can demonstrate what they know and cando. However, many of these students are likelyalready demonstrating what they can do withgrade-level content – which is not much. Thus,we argue that while the goal of enhancing theaccessibility of achievement tests is important,it may also be necessary to assess other, likelybelow-grade level, skills if we are to create atest on which this group of students can honestlydemonstrate “proficiency.”

Page 323: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

17 Implementing Modified Achievement Tests 307

It seems much more difficult to justify redefin-ing what it means to be proficient and providing a“different” or “easier” test to a group of studentswho are almost proficient. Students in the basicrange are “almost” proficient. The performanceof this group of students might be improved bymore intensive instruction and by more attentionpaid to making the standard PSSA more acces-sible through application of more principles andbetter principles of universal designs for learning(UDL). Focusing efforts on improving the regularPSSA would result in a better measure of studentability for students scoring in the high basic rangethan the AA-MAS.

Targeting students scoring in the low below-basic group positions districts to increase thepercentage of students counting toward AYP.Because at least some of the students with IEPs,mostly those scoring in the basic range, willachieve proficiency on the state exam duringthe next school year, the GSEG team recom-mended that placing students into the modifiedassessment, who might achieve proficiency onthe standard assessment if they and their teachersreally did expend maximum effort, could actuallyreduce the proficiency count for students withIEPs, not increase it. If you assign to the AA-MAS both students very unlikely to be proficientnext year (i.e., below-basic students) and studentswho have some possibility of being proficientnext year (i.e., basic students), the total percent ofproficient students based on the AA-MAS wouldbe equal to 2% of the tested population. However,if the students who had a possibility of being pro-ficient were assigned to the regular assessmentand they did achieve proficiency, schools couldcount a number of AA-MAS scores equivalent to2% of the tested population plus the students whoachieved proficiency on the regular test.

In other words, if the modified assessmentis targeted toward the lowest-performing groupof students with disabilities who take the regu-lar assessment, schools will be able to count asproficient a number of these students equivalentto 2% of the tested population and any higher-performing students who achieve proficiency onthe regular assessment. If, instead, the highest-performing group of students with disabilitiesis allowed to take the AA-MAS, the maximum

number of students who will be counted as “profi-cient” will be capped at 2%. This number reflectsthe reality that (a) IEP teams are unlikely to allowhigher-performing students to take the AA-MAS,but require the regular test for lower-performingpeers, and (b) even if lower-performing studentsare required to take the regular test, their like-lihood of scoring as proficient is quite low. Putdifferently, the higher the performance level ofstudents who are allowed to take the AA-MAS,the lower the total number of students with dis-abilities who will count as “proficient.”

An underlying assumption of this argumentis that schools would have a sufficient numberof below-basic students who could achieve pro-ficiency on the modified assessment to reach themaximum 2% count. Based on our review ofschool data, we believe that schools will have anample number of below-basic students. However,we acknowledge the reality that the design ofeach state’s AA-MAS and the related standardssetting in each state will influence which studentswill be able to achieve proficiency on each state’sindividual test (i.e., the more closely related tothe regular assessment, the less likely that themost far-behind student will be able to demon-strate proficiency). However, regardless of thespecific test, it is fairly clear that wherever thecut-point for assigning students into the AA-MAS is set, schools will only be able to onlycount around 2% of students below that cut-pointtoward AYP.

This fact may seem obvious, but it is impor-tant to consider the consequences of allowing alarger number of students to take the AA-MAS.Focus groups participants believed that regard-less of which group of students is identified asthe target population, that all students with dis-abilities scoring at that level and below it wouldalso be identified. Specifically, if students scor-ing in the basic range are eligible, all students notachieving proficiency (thus, those in the below-basic range) will also be eligible by definition.As one participant who is a special educationteacher and parent of a student with a disabilitycommented, “One wouldn’t expect students whoare close to proficiency to take a ‘modified’ testwhile their lower-performing peers are expectedto take the regular test.” Approximately 69%

Page 324: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

308 C.J. Lemons et al.

or 97,000 students with IEPs in Pennsylvaniawere not proficient on the 2008 PSSA. Thus, IEPteams would be likely to allow more than 60%of students with IEPs (that’s almost 6% of thestudents being tested each year) to take the AA-MAS. The USDE is clear in its guidance that theAA-MAS is intended for a “small group of stu-dents with disabilities.” Targeting students in thebasic performance categories would result in an“over-identification” for participation in the AA-MAS. This would likely impact a much largerpercentage of students than was intended by thelegislation.

Additional Considerations andUnintended Consequences

The recommendations of the GSEG leadershipteam provided guidance to the state that the tar-get population should be a group of students inspecial education who have been performing con-sistently at the lowest level of performance on theregular state assessment. This guidance, however,did not address some of the larger issues withwhich school districts are currently grappling.We next address some of the most common rea-sons given for including a modified assessment asanother option in the testing system and providean argument for why some of these do not appearto be justifiable. Additionally, we examine whatmay be unintended consequences of focusing onthe development of an alternate test to improveaccessibility as opposed to improving the regularassessment and considering the learning expecta-tions for students in special education.

Commonly Provided Rationalefor the AA-MAS

The 2% test will help us meet AYP. One ratio-nale for the development of the AA-MAS wasto create a better measure that would allowmore students to demonstrate what they know.However, it appears that another strong moti-vation for developing the AA-MAS option was

to provide schools with a way to decrease thenegative impact of the test performance of stu-dents in special education on adequate yearlyprogress. Under NCLB’s AYP system, schoolsmust show that increasingly higher percentagesof all students are passing the state assess-ment. This includes students in the subcate-gory groups of special education, English lan-guage learners, minorities, and students fromlow-income families. As the percentage of stu-dents required to demonstrate proficiency hasincreased, a larger number of schools havefailed to meet AYP solely based on the per-formance of their special education students.The 2% test seems to be an appeasement todistrict and state leaders who have protestedthe notion that students enrolled in specialeducation should take the same assessment astheir typically developing peers (Lazarus &Thurlow, 2009). The fact that states in the plan-ning phases of developing an AA-MAS werepermitted to automatically begin counting thisadditional 2% as a proxy supports the claim thatimproving AYP counts was a prime motivationfor the modified assessment (Thurlow, 2008).

A deeper look into the impact of the 2%on whether or not a given school will meetAYP by deeming a small group of previouslynon-proficient special education students as nowproficient reveals that the benefit will actuallyaccrue to only a few schools, that the benefitmay only help for a few years as the AYP tar-get goal increases, and that the benefit will likelynot help any school meet the requirement of hav-ing all students with IEPs proficient in math andreading by 2014. We note that this final pointmay be moot contingent upon the reauthorizationof NCLB and the restructuring of the 100%-proficient-by-2014 policy. However, regardless ofhow AYP is reconceptualized, the benefits fromthe 2% option in terms of “AYP math” for anygiven school appear scant.

We acknowledge that many states are usingadditional calculations including confidenceintervals, proxy counts, and growth models tocalculate proficiency. For demonstration pur-poses, we are presenting a simplified version ofcalculating AYP. We are presenting a best-case

Page 325: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

17 Implementing Modified Achievement Tests 309

scenario in which the full 2% can be countedas proficient and, we acknowledge that someschools may be able to count a few more studentsas proficient depending upon the number countedunder the AA-AAS (see Kettler & Elliott, 2009)– regardless, this doesn’t change our overallpoint. In the most basic sense, the value of theAA-MAS in terms of meeting AYP is basedon two numbers: the percentage of the studentpopulation enrolled in special education andthe proficiency rate of this group of students.The lower the proportion of special educationstudents to the general population, the greater the

boost given by the AA-MAS. The “boost” is asimple calculation: 2% of the student populationfrom among those enrolled in special education.Thus, a school with a 15% special educationpopulation would be able to increase theirproportion of special education students meetingproficiency by 2/15 or 13.3%. And, the greaterthe proficiency of the special education students,the more likely that the boost will be enough tomeet the proficiency benchmark set by NCLB.The relationship between these two variablesis illustrated in Table 17.6. In this table, thebenefit in terms of AYP for schools with dif-

Table 17.6 Hypothetical examples of AYP benefits of a 2% increase in proficient special education students fromthe AA-MAS

RowTotal studentpopulation % SpEd∗ # SpEd

Actual %SpEd scoringproficient

Possible % SpEdproficient if full 2%from AA-MAS count

Resultingincrease in% SpEdproficient

2% Helpedschool meetAYP

1 700 6 42 30 63 33 NO

2 700 8 56 30 55 25 NO

3 700 10 70 30 50 20 NO

4 700 12 84 30 47 17 NO

5 700 14 98 30 44 14 NO

6 700 16 112 30 43 13 NO

7 700 18 126 30 41 11 NO

8 700 20 140 30 40 10 NO

9 700 6 42 40 73 33 YES

10 700 8 56 40 65 25 NO

11 700 10 70 40 60 20 NO

12 700 12 84 40 57 17 NO

13 700 14 98 40 54 14 NO

14 700 16 112 40 53 13 NO

15 700 18 126 40 51 11 NO

16 700 20 140 40 50 10 NO

17 700 6 42 50 83 33 YES

18 700 8 56 50 75 25 YES

19 700 10 70 50 70 20 YES

20 700 12 84 50 67 17 NO

21 700 14 98 50 64 14 NO

22 700 16 112 50 63 13 NO

23 700 18 126 50 61 11 NO

24 700 20 140 50 60 10 NO∗SpEd= Students receiving special education services

Page 326: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

310 C.J. Lemons et al.

fering proportions of special education studentsand special education student proficiency arecompared. A stable population of 700 is usedfor the comparisons. The first point is that the2% assessment gives the largest boost to schoolswith the lowest percentage of special educationstudents. Schools with 6% of their populationin special education (rows 1, 9, and 17) get anincrease of 33.3% whereas schools in which 20%of students are in special education (rows 8, 16,and 24) only gain an additional 10% to counttoward AYP. This realization is disconcertingbecause it appears that the benefit will not gotoward many of the most struggling schools –many of the schools with the highest percentagesof special education students are also schoolsserving greater numbers of minority studentsand students from economically disadvantagedfamilies (see Skiba et al., 2008).

A second point is that even with the additionalstudents counting toward AYP, only those schoolsthat are already fairly close to the AYP target willsurpass the goal with the additional count. In thisdemonstration, the AYP target is set at the 2011level for mathematics, 67% of students achiev-ing proficiency. In this case, there are only fourschools for whom the boost made a difference(rows 9, 17, 18, and 19) – all with lower percent-ages of special education students and all with

higher levels of proficiency than that currentlyexhibited by many of our schools.

Another way to look at the impact is to com-pare the benefit for five actual campuses. Datafrom five Pennsylvania schools that have beenrenamed for purposes of this chapter are pre-sented in Table 17.7. The campuses have varyingproportions of special education students (rang-ing from about 12% to about 35%) and varyinglevels of special education student proficiency(ranging from about 10 to 46%). However, innone of these campuses does the 2% additionbump them up into meeting the 2011 AYP targetsof 72% proficient in Reading and 67% proficientin Math. If this had been the 2010 school year(with slightly lower proficiency targets), the addi-tion would have helped Robinson Elementarymeet the target in mathematics only. Thus, theadditional 2% of students from the IEP pool doesnot appear to help many schools make their AYPtargets. And, higher SES schools that alreadyhave a relatively low number of special educationstudents who are already performing near profi-ciency will likely be the ones to see the benefit.Schools with a large proportion of special edu-cation students and schools with low proportionsof special education students scoring proficientlywill see no AYP benefit from the AA-MAS. Inmany ways, the benefit of the 2% test is unfairly

Table 17.7 Potential AYP benefits to five pennsylvania schools from a 2% increase in proficiency from the AA-MAS

School SubjectTotal #tested

Total # testedSpEd∗ % SpEd

Actual % SpEdscoring proficient

Possible % SpEdproficient including2%

Wagner Elementary Math 191 47 24.61 10.00 18.13

Reading 191 47 24.61 15.00 23.13

Stargell Elementary Math 680 81 11.91 30.00 46.79

Reading 679 80 11.78 40.00 56.98

Clemente Elementary Math 141 49 34.75 44.00 49.76

Reading 141 49 34.75 21.00 26.76

Mazeroski Elementary Math 355 42 11.83 39.00 55.90

Reading 354 41 11.58 37.00 54.27

Robinson Elementary Math 467 56 11.99 46.00 62.68

Reading 468 56 11.97 45.00 61.71

∗SpEd= Students receiving special education services

Page 327: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

17 Implementing Modified Achievement Tests 311

distributed – schools with the greatest resourcesget a much greater benefit than those with fewerresources.

However, if the new reauthorization of ESEAmaintains the current targets of having 100% ofstudents in special education on grade level by2014 or if this goal is simply moved further outin time, the benefit of the 2% test in terms ofAYP will eventually expire for virtually everyschool. Unfortunately, for schools like MazeroskiElementary, the most feasible option for meetingthe AYP benchmark for the IEP students wouldbe to decrease the number of students with IEPstaking the exam to fewer than 40 – perhaps an IEPteam could exit a student or two from special edu-cation, the school could encourage the students totransfer to another campus, or the administratorscould encourage the family to keep the studentshome during the testing period. While this last setof circumstances may seem extreme, the pressureof meeting high-stakes assessment goals has ledsome to do even more questionable things (SeeGabriel, 2010; Levitt & Dubner, 2005). Based onthis information, we think that the AYP argumentis not enough to justify including an AA-MASin a state accountability system. The benefits aresmall and apply to only a few, primarily higher-SES, school districts. Another rationale is neededto justify the new assessment.

The 2% test will allow students with disabil-ities to better demonstrate their understandingof grade-level content. This rationale has a highlevel of face validity – it makes sense that if thereis a group of students who are unable to demon-strate proficiency due to the nature of the assess-ment, then reasonable accommodations that donot invalidate the assessment should be provided.It is in this vein that states have developed accom-modation guidelines for their regular state assess-ments (Thurlow, Elliot, & Ysseldyke, 1998).Some states, such as Pennsylvania, have differentguidelines for accommodations that are allowedfor all students and those that are only allowedfor students in special education (PennsylvaniaDepartment of Education, 2010).

The notion of creating a “better test” thatallows more students to access that content andto demonstrate their knowledge is also supported

by work in universal design, cognitive load the-ory, and item modification research. An evolvingbody of research has explored the application ofthe principles of architectural universal design(Center for Universal Design, 1997) (meantto increase physical environment accessibility)to the educational realm. Universal design ininstruction and assessment promote accessibilityof curricular or test content so that diverse pop-ulation of students can accurately demonstratetheir knowledge uninhibited by design limitations(Johnstone, 2003). Thompson and colleagues(2002) highlight seven elements of universallydesigned assessments: (1) inclusive assessmentpopulation, (2) precisely defined constructs, (3)accessible, non-biased items, (4) amendable toaccommodations, (5) simple, clear, and intuitiveinstructions and procedures, (6) maximum read-ability and comprehensibility, and (7) maximumlegibility – characteristics that are salient foraccountability tests, considering IDEA 2004’srequirements for accessible testing experiencesfor students with disabilities. Specifically, thelaw states that “the State educational agency(or, in the case of a district-wide assessment,the local educational agency) shall, to the extentfeasible, use universal design principles in devel-oping and administering any assessments” (§612(a)(16)(E). Ultimately, the more universallydesigned the test, the greater the opportunityfor students to demonstrate mastery of testcontent.

In addition to universal design, cognitive loadtheory (CLT) has influenced test development andassessment research. CLT research suggests thathighly effective instruction is that which maxi-mizes the learner’s ability to manage the men-tal effort needed to work through tasks withoutexhausting working memory capacity (Sweller,1994). Recently, researchers have turned theirattention to applying CLT to the assessmentof student learning (Kettler, Elliot, & Beddow,2009). This work is particularly relevant to itemdevelopment for an AA-MAS intended for apopulation of significantly struggling learners.Because a student’s ability to demonstrate pro-ficiency or mastery of a task is mitigated byhis ability to manage the cognitive load of the

Page 328: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

312 C.J. Lemons et al.

task, research posits that highly effective itemsare those which minimize the taxing cognitivedemands of sorting through extraneous informa-tion that is unessential to the tested construct andmaximize the amount of mental energy focusedon zeroing in on central information to success-fully complete the task.

Furthermore, research on item modificationsuggests that altering items to reduce languageload (simplify sentence structure, segment textinto manageable parts, etc.) to improve the con-struction of answer choices (shortening itemstem, order choices logically, etc.), to decreasethe number of multiple-choice options from fourto three, and to enhance the regular test format(add white space, remove unnecessary visuals,etc.) may have significant positive effects on stu-dent performance on tests (Elliott et al., 2010;Hess, McDivitt, & Fincher, 2008; Rodriguez,2005).

However, using the “better test” argument tosupport the development of a separate, modifiedtest raises the question of whether the aim is toaccommodate students or whether it is to modifythe content the students are to learn. In the origi-nal guidelines, states were told that the AA-MAScould be based on a set of academic achieve-ment standards that reflect a reduced breadth ordepth of the grade-level content. In this case, stu-dents taking the AA-MAS could be expected tolearn a little bit less than their grade-level peersor to do a little bit less with the grade-level con-tent (i.e., cover the same content, but not be heldaccountable for processing the information at thesame level of cognitive complexity). However,as Zigmond and Kloo (2009) highlighted, thisnotion doesn’t make sense in a system where agrade-level test is given annually. If you onlylearn a portion of third-grade content, what areyou supposed to learn in fourth grade? The restof third-grade or a limited amount of fourth-grade content? And, if the answer is the latter,how will missing portions of each grade levelaffect students years down the line? What por-tions of the academic standards are nonessentialpieces that can be removed with little to noeffect as students progress through the schoolsystem?

But, more recent regulatory guidance hasremoved the “reduced breadth and depth” lan-guage and has reiterated that the AA-MAS isan assessment that covers that same content asthe regular test – it can be “a little less dif-ficult” but it must be “a rigorous assessmentof grade level knowledge” (Elliott et al., 2008).In addition, the test blueprints for the regularand AA-MAS assessments must be the same. It,therefore, appears that the test is to assess all ofthe same content, just be more accessible. If thisis the intent of the AA-MAS, it is unclear whystates are not focusing the efforts instead on mak-ing the regular test more accessible – particularlysince we know that many students who are not inspecial education also struggle with accessing theregular assessment and fail to demonstrate profi-ciency. One argument for the different test may bethat the types of accommodations that would beneeded to make the test accessible might invali-date the test (Palmer, 2009). For example, readingthe text of a reading passage aloud to a studentwould invalidate an assessment of reading com-prehension. But, the same logic would apply tothe AA-MAS. If a student has the text of anAA-MAS reading comprehension test read aloudto her or him, the test is no longer a reading com-prehension test and it cannot be used to deema student as proficient on grade-level readingcomprehension. (For another viewpoint of whya different test may be the most accessible forstudents likely to qualify for the AA-MAS, seeKettler, 2011.)

We think that the rationale for developing anAA-MAS solely to enhance accessibility is alsonot strong enough of a rationale. Instead, we needto be clearer about what we want students takingthe AA-MAS to learn and to demonstrate profi-ciency in. If it is grade-level content, we need toimprove the regular assessment; if it is a modifiedversion of grade-level content, we need to deter-mine how to successfully move students throughmultiple years of reduced expectation in a waythat is fair and that leaves students with a solidcore of knowledge.

The 2% test will improve instruction for stu-dents with IEPs. One primary purpose of thecurrent accountability system is to ensure that

Page 329: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

17 Implementing Modified Achievement Tests 313

students are receiving high-quality instructionand that this instruction is improving theirachievement. One potential reason to develop anAA-MAS is to improve the instruction providedto students in special education. If teachers areprovided with a clearer vision of the critical skillsto be taught to students taking the 2% test, it islikely that instruction will begin to focus on theseskills and higher scores on the assessment thatmeasures the skills will follow. But, this effect oninstruction will likely depend on whether the AA-MAS is meant to provide accommodations thatallow access to the grade-level test or whether itis aligned with a modified version of academicstandards. If the learning goals for students whotake the 2% are exactly the same as those studentstaking the regular exam, it is unclear how this testcould improve instruction – the goals for what astudent is to learn are the same, only how thislearning is measured has changed. On the otherhand, if breadth or depth is reduced, teachers mayhave a greater ability to focus on what would beseen as a set of “essential” skills for the studentstaking the AA-MAS, a point noted in focus groupdiscussions. This may yield positive results.

Quenemoen (2009) proposed another way inwhich the 2% may improve instruction for stu-dents in special education. Quenemoen arguesthat it is likely that there are a group of stu-dents who struggle to make proficiency on theregular assessment, who were placed into theAA-AAS by their IEP team, but for whom thealternate achievement standards are inappropri-ately unchallenging. If a test that is more aca-demically challenging than the AA-AAS, but lesschallenging than the regular assessment, is avail-able, teachers may raise their expectations forthis group of students and provide them instruc-tion that better targets their instructional needs.We believe that this is the most likely scenariofor the 2% test to actually improve instruc-tion; however, it will really depend on what thetest looks like and how academically challeng-ing it appears to be. If the target population forthe test is not the type of student proposed byQuenemoen and the test instead appears to bea slightly easier version of the regular assess-ment, it is unlikely that the instruction provided

to this group of students will significantlychange.

The 2% test is the ethical thing to do. Thisargument for implementing the AA-MAS is per-haps the most legitimate. If there is a studentwho will not pass the regular test, why not giveher or him a shorter, easier exam to make for amore pleasant testing experience? Also, if thereare students who cannot reasonably be expectedto meet the learning expectations of the regu-lar exam, it may be ethical to acknowledge thisfact and to redefine the essential components ofthe academic standards that we really do expectall students to learn. And, if a student is unableto demonstrate her or his grade-level contentknowledge because of the assessment, we shouldimprove that assessment to ensure that the test isnot the reason the student is performing poorly.All of these are good reasons to reexamine howwe currently include students in special educationin the accountability system. However, creat-ing a better testing situation because it appearsto be the right thing to do, does not fore-stall unintended consequences. We discuss thesenext.

Unintended Consequences

Special education tracking 2010. If schools aregiven the leniency to redefine which componentsof the general education curriculum are mostimportant for students in special education, thismay enhance teachers’ ability to focus instructionand to see improved student outcomes. However,it may also open up the door for a new typeof special education tracking. Once a studentstarts taking the AA-MAS and is, therefore, onlyexpected to learn the “essential skills,” it is likelythe student would continue on the AA-MAS path.And, students who are on the AA-MAS path fromfourth grade to graduation will likely have sub-stantially different learning outcomes and expe-riences from those who are not. Perhaps thisis okay and that we should acknowledge thatwe have a different set of expectations not onlyfor students with the most significant cognitiveimpairments (i.e., students who take the AA-

Page 330: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

314 C.J. Lemons et al.

AAS), but also for another group of students whoare unlikely to be able to obtain mastery of thesame amount of grade-level academic content astheir typically developing peers.

Nonetheless, we feel that this may be one ofthe largest areas of confusion surrounding the2% assessment. The students most likely to learnthe largest portion of grade-level content are thestudents who are already performing close to pro-ficient. This group, however, doesn’t appear tobe a logical target audience for the AA-MAS.Instead, the lowest-performing group of studentsin special education appears to be the better tar-get group. But, feedback from educators in thefield suggests that if this is the target group, weneed to consider what we can realistically expectin terms of academic performance. In a sense,it may be helpful to consider whether the 2%test is more like the 1% or more like a moreaccessible regular assessment. If the logic of the1% applies, the strong focus on grade-level aca-demic content may need to be reconsidered andinstead a redefined set of academic goals maybe needed. Perhaps we should be a little morehonest about which skills this group of studentswith a history of academic failure needs to obtainindependence, employment, and happiness oncethey finish school. If, on the other hand, the logicis much more aligned with enhancing accessibil-ity of the regular exam, perhaps we should redi-rect efforts to improving that assessment insteadof creating a separate but supposedly equalalternate.

How much will the 2% test cost and is thisthe best use of funds? Another consideration thatdoes not appear to have been discussed is the fis-cal cost of implementing the AA-MAS. Palmer(2009) reported that one state estimated that thecost for implementing the AA-MAs would beover $6 million. While we do not purport tobe experts on school finance, it does seem fairto raise the question of whether the real-worldbenefits of the AA-MAS are worth the finan-cial costs – particularly in a time in which manystate departments of education and schools arefacing budget crises. If the test does not meetthe apparent goal of moving more school dis-tricts toward meeting AYP, perhaps these funds

would be better directed toward other avenues.For example, the same funds and efforts couldbe directed toward improving the accessibility ofthe regular exam – an effort that would benefitmany students, including those in general educa-tion. Or, the funds could be focused on providingadditional professional development and trainingto teachers aimed at enhancing the instructionprovided to students in special education whoare not making adequate growth. Further, statescould even hire additional staff to provide moreintensive intervention for students.

But, are we not just pretending? While it isoft forgotten, the current accountability system isaimed at answering one very simple question –at the time of the assessment, what percentage ofstudents in a given grade have learned a sufficientamount of grade-level content to be called profi-cient? Schools can demonstrate improvement bydoing better (i.e., having more students meet pro-ficiency) with each year’s new group of studentsin a given grade. The system is not concernedwith individual growth over time or whether ahigher percentage of students in a given cohortpass the exam each year. Thus, the measure iswhether schools can do better with this year’sgroup of third graders than they did with lastyear’s; not whether more of the third gradersdid better when they got to fourth grade. Oneimportant factor in considering whether or not toadopt an AA-MAS is to think about the questionfor which an answer is sought. If we are ask-ing whether more students in this year’s groupmet grade-level expectations (and are, therefore,proficient) the test has to measure grade-levelacademic content.

If, on the other hand, we develop a substan-tially easier assessment and set the expectationfor meeting proficiency on this assessment at alevel that schools can be assured that at least2% of their population can achieve proficiency,it seems that we are simply pretending. And, pre-tending doesn’t answer the original AYP question– in fact, it muddies the waters and makes answer-ing the question even more challenging. And,as states have been given the leeway to selectdifferent target populations (i.e., the “almost pro-ficient” versus the “nowhere near proficient;” see

Page 331: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

17 Implementing Modified Achievement Tests 315

Zigmond & Kloo, 2009) and develop tests that arepotentially of varying levels of difficulty, makingcomparisons of the proficiency of students acrossstates becomes even more difficult.

Concluding ThoughtsWe believe that pretending that students inspecial education have met grade-level expec-tations, when they have not actually done so,is not the honest solution. Instead, it is whatwe refer to as the “Unicorn syndrome” – theacademic equivalent of gluing a horn to theforehead of a pony and telling her and othersthat she can fly. It is a disservice to the stu-dents, their parents, and their teachers to pre-tend they have achieved something they havenot. The consequences of this wishful thinkingwill likely have the opposite effect of what isintended. It seems quite possible that allowingschools to falsely deem students as proficientlets schools off the hook, in a sense, in termsof what society expects them to accomplish(see Zigmond & Kloo, 2009). A more hon-est solution would be to acknowledge thatspecial education is designed to meet the indi-vidual needs of a group of students who arein special education primarily because theyhave not been able to meet grade-level expec-tations. In this sense, perhaps, the account-ability system should be asking a differentset of questions for this group of learners –namely, how much academic growth occurredthis year and how much closer to achievingmeaningful academic and social goals is eachstudent?

In this light, maybe, we also need to gainclarity on the purpose of special education. Isthe aim to assist students with disabilities inmeeting all of the goals of their general edu-cation peers? Or, is the purpose to provide anindividualized program aimed at helping stu-dents meet a set of goals that may look quitedifferent from those of their general educa-tion peers? Or, maybe, the purpose is both– depending on the student? Fuchs, Fuchs,and Stecker (2010) argue that the student-level focus, which is the historic root of spe-cial education, needs to be brought back into

focus. The authors contend that special edu-cation should go “back to the future” andembrace the model of experimental teachingin which special educators are expert instruc-tors who are able to connect their instruc-tion to each student’s individual needs andto ensure that each student is making ade-quate progress toward her or his educationalgoals. This approach does not involve any pre-tending – instead, it honors both the teacherand the student who are working to meetthe challenges posed by disability and allowsthe progress toward individualized goals tobe rewarded. If special education as a fieldcan refocus and move away from pretending,we will be able to reconnect with the “indi-vidualized” and “special” components of ourprofession and better prepare our students forsuccess in the real world that they are aboutto enter.

References

Carnine, D. W. (1994). Introduction to the mini-series: Diverse learners and prevailing, emerging, andresearch-based educational approaches and their tools.School Psychology Review, 23(3), 341–350.

Center for Universal Design. (1997). What is universaldesign? Center for Universal Design, North CarolinaState University. Retrieved January, June 9, 2010, fromhttp://www.design.ncsu.edu

Elliott, S. N., Kettler, R. J., Beddow, P. A., & Kurtz,A. (2011). Creating access to instruction and testof achievement: Challenges and solutions. In S. N.Elliott, R. J. Kettler, P. A. Beddow & A. Kurtz (Eds.),Handbook of accessible achievement tests for all stu-dents (pp. 1–16). New York: Springer.

Elliott, S. N., Kettler, R. J., Beddow, P. A., Kurz, A.,Compton, E., McGrath, D., et al. (2010). Effects ofusing modified items to test students with persis-tent academic difficulties. Exceptional Children, 76(4),475–495.

Elliott, S. N., Kettler, R. J., & Roach, A. T. (2008).Alternate assessments of modified achievement stan-dards: More accessible and less difficult tests toadvance assessment practices? Journal of DisabilityPolicy Studies, 19(3), 140–152.

Fuchs, D., Fuchs, L. S., & Stecker, P. M. (2010). The“blurring” of special education in a new contin-uum of general education placements and services.Exceptional Children, 76, 301–323.

Fuchs, L. S., & Fuchs, D. (1998). General educa-tors’ instructional adaptation for students with learn-

Page 332: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

316 C.J. Lemons et al.

ing disabilities. Learning Disability Quarterly, 21(1),23–33.

Gabriel, T. (2010, June 10). Under pressure, teachers tam-per with tests. The New York Times. Retrieved June 22,2010, from http://www.nytimes.com

Hess, K., McDivitt, P., & Fincher, M. (2008, June).Who are the 2% students and how do we designitems and assessments that provide greater accessfor them? Results from a pilot study with Georgiastudents. Paper presented at the CCSSO NationalConference on Student Achievement, Orlando, FL.Retrieved June 16, 2010, from http://www.nciea.org/publications/CCSSO_KHPMMF08.pdf

Individuals with Disabilities Education Act Amendments.(1997). Pub. L. No. 105-7 (1997) Retrieved January21, 2010, from http://www.cec.sped.org/law_res/doc/law/index.php

Individuals with Disabilities Education Improvement Act,Pub. L. No. 108-446. (2004). Retrieved January21, 2010, from http://www.ed.gov/policy/speced/guid/idea/idea2004.htm

Johnstone, C. J. (2003). Improving validity of large-scaletests: Universal design and student performance(Technical Report 37). Minneapolis, MN: Universityof Minnesota, National Center on EducationalOutcomes. Retrieved June 13, 2010, from http://education.umn.edu/NCEO/OnlinePubs/Technical37.htm

Kame’enui, E. J., Carnine, D. W., Dixon, R. C., Simmons,D. C., & Coyne, M. D. (2002). Effective teach-ing strategies that accommodate diverse learners(2nd Ed.). Upper Saddle River, NJ: Merrill PrenticeHall.

Kettler, R. J. (2011). Holding modified assessmentsaccountable: Applying a unified reliability and validityframework to the development and evaluation of AA-MASs. In M. Russell (Ed.), Assessing Students in theMargins: Challenges, Strategies, and Techniques (pp.311–333). Charlotte, NC: Information Age Publishing.

Kettler, R. J., & Elliott, S. N. (2009). Introduction to thespecial issue on alternate assessment based on modi-fied academic achievement standards: New policy, newpractices, and persistent challenges. Peabody Journalof Education, 84, 467–477.

Kettler, R. J., Elliott, S. N., & Beddow, P. A. (2009).Modifying achievement test items: A theory-guidedand data-based approach for better measurement ofwhat students with disabilities know. Peabody Journalof Education, 84, 529–551.

Lazarus, S. S., & Thurlow, M. L. (2009). The changinglandscape of alternate assessments based on modifiedacademic achievement standards: An analysis of earlyadapters of AA-MASs. Peabody Journal of Education,84, 496–510.

Levitt, S. D., & Dubner, S. J. (2005). Freakonomics: Arouge economist explores the hidden side of every-thing. New York: HarperCollins.

Marion, S. (2007). A technical design and documentationworkbook for assessments based on modified achieve-ment standards. Minneapolis, MN: National Center

on Educational Outcomes. Retrieved April 2, 2010,from http://www.cehd.umn.edu/nceo/Teleconferences/AAMASteleconferences/AAMASworkbook.pdf

No Child Left Behind Act, Pub. L. No. 107-110. (2001).Retrieved January 21, 2010, from http://www.ed.gov/policy/elsec/leg/eseas02/index/html

O’Sullivan, P. J., Ysseldyke, J. E., Christenson, S. L.,& Thurlow, M. L. (1990). Mildly handicapped ele-mentary students′ opportunity to learn during read-ing instruction in mainstream and special educa-tion settings. Reading Research Quarterly, 25(2),131–146.

Palmer, P. W. (2009). State perspectives on implementing,or choosing not to implement, an alternate assessmentbased on modified academic achievement standards.Peabody Journal of Education, 84, 578–584.

Pennsylvania Department of Education. (2010). PSSA &PSSA-M Accommodations Guidelines for Studentswith IEPs and Students with 504 Plans (Revised1/11/2010). Retrieved June 17, 2010, from http://www.portal.state.pa.us/portal/server.pt/gateway/PTARGS_0_123031_744146_0_0_18/PSSA_Accommodations_Guidelines_2010.pdf

Quenemoen, R. (2009, July). Identifying students andconsidering why and whether to assess them with analternate assessment based on modified achievementstandards. In M. Perie (Ed.), Considerations for thealternate assessment based on modified achievementstandards (AA-MAS) (Chapter 2, pp. 17–50). Albany,NY: New York Comprehensive Center. RetrievedJune 17, 2010, from http://nycomprehensivecenter.org.docs/AA_MAS.pdf

Rodriguez, M. C. (2005). Three options are optimal formultiple choice items: A meta-analysis of 80 yearsof research. Educational Measurement: Issues andPractice, 24, 3–13.

Samuels, C. (2007). Spec. ed. advocates wary of relaxingtesting rules. Education Week, 26(43), 24–28.

Schiller, E., Sanford, C., & Blackorby, J. (2008).A national profile of the classroom experiencesand academic performance of students with LD:A special topic report from the Special EducationElementary Longitudinal. Menlo Park, CA: SRIInternational. Accessed on June 20, 2010. Availableat http://www.seels.net/info_reports/national_profile_students_leaming_disabilities.htm

Skiba, R. J., Simmons, A. B., Ritter, S., Gbb, A. C.,Rausch, M. K., Cuadrado, J., et al. (2008). Achievingequity in special education: History, status, and currentchallenges. Exceptional Children, 74, 264–288.

Sweller, J. (1994). Cognitive load theory, learningdifficulty and instructional design. Learning andInstruction, 4, 295–312.

Title I—Improving the Academic Achievement ofthe Disadvantaged; Individuals with DisabilitiesEducation (IDEA) Act; Assistance to States forthe Education of Children with Disabilities. (2005).Proposed Rule, 70(Fed. Reg.), 74624–74638.

Title I—Improving the Academic Achievement ofthe Disadvantaged; Individuals with Disabilities

Page 333: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

17 Implementing Modified Achievement Tests 317

Education (IDEA) Act. (2007). Final Rule, 72(Fed.Reg.), 17748–17751 (to be codified at 34 C.F.R. pts.200 and 300).

Thompson, S. J., Johnstone, C. J., & Thurlow, M. L.(2002). Universal design applied to large scaleassessments (Synthesis Report 44). Minneapolis,MN: University of Minnesota, National Center onEducational Outcomes. Retrieved June 9, 2010,from http://education.umn.edu/NCEO/OnlinePubs/Synthesis44.html

Thurlow, M. L. (2008). Assessment and instructionalimplications of the alternate assessment basedon modified academic achievement standards(AA-MAS). Journal of Disability Policy Studies, 19,132–139.

Thurlow, M. L., Elliott, J. L., & Ysseldyke, J. E. (1998).Testing students with disability: Practical strategiesfor complying with state and district requirements.Thousand Oaks, CA: Corwin Press.

US Department of Education. (2007). Modified aca-demic achievement standards: Non-regulatory guid-

ance. Washington, DC: Author. Retrieved February14, 2010, from http://www.ed.gov/policy/speced/guid/nclb/twopercent.doc

Wagner, M., Newman, L., Cameto, R., Levine, P., &Garza, N. (2006). An overview of findings fromwave 2 of the national longitudinal transition study-2 (NLTS2). Menlo Park, CA: SRI International.Retrieved June 20, 2010. Available at www.nlts2.org/reports/2006_08/nlts2_report_2006_08_complete.pdf

Zigmond, N., & Kloo, A. (2009). The “two percent stu-dents:” Considerations and consequences of eligibil-ity decisions. Peabody Journal of Education, 84(4),478–495.

Zigmond, N., & Matta, D. (2004). Value added of the spe-cial education teacher in secondary school cotaughtclasses. In T. E. Scruggs & M. A. Mastropieri (Eds.),Secondary interventions: Advances in learning andbehavioral disabilities (Vol. 17, pp. 57–78). Oxford,UK: Elsevier Science/JAL.

Page 334: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy
Page 335: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

18Accessible Tests of StudentAchievement: Access andInnovations for Excellence

Stephen N. Elliott, Ryan J. Kettler, Peter A. Beddow,and Alexander Kurz

Access means more than participation in gen-eral education classes and end-of-year achieve-ment tests. Real access involves the opportunityfor all students to learn the intended curriculumand demonstrate what they have achieved as aresult of their learning. In an era of standards-based accountability, these goals of access are atthe heart of high-quality instruction and testingpractices.

As articulated by the authors of the 17 pre-ceding chapters, access may start with pol-icy and federal regulations, but to realizethe benefits of educational access for studentswith disabilities many people must coordinatetheir efforts. Policy makers, educational lead-ers, Individualized Education Program (IEP)team members, individual teachers, test devel-opers, and students themselves are all involvedin advancing educational practices toward opti-mal access. Researchers also need to continuetheir press to develop tools that support teach-ers’ efforts to facilitate all children’s access toquality instruction and valid assessment of theknowledge and skills designated by the intendedcurriculum. Although promising methods andpractices are emerging that can improve accessfor all students, more work is needed on a muchlarger scale, if it is to have a meaningful impact.

S.N. Elliott (�)Learning Sciences Institute, Arizona State University,Tempe, AZ 85287, USAe-mail: [email protected]

We must move beyond the rhetoric of universaldesign principles and delve into the details ofevidence-based teaching and testing practices, ifeducators and students are to realize the benefitsof optimal access. Thus, in this concluding chap-ter of the Handbook of Accessible AchievementTests, we take stock of what has been learned andwhere we are headed with regard to educationaltesting and access for students with disabilities.

What We Know

In many ways, this book was stimulated byamendments to the NCLB Act in 2007 thatgave states the option of creating an alternativeassessment based on modified academic achieve-ment standards (AA-MAS) for eligible studentswith disabilities. The central theme of meaning-ful access to achievement tests, however, goesbeyond students with disabilities and a particu-lar testing option for accountability purposes. Ifyou have read the majority of the chapters in thisbook, you know there are a number of issues orbarriers to achieving meaningful access to bothinstruction and tests. You have also learned aboutsome research insights and innovations related toclassroom instruction and test design that havethe potential to overcome critical barriers to opti-mal access. We briefly recount key issues andinnovations relative to policy and regulations,classroom practices, and test design with the goalof providing an integrative summary that canfocus our thinking about needed future actions.

319S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4_18, © Springer Science+Business Media, LLC 2011

Page 336: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

320 S.N. Elliott et al.

Policy and Regulations About Access

Maximizing students’ opportunity to learn (OTL)valued knowledge and skills for a population ofstudents with a diverse range of achievement andability levels represents one of the main goals ofeducational policies and regulations about access.Improving access to valid assessment is thetypical strategy for pursuing this goal, because(a) improved OTL should result in achievementgains, (b) established measures of achievementgains are more available than are establishedmeasures of OTL, (c) assessment results can beused to identify areas where instruction needs tobe improved, and (d) the promise of assessmentresults from which valid inferences can be drawn,coupled with consequences for failure to meetproficiency standards, increases the motivation tosuccessfully teach the knowledge and skills thatcompose the intended curriculum. In Chapter 4,Zigmond, Kloo, and Lemons provided a sequen-tial model of this relationship, detailing the stepsfrom the participation of students with disabil-ities in high-stakes assessments, as mandatedby the Individuals with Disabilities EducationAct (IDEA, 1997), to improved academic out-comes for this group. The authors warned thatthe system could be undermined by options thatset lower standards for students who are capa-ble of achieving at grade level. In Chapter 2,Weigert also identified the delicate balance thatalternate assessment policies have had to main-tain, between holding students with disabilitiesto such high proficiency standards that the result-ing instruction cannot be meaningfully accessed,and having such low expectations that validassessment is no longer a lever for increasingthe quality of academic instruction. Of course,while IEP teams must make categorical decisionsabout which students are eligible for each assess-ment option (general assessment with or withoutaccommodations, modified assessment with orwithout accommodations, alternate assessment ofalternate academic achievement standards [AA-AAS]), in reality the best option is determinedbased on each individual’s achievement alonga continuum and her or his unique profile of

strengths and weaknesses; these considerationstaken together do not nicely divide studentsinto categories aligned with assessment options.While students who should complete the generalassessment and those who should complete theAA-AAS are non-overlapping groups, the linesbetween who should complete an AA-MAS andeither the general assessment or the AA-AASare much fuzzier. The situation gets even morecomplicated when considering the possibility ofusing accommodations deemed appropriate on anindividual basis.

In Chapter 3, Phillips outlined a set of guide-lines, based on legal requirements and psycho-metric principles, for states to develop nonstan-dard test administration policies for studentswith disabilities and English language learners(ELLs). The guiding psychometric principle ofthe author’s recommendations was preservationof the construct being targeted by the test. Phillipssuggested defining accommodations as nonstan-dard test administrations that do not change thetarget construct, and defining modifications asnonstandard test administrations that do changethe target construct. Consistent with this sugges-tion, the author recommends that scores fromtests that have only been altered by the use ofaccommodations be aggregated with and inter-preted alongside scores from a general assess-ment, but that scores from a modified admin-istration be reported separately unless the testshave been statistically linked. Under such a sys-tem, AA-MASs that are not linked to the gen-eral assessment would provide students withdisabilities (SWDs) greater opportunity to expe-rience success, but would not necessarily pro-vide greater access. Zigmond and colleaguesaddressed the selection of the appropriate assess-ment on the individual level, emphasizing therequirement that IEP teams select the most appro-priate option at the time of the evaluation. Giventhat the most appropriate assessment option is theone that provides the score from which the mostvalid inferences can be drawn, this suggestion isin agreement with the psychometric commitmentto preserve the construct being measured.

In Chapter 17, Lemons, Kloo, and Zigmonddrew similar conclusions to those of Phillips

Page 337: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

18 Accessible Tests of Student Achievement 321

about the issue of considering scores derivedfrom AA-MASs as comparable to, and combin-able with, scores from a general assessment. Theauthors dismiss the notions that the AA-MASpolicy will lead to better measurement of theknowledge and skills of students with disabil-ities, or that the test will help many schoolsand districts meet annual yearly progress (AYP)goals. Their grounds for questioning improvedmeasurement are that reducing breadth and depthwould change the construct, and that changeswhich strictly affect access could be appliedto the general test. Lemons and colleagues didnot address the possibility that an easier setof items might provide more precise measure-ment for a group of students that is certain toachieve at a lower level (i.e., students eligible foran AA-MAS). They acknowledged that the pol-icy might be a more ethical solution and mightlead to improved instruction for a subset of stu-dents who would otherwise take the AA-AAS,but ultimately contended that developing an AA-MAS is not worth the risk when one considersthe cost of production and possibility of unin-tended consequences (e.g., instructional track-ing). Echoing Phillips’ concerns, Lemons andcolleagues indicated that comparing AA-MASscores with scores from the general assessmentto determine proficiency is just “pretending,”and suggested that alterations to tests that solelyaddress access be applied to the general examina-tion, and that special education move away fromthe danger of uniform expectations that are toohigh to meaningfully inform instruction.

Over the course of 40 years of the US assess-ment policy, we are slowly approaching a systemthat maximizes access to measurement of knowl-edge and skills for all students. In Chapter 5,Davies and Dempsey presented an interestingcomparison with Australian policy on achieve-ment accountability, demonstrating that lawsthere are less clear about the inclusion of stu-dents with disabilities. While policy in Australiaindicates that all students should be included innational achievement assessment, a large num-ber of these students are still exempted or maybe absent during testing due to disabilities. UntilAustralia adopts practical solutions analogous to

those used to include students with disabilities inthe United States, those students will not ben-efit from the improvement in instruction that isassociated with appropriate assessment.

The purpose of improving education throughbetter measurement is only served when we pri-oritize a goal of access, which in turn leads tobetter measurement precision among the popula-tion for which each test is intended; much workin this area, evaluating the reliability and validityof tests modified to increase access, remains tobe done. The success of any large-scale account-ability system will be ultimately measured basedon its consequences for improving classroominstruction and increasing access to the generalcurriculum for all students – the topic of the nextsection.

Classroom Instruction and Accessto the General Curriculum

The assessed curriculum of any large-scaleaccountability system clearly exerts a stronginfluence on the content of classroom instruc-tion and the extent to which students receiveaccess to the general curriculum (Cizek, 2001;Ysseldyke et al., 2004). In Chapter 6, Kurz dis-cussed a comprehensive framework that delin-eates the connections between the various curric-ula of the educational environment at the system,teacher, and student level. The framework positsthe intended curriculum (i.e., the content desig-nated by the academic standards of the generalcurriculum) as the normatively desirable targetof all other (subordinate) curricula. For studentswith disabilities, each Individualized EducationProgram delineates the extent to which the gen-eral curriculum is part of the student’s intendedcurriculum. That is, the IEP curriculum may des-ignate the entire general grade-level curriculumas the student’s intended curriculum or specifysome modified or alternate objectives as well asadditional social, behavioral, or functional goals.Thus, Kurz presented the intended curriculum forstudents with disabilities as dually determined byboth the general curriculum and the student’s IEPcurriculum.

Page 338: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

322 S.N. Elliott et al.

The framework developed by Kurz indicatesthat the intended curriculum directly informsthe content of the assessed curriculum used foraccountability purposes. This model is consis-tent with the current accountability system, whichmandates that the tested content of large-scaleassessment be used to sample exclusively acrossthe content domains of the intended curriculum.Only on the basis of such curricular alignment arestakeholders able to draw valid inferences aboutthe extent to which students have been exposedto, and subsequently achieved, the intended cur-riculum. Moreover, stakeholders who wish todraw valid inferences about student achievementthat has occurred as a function of classroominstruction must also account for students’ oppor-tunity to learn the intended curriculum. The chap-ters by Kurz (Chapter 6), Ketterlin-Geller andJamgochian (Chapter 7), and Beddow, Kurz, andFrey (Chapter 9) specifically addressed two majoraccess points related to the intended curriculum:(a) the enacted curriculum and (b) the assessedcurriculum. With regard to the enacted curricu-lum, the content of classroom instruction presentsthe primary means through which students accessthe intended curriculum. With regard to theassessed curriculum, accessible tests are neces-sary for students to fully demonstrate achieve-ment on the tested constructs of the intendedcurriculum. All authors emphasized that withoutaddressing both barriers, it is virtually impossi-ble to make valid test score interpretations. Tothis end, Ketterlin-Geller and Jamgochian pro-vided two clear examples. In the first example,students who were given the opportunity to learnthe intended curriculum were not able to fullydemonstrate what they knew and were able to do,because the assessments were inaccessible. Poortest performance could thus be an artifact of inac-cessible tests rather than a lack of progress inlearning the intended curriculum. In the secondexample, poor test performance obtained throughaccessible tests was difficult to interpret, becausestudents were not provided with the opportunityto learn the intended curriculum in the first place.

OTL, of course, is not a dichotomous realityat the enacted curriculum level. Kurz describedOTL as a matter of degree related to threeinstructional dimensions: time of instruction,

content of instruction, and quality of instruction.These key dimensions of classroom instructionhave a strong and long-standing research baseas correlates of student achievement. Kurz dis-cussed the challenges and considerations relatedto measuring this operational conceptualizationof OTL in detail in Chapter 6. As a future direc-tion of research on OTL, he advocated consid-ering the formative benefits for teachers’ class-room instruction that could result from integrat-ing OTL measurement into ongoing instructionalprocesses.

Ketterlin-Geller and Jamgochian focused onthe content dimension of classroom instruc-tion by examining the instructional adaptationsthat may be necessary for students with dis-abilities to fully access the intended curricu-lum through the teachers’ classroom instruc-tion. The authors hereby differentiated betweeninstructional accommodations and instructionalmodifications. Instructional accommodations areadaptations to the design or delivery of instruc-tion (including materials) that do not changethe breadth of content coverage and depth ofknowledge of the general grade-level curricu-lum. Examples include large fonts or Braille forstudents with visual impairments (i.e., presen-tation accommodation). Instructional modifica-tions are adaptations to the design and deliveryof instruction (including materials) that resultin a reduction of the breadth of content cov-erage and/or depth of knowledge of the gen-eral grade-level curriculum. Examples includeextended time or frequent breaks (on an oth-erwise timed task) for students with attentionand/or processing deficits (i.e., timing/schedulingmodification). The authors emphasized that animportant consequence of instructional modifi-cations is a reduction of students’ opportunityto learn grade-level content standards and/or thelevel of proficiency expected of the general popu-lation. Due to the impact of instructional modifi-cations on OTL, Ketterlin-Geller and Jamgochianargued that their use should be limited in favorof instructional accommodations, which main-tain access to the general curriculum at gradelevel.

Many of the instructional accommodationssuggested by Ketterlin-Geller and Jamgochian

Page 339: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

18 Accessible Tests of Student Achievement 323

provide students with access skills that increasestudents’ benefit from an opportunity to learn theintended instructional constructs (e.g., highlight-ing of key words to focus attention on criticalinformation). Kettler, Braden, and Beddow dis-cussed test-wiseness as an access skill in thecontext of achievement testing. Test-wiseness canaffect the extent to which students are capa-ble of utilizing test-taking skills or strategiesin a test-taking situation to receive a higherscore. Typically, test-wiseness does not repre-sent an intended construct of achievement tests.Yet, many test-preparation programs include test-taking skills or strategies as part of their cur-riculum. Kettler et al., however, indicated thatresearch does not support the extended use ofclassroom time and instruction for purposesof teaching test-taking skills. Instead, teachersshould focus their instructional time on the con-tent of the intended curriculum, which is subse-quently sampled by properly aligned achievementtests.

As noted by the authors of chapters in theClassroom Connections section, considerationsfor accessible achievement testing should notbe restricted to the test event only. Accessibleachievement testing and accessible instructionrepresent two sides of the same coin. That is,all students should receive the opportunity tolearn the intended curriculum through classroominstruction that is well managed in terms oftime, content, and quality, as well as maximallyaccessible through judicious use of differenti-ated instruction and/or instructional adaptations.The resulting test scores derived from accessibleachievement tests thus should permit more validinferences about what students know and areable to do as a function of classroom instruction.Available strategies for purposeful test design toallow test-takers the opportunity to demonstratetheir knowledge to the greatest extent possible arediscussed next.

Test Design That Supports Access

Commensurate with the importance of access tothe general curriculum, as addressed by Kurz andothers, is the need for ensuring that assessments

of this curriculum yield data that reflect actualstudent achievement. Beddow, Kurz, and Frey(Chapter 9) offered a theoretical model (i.e.,accessibility theory) that defines the variety ofpotential sources of construct-irrelevant vari-ance based on test accessibility. In essence,the authors contended that the accessibility ofan assessment is a function of test featuresas well as individual test-taker characteristics.Specifically, Beddow et al. operationalized acces-sibility as the sum of interactions between thesevariables. An optimally accessible test, there-fore, yields scores from which subsequent infer-ences reflect only the interaction between thetest and the target construct as demonstratedby the test-taker, controlling the influences ofaccess skills. The authors present an exampleof a high school science item in three stagesof modification to illustrate several aspects oftest items that may be considered in the effortto reduce access barriers and increase the valid-ity of subsequent inferences for more test-takers.Beddow et al. recommended test developers sub-ject tests to accessibility reviews prior to admin-istering them with actual test-takers for theirintended purposes. They argued reviewers shouldground these reviews in empirical data, disag-gregated by high and low performers, disabil-ity status, and other group categories, wheneverpossible.

In Chapter 10, Tindal and Anderson dis-cussed the types of evidence needed to ensuretest score inferential validity, particularly forassessments designed for students with specialneeds. The authors reviewed several studies onresponse processes and concluded extended timeand read aloud can reduce potential access bar-riers and increase validity. They also noted anumber of shortcomings in many efforts to pro-vide validity evidence and made several sug-gestions for test developers to improve researchand practice. These included increasing the pre-cision with which constructs are defined, opera-tionalizing test design and implementation pro-cedures, providing deliberate training for testadministrators/teachers on the science and prac-tice of assessment, attending to the practi-cal organization and implementation of teststo ensure external validity of research, and

Page 340: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

324 S.N. Elliott et al.

examining test consequences programmaticallyto support the iterative increase of scientificevidence.

Rodriguez added to the test design discus-sion by surveying the extant knowledge on testitem–writing practices. In Chapter 11, Rodriguezreviewed a broad base of research and theoryspanning more than a century, providing severalconsiderations for designing tests and test itemsto maximize measurement utility. He examinedthe range of common item formats and eval-uated the potential benefits and limitations ofa number of innovative formats that are madepossible by increases in computer technology.Drawing on decades of test development guide-lines, Rodriguez offered several recommenda-tions for selecting the proper item format for theintended purpose, and for writing good test items.These include content and formatting concerns,such as keeping vocabulary and grammaticalconstruction as simple as possible, maintainingsingularity in target constructs, and formattingitems consistently. He also described strategiesthat should be avoided by item writers, includ-ing using negative stems, writing trick items,and including overly-specific or trivial content.Rodriguez focused specifically on the options formultiple-choice items, contending that it is in thisarea that empirical evidence is most conclusive.Based on this evidence, he asserted three choicesare optimal for multiple-choice items. Rodriguezconcluded the chapter by discussing the relationbetween item development and accessibility andits relevance to inclusive assessment.

Another critical aspect of test accessibility isthe degree to which language issues may influ-ence the validity of test score inferences. InChapter 12, Abedi addressed linguistic and cul-tural issues and the problem of reducing accessbarriers for students with a broad range of abil-ities and needs. He specifically focused on theperformance gaps between English languagelearners and non-ELL students. Abedi notedthese gaps are observed not only in reading, butin science and mathematics as well. In contentareas in which language and culture are not thetarget constructs, he argued, the concept of lin-guistic and cultural accessibility is essential. He

proposed employing teams of experts in assess-ing ELLs to address the linguistic complexity ofassessments prior to using them to make deci-sions on behalf of students. To determine thelinguistic complexity of an assessment and itsresultant accessibility for ELLs, Abedi contendedthese experts should attend to issues such as wordfrequency, familiarity, and length; grammaticalstructure; and concrete versus abstract narrative.

Kettler, in Chapter 13, reviewed the limitedextant data on the effects of using packages ofitem modifications to increase test accessibil-ity. The author drew several conclusions aboutthis emerging field of research. Kettler indi-cated that while examining packages of modi-fications in validation studies necessarily limitsthe degree to which these studies yield empiri-cal knowledge about the effects of individual itemenhancements, modification typically is providedin packages in practice. The author further con-tended that the measurement properties of eachoriginal assessment should be a “gold standard”by which the measurement properties of modi-fied assessments with the intended populationsare evaluated. Indeed, Kettler argued while it isexpected item difficulty typically decreases fol-lowing modification, reliability is primary amongindicators of the effects of test changes on themeasurement of the target construct. The authorsuggested access barriers likely diminish the con-sistency with which a test measures the targetskills across the test-taker population; thus, if reli-ability is not maintained or increased, little elsecan support the use of a modified test over thegeneral assessment as a better measure of whatstudents know and can do. Further, Kettler sug-gested decreased reading load likely is helpful forthe portion of the test-taker population for whomreading may present access barriers. This may bemanifest in shorter stems, fewer words in visu-als, and fewer options for multiple-choice items.Lastly, Kettler noted that while data indicate reli-ability can be maintained in reading across gradelevels, and in mathematics and science at theelementary level, reliability may decrease whenmodifications are applied to mathematics and sci-ence items at the middle and high school gradebands; the cause of this pattern is yet unknown.

Page 341: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

18 Accessible Tests of Student Achievement 325

In Chapter 14, Roach and Beddow discussedhow student input can be incorporated in thedesign of assessments for students with a broadrange of abilities and needs. These authors exam-ined students’ drawings, focus group interviews,think-aloud cognitive labs, and survey data tounderstand how student input may influencethe development of accessible assessments. Theauthors contended that soliciting student opin-ions and perspectives during the design phaseis important, but students often may not neces-sarily comprehend the degree to which item ortest features may impact their performance onthe test. Notwithstanding, ample evidence sug-gests (a) students often experience anxiety whenpresented with an assessment task, (b) studentsreport assessments often are more complex thanthey would like, (c) students, particularly thoseidentified with disabilities, often feel they havenot had sufficient opportunity to learn the testedcontent prior to being required to demonstratetheir proficiency, and (d) students tend to pre-fer test items that have been modified to improvetheir accessibility over test items in their originalform.

Russell in Chapter 15 addressed the opportu-nities computers offer for enhancing test accessi-bility across the range of test-takers. He invokedBennett’s (1999) prediction that the inevitableubiquity of computer-based testing would per-mit its increased integration with instruction.Russell observed, however, that in the decadesince Bennett’s conjecture, computers primarilyhave been used to increase the efficiency of test-ing as opposed to its integration with instruction,thus missing the mark insofar as the potential ofusing computer assessments to maximize learn-ing across the educational environment. Whilecomputers can be used solely for the purpose ofenhancing the efficiency of data collection, theyalso can be used to reduce barriers for students forwhom current assessments may not be optimallyaccessible. Russell surveyed the current state ofcomputer-based assessment technology, specifi-cally in terms of addressing the individual accessneeds of test-takers through integrated accommo-dations and access tools. He described innova-tive means of presenting audio content for poor

readers or visually impaired test-takers, signedavatars for deaf test-takers, adapted responsemodes for test-takers for whom the traditionalmethod of responding is inappropriate, and alter-nate representations for test-takers whose accessneeds require them (such as tactile representa-tions of visuals). Russell concluded with a call forresearchers to examine ways of helping teachersand test administrators decide which tools shouldbe used for individual test-takers to ensure all stu-dents are given access to demonstrate what theyknow and can do.

In Chapter 16, Egan, Schneider, and Ferraradelved deep into standard setting, a critical stepin the assessment development process, and ulti-mately in the interpretation of achievement testresults. Given the goal of accessible assess-ment to ensure the measurement of achieve-ment is equally accurate and consistent acrossthe range of test-takers, the process used bytest developers to set proficiency levels must beequally applicable to all test-takers as well. Theauthors described a standard-setting frameworkin six phases: Define, Describe, Design, Deploy,Deliver, and Deconstruct. Mirroring the clarioncall in other chapters for precisely defined tar-get constructs for tests and test items, Egan et al.asserted the establishment of clear, precise defini-tions of achievement-level descriptors is integralto test development, cut-score recommendation,and test score interpretation. The authors contendvalidity evidence should be collected throughoutthe standard-setting process, particularly whendeveloping tests that are intended for use withstudents with special needs.

Where Are We Going

Two years have passed since we conceptualizedthis book on accessible achievement tests. Somenew research has been published on item modifi-cations; several federally funded grants are under-way to examine innovative OTL measurementand explore attributes of more accessible tests;some test developers are using accessibility toolsto guide the development of new items; and atleast seven more states have opted to implement

Page 342: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

326 S.N. Elliott et al.

AA-MASs. Yet at the same time, testing practicesin the United States and in some foreign coun-tries, such as Australia, continue to be influencedby dated standards and limited conceptualizationsof item modifications and universal design princi-ples for tests. Policies, practices, and proceduresaround instruction and testing seem to change tooslowly for the thousands of student test-takerswho continue to experience difficulty gainingaccess to both the intended and assessed curric-ula. There are, of course, challenges confrontingthose of us interested in improving instructionand testing for all students. Several of the centralchallenges are discussed next, followed closelyby needed innovations that have the potentialto advance accessible instructional and testingpractices.

Challenges to Access

Barriers that limit access to the intended andassessed curricula are real. This assertion is notnew and has been part of the vision of educa-tion policy makers, test developers, and manyteachers for more than a decade. Based on thecollective examination of the scholars and lead-ers contributing to this book, we believe thereare five central challenges that educators and testdevelopers must address if they are to improveaccess to valued content and knowledge for allstudents:1. Ensuring students have meaningful opportu-

nities to learn the grade-level intended andassessed curricula.

2. Precisely articulating target constructs of testsand test items to facilitate the development ofmaximally accessible items.

3. Creating highly accessible tests that yieldreliable scores and valid inferences aboutstudents’ achievement in language arts, math-ematics, and science.

4. Determining the consequences of participa-tion in any of a number of assessment optionsfor a diverse population of learners.

5. Moving beyond the existing proficiency test-ing paradigm to one that privileges access andprogress to grade-level academic content.

These challenges are not unique to servicesfor students with disabilities, although they maybe more salient in their educational lives andthat of their teachers and parents. Access tohigh-quality instruction and assessment is a com-mitment extended to all students and must meanmore than participation, testing accommodations,and tests refined with universal design principlesin mind.

Needed Innovations to Improve Access

The words access and excellence often are usedin the same sentence by educational leaders asif they are two different goals. In fact, we con-tend, as do the majority of our co-authors in thisbook, that access is a prerequisite for achiev-ing excellence in teaching, learning, and testing.Indeed, excellence depends on access! As docu-mented in this book, the knowledge base concern-ing educational access has grown substantiallyover the past several years and in the processnew ways of thinking and tools for taking actionhave emerged for both researchers and practition-ers. Collectively, the knowledge and the relatedtools examined in this book provide us with sev-eral innovations that can be used to address thecentral challenges to access and help educatorsprogress toward the goal of excellence for alllearners.

To ensure students have meaningful opportu-nities to learn intended and assessed curriculaat grade level, we believe teachers need moresupport and feedback concerning their state’scontent standards. The work by Kurz and col-leagues on OTL and the development of practical,teacher-oriented tools such as MyiLOGS have thepotential to advance access by providing teachersa framework for organizing instructional content,documenting instructional time, and stimulatingevidence-based instructional decisions for entireclasses and individual students. Such data-basedtools must take little time, yet provide teacherswith powerful and ongoing feedback about theirinstructional actions. The information obtainedabout OTL from these tools helps to contex-tualize achievement data, potentially resulting

Page 343: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

18 Accessible Tests of Student Achievement 327

in instructional changes that increase access toexcellence for all students. Professional devel-opment for teachers that features instructionalcontent and resources for activating key conceptsor procedures associated with the content, cou-pled with tools for managing instructional timeand student outcomes, will certainly improvestudents’ opportunity to learn the intended andassessed curriculum.

Precisely defining the target constructs of testsand test items is a long-standing challenge con-fronting test developers and researchers inter-ested in test score validity. Precision in measure-ment of student achievement requires attention tomany details; often these details are influenced byitem writers and sometimes by educators respon-sible for defining the content to be assessed. Noinnovation discussed in this book will fully meetthis challenge; however, meaningful improve-ment is possible. We believe that by providingitem writers who have deep knowledge of a con-tent domain with explicit training in the use ofitem-writing tools like the Test Accessibility andModification Inventory (TAMI) and its diagnos-tic complement, the Accessibility Rating Matrix(ARM), that much of the extraneous content ofitems can be reduced. Extraneous item contentoften leads to construct-irrelevant variance. Thus,supporting item development that values accessi-bility can be achieved today. Coupled with thistype of training, states and other organizationsthat oversee achievement tests are encouragedto implement accessibility review panels. Suchpanels should have representatives knowledge-able of students with disabilities and also con-tent experts. These panels could function muchlike bias review panels and focus on eliminat-ing item attributes and content that are extra-neous and irrelevant to the constructs beingmeasured.

The challenge of creating highly accessi-ble tests that yield reliable scores and validinferences about students’ achievement beginswith the development of well-written itemswithout extraneous content. For some students,accessible tests that yield reliable and validscores, however, require more than accessibleitems. Many students with disabilities or English

language learners need accommodations to beable to meaningfully access items and respond toquestions. Testing accommodations provide morecomplete access to questions and answers andreduce irrelevant variance in responses caused bya disability or language difference. The provisionof needed and appropriate testing accommoda-tions with integrity for all qualified students isachievable and will make a difference in theaccessibility of tests, and, for many students,will result in higher test scores. When tests aredelivered via computers, the potential for deliver-ing needed accommodations with high integrityand consistency is even greater. Continuedinnovation of computerized item managementand delivery of tests promises to improveaccess to needed accommodations for morestudents.

Tests have instructional consequences.Determining the consequences of participatingin an assessment for students is important tounderstand and has been challenging to doc-ument. Ideally, assessment results are used torefine instruction for each student. Unfortunately,when results of tests are provided two to threemonths after the test is completed, the resultsare very unlikely to be used with the group ofstudents who took the test. This disconnectionbetween assessment and instruction can nolonger be accepted if we want to improve theconsequences of assessment. Computerizedtesting and more frequent short tests with nearlyimmediate feedback offer a solution that can leadto improved access to appropriate instruction.Tests also have motivational consequences.That is, repeated testing experiences where astudent is confronted with test content that hasnot been taught or is not readily accessible canserve to decrease one’s motivation to engage intesting, because it seems futile and unfair. Moreaccessible – accommodated and modified – testshold promise for improving the emotional orattitudinal side effects of testing and also yieldscores with more validity.

Moving beyond the existing proficiency test-ing paradigm to one that privileges accessand progress to grade-level academic contenthas been challenging for most states and test

Page 344: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

328 S.N. Elliott et al.

developers. As documented in this book, a num-ber of instructional and testing innovations areavailable to improve access for all students. Someof these innovations are being taken to scalein five or six states, but a more pervasive andsustained effort is needed nationwide to havea meaningful impact. In addition, more stateassessment systems would benefit from the useof tests that allow for achievement growth to berecorded. Collectively, we know how to buildmore accessible tests and assessment systems thatyield academic growth information. For a num-ber of reasons we have not accomplished all thatis possible when it comes to accessible tests andaccessible testing practices.

ConclusionIt is time to stop admiring the challenges toaccessible tests and think about innovativesolutions that will improve access and lead toexcellence in education for all students. Wehave written this handbook to help educators,test developers, and researchers understand theimportance of access and its role in provid-ing excellent instruction and testing practices.We hope this volume stimulates expedientactions that lead to the invention of more solu-tions, because today too many students sit inclassrooms where access to the intended andassessed curriculum is the exception and notthe norm.

Page 345: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

Subject Index

AAA-AAS, see Alternate assessment based on alternate

achievement standards (AA-AAS)AA-MAS, see Alternate assessment based on modified

academic achievement standards (AA-MAS)Academic Competence Evaluation Scales (ACES), 190Academic grading system, 83Acceptability, 37, 239, 246–247, 253Access, 44

alternate language, 266audio, 260–261barriers/challenges, 5, 147–148, 165, 172, 326and cut score setting, 275–276definition, 3to general curriculum, 321–323innovations to improve, 326–328to instruction and tests of achievement, 1–2, 131–132,

137–139, 158, 319to intended curriculum, 100linguistic, 217–218pathway, 179–180policy and regulations, 320–321as policy tools, 202signed, 263–266tactile, 266–267test-taking skills and, 153and test-wiseness, 147–148through OTL, 5–8through testing accommodations, 8–9through well-designed test items, 10–11, 166–171,

231, 323–325v. success, 44–45

Accessibilityacross educational environment, 179–180and construct preservation/score comparability, 38–39definition, 2–3item modifications for, 210levels, 168limited, 201–202linguistic, concept of, 217–220, 223–224means for increasing, 279modification packages on test and item, effects of, 231proof paradox, 177–178test-taking skills and, 147–148

Accessibility Principles for Reading Assessments, 213

Accessibility Rating Matrix (ARM), TAMI, 11, 168–169,171–176, 179, 327

Accessibility theory, 2–3, 10–11, 163–165accessibility proof paradox, 177–178accessibility rating, 174–177accessible test items, 166–171

characteristics, 169TAMI/TAMI ARM, 166, 168

CLTcategories, 166long-term/working memory, concept of, 166principles of, 166–168

and educational environment, 179–180test-takers, 164–165, 190

ARC, 165barriers, 165interaction domains/categories, 164

UD, 163–164Accommodation(s)

access through testing, 8–9categories, 258–260definition, 4, 38for ELL, 60, 219, 223–224instructional, 13, 131–136, 138–142, 144, 180, 257,

322–323read aloud, 140, 186–187, 189, 195–196, 256–261,

263“standard”, 23test, 8, 64, 185–186, 193, 211, 257–258, 260,

269–271test-taking skills and, 153–154

Accommodations Survey (Terra Nova 18), 190Accountability systems, 5, 12, 24, 59, 62, 69–71, 80,

107–108, 113, 153, 243, 295, 297, 311–315,321–322

Accountability testing challenges, U.S. legal issues,59–63

California case, 61–62construct shift, 61ELL accommodations/modifications, 60majority/minority ELL, 60–61NCLB Act, 59–60NCLB cases, 62–63

Coachella Valley, 62–63Reading School District, 62

329S.N. Elliott et al. (eds.), Handbook of Accessible Achievement Tests for All Students,DOI 10.1007/978-1-4419-9356-4, © Springer Science+Business Media, LLC 2011

Page 346: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

330 Subject Index

ACES, see Academic Competence Evaluation Scales(ACES)

Achievementalternate, standards, 27–28IEA, 110instruction and tests of, 1–2levels, 20, 30, 64, 71, 93, 193, 275–276, 278,

280–282, 284, 286–287, 289–290, 325modified, standards/tests, 29–30, 71, 73, 77–79, 106,

197, 232–235, 239–240, 295–296See also Alternate assessment based on modified

academic achievement standards (AA-MAS)student, 5–8, 12, 27, 88–89, 94, 99–100, 103,

107–113, 115–116, 153, 278, 282, 296, 319,322–323, 327

ADA, see Americans with Disabilities Act (ADA)Adapted interactions, 258Adapted response modes, 258, 325Adelaide Declaration, 87–88Adequate Yearly Progress (AYP), 26, 29, 59–60, 137,

183, 295–297, 306–311, 314, 321Advanced Placement exam, 207Advocates for Special Kids v. Oregon Dep’t of Educ., 45AERA, see American Educational Research Association

(AERA)Alignment index (AI), 102, 110, 121Alignment methodologies, 7

constraints, 107SEC method, 102, 111Webb method, 102

All-of-the-above (AOTA) items, 205, 208–209Alternate achievement standards, 27–28, 70–71, 73–74,

78, 80, 94, 313Alternate assessment based on alternate achievement

standards (AA-AAS), 27–28, 71, 73–75, 79,94, 105, 137, 210–211, 231–232, 275–278,280–282, 284–285, 287, 290, 296–297, 309,313, 320–321

Alternate assessment based on modified academicachievement standards (AA-MAS), 3–4,29–30, 71, 73–75, 77–79, 93, 105, 137, 183,185–187, 210–211, 231–236, 238–240, 244,275–282, 284–285, 287, 290, 295–298,302–303, 305–308–314, 319–321, 326

Alternate Assessment Standards for Students with themost Significant Cognitive Disabilities, 73

Alternate-choice format, 203, 205Alternate contrasts accommodation, 259, 268–269Alternate scoring methods, 214Alternative assessment, 23–24, 93–94, 319Ambach case, 36, 52–53American Educational Research Association (AERA),

3–4, 37, 183–185, 226, 232American Psychological Association (APA), 3–4, 33, 37,

103, 183–185, 187, 210, 226, 232, 245–246,253

American Sign Language (ASL), 263, 266Americans with Disabilities Act (ADA), 23, 35–36, 100,

164

Amplification devices, 139–141Analytic rubrics, 168, 174, 220Ancillary requisite constructs (ARC), 165, 172, 175,

178Anderson v. Banks, 36Annual accountability assessment, primary participation

decisions, 76, 78APA, see American Psychological Association (APA)Applied Measurement in Education, 204Approaching Expectations, 281–282Approximation techniques, 39ARC, see Ancillary requisite constructs (ARC)Arizona Instrument to Measure Standards, 238ARM, see Accessibility Rating Matrix (ARM), TAMIARM Record Form, 174ARM rubrics, 168, 174Assessed curriculum, 12, 99–100, 102–106, 110, 113,

121, 179–180, 321–322, 327–328Assessed skill levels, 118–119Assessment

accommodations, 23, 153–154, 158instruments, 10, 25, 164, 166, 177, 227, 244policies, 1960s and 1970s, 19–20modifications, 153

Assistive listening devices, 91Assistive technologies, 2, 25, 27–28, 141, 213, 258Assistive Technology Act, 2, 164Attention deficit hyperactivity disorder, 85Auditory accommodations, 139–142, 257–258, 260–263,

270–271Australia, 83–94

See also specific entriesAustralian Curriculum, Assessment and Reporting

Authority (ACARA), 88, 94Australian Federal Court, 84–85, 87Australian Government Productivity Commission, 84–85Australian Human Rights Commission, 85–87Australian policies, inclusive assessments and, 83–84

compared with U.S. policies, 87, 93–94DDA legislation, 84–87

historical development of, 85–87educational system, 83–84

enrolment options for SWD, 83–84two tiered, 83

NAPLAN, 88–90comprehensive assessment process, 88–89exemptions/absences/withdrawals and related

issues, 90–93participation and eligibility, 90performance scores/statements, 89–90special provisions/considerations and related

issues, 91–94test administration authority, role of, 89types of test questions/formats, 89

national goals for schooling, 87–88Australian public education systems, 85Australian student exemptions, absences/withdrawals, 92“Authentic” assessments, 22Automated scoring, 204

Page 347: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

Subject Index 331

Avatars, signing, 165, 264–266, 325AYP benefits, 309–310

BBd. of Educ. of Northport v. Ambach, 36, 52Beddow model, 164–165Biases, 194–195, 218–219, 225–226, 228, 233–234, 252,

288, 327Binet-Simon Scale, 201Bloom’s taxonomy, 134Braille accommodations, 8, 23, 35, 41, 44, 48, 54, 91,

139–140, 256–257, 259, 266–267, 322Brookhart v. Illinois State Bd. of Educ., 35–36, 51–52Brown v. Board of Education, 19, 295Bureau of Accountability and Assessment (BAA;

Pennsylvania), 305Bureau of Special Education (BSE; Pennsylvania), 305

CCAAVES, see Consortium for Alternate Assessment

Validity and Experimental Studies (CAAVES)Calculators, 7, 28, 33, 39–40, 42–43, 45–46, 53, 55, 65,

112, 136, 139, 142, 144–145, 212, 233California case

accountability testing challenges, 61–62graduation testing challenges in SWD, 54–56

alternate assessments, 55graduation test, 54–55legislative intervention and settlement, 56waiver policy, 55–56

California High School Exit Examination (CAH SEE),54, 58, 256

California Supreme Court, 58Carroll model, 7, 108–109, 112CBM, see Curriculum-based measurement (CBM)CEC, see Council for Exceptional Children (CEC)CEC testimony, 22Chain gameplay, 156Chapman (Kidd) v. State Bd. of Educ., 53–54, 256Chicago Public Schools, 250Child x Instruction interactions, 118Civil Rights Act, 19Classical measurement theory, 218, 225Classical test theory, 225, 238Classroom Assessment Scoring System (CLASS),

118–119Classroom-based assessment practices, 137–138, 142,

257Classroom instruction

accommodation/modification adaptations, 131–145OTL and ICM, 99–125test-taking skills, 147–158

“Clickers”, see Electronic input devicesClosed-product formats, 206Closed scoring keys, 206CLT, see Cognitive load theory (CLT)CMAADI, see Consortium for Modified Alternate

Assessment Development and Implementation(CMAADI)

Coachella Valley v. California, 62–63Coaching programs, 150, 152Coefficient alphas, 203, 232–239Cognitive complexity, 134–135, 234, 277, 284, 312Cognitive continuum, 206Cognitive demands, types of, 7, 10–11, 102–103, 110,

112, 115, 120–122, 166–167, 173–175, 177,251, 312

Cognitive load theory (CLT), 5, 10–11, 202, 211, 213,252, 311

categories, 166item stimulus, 171long-term/working memory, concept of, 166principles of, 166–168, 178–179

Cognitive overload, 143, 155, 167, 174Cognitive process dimensions, 123–124College Board, 201Committee on Goals 2000, 93Commonwealth and State and Territory governments,

85–86Communication devices, 91, 139, 142, 258, 271Comparable scores, 33–34, 37–38, 40, 44–46, 55, 64,

66Complex multiple-choice (Type K) format, 203Computer-adaptive assessments, 25, 212Computer-based tests (CBT), 147

access tools, recommended, 155advantages/disadvantages, traditional methods,

154–155suggestions for developers, 155–156video games and, 156–157

Computer-enabled innovations, 211–212Computerized tests and individual needs, 255–257, 327

accommodation representations/categories, 258–260alternate language based, 266audio based, 260–263sign based, 263–266tactile based, 266–267test validity, 269–270

accommodations, rethinking test, 257–258adapted presentations, 267–269

alternate contrasts, 268magnification, 267–268masking, 268–269

prospects, 270–272Computer spell-check, 45Consensus-building activities, 286Consortium for Alternate Assessment Validity and

Experimental Studies (CAAVES), 166, 233,235–238, 240, 250–251

Consortium for Modified Alternate AssessmentDevelopment and Implementation (CMAADI),166, 233, 236–239

Constructed response (CR) items, 148, 168, 197,202–204, 206–208, 212, 233, 235, 248,284–285

Construct-irrelevant variance, 8, 147, 157–158, 179,211–212, 218–219, 226, 228, 323, 327

Construct preservation, 34, 38–39, 66

Page 348: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

332 Subject Index

Construct-relevant factors, 30, 37, 40, 56–57, 60–61, 64,147, 155, 214

Construct validity, 37, 119–120, 147–148, 185, 194–196,246

Contentcoverage, 134–137, 139, 142–143, 179, 322emphasis, 102, 113exposure, 112–113map, SEC, 121–122overlap, 7, 108, 110–111, 121and quality, 7–8, 113–114, 118, 121, 323related evidence, 34, 183, 202validity, 34, 43, 47, 235

Context-dependent item set format, 203Conventional MC format, 202, 205Co-teaching, 117, 302Council for Exceptional Children (CEC), 22, 56,

247Council of Australian Governments (COAG), 88,

92Criterion-referenced tests, 23, 203Criterion-related validity, 122, 226Cronbach alpha, 177Cue-using strategy, 150–151Curricular validity, 50–51, 53Curriculum

general, 1–2, 5–6, 22, 24, 27, 73, 75, 100–102,104–108, 111, 113, 118, 121–123, 155, 214,321–323

intended, 1, 5–7, 11, 99–108, 110–118, 120–122,124–125, 179–180, 319–323

1980s and 1990s: IEP as, 20–23Curriculum-based measurement (CBM), 24, 28, 77, 190,

195, 301Custom skills/activities, instructional time allocation to,

122–123Cut-score recommendations, 279, 325

methods, 285–286technical report, 288workshop, 276, 280, 284, 286–288, 290

Cut score setting, see Standard setting process

DDDA, see Disability Discrimination Act (DDA)Debra P. v. Turlington, 49–51, 53, 108Deductive reasoning strategy, 150–151Department of Education Organization Act (PL 96-88),

20Depth of knowledge (DOK), 11, 30, 131–132, 134–137,

139, 142–144, 177, 179, 322Design

access, supporting, 323–325extended time studies, 190–192language issues in, 217–228student voices in, including, 243–253test item, 163–180UD, 2–3, 5, 10, 70, 156, 163–164, 178, 197, 202, 206,

211, 213, 243–247, 256–258, 271, 303, 307,311, 319, 326

Developing and Validating Multiple-Choice Test Items,204

6D Frameworkdeconstruct phase, 288define phase

AA-MAS/AA-AAS and unmodified grade-levelassessment, relationship between, 277–278

identifying desired ALD rigor, 277identifying examinee population, 277identifying intended uses of ALD, 279identifying means for increased accessibility, 279stakeholder involvement, 279validity evidence, 279

deliver phase, 287using reporting ALD, 287validity evidence, 288

deploy phase, 286refining/reporting target ALD, 286–287validity evidence, 287

describe phase, 280ALD, types of, 280–281ALD, writing range and target, 281–284proficient definition, 281–282validity evidence, 284

design phaseAA-MAS/AA-AAS test designs, 284–285methodologies, 285–286validity evidence, 286

Differential boost, 3, 8–9, 186–187, 237–238Differential item functioning (DIF), 187–188, 213, 244,

246Digital magnifying glass, 267–268Digital technologies, see Computerized tests and

individual needsDisabilities

Australian policies, 83–94SWD, see Students with disabilities (SWD)SWOD, see Students without disabilities (SWOD)U.S. policies, 19–30

Disability Discrimination Act (DDA), 84–87, 91–92,94

Disallowed accommodations, 91Discovery Education Assessment, 250Discrepancy model, 28, 193

EEAHCA, see Education for All Handicapped Children

Act (EAHCA)Educational environment, 100, 108, 131, 138, 144, 163,

179–180, 321, 325Educational evaluation, 20Educational measurement, 166, 204, 206Educational Measurement (book), 201, 204Educational productivity model, 112Educational Researcher, 115Educational systems, 26, 35, 52, 63Educational testing, 3, 20, 33–66, 87, 147, 246, 253, 255,

319Educational Testing Service (ETS), 207

Page 349: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

Subject Index 333

Education for All Handicapped Children Act (EAHCA),20, 36

Electronic input devices, 142Elementary School Journal, 115Element interactivity, 171, 173–174Elimination testing, 214ELL, see English language learners (ELL)Embedded mnemonic devices, 142Enacted curriculum, 7–8, 100, 102–103, 106–110,

112–116, 120–125, 179–180, 322English language arts (ELA), 53–55, 63, 65, 121, 225,

233–235, 248, 283English language learners (ELL), 33–34, 43, 49, 56–66,

99, 111, 217–220, 222–228, 253, 308, 320,324, 327

Equitable OTL, 108Error-avoidance strategy, 150–151Essay formats, 203–204Evidence-based instructional practices, 113–114, 122,

326Evidence-based interventions, 24, 28–29Experimental studies, 184, 232–233, 235–239Extended MC items, 212External validity, 185, 194–196, 323–324Eyeglasses, 39, 48, 140, 149

FFactor analysis model, 119, 226Fair assessment practices, 24, 26Fair Housing Amendments Act, 164Fairness in testing, 12, 23, 63, 103, 149, 163, 201–202,

207, 211–212, 245Federal District Court (California), 256Federal guidance documents, 74–75Federal legislation, U.S. legal issues in SWD, 35–37

AA-AAS, 137AA-MAS, 137ADA, 36, 164IDEA, 2, 36–37, 134–135NCLB modified tests, 2, 46–47, 134–135Rehabilitation Act, 2, 35–36

Federally funded research projects, 202Federal Register, 70, 77, 297, 305Florida Comprehensive Achievement Test, 190Formal assessments, 201Formative assessments, 28, 88, 226–227Frame of reference, 183Free appropriate public education (FAPE), 19–20, 23,

104

G“Gap kids, the”, see Students with disabilities (SWD)General achievement test, 231–232, 236–237, 239General curriculum, 1–2, 5–6, 22, 24–25, 27, 73, 75,

100–102, 104–108, 111, 113, 118, 121–123,155, 214, 321–323

General Supervision Enhancement Grant (GSEG), 236,297–300, 303, 305–308

GI Forum v. Texas Educ. Agency, 49–51, 57–58

Goals 2000 Educate America Act (PL 103-227), 22“Gold standard”, 114, 232, 324Government/legal policies supporting inclusive

assessment practicesAustralian policies, 83–94IEP team decision-making, 69–80U.S. legal issues, 33–66U.S. policies, 19–30

Grade-level proficiency, 71, 74–75, 144, 185, 232, 281,295–297, 306

Graduation testing challenges, U.S. legal issuesclaiming racial/ethnic discrimination, 49–51

Debra P. and GI Forum cases, 49–50notice and curricular validity, 50–51retests and remediation, 50–51

involving ELL, 56–59California case, 58–59Texas case, 57–58

involving SWD, 53–56Ambach case, 52–53Brookhart case, 51–52California case, 54–56Indiana case, 53–54

GRE innovations, 212–213Grid-in items, 204GSEG, see General Supervision Enhancement Grant

(GSEG)Guessing strategy, 150–1512005/2007 Guidance, U.S. DOE, 73Guidelines for Constructed-Response and Other

Performance Assessments, 207

HHandbook of Test Development, 204Hierarchical linear modeling, 110–111, 116Hills Grammar School v. HREOC, 84Human interpreters, 155, 256, 264–265Human-like digital figures, see Avatars, signing

IIDEA, see Individuals with Disabilities Education Act

(IDEA)IEP, see Individualized education program (IEP)IEP team decision-making

annual accountability assessment, participation in, 76historic role of, 72inclusive assessments, 69–72

1994 ESEA, full participation requirement, 69flexibility of alternate assessments for SWD,

70–71IDEA amendments, 69–70modified/alternate achievement standards, 70–71NCLB accountability provisions, 69participation assumptions, 70

recommendations, 74–80AA-AAS, participate through, 75annual IEP meetings, decisions be made at, 74considerations, 75, 79general assessment, default decision, 74, 76

Page 350: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

334 Subject Index

state guidelines, 73–75statewide assessment participation, 72–74

decision-making, 72–73Improving America’s Schools Act, 69Inclusion of Students with Disabilities for the National

Research Council, 93Inclusive assessment, 2, 10, 19–20, 30, 69–72, 83–84,

164, 243–244, 246–248, 311, 324Indiana case, 53–54Individualized education program (IEP), 36, 93, 133,

301Individualized intended curricula, 105Individualizing Student Instruction Observation and

Coding System (ISI), 118–119Individuals with Disabilities Education Act (IDEA), 2,

21, 24–26, 28–29, 36–37, 133, 153, 164, 243,296

Individuals with Disabilities Education Improvement Act(IDEA), 28, 72, 77, 100, 105, 134–135, 164,255–256

Initial Content Standard III, 247Innovative practices, see Test design and innovative

practicesInspiration R©, 142Institute for Research on Learning Disabilities (IRLD),

184Instruction

access to, 1–12, 131–132, 137–138classroom, 321–323content on, 110–112differentiated, 132–133quality of, 112–113unfolding of, 113–114

Instructional accessibility, 131–132, 137–138, 180, 298,323, 326

Instructional accommodations, 13, 131–136, 138–142,144, 180, 257, 322–323

Instructional activities, 100, 118, 120, 135, 143,258

Instructional adaptations, 131–132, 139accommodations, 139–142

location/condition, changes to, 141and modifications, distinguishing between,

133–134presentation, 139–141response format, 142timing and scheduling, 141–142

alignment with grade-level content standards,134–135

case example, 145consequences, 136–137differentiated instruction, 132–133integration based on students’ needs, 144–145interdependence, 137–138modifications, 142–144

and accommodations, distinguishing between,133–134

location/condition, changes to, 143presentation, 142–143

response format, 144timing and scheduling, 143–144

performance expectations and general educationconsistency, 135–136

Instructional delivery, 113, 142–143Instructional dimensions, 108–121, 123–124, 322Instructional groupings, 7, 112–113, 122Instructional task demands, 10–11Instruction and achievement test access, 1–2

affecting SWD, 1barriers, overcoming, 5as central issue in instruction, 1improvement actions and innovations, 11–12

access to achievement tests, 12access to instruction, 11–12

legislative context and key concepts, 2–4AA-MAS, 3accessibility, 2–3accommodation, 4IDEA, 2modification, 3–4NCLB, 2–3purpose, 3–4Rehabilitation Act, 2UD principles rationale, 2

via OTL, 5–8content/time/quality of instruction, 6–7definition, 6factors affecting, 6measurement of, 6–7SEC method, 7

via testing accommodations, 8–9differential boost, 8effect sizes of, 9implementation challenges, 9interaction hypothesis, 8research review, 8–9typical changes in, 8

via well-designed test items, 10–11CLT, 10–11development stages, 11primary options and goals, 11recommendations, 10TAMI, 11test accessibility, 10

Integrated instructional supports, 145Intended constructs, 34, 37, 39, 41, 43, 64, 148, 158, 262,

269, 323, 140, 142Intended curriculum, 1, 5–7, 11, 99–108, 110–115, 117,

120–122, 124–125, 179–180, 319–323Intended curriculum model (ICM)

access to, 100framework, 100–101for general education, 101–104

failure to distinguish, 104intended/assessed/planned/enacted curriculum

alignment, 102–1032001 NCLB Act, mandate, 101–102

and OTL, 99–100

Page 351: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

Subject Index 335

for special education, 104–106assessment options, 105IEP/general curricula overlap, 105–106student specific, 105

Intent consideration strategy, 150–151Interaction hypothesis, 8–9Internal validity, 184, 193–195International Association for the Evaluation of

Educational Achievement (IEA) studies,110

Internet, 255See also Computerized tests and individual needs

Interpanelist consistency, 286, 288, 290Interpretative/qualitative methods, 244–245Interrater agreement, 115, 248Interrater reliability, 119–120Intraclass correlation coefficient (ICC), 111Intrapanelist consistency, 286, 288, 290Introduction to the Theory of Mental and Social

Measurements, 201Iowa Test of Basic Skills (ITBS), 40, 189–192IQ-performance discrepancy model, 28Irlen Syndrome, 268IRT, see Item response theory (IRT)IRT framework, 237ITBS, see Iowa Test of Basic Skills (ITBS)Item

-based OTL measures, 6–7, 110–111development, 5, 10, 166–171, 176, 178–179, 197,

213, 227, 244, 251, 275, 280, 311, 324,327

difficulty, 177–179, 208, 212, 232, 250, 324discrimination, 177–179, 209-total correlations, 177–178, 203, 225

Item response theory (IRT), 214, 237Item-writing, practice and evidence

access/assessment policy tools, 202CR item

formats, 203–204guidelines/taxonomy of, 206–208

format choice, 208guidelines, empirically studied, 208–209historical view, 201–202MC item

formats, 202–203guidelines/taxonomy of, 204–206

modifications for accessibility, research on, 210–214AA-AAS, 210–211AA-MAS, 211alternative scoring methods, 213–214assistive devices and technologies, 211–213

research, 201–202three-option items, optimal, 209–210tools

ARM, 327TAMI, 327

J2007 Joint Title I IDEA Regulations, 29–30

KKansas Assessments of Multiple Measures (KAMM),

233Kansas General Assessments, 233–234K-3 educational system, 118–119K-12 educational system, 26, 104, 121, 187, 201, 209,

276, 284–286Kidspiration R©, 142Knowing What Students Know, 245Knowledge, measurement of, 232, 321, 327Kuder-Richardson Formula 20, 235

LLAA 2, see Louisiana Educational Assessment Program

Alternate Assessment, Level 2 (LAA 2)Language-based accommodation, see Language issues in

item designingLanguage issues in item designing, 217

affecting performance, 219–220complexity, unnecessary, 219impact, 218–219improvement steps, 226–228linguistic accessibility, concept of, 217–218modification procedures, 220–223

effectiveness, 224–225measurement error, sources of, 225validity, impact on, 226

research/methodological issues, 223–224Laptops, 91, 255Large-print test materials, 35, 48–49, 91, 134, 139–140,

154, 180, 191, 196, 267, 269See also Magnification methods

Large-scale assessment, 6, 10, 13, 99, 102, 137–138, 157,196, 210, 227, 243, 247, 322

Large-scale tests/testing, 2, 22, 93, 183, 185, 195–196,198, 202, 244, 247 –250, 252, 257, 270–271

Laws, 19, 22–24, 26, 35–38, 46, 48, 51, 56–58, 61–63,69–70, 72–73, 84, 87, 105, 279, 289, 296, 311,321

See also specific actsLearning disabilities (LD), 23, 25–26, 29, 33, 39–43,

45–46, 48, 51, 76, 85, 91, 112, 149, 184, 186,189, 193, 298, 301, 306

labels, 187, 189–191, 193, 252“Leveling the playing field”, 44–45, 149, 295“Life-skills” for independent living, 27Likert scale, 110, 168, 220–221, 227Listening comprehension, 38–39, 43–44, 142Local education agencies (LEA), 21, 26, 71, 78, 80, 275,

277, 279, 297, 311Louisiana Educational Assessment Program Alternate

Assessment, Level 2 (LAA 2), 233–235Louvain University, 201

MMagnification methods, 49, 139–140, 154, 256, 259,

267–269, 271Manipulatives, use of, 8, 133, 139, 141–142, 285Masking methods, 155, 258–259, 268–269, 271

Page 352: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

336 Subject Index

Massachusetts Department of Education, 248Matching format, 43, 203, 205, 285MAZE measure, 238–239MCEECDYA, Early Childhood Development and Youth

Affairs (MCEECDYA), Ministerial Council forEducation

MCEETYA 4 Year Plan, 88, 92Measurement errors, 155, 218–219, 224–225Mehrens-Kaminski continuum, 152Melbourne Declaration on Educational Goals for Young

Australians, 87–88, 92Mental load analyses, 252Meta-analysis, 7, 9, 110, 150, 173, 204,

208–209Microsoft PowerPoint R©, 142Ministerial Council for Education, Early Childhood

Development and Youth Affairs(MCEECDYA), 87, 90, 94

Modification(s)CAHSEE, 54–56definition, 3–4, 38ELL, 60, 62goals of, 11instructional, 131–138, 142–144, 322item, 3, 11, 30, 46–47, 172, 175–177, 179–180, 208,

210–211, 231–240, 246–247, 250–251, 279,311–312, 324–326

linguistic, 220–228Modification packages, test and item accessibility

improvement, 231AA-MAS policy, 231–232

characteristics, 232–233coefficient alphas, 232, 234measurement for proficiency, 232new policy criteria, 232

experimental studies, 235–236CAAVES, 236–238CMAADI, 238OAASIS, 238–239

findings, 239–240prospects, 240state-modified achievement tests, 232–233

KAMM, 233LAA 2, 233–235TAKS-M, 235

Modified achievement tests, 106, 232–235, 239,295–296

AA-MAS rationale, 308–313consequences, 313–315federal regulations, history of, 296–297GSEG project, 297–298

focus groups, 302–303PSSA performance trends analysis, 303–305recommendations and guidance, 305–308survey, 298–302

Multi-level analysis, 110–111, 116Multi-level models, 116Multi-level OTL studies, 112, 116Multimedia learning theory, 166

Multiple choice (MC) items, 45, 54, 89, 201–203, 239,261

formats, 202–204guidelines/taxonomy of, 204–206key characteristics, 168–169

Multiple regression, 110Multiple true-false format, 202, 205MyiLOGS, see My Instructional Learning Opportunities

Guidance System (MyiLOGS)My Instructional Learning Opportunities Guidance

System (MyiLOGS), 120–124, 326

NNAEP, see National Assessment of Educational Progress

(NAEP)NAPLAN, see National Assessment Program for Literacy

and Numeracy (NAPLAN)National Accessible Reading Assessment Projects

(NARAP), 213National achievement scales, 89National Assessment of Educational Progress (NAEP),

21, 60, 131, 201, 207, 222, 224, 284, 287National Assessment Program for Literacy and

Numeracy (NAPLAN), 83, 88–94National Association of School Psychologists, 77, 247National Center for Special Education Research, 213National Center on Educational Outcomes (NCEO),

21–22, 74, 185–186, 243, 250National Commission on Excellence in Education, 21National Council on Measurement in Education

(NCME), 3–4, 33, 37, 103, 184–185, 197, 210,226, 232, 245–246, 253

National Instructional Materials Accessibility Standards(NIMAS), 256

National Longitudinal Transition Study, 21National Research Council, 8, 93, 245NCEO, see National Center on Educational Outcomes

(NCEO)NCLB, see No Child Left Behind Act (NCLB)NCME, see National Council on Measurement in

Education (NCME)“New generation” assessment, see “Race to the Top”

(RTT) assessment initiativesNew Hampshire Department of Education, 271New York Supreme Court, 36No Child Left Behind Act (NCLB), 2–3, 26–28, 46–47,

55, 59–60, 62–65, 69–70, 72, 80, 93, 100–106,108–109, 134–135, 153, 185, 187, 202, 210,231, 240, 243–244, 256, 276, 279, 295–296,306, 308–309, 319

Non-biased items, 10, 164, 213, 311Non-comparable scores, 37–38, 43–47, 53–54Non-English language learners (non-ELL), 43, 57,

60–61, 63–64, 217–218, 224–226, 228,324

None-of-the-above (NOTA) items, 205, 208–209Non-essential visuals, 172Non-LD labels, 187, 189–190, 193Non-Regulatory Guidance, U.S. DOE, 244, 297

Page 353: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

Subject Index 337

Non-standard test administrations, 33–36, 38, 40–41,44–46, 49, 53, 64–66, 320

eligibility, 47–48labeling, 39

Norm group, 40Norm-referenced tests, 22–23, 40–41Notice validity, 50–51Numbers/operations, instructional time to, 122–123

OOAASIS, see Operationalizing Alternate Assessment for

Science Inquiry Skills (OAASIS)Office for Civil Rights (OCR), 54Office of Special Education Programs (OSEP), 21, 24,

26, 28, 213, 297Online teacher tools, see My Instructional Learning

Opportunities Guidance System (MyiLOGS)Operationalizing Alternate Assessment for Science

Inquiry Skills (OAASIS), 233, 236, 238–240Opportunity to learn (OTL), 2, 99–100

alignment methodologiesconstraints, 107SEC method, 102Webb method, 102

dimensions, 108–113content of instruction, 110–112quality of instruction, 112–113synthesis of, 113–114time on instruction, 108–109

documentation, 107–108ICM

framework, 100–101for general education, 101–104for special education, 104–106student access to, 100

research, 106–108conceptual and substantive relevance, 107–108measurement of, 114–124prospects of, 124–125

Optical mark recognition software, 89Option items, three/four/five-, 209–210, 212, 214, 233Oral presentations, 8, 54, 120, 134, 139–140, 193, 201,

203, 259, 263Oregon case, 45–46Oregon Statewide Test, 192OSEP, see Office of Special Education Programs (OSEP)Otherwise qualified individuals, 20, 35–36, 44, 55–56OTL, see Opportunity to learn (OTL)

PPaper-based tests, 39, 42–43, 55, 148, 154–155, 225, 248,

257–260, 266, 271“Pathway to accessibility”, 169Peabody Journal of Education, 204, 211Peer-assisted learning strategies, 139–140, 145Peer-reviewed alternate assessments, 27, 60, 63, 74Pennsylvania GSEG survey, 298–302Pennsylvania System of School Assessment (PSSA), 298,

301, 303–305, 307–308

Performance expectations, 6, 131–133, 135–136, 138,141–143

Performance levels, 93, 135, 184, 276, 302, 307Performance standards, 22–23, 63, 70, 101–102, 104,

135, 296See also Performance expectations

Peripheral devices, 266, 271Personal decisions, 69–80Policies, see Government/legal policies supporting

inclusive assessment practicesPractice(s)

classroom-based assessment, 137–138, 142,257

ethics and standards for professional, 246–247evidence-based instructional, 113–114, 122, 326fair assessment, 24, 26government/legal policies supporting inclusive

assessmentAustralian policies, 83–94IEP team decision-making, 69–80U.S. legal issues, 33–66U.S. policies, 19–30

innovative, see Test design and innovative practicesvalidity evidence, 185–188

Praise statements, usage of, 7, 112Predictive validity, 119–120, 122Prerequisite skills, 10, 50Presentation

accommodations, 139–141auditory, 140–141tactile, 141visual, 140

modifications, 142–143President’s Commission on Excellence in Special

Education, 26Principals, role of school, 72, 89–92, 286Principles for Professional Ethics, 247Professional judgment, 27, 37, 56, 90Proficient performance, 79, 275–291Programme for International Student Assessment, 83Prompts, 54, 78, 99, 101, 139, 141–142, 189, 213,

248–249, 251, 260, 268–269, 285, 303Psychometrics, 4, 33–35, 37–39, 46, 61–66, 118, 122,

153–154, 202, 206, 214, 228, 244–245,277–278, 320

See also Rasch model

QQSA, see Queensland Studies Authority (QSA)Qualified individual with disability, 36

See also Otherwise qualified individualsQuality indicators, 7, 112–113Quality of instruction, 6–8, 107–108, 112–113, 115, 125,

322Quantitative methods, 244

See also Statistical analysesQuasi-experimental studies, 9, 184–185, 187, 193,

197–198Queensland Special Provisions, 91

Page 354: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

338 Subject Index

Queensland students afforded special consideration andexemptions, 92

Queensland Studies Authority (QSA), 90–92

R“Race to the Top” (RTT) assessment initiatives, 30Racial/ethnic discrimination, 49–51, 87, 212, 250Range ALD, 280–284, 288–291Rasch model, 238Read aloud accommodations, 41–42, 139–141, 145, 148,

154, 185–189, 191–193, 195–198, 236, 239,256–263, 266, 268, 271, 312, 323

Reading comprehension, 37–39, 43–44, 54, 142, 172,188–193, 219, 251, 312

Reading School District case (Pennsylvania), 62Reasonable accommodations, 19, 25, 36, 38–39, 41, 60,

311“Reasonable adjustment”, 86, 91Reasoning competency, types of, 206Reauthorization, Elementary and Secondary Education

Act (ESEA)1993, 21–221994, 26, 69

Reauthorization, Individuals with Disabilities EducationAct (IDEA)

1990, 211991, 351997, 2, 22, 24–27, 69–70, 72, 105, 296, 3202004, 28, 72, 77, 100, 105, 134–135, 164, 255–256

Rehabilitation Act, 2, 19–20, 35–36, 39, 52Reliability, 11, 33–34, 111, 119–120, 124, 147–148, 165,

177–178, 184, 194, 196, 202–203, 209, 211,214, 223–225, 228, 233–237, 239–240, 244,321, 324

Remedial instruction, 21, 24, 29, 50–53, 55–56, 59, 63,103, 284, 306

Remote control, use of, 264Rene v. Reed, 53Reporting ALD, 281, 286–291Response modes

accommodations, 142adapted, 258, 325modifications, 144

Response theory, 214Response to intervention (RTI), 28–29, 77, 193,

256Retests and remediation, 50–51Rhode Island Department of Education study, 257Rhode Island Performance Assessment, 190Rodriguez meta-analysis, 209RTI, see Response to intervention (RTI)RTT, see “Race to the Top” (RTT) assessment initiativesRule violations, item writing, 209

SSan Francisco Unified School District (SFUSD), 61–62SAT-9, Stanford Achievement Test Ninth Edition (SAT-9)Scaffolded instructional prompts, 142Scale scores, 89, 279

School learning model, 7, 108–109, 112School psychologists, 247, 302Sciences, 37, 169–170, 172–176, 201–202, 224, 259,

323–324Scores

comparable, 33–34, 37–40, 44–46, 55, 64, 66, 224cut, 275–291non-comparable, 37–38, 43–47, 53–54reliability, 202–203, 209, 211–212, 326–327validity, 2–3, 34, 138, 269, 272, 327

Screen readers, 139–141Scribes, 91, 139, 141–142, 155, 180, 256, 258SEC, see Surveys of the Enacted Curriculum (SEC)Settings (location/condition changes)

accommodations, 141modifications, 143

Short answer items, 168, 201, 203–204Sign based accommodation, 263–266Signed English, 263, 265–266Signing avatars, 165, 264–266, 325Sign language, 8, 54, 165, 256, 259, 263SLD, see Slow Learning Disability (SLD)Slow Learning Disability (SLD), 26–29Socioeconomic status (SES), 109–110, 116, 310–311Southeastern Community College v. Davis, 20, 23,

35Spearman-Brown formula, 234–236Special education tracking, 306, 313Special populations, 23, 33–66

See also Students with disabilities (SWD)Spencer Foundation, 245Standardized tests, 23, 25, 53, 69, 264Standards-based accountability system, 23, 69, 80, 100,

253Standards-based assessment/testing, 30, 183–184, 188,

250, 276–277Standards-based reforms, 2, 6, 22, 26, 104, 110, 306Standard setting process, 275–276

ALD development prior to, 290definition, 276design and development steps, overview, 290implementation, 289–290panelists/committee members, 280process planning/evaluations, 279–280technical report validity, 288–289terminology, 276validity, 276–277See also 6D Framework

Standards for Educational and Psychological Testing,3–4, 33–34, 37–39, 43–45, 47, 50, 56, 61, 63,183–185, 197, 210, 226, 232, 245–246, 253

Standard testing conditions, 36, 39–40Stanford Achievement Test Ninth Edition (SAT-9), 21,

61–62State, Territory, and Commonwealth Ministers of

Education, 87State assessment participation guidelines, 74–75State Board of Education (SBE), 54–55, 61, 276State Department of Education, 74, 76

Page 355: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

Subject Index 339

State education agencies (SEA), 26, 275–276, 279,281–282, 289

Statistical analyses, 194, 244, 251Statistical conclusion validity, 184–185,

194–196Students

achievement, 5–8, 12, 27, 88–89, 94, 99–100, 103,107–113, 116, 153, 278, 282, 296, 319,322–323, 327

characteristics, 188–189, 193ELL, 226–228exemptions, 69, 90–92needs, 8, 77, 88, 132, 144, 156, 262,

266See also Instructional adaptations

OTL, see Opportunity to learn (OTL)participation, 90PSSA performance trends analysis, 303–305response data, 245–246, 248–252test-taking skills, see Test-taking skills and impact on

accessibility for studentswithdrawals, 90–93

Students with disabilities (SWD), 1–9, 42, 44–46, 69–73,76–78, 88, 90–91, 93–94, 100–102, 104–108,112, 117, 149, 153, 157–158, 179–180,183–185, 187, 190, 192, 194, 196–197, 213,219, 231–232, 236–239, 243–244, 247,251–253, 256, 261, 270–271, 275, 295–297,299, 302–303, 305–308, 311, 315, 319–322,326–327

See also Instructional adaptationsStudents without disabilities (SWOD), 2, 4, 8, 41, 72, 91,

102, 105–107, 113, 149, 158, 236–239, 251Student voices in inclusive assessments, 243–244

epistemological and methodological frameworks,244–245

extant research integrating students, 247–252using drawings, 248–249using interviews, 249–252

professional practice, ethics/standards for, 246–247prospects, 252–253response data, uses of, 245–246

Study of Instructional Improvement (SII) logs, 118,120–121

Surveys of the Enacted Curriculum (SEC), 7, 102,110–111, 120–122, 125

Sydney Morning Herald, 92Symbolic communication systems, 27, 41–42, 91, 259,

263See also Sign language

TTactile accommodations, 139–141, 155, 164, 259,

266–267, 325TAKS-M, see Texas Assessment of Knowledge and

Skills – Modified (TAKS-M)Talking Tactile Tablet (TTT), 266–267TAMI, see Test Accessibility and Modification Inventory

(TAMI)

TAMI Accessibility Rating Matrix (ARM), 11, 168–169,171–176, 179, 327

Target ALD, 280–284, 286–291Target constructs, 2–3, 8, 10–11, 103–104, 147, 157,

163–167, 169–170, 172, 174, 177–180, 212,226, 251, 320, 323–327

Taxonomies, content, 6–7, 110–111, 117, 134, 203–206,208

Teacher-child interactions, 115, 118Teacher information, 298–299Teacher logs, 115–116, 120Teacher-recommended cut scores, 284, 286Teachers’ instruction, 6, 100, 136Technology Assisted Reading Assessment (TARA)

project, 213Telecommunications Act, 164Terra Nova 18, 190Terra Nova Multiple Assessment Battery, 192Test Accessibility and Modification Inventory (TAMI),

11, 166, 168–170, 176, 179, 211, 213,250–251, 327

Test accessibility theory, 165Test accommodations, 8, 64, 185–186, 193, 211,

257–258, 260, 269–271Test Anxiety Inventory for Children and Adolescents

(TAICA), 252Test delivery systems, 155, 165, 168, 258–260, 263, 267,

271Test design and innovative practices

accessibility theory for test-takers, 163–180accommodated/modified large-scale tests, 183–198CBT, 255–2726D Framework, 275–291item-writing, 201–214linguistic complexity in, 217–228modification packages, 231–240student voices, inclusion of, 243–253

Test developers/development, 2, 4, 20, 23, 37, 65, 147,153–155, 158, 163, 165, 171, 178–179, 188,202, 206, 213, 223, 239, 244–246, 253, 258,260, 262, 270, 275, 279–280–281, 289–291,295–296, 311, 319, 323–328

Tested construct, 33–34, 43, 61, 178, 269, 312, 322Testing accommodations, 1–5, 8–10, 12–13, 23, 33–34,

39, 74, 93, 138, 155, 164, 179–180, 237,246–247, 251, 257, 326–327

See also Non-standard test administrationsTesting Individuals of Diverse Linguistic Backgrounds,

56–57Testing protocols, 4, 89, 94Test item design, 10–11, 163–180, 217–228, 231–240,

267–269Test Preparation Handbook, QSA, 90–91Tests of achievement, 1–19, 295–315, 319–328Test Standards, see Standards for Educational and

Psychological TestingTest-takers/taking, 2–3, 8, 10–11, 103, 148–150,

152–155, 157–158, 163–180, 197, 211, 236,249, 271, 323–326

Page 356: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

340 Subject Index

Test-taking skills and impact on accessibility for students,147

interactions with other improvement methods,153–157

accommodations, 153–154CBT, 154–157modifications, 154–155

practical implications, 157–158test-wiseness

access and, 147–148definition, 148frameworks and findings, 150–153threshold hypothesis, 148–150

Test-taking strategies, 149, 157Test-wiseness, 147–153Texas Assessment of Knowledge and Skills – Modified

(TAKS-M), 233, 235Texas case, graduation testing challenges involving ELL,

49–51, 57–58Texas Education Agency, 235Texas Education Code, 51, 57Texas Reading Assessment of Knowledge and Skills

(TAKS), 191Texas Student Assessment Program, 232, 235Text-based information, 259–261, 263, 266–267Third International Math and Science Study (TIMSS),

224Third-party observations, 114Threshold hypothesis, 148–150Time

and content dimension, 11, 113and quality dimension, 8, 11, 113–114-using strategy, 150–151

Timers, 139, 142Timing and scheduling

accommodations, 139, 141–142modifications, 143–144

2002–2003 Title I ESEA regulations, 27–28TOEFL innovations, 212True-false format, 147, 202–203, 205Two-tier education system, Australian, 83Type-K formats, 205, 208–209

UUnderstand/apply, cognitive process dimension, 123–124“Unicorn syndrome”, 315United Nations Convention on the Rights of the Child

(UNCRC) Article 12, 247Universal design (UD), 2–3, 5, 10, 70, 156, 163–164,

178, 197, 202, 206, 211, 213, 243–247,256–258, 271, 303, 307, 311, 319, 326

See also specific designsUniversal Design for Learning (UDL), 256, 307“Universal proficiency” goal, 29University of Bologna, 201University of Kansas, 140–141University of Minnesota, 184Unmodified grade-level assessments, 275–278, 282,

284–285, 290

U.S. Department of Education (U.S. DOE), 2–3, 22–24,26, 59–60, 64, 93, 153, 185, 231–232,235–236, 240, 244, 271, 295, 297

U.S. Government Accountability Office, 217U.S. inclusive assessment policies for SWD, 19–30

1960s and 1970s, inclusion and equal access, 19–20EAHCA, 20ESEA amendments of 1974, 20FAPE, 19–20Rehabilitation Act, 19–20Title VI of the Civil Rights Act of 1964, 19

1980s and 1990s, IEP curriculum, 20–23ADA, 23CEC testimony, 22criterion-referenced tests, 23Department of Education Organization Act (PL

96-88), 20–21education and assessment gap, 22ESEA reauthorized (1993) attempts and reform

approaches, 22IDEA reauthorized (PL 101-476; 1990) focus and

funded studies, 21National Commission on Excellence in Education,

21norm-referenced tests, 22–23OSEP, 21participation guidelines, 23Rehabilitation Act regulations, 20sensory impairments v. cognitive disabilities,

debate, 231997 IDEA and alternate assessment options, 24–26

CBM measures, 24classes of SWD, 24–25communication, mode of, 25computer-adaptive assessments, 26elements, new, 25endorsement, 24IEP as tool, 26initiated by Hoppe, 25non-native English speakers, inclusion of, 241999 Part B regulations mandate, 26SLD students, 26

2001 NCLB Act, 26–27annual assessments, 26Paige and Spellings emphasis on, 26–27President’s Commission on Excellence in Special

Education, 27“wait-to-fail” model, 27

2002–2003 Title I Regulations on alternateassessments, 27–28

CBM-based assessments, 28“life-skills” for independent living, 27

2004 IDEA and RTI assessment, 28–292007 Joint Title I IDEA Regulations, 29–30RTT assessment initiatives, 30

U.S. legal issues in educational testing for SWD, 33accessibility and construct preservation/score

comparability, 38–39accountability testing challenges, 59–63

Page 357: Handbook of Accessible Achievement Tests for All Students: Bridging the Gaps Between Research, Practice, and Policy

Subject Index 341

California case, 61–62construct shift, 61ELL accommodations/modifications, 60majority/minority ELL, 60–61NCLB Act, 59–60NCLB cases, 62–63

construct fragmentation and shift, 43–44ELL v. non-ELL, 43mathematical assistance to SWD, 43

federal legislation, 35–37ADA, 36IDEA, 36–37NCLB modified tests, 46–47Section 504 of Rehabilitation Act, 35–36

graduation testing challengesclaiming racial/ethnic discrimination, 49–51involving ELL, 56–59involving SWD, 53–56

labeling non-standard test administrations, 39“leveling the playing field”, access v. success, 44–45non-standard test administrations, 33–35

access, 33burdens, undue, 47–48content validity evidence/judgments, 34definition, 33eligibility for, 47–48tested construct, 33–34testing accommodation, 33

Oregon case settlement, 45–46professional standards, 37–38public policy exceptions, 44recommendations, 63–66skill substitution, 40–43

calculator use, 42–43extended time alteration, 40–41readers, 41–42

U.S. policies, see U.S. inclusive assessment policies forSWD

U.S. Supreme Court, 20, 23, 35–36, 48, 58

VValenzuela v. O’Connell, 57–58, 63Validity evidence, 183–185

constructs, 197definition, 184extended time research

elements for, 190student characteristics, 187–189task demands, 187–188

operationalization, test design, 197–198outcomes, validation of, 198read aloud research

elements for, 191–192student characteristics, 188, 193task demands, 188–193

research designs and qualityextended time, 194–195read aloud accommodations, 195–196

response processes, current practices, 185–188systems organization, 198teacher training, 198types of, 184–185See also 6D Framework

Variance decomposition, 111, 116VHS/DVD/CD/MP3 players, 141, 264Video game technologies, 156–157Visual accommodations, 139–142, 155, 166–167,

172–173, 211, 213, 251, 324–325Visual acuity difficulties, 149Voice-over technology, 233, 236, 239

W“Wait-to-fail” model, 27Waivers, 44, 53, 55–57, 62Walberg model, 112Washington Administrative Code, 193Washington statewide achievement test (WASL),

191Webb method, 102Webb’s taxonomy, 134What Every Special Educator Must Know: Ethics,

Standards, and Guidelines, 247Withdrawals, student, 90–93Within-method consistency, 286, 290Writing Test Items to Evaluate Higher Order Thinking,

204