15. Accommodations on Large-scale Assessment

7/24/2019 15. Accommodations on Large-scale Assessment

1/35

Sage Publications, Inc. and American Educational Research Association are collaborating with JSTOR to digitize, preserveand extend access to Review of Educational Research.

http://www.jstor.org

Accommodations for English Language Learners Taking Large-Scale Assessments: AMeta-Analysis on Effectiveness and ValidityAuthor(s): Michael J. Kieffer, Nonie K. Lesaux, Mabel Rivera and David J. FrancisSource: Review of Educational Research, Vol. 79, No. 3 (Sep., 2009), pp. 1168-1201Published by: American Educational Research AssociationStable URL: http://www.jstor.org/stable/40469092Accessed: 15-11-2015 00:20 UTC

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content

in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.For more information about JSTOR, please contact [email protected].

This content downloaded from 161.112.232.221 on Sun, 15 Nov 2015 00:20:46 UTCAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/http://www.jstor.org/action/showPublisher?publisherCode=aerahttp://www.jstor.org/stable/40469092http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/stable/40469092http://www.jstor.org/action/showPublisher?publisherCode=aerahttp://www.jstor.org/


2/35

Review

f

Educational

esearch

September

009,

Vol.

9,

No.

3,

pp.

1168-1201

DOI: 10.3102/0034654309332490

2009AERA, ttp://rer.aera.net

Accommodations

or

English

Language

Learners

TakingLarge-Scale

Assessments:

A

Meta-

Analysis

n

Effectiveness

nd

Validity

MichaelJ.Kieffer ndNonieK. Lesaux

HarvardGraduate

chool

of

Education

Mabel Rivera

nd

David

J.

Francis

University

f

Houston

IncludingEnglish language

learners

ELLs)

in

large-scale

assessments

raises

questions

bout the

validity

f

nferences

ased on

their cores.

Test

accommodations

or

ELLs are intended

o reduce

the

impact

of

limited

Englishproficiency

n the assessment

f

the

target

onstruct,

most

often

mathematicr science

proficiency.

his

meta-analysis

ynthesizes

esearch

on the ffectivenessndvalidityf uch ccommodationsorELLs. Findings

indicate hat one

of

the even ccommodations

tudied

hreaten

he

validity

of inferences.

owever,

nly

one

accommodation-

roviding

nglish

dic-

tionaries r

glossaries-

has

a

statisticallyignificant

ffect

n

ELLs

perfor-

mance,

nd this

ffectquates

to

only

small

reduction

n the chievement

score

gap

between

LLs and native

nglish peakers.

indings uggest

hat

accommodations o

reduce the

mpact

f

limited

anguage

proficiency

n

academic skill ssessment

re

not

articularly

ffective.

iven

his,

we

posit

a

hypothesis

bout the

necessary

ole

of

cademic

anguage

kills n mathe-

matics nd science ssessments.

Keywords:

achievement

gap,

assessment,

English

language

learners,

high

stakes

testing, anguage development.

As the tandards

movement

n

education

has

gained

n

momentum,

olicy

mak-

ers have

increasingly

ocused

on test-based

ccountability

ystems

with

the

goal

of

mproving

cademic achievement

or ll children.

he

principles

f

setting igh

standards,

ssessing

all

students elative

o those

standards,

nd

holding

schools

accountable for tudent

chievement ave

long

been central

o reform

movements

in

public

education

(e.g.,

Fuhrman,

2003).

However,

since the

No Child

Left

Behind

Act of 2001

(NCLB),

the

application

of

these

principles

o

subgroups

f

studentsdentified s particularlytriskfor cademicdifficultiesas becomevery

important.

1168

http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp


3/35

Accommodations

or

ELLs

One

of these

ubgroups

onsists

f studentswho ack full

proficiency

n

English,

commonly

eferred

o

as

English

anguage

earners

ELLs).

ELLs

represent

ne of

thefastest-growingroupsamongtheschool-aged population n thisnation e.g.,

Capps

et

al.,

2005).

Speaking

a wide

variety

f

languages,

this

group

almost dou-

bled in size

between

1980

and

2000,

and

the most

recent stimates

place

the size

of

the

population

t more han million

e.g.,

Batalova, Fix,

&

Murray,

007).

The

results

from

many arge-scale

assessments

suggest

thatwhen

compared

to

their

native

nglish-speaking

eers,

ELLs

lag

behind

n

all

grades

nd

content reas. For

example,

on recent

national ssessments

f

reading

nd

math,

nly

a

small minor-

ity

of

ELLs scored

at

proficient

evels

(4%

to 1

1%,

depending

on

grade

and

sub-

ject),

compared

o

a third r more

of

native

English speakers

National

Centerfor

Education

Statistics,

005).

According omany ducators,NCLB has succeededin ncreasing warenessof

the

cademic

needs and

achievement f

ELLs

through

ew

requirements

o evalu-

ate

schools,

districts,

nd

statesbased on

the

English

and content

utcomes

of

this

group

of earners

Center

n Education

Policy,

2006).

However,

ncluding

LLs

in

large-scale

assessments

is not a

straightforwardndertaking.

LLs

present

a

unique

set

of

challenges

foreducators and

policy

makersbecause

of

the central

role

played by

language proficiency

n

the

acquisition

and assessmentof content

area

knowledge.

Thus,

many

unanswered

uestions

remain bout

the

nclusionof

ELLs

in

large-scale

assessments;

foremost

mong

them re

questions

about how

valid inferences

bout

ELLs' abilities can be

made

based

on

scores

from

hese

assessments.

The

purpose

of this

study

was to determine he

effectiveness nd

validity ftest ccommodations orELLs taking arge-scale ssessmentsby using

meta-analysis

o

quantify

he

mpact

of the

specific

accommodationson the

per-

formance

f

ELLs and

native

English speakers.

Including

ELLs in

Large-Scale

Assessments

Historically,

ELLs

have often been

excluded from

arge-scale

assessments

because

limited

English

proficiency

as

thought

o

prevent

tudents

romunder-

standing uestions

nd/or

esult

n

invalid

estresults nder tandard est

dminis-

tration

rocedures

Rivera,

Collum,

& Shafer

Willner,

006).

Exclusion of

large

numbers

f students

rom

articipation

n

standards-based ests

not

only

can

result

in substantial istortionfthepercentage

f students

chieving roficiency

ut

lso,

more

important,

an obscure

important

nd

systematic

differencesn student

achievement

between

different

emographicgroups.

Thus,

one

of

the laudable

goals

of

NCLB and

state fforts

s to ncrease

participation

f all learners includ-

ing

those

n

dentified

ubgroups

in

large-scale

ssessments.

However,

t s

not

enough

for tudents

o

participate

n

state

ssessments;

tu-

dents'

participation

ust

ead to valid nferences

bouttheir chievement.

btaining

valid

results

s a

particularly

ressing

ssue

because the takesof mandated ssess-

ments

for

states,

districts,

nd schools

are

high.

NCLB

and state

accountability

systems

not

only place

considerable

pressure

n schools and districts

o

increase

participation

ates

n

arge-scale

ssessmentsbut lso

impose

sanctions

n

schools

that cannot move students n all identified ubgroupstowardproficiency.n

addition,

performance

n

large-scale

assessments s

increasingly

igh

stakes for

1169



4/35

Kiefferetal.

students:

By

2008,

28 states n the United States

will

require

hat tudents

ass

a

state-administeredestfor

high

school

graduation

Fuhrman,

003).

There s reasonfor oncern bout thevalidity f test cores f n fact hesereflect

individual

differences

n abilitiesthat

re distinct

rom

hose

that re

the

target

f

assessment

(American

Educational

Research

Association

[AERA],

American

Psychological

Association

[APA],

&

National

Council

on

Measurement

in

Education

NCME],

1999).

Because

language plays

an

integral

ole

n

most,

f not

all,

academic

learning,

ny

test f academic

achievement

s

also,

to some

degree,

test

f

anguage ability.

onsequently,

LLs

present

special

challenge

to schools

and

those nvolved

n

arge-scale

ssessment;

f ests

re

not

ppropriately

esigned

or

if ELLs are not

testedunder

ppropriate

onditions,

hen

anguage

demands

of

the

test hat

re notcentral

o the

target

f assessment

may

unfairly

nd

negatively

influencetheirperformance.Research conductedbyAbedi and colleagues has

demonstrated

hat here

s

indeed

a substantial

inkbetween

tudents'

nglish

an-

guage proficiency

nd their

erformance

n tests

f

math,

cience,

and

social stud-

ies

(e.g.,

Abedi &

Leon,

1999;

Bailey,

2005;

Butler

&

Castellon-Wellington,

005).

Furthermore,

lthough

here

may

be substantial

ifferences

etween

ELLs

and

their

peers

in content

knowledge,

research

hows

that

he size

of

this

knowledge

gap

often

depends

on

the

anguage

demands

of the assessment.

Several

correlational

studies

have found hat

ssessments

nd

individual est

tems

thathave

more

in-

guistic

complexityyield

larger

performance aps

between

ELLs

and

non-ELLs

(e.g.,

Abedi, Leon,

&

Mirocha, 2003;

Abedi,

Lord,

Hofstetter,

Baker,

2000;

Abedi,

Lord,

&

Plummer, 997;

Martiniello,

007).

Thesefindingsuggest hat contraryo somepopular onceptions assessments

in all domains ssess

anguage

kills s

well as content

nowledge

nd

skills.

However,

such

a

relationship

oes

not ead

directly

o the

conclusion

hat

alid

nferences

an

neverbe

made about

the content

nowledge

f

ELLs from

arge-scale

ssessments.

Rather,

he

key

question

s to what

extent he

anguage

skills

measured

by

these

assessments

re essential

o the

construct

argeted

y

the

test

nd,

n

turn,

o

what

extent

hey

measure

anguage

demands

hat re

rrelevant

o the

cademic

kills

being

assessed.

Use

of

Accommodations

or

ELLs

TakingLarge-Scale

Assessments

Making specificchanges

to the test

format

r the conditions

under

which

stu-

dents

are tested s one method hathas been

proposed

tominimize he nfluence

on content rea

test

performance

f variation

n

ELLs'

language

skills that

s not

central

to the construct

eing

assessed.

Such

test accommodations

nclude

any

alteration

o standard est

dministration

rocedures

designed

to

provide

support

for studentsbased

on their

pecial

needs

without

hanging

the construct

eing

assessed

(AERA,

APA,

&

NCME,

1999).

These

procedures

nclude

the

presenta-

tion

of the assessment

tems,

he

ways

in

which students

espond

o

the

tems,

ny

equipment

r materials

o be

used,

the

period

of time

llowed

to

complete

he

est,

and

the environment

n which

students

ake the test.

There

are

as

many

s 75

dif-

ferent ccommodations

currently

n

use

with

ELLs,

although

not all

of them

re

appropriate.Moreover, heir electionand implementationarybystate nddis-

trict

for

review

of state

policies

on accommodations

for

ELLs,

see

Rivera et

al.,

2006).

1170



5/35

Accommodations

or

ELLs

An

appropriate

ccommodationfocuses

on

those extraneous actors hat

ffect

the test

cores of studentswith

pecial

needs but that

re

not the

target

f assess-

ment.An exampleof an appropriateccommodationwouldbe toprovide large-

print

version of a

test to a studentwith a visual

impairment.

t

the

same

time,

accommodations

hould

not

provide nappropriate upport

r

change

thenature f

the

task such that

esulting

cores no

longer

llow valid

inferences

bout the

cen-

tral

construct

eing

measured. An

example

of an

inappropriate

ccommodation

would

be to rewrite

he

passages

in

a

reading

omprehension

ssessment n a

way

that lters

heir

undamental

ifficulty

evel.

Thus,

for

ELLs,

appropriate

ccom-

modations

provide

direct

or indirect

inguistic

upport

o

minimizethe

negative

impact

of irrelevant

anguage

demands

on

students'

erformance

o that he stu-

dents

an

demonstrate

heir

ontent

nowledge

nd academic skillsto the

greatest

extent ossible.

Evaluating

Accommodations

or

ELLs in

Large-Scale

Assessments

Theoretically peaking,

many

accommodations

that offer

inguistic upport,

such

as

providing

ictionaries

r

simplifying

he

English

sentence tructuref the

test

tems,

may

ndeed be

appropriate

orELLs.

However,

because content

nowl-

edge

is

inextricably

inked

to

language,

the use of certain

anguage supports

or

ELLs

may

notbe as

straightforward

s

providing

large-print

ersion f an

assess-

ment o

a studentwith

visual

mpairment;

ven

anguage-based

ccommodations

that re

grounded

n

theorymay

n

practice

e

ineffective

r threaten he

validity

f

scores.

Thus,

the election

f accommodations

or LLs must e based on

empirical

evidencefor heir ffectivenessndvalidityAbedi,Hofstetter, Lord,2004).

Although

ccommodations

for

ELLs can

be

evaluated

along

several dimen-

sions,

evaluating

ccommodations

for

effectiveness

nd

validity

s of

paramount

importance.

Effectiveness

efers

o the extentto

which

students

receiving

the

accommodation

demonstrate

mproved

est scores.

In

contrast,

he

validity

f

an

accommodation

refers,

n

part,

to the

notion that the accommodation should

improve

he

performance

f students

who

require

t

but

not

affect

he

performance

of students

who do not.

f an accommodation

ffects he

performance

f

students

who

do not

require

t,

hen

providing

he ccommodation

o

some

students utnot

others

would threaten

he

validity

f the

resulting

est cores.

If

an assessment s

valid foruse with specificgroup, hen tudentswho do notrequirethe accom-

modation

will

be neither

dvantaged

nor

disadvantaged

y

receiving

t.

A

growing

body

of

empirical

esearchhas evaluated

ccommodations

or

LLs,

but heresults

of these

individual

studies

have

yet

to be

quantitatively

ynthesized

o

produce

aggregate

stimates

f

their ffectiveness

nd

validity.

Moreover,

nvestigation

f factors

hat

may

potentially

moderate

he effective-

ness

of

these accommodations

(e.g.,

grade

level,

domain

tested,

anguage

of

instruction)

s

needed.

It

is

possible

that

a

given

accommodation

will be

more

effective

or ests

n some

domains than

for ests

n

otherdomains or

that

ccom-

modations

will

be

more effective

t some

grade

levels than at others.Curricular

content

nd

corresponding

measures

of achievement

hange

with

respect

o

both

difficultyNationalCenteron Education and theEconomy,1998) and thenature

of the skills tested

e.g., Koenig

&

Bachman, 2004;

RAND

Mathematical

Study

Panel,

2003;

RAND

Reading

Study Group,

2002)

over

the course of the

grade

span,

thus

potentially nfluencing

he effectiveness f

specific

accommodations.

1171



6/35

Kieffer

t l

This s

particularly

mportant

n

the ontext fELLs' test

erformanceiven

he

differinganguage

emands f

cademic asks

ver ime nd he

anguage

emands

specificodifferentomains ested. orexample,he ourthrademathestmay

emphasize

nd

prioritize

hildren's alculation

kills,

whereas

he

ighth

rade

tests

n

the ame content

rea of math

may mphasize

omplex

word

roblems

with

ophisticatedanguage. inally,

valuating

ccommodations

or his

opula-

tion

must

urther

ecognize

otential

ources

fdifferential

ffectiveness

y

focus-

ing

on the nstructional

nd

inguistic

ontext

n

which

he

esting

s

occurring,

given

he

iffering

odels

f

nstruction

ffered

or LLs

(Abedi

t

al.,

2004).

Present

tudy

The

purpose

f his

tudy

s

to

evaluate

he

ffectivenessnd

validity

f ccom-

modationsor LLsparticipatingn arge-scalessessments.wonarrativeeviews

(Abedi

t

al.,

2004;

Sireci, i,

&

Scarpati,

003)

have

previously

ynthesized

he

findings

f

studies n

test ccommodations

or LLs

published

efore 001.

The

present

tudy

as

designed

obuild n this

work

n

two

ways.

irst,

sing

meta-

analytic pproach,

he

urrent

tudy uantifies

he

verage

ffects

f the ccom-

modations tudied.

econd,

hecurrent

tudy

pdates

he

findings

f

previous

reviews

y ncluding

he

indings

f

several

tudies

ublished

ince 001

as well

as those

reviously

eviewed. iven

he

otential

ources f

differential

ffective-

ness

of

accommodations

iscussed

bove,

the

meta-analysis

lso

includes

n

examination

f

several

moderatorsf effects.

he

analyses

were

guided

y

two

specific

esearch

uestions:

1. What

evidence xists

hat

pecific

est ccommodations

re

effective

n

improving

he

performance

f ELLs

takingarge-scale

ssessments?

hat

evidence xists

hat hese ffects iffer

s a function

f the

grade

evelof

students,

omain

ested,

rovision

f

xtra

ime,

r

anguage

f nstruction?

2. What vidence xists

hat

pecific

est ccommodations

esigned

or

LLs

are

valid

n

arge-scale

ssessments?

Method

Study

nclusion

riteria

Based on ourresearch

uestions,

e selected

our haracteristics

hat

ormed

the riteriaor nclusion

f studies hat

rovide

mpirical

vidence

or

valuating

accommodationsor LLs. We ncludedtudies

n

he

meta-analysis

hat

a)

exam-

ined

ndividual ccommodations

r

individual

ccommodations

undledwith

extra

ime,

b)

were articles

ublished

n

peer-reviewed

ournals

r technical

reports

vailable

online,

c)

employed

n

experimental,

uasi-experimental,

r

repeated

measures

esign,

nd

d)

reported

ufficientata o allow

for he

stima-

tion feffect

izes.

Search

procedure.

tudies

or eviewwere

obtained

hrough

wo searches on-

ductednJuly006designedo nclude ll studies vailable ptothat ime. irst,

we conducted

comprehensive

earch

f online

atabases,

ncluding

ducation

Resources nformation

enter,

PsycINFO,

Modern

Language

Association,

1172



7/35

Accommodations

or

ELLs

Education

Abstracts,

nd

Academic Search Premier

which

yielded

1 14

entries),

as

well as the

online database

of the National Center

for

Research on

Evaluation,

Standards, nd StudentTesting whichyieldedan additional 27 entries,manyof

them

redundant

with

the

1 14

previously

found).

The abstract f each

identified

citation

was

read to determine

f t was an

empirical tudy xamining

he ffects f

one

or more

ccommodations.

econd,

we collected citations f studies

previously

reviewed

by

Sireci

et al.

(2003)

and/or

y

Abedi

et al.

(2004).

Based on the ist of

citations

f

empirical

tudiesfrom

he wo

searches,

we collected technical

eports

as

well as

articles.

However,

we did not collect

presentations

t academic confer-

ences

because

of both

practical

nd

quality

concerns.

n

several

cases,

the results

of

a

single

study

were

reported

n

multiple

documents;

n

such

cases,

the docu-

ments

were linked

together

nd cross-checked

for

complete

nformation

nd

the

mostrecent ocument s cited here.

Excluded

studies.

The

search

procedure

bove

yielded

21

studies or

possible

nclu-

sion

n the

nalyses.

However,

everal

fthese

tudies,

ncluding

ome cited

n

previ-

ous

reviews,

ad to

be excluded

from he

meta-analysis

or easonsof data

reporting

or

methodology.

n

three nstances

N.

E.

Anderson,

enkins, Miller,1996; Hafner,

2001

Lotherington-

oloszyn,

1993),

the tudiesdid

not

report

he

necessary

nfor-

mation

o

quantify

he effects

f accommodations

eparately

orELLs and native

English

peakers.

n two

cases

(Abedi

&

Hejri,

2004;

Shepard, aylor,

Betebenner,

1998),

studies

xamined

he ffect

f various

ccommodations hosenfor ndividual

students

y

their

eachers

nd thus

were

inappropriate

or

xamining

he effect f

specific ccommodations.nonecase,a previouslyited tudyMiller,Okum,Sinai,

&

Miller,

1999)

was a conference

resentation.

After

xcluding

he tudies

bove,

a

totalof 15 studiesremained.Of these

tud-

ies,

4

(Abedi

&

Lord,

2001;

Albus,

Thurlow,Liu,

&

Burlinski,

005;

Castellon-

Wellington,

000;

Johnson

Monroe,

2004)

employedrepeated

measures

designs

in which

he

ame

group

of students

was

testedwith nd without ccommodations.

Because

the

preponderance

f the tudies

o be

included

mployedbetween-groups

designs

and

because

effect

izes

from

epeated

measures

designs

are

not

strictly

comparable

o those

from

etween-groups

tudies,

esults

rom

hese

studieswere

not ncluded

n the

formal

meta-analysis

ut were

considered

n

our

findings.

Studies ncludedinMeta-Analysis

In

all,

1 1 studies

were

ncluded

n the

meta-analysis

with

total

of

23,999

par-

ticipants

17,445

native

English

speakers,

6,554

ELLs).

Of these

studies,

6

were

conducted

by

Abedi

and

colleagues,

whereas 5

otherswere conducted

by

other

research

eams

(i.e.,

M.

Anderson,Liu,

Swierzbin,

Thurlow,

&

Bielinski,

2000;

Brown,

1999;

Garcia

Duncan

et

al.,

2005;

Hofstetter,003;

Rivera

&

Stansfield,

2004).

With

respect

to

design,

8

were

true

experiments,

n

which students

were

randomly ssigned

to accommodated

or

unaccommodated

onditions,

whereas 3

(Abedi,

Courtney,

Leon,

2003a;

Abedi,

Courtney,

Leon, 2003b; Brown,

1999)

were

classified

s

quasi-experiments

ecause

of

factors

pecific

to each

study.

n

thestudybyBrown 1999), the mechanism fassignments unclear n thereport

and

could

not be

confirmed

hrough

ommunications

with the

study

author

or

school

personnel

nvolved

n

the

study.

Observed

pretest

ifferences

etween the

two

groups

were

negligible.

In the

study by

Abedi,

Courtney,

t al.

(2003a),

1173



8/35

Kieffer

t l.

students

were

originally ssigned

at

random

o

a treatment

ondition;however,

ot

all

students

andomly ssigned

were

actually

provided

he ccommodation ecause

of limited pace and equipment. imilarly,n thestudybyAbedi,Courtney,tal.

(2003b),

only Spanish speakers

were

randomly ssigned

to a

bilingual

dictionary

condition,

although

the

control

group

included

students

with native

anguages

other han

Spanish.

The

findings eported

elow

were

largely

robust

o

the nclu-

sion or exclusion

of these three tudies.

All but 1 of the

1 1

studies

used

multiple

amples

to

nvestigate

ifferent

ccom-

modations and/or

single

accommodation

provided

n

multiplegrades.1

Thus,

together

he tudies

yielded

38

different

ests f the ffectiveness

f

specific

ccom-

modationsfor

ELLs as well

as

30

testsof the

validity

f

accommodations.

Of

the

38

testsof

effectiveness,

4

involved students

n fourth

rade

n

=

1

1)

or

eighth

(n

-

23) grade,whereas4 involved studentsn fifth rade n

=

2) or sixthgrade

(n

=

2).

Of

the 38 testsof

effectiveness,

7 used

a math est

s the outcome

mea-

sure,

20 used

a science

test,

nd

1

used

a

reading

est.Of

these

effects,

9

used the

NationalAssessment f

Educational

Progress

NAEP)

assessment

r NAEP

items

(n

=

23)

or itemsdrawnfrom

he NAEP and

Trends

n International

Mathematics

and

Science

Study

assessments

(n

=

6).

Only

9

effects

were based

on a state

accountability

ssessment

8

of

which came

from wo studies

using

the

Delaware

StateTest and

1

of whichcame from

study sing

the

Minnesota

tate

est).

Of

the

1

1

studies,

reported

hat tudents

wereclassified

s

ELLs based

on school

records

of a limited

nglishproficient

r

ELL

designation,

whereas

ELL classification

was

not

reported

n

the

remaining

tudies.

Although

his

suggests

consistency

n

ELL classification cross studies, t s importanto notethat he criteria or uch

school-based

designations

an

vary

onsiderably

cross states

nd districts

Ragan

&

Lesaux,

2006).

Appendix

A

provides

detailed

nformation

n the

design

of each

study

nd the characteristics

f the

participants.

In their

review

of state assessment

policies

regarding

LLs,

Rivera

and col-

leagues

(2006)

identified

5

accommodations

hat re

currently

made available

to

ELLs.

Of

these,

hey

ound

roughly

7

that re considered

potentially

ppropriate

insofar s

they

re

specially

designed

to

address the

inguistic

needs

of

ELLs. In

contrast o

thisbreadth f

accommodations

ffered o

ELLs

by

states,

he

11 stud-

ies and 38 testsof the effectiveness

f

specific

ccommodations

focused

on

only

seven differentypesof accommodation: implified nglish n

=

16),English

dic-

tionary

r

glossary

n

= 1

1),

bilingual

dictionary

r

glossary

n

=

5),

extratime

(n

=

2),

Spanish language

test

n

=

2),

dual

language questions

n

=

1),

and

dual

language

booklet

n

=

1).

In

addition to the

two effects

hat ncluded

extratime

alone,

seven estimated ffects

ame from tudies

hat nvolved

xtra

imebundled

withone of three ther ccommodations:

implified

nglish

n

=

2),

English

dic-

tionary

n

-

3),

or

bilingual

dictionary

n

=

2).

One

study

Abedi,

Courtney,

Mirocha,

Leon,

&

Goldberg,

2005)

allowed

extra ime to

participants

n

both the

control nd treatment

onditions;

his

tudy

was notcoded as

evaluating

he

effect

of extra ime.

All

but two of

the

reported

ffect ize

estimates

re based

on

paper

and

pencil

tests;

he

remaining

wo used

computerized

ssessments.

Because technical eportswere ncluded n addition opublished rticles, here

is littlereason

to

believe that

publication

bias

would have

led to the

nflation f

effect

izes.

Nonetheless,

o

nvestigate

he

possibility

hat

heresults f

studies

with

nonsignificant

esults

were

more

ikely

o

go

unreported

han

hosewith

ignificant

1174



9/35

Accommodations

or

ELLs

results,

we

plotted

he tandard rror f

Hedges's

g

tatistic

gainst

hevalue

of the

Hedges's

gu

tatistic or ach

study.

nspection

f this

plot

revealed hefunnel

hape

we wouldexpect nthe absence of substantial ublication ias,with ampleswith

more

precise

estimates

yielding

ffect izes closer to the mean and

little vidence

of a

gap

in

which

unreported

onsignificant

ffect

izes would

occur.

Accommodations

hat Have Been Evaluated

Empirically

As

mentioned,

n the tudies

eviewed,

even

different

ypes

f

accommodations

were

evaluated:

implified

nglish, nglish

dictionaries r

glossaries,

ilingual

dic-

tionaries

r

glossaries,

ests

n

thenative

anguage,

dual

language

test

ooklets,

ual

language

questions

for

English passages,

and extra ime.Each of

these

s

theoreti-

cally

ustifiable

or LLs

insofar

s

they

re

designed

o address he

anguage

needs

oftheELLs by minimizing ariationnscores because of construct-irrelevantan-

guage

abilities.

With he

single exception

of dual

language questions,

he accom-

modations

were studied

xclusively

with ests

f

math nd

science.

Simplified

nglish

nvolves

hanges

n the

vocabulary

nd

grammar

f test tems

to

eliminate

rrelevant

inguistic omplexity

while

maintaining

he

same

content

vocabulary

nd evel of

complexity

n

the

ontent ask.These

changes

nclude lim-

inating

are

vocabulary

nrelated o

the

content,

hortening

r

simplifying

entence

structure,

eplacing

passive

voice with active

voice,

and

replacingcomplex

verb

forms

with

present

ense

verbs

for

a

description,

ee

Abedi et

al.,

1997).

English

dictionaries

r

glossaries

involve

providing

efinitionalnformationn

English

n

some

form,

ncluding

tandard

ictionaries,

ictionaries ustomized o the assess-

ment, rglossariesfor pecificwordsusedinthe ssessment.Hereagain,the ntent

is to

provide

efinitional

nformationbout

words

hat

re

necessary

o

comprehend

the

askbut

do not

represent

ey oncepts

fthe ontent.

imilarly,

ilingual

diction-

ary,glossary,

r

marginal

glosses

provide

bilingual

tudents

with

ccess to defini-

tions

r

direct ranslations

f

selectednoncontent

ords n students' ative

anguage.

Another

varianton

this accommodation

nvolves

providing

marginal

glosses

explanatory

otes

written

n the

margin

f

the ext n the tudents' ative

anguage.

Threeother

ccommodations

nvolve

heuse

of

native

anguage

n

the est tself.

Native

anguage

versions

f tests nvolve

dapting

ests

nto he

native

anguage

of

students.

he most ommon

method f

adapting

test o another

anguage

s

to

use

back translation;he test s translated rom heoriginal anguage into the native

language

by

a biliterate

est

maker.This

adapted

test s thentranslated ack

into

the

original

anguage

by

an

independent

ndividual,

nd

thetwo

original anguage

tests

re

compared

for

quivalence.

This

process

not

only

s resource ntensive ut

also

can

introduce dditional

hreats

o

validity

ecause

of the

difficulty

n main-

taining quivalence

in the

constructmeasured

American

nstitutes f

Research,

1999).

Dual

language

assessments

nvolve

est

booklets,

n

which

English

versions

and

native

anguage

versions

of the same

item are

placed

on

facing pages.

Two

types

of dual

language

testshave been

investigated

dual

language

booklets

n

which all

items

on

mathtest

are

presented

n two

languages

and

dual

language

questions

n which

a

readingpassage

is

presented

n

English,

followed

by ques-

tionsreadaloud intwo anguages.

Finally,

ne of the

most

frequently

sed accommodations or LLs is to

provide

extra

ime

o

complete

the

test.The theoretical

ationale

s

thatELLs will be able

to

demonstrate

heir ontent

nowledge

nd skillsbetter

f

given

dditional

ime

o

1175



10/35

Kiefferetal

work

through

he

anguage

demands of the test.

Often,

xtra

time s

provided

n

combinationwith nother

ype

of

accommodation,

n which case the

rationale

s

to allow students he timerequired ouse the accommodation e.g., to use a dic-

tionary

o look

up

the

meanings

of unknown

words).

Methods

or

Meta-Analysis

To evaluate

the

appropriateness

nd

practical

mportance

f test

ccommoda-

tions for

ELLs,

three

ets of

meta-analyses

were

conducted.

First,

preliminary

meta-analysis

was

conducted

o

compare

the academic

achievement

est

cores

of

ELLs

in the bsence

of accommodations

with hose

of native

nglish peakers.

his

first

nalysis

was undertaken

n

an

effort

o describe

the

magnitude

f

differences

in test cores between

ELLs

and non-ELLs

in the

absence

of accommodations.

t

is this et ofdifferenceshat he accommodations re intended ohelpameliorate,

and thus

t erves s

a metric or

udging

he

magnitude

f the

ffect izes

for ccom-

modations.

he second

analysis

ddressed

he ffectiveness

f accommodations

y

estimating

he

degree

to whicheach accommodation

ed

to

improved erformance

forELLs.

The third

nalysis

ddressed

validity

f

the ccommodations

y

estimat-

ing

the

mpact

of

the accommodations

n the

performance

f

non-ELLs,

with

he

assumption

hat

valid accommodation

hould

have

no

significant

ffect

n their

performance.

o

compute

average

effect

izes,

we treated

ach

study

ample

as

the

unit of

analysis,

yielding

38

tests

of

effectiveness.

We made

this

decision

because

effects f different

ccommodations

that

were

derived

from

he same

study

were

based

on

different

amples

of students.

Although

ffect

izes

derived

from hesame study annotgenerallybe considered ndependent,n thepresent

case

multiple

effects

rom he

same

study

were

not

generally

nvolved

in eval-

uating

the effects

f

any particular

ccommodation.

That

is,

studies

contributed

multiple

ffects cross

the set

of

accommodations

butdid

not

typically

ontribute

multiple

ffect

izes for

ny single

accommodation.

nsofar

s thenet

ffect f

this

nonindependence

s to

reduce

the standard

rror

f the mean

effect

ize,

it

will

be

seen that

ny

failure

f this

trategy

o

fully

ddress

the ssue

of

nonindependence

would not alter

he

general

conclusions

from he

analyses

of

mean effect

izes.

To

compute

average

effect izes across

the

entire et

of

samples

and

for all

samples

addressing pecific

accommodations,

we

averaged

across different

ut-

comes and

grades.2

n

averaging

he different

ffect

izes,

we

weighted

he

ndi-

vidual effect izes

according

o their

recision.

As ourmeasureof effect

ize,

we

first

omputed

the mean

difference

n

performance

etween

ELLs

receiving

he

accommodated

est

nd ELLs

taking

he estwithout

ccommodations.

For

analy-

ses of

validity,

his

difference

as

computed

for

non-ELLs

taking

he

accommo-

dated test

with nd those

taking

he est

without

ccommodations.)

This difference

in

mean

performance

as

then tandardized

sing

the

pooled

within-groups

sti-

mateof the tandard

eviation.

This measure

of effect

ize is

thecommon

Cohen's

d,

which

s known o be biased

in small

samples.

We

therefore

orrected

his

mea-

sure of effect ize

using

a transformation

f

recommended

by

Hedges

(1981)

to

produce

estimates

n

Hedges's

gu.

These

estimates

were

computed

directly

rom

the means and standard eviations eportedn thestudiesby usinga programmed

routine

n

the

Comprehensive

Meta-Analysis

Version

2)

software

Borenstein,

Hedges,

Higgins,

&

Rothstein,

005).

1176



11/35

Accommodations

or

ELLs

In additiono

stimating

hemean ffectizefor ach

ccommodation,

e nves-

tigated

hetherther

spects

f he ccommodationreatment

oderatedhe ffect

ofthe ccommodations;hesemoderatorsncludedhegradeevel fthe tudents,

thedomain ested

math,

cience,

r

reading),

hether

he

estwas based on the

NAEP

or

state

est,

nd

whetherhe

ccommodationas bundled ith xtra ime

or

provided

lone.

Using

ROC

MIXED in

SAS

(SAS

Institute,

999),

two-level

hierarchical

inearmodel

HLM)

was

fitted,

nwhich evel

1

equations

epresented

the evel

f he ffectize

for ach bservationndLevel

2

equations epresented

he

study

evel,

where

tudy

haracteristics

includingype

f

accommodations

well

as

moderating

actors)

hat erved

o

explain

ariation

neffectizes

were ncluded

(Raudenbush

Bryk,

002).

We

first

ittedn unconditional

odel,

n which an-

dom

ffectsariance

t Level

1 was

specified

o be

thevariance ue

to

sampling

error ithinamplewhichwas assumed nownndgiven ythe quare f the

standardrror f

he

Hedges's

u

tatisticrom acheffectize

estimate)

ndLevel

2 variance as

specified

o be thevariance

n

Hedges's

gu

tatistics

ttributableo

differences

etween

amples.

ext,

we

fitted set

of

conditional odels

n

which

dummy

ariables

or he

ype

f ccommodationnd ther

otential

oderatorari-

ables

were ncluded

tLevel

2 todeterminef

hey xplained

ariationn the ffect

sizes

between

amples.

o determine

f

given

ariable

xplained tatisticallyig-

nificant

ariation

n

ffect

izes,

we

examinedhe

hange

n

goodness

ffit etween

models

sing

he

hange

n -2

log

likelihood tatistic

A-2LL)

and

conducted

significance

est

y omparing

his

tatistico

chi-square

istributionith

degree

of

freedom.

n addition

o

nvestigating

oderator

ffectsecause f

ype

f ccom-

modation,nalyseswere onductedo determinef he ffectsor pecificccom-

modations

iffereds a

function

fa characteristic

f

the

tudies

hemselves

e.g.,

whether

he

tudy mployed

n

experimental

r

quasi-experimentalesign,

he

grade

evel f he

tudents,

ontent

omain

measured).

Results

Preliminarynalyses: ifferences

nAchievement

Test

cores

Between

LLs and Native

nglish

peakers

Before

ddressing

he

uestion

f

effectiveness

f

accommodations,

e esti-

mated

he

verage

ifference

n academic chievementest

cores

etween

LLs

andnativenglishpeakershatanbeexpectedn arge-scalessessments.hese

estimates

rovide

context

or

valuating

he

practical

mportance

f the ffects

of

accommodations.

able

1

presents

everal

stimates f

themath

nd

science

achievement

aps

between

LLs and native

nglish peakers.

he

top

half

of

Table

1

presents

ean

ffect

izes

reported

s

Hedges's

gu

tatistics)

or hedif-

ferences

nmath

nd cience

chievement

cores etween

LLs

andnative

nglish

speakers

n

he

naccommodated

onditions

romhe tudies

eviewed. hese sti-

mates

uggest

hat

here re

arge

chievementcore

differencesetween he wo

groups

cross

hese

grades

nd

domains

f

knowledge,

ithmean

effect izes

ranging

rom ix

tenths

othree

ourths

f standardeviation.

hey

lso

suggest

that

he chievement

ap

differs

y

test

omain o

some

xtent,

ith

arger aps

present

nscience hann math.

Although

hese

ifferencesetween LLs andnon-ELLs re

quite

ubstantial,

they

re omewhat

mall

n

comparison

o estimates

fthe

chievement

ap

from

1177



12/35

TABLE 1

Estimates

f

he chievement

core

ifferences

etween

nglishanguage

earners

andnative

nglish peakers

nmathnd cienceromtudieseviewedasHedgesys

gu)

nd

from

he

005 National

Assessment

f

Educational

rogress

as

Cohen's

d;

National enter

or

Education

tatistics,

005)

95%

confidence

interval

Number

Mean effect

Lower

Upper

of studies

size

limit

limit

By

domain3

Math 7 0.604 0.279 0.929

Science

11

0.748

0.581

0.914

2005

National Assessment

of Educational

Progress

4th

grade

math

0.831

0.799

0.864

8th

grade

math

1.006

0.964

1.047

4th

grade

science

1.051

1.008

1.094

8th

grade

science

1.227

1.177

1.277

a. The chievement

core ifference

n

reading

asnot stimated

ecause

nly

single

tudy

xamined

his

domain.

national

tudies. or

xample,

s another

oint

f

reference,

he ottom

alf

fTable

1

presents

stimates

f he

chievement

ifference

etween

ative

nglish

peakers

and ELLs from

he

2005

NAEP.3

hese

estimates

re

expressed

ppropriately

s

Cohen's

d because

of

the

arge ample

n

which he

stimates

re

based.

These

estimates

re

ppreciably

arger

han

hose

rom he tudies

eviewed,

ith

hree f

the

fourdifferences

reater

han

one standard

eviation.

s with he

studies

reviewed,

he

ap

was

arger

or cience

han

ormath

ndfor

ighth rade

tudents

compared

ofourth

rade

tudents.

hedifference

n

magnitude

etween

he

NAEP

estimatesndthose

rom he tudies

eviewed

may

e

because

f he

onfounding

of oncomitantredictorsf chievement,uch spoverty,nthenationalamples,

which

ikely

rebetterontrolled

y

the

design

f the

esearch

tudies

f accom-

modations.

ll ofthe tudies

eviewed

ampled

LL

andnative

nglish-speaking

students

rom ithinhe ame

chools nd/or

istricts,

hereashe

NAEP

estimates

are based

on

a

nationally

epresentative

ample.

he NAEP

estimates

may

hus

capture

more f thevariation

ue to

differencesetween

he

chools ttended

y

ELLs and

those ttended

y

native

nglish

peakers

s well

s those

oncomitant

demographic

haracteristics

hat end

o affect

chievement

f

at-risk

opulations

innational

amples

ut

whose ffects

remasked

hen esults

re

disaggregated

n

only

single

imension.

evertheless,

oth ets

f stimates

ndicate

hat

here re

large

bserved ifferences

n

achievement

nbothmath

nd cience

etween

LLs

andnativenglish peakersn arge-scalessessments,uggestinghat nemetric

by

which

we can

udge

he

ffectiveness

f ccommodations

s the xtent

owhich

they

educe

hese

pparent

chievement

aps.

1178



13/35

1

i

i

v>

S

1

I

I

I

.s

s

1

t)

1

I

1

11

;t

Os t^ os

^h

en

,- wi

,-

t

O

co

vo

O

O

OsQQO

>*

V

* *

' P*

P* P

^0000

^

' *

*

V

'

V V V

p

'*>

^

||

^

- -

^

w

2

.s

p

ooocovo

n^Tcsr-

rt

en

en

tJ-'

'

Tt

i-^

^*

(N

--h ^h

VO CN

Os

-

(OsOsO'-H H

1-H

Os

^>

r-oooNCNON'-H

m

i-

i

3

(Nor^r^oso r*

oo

S

S

^

n

^-

r

^

r

o

-

v

^

T

o

n

o

n

e

n

n

n

"

n

w

o

n

o

e

n

c

I

d

r

n

n

N

r

v

r

n

n

C

O

0

u

N

C

N

0

O

(

N

0

w

r

o

v

O

n

o

n

i

n

m

o

r

o

n

o

C

r

O

m

15. Accommodations on Large-scale Assessment

Documents

Transcript of 15. Accommodations on Large-scale Assessment