15. Accommodations on Large-scale Assessment
Transcript of 15. Accommodations on Large-scale Assessment
-
7/24/2019 15. Accommodations on Large-scale Assessment
1/35
Sage Publications, Inc. and American Educational Research Association are collaborating with JSTOR to digitize, preserveand extend access to Review of Educational Research.
http://www.jstor.org
Accommodations for English Language Learners Taking Large-Scale Assessments: AMeta-Analysis on Effectiveness and ValidityAuthor(s): Michael J. Kieffer, Nonie K. Lesaux, Mabel Rivera and David J. FrancisSource: Review of Educational Research, Vol. 79, No. 3 (Sep., 2009), pp. 1168-1201Published by: American Educational Research AssociationStable URL: http://www.jstor.org/stable/40469092Accessed: 15-11-2015 00:20 UTC
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.For more information about JSTOR, please contact [email protected].
This content downloaded from 161.112.232.221 on Sun, 15 Nov 2015 00:20:46 UTCAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/http://www.jstor.org/action/showPublisher?publisherCode=aerahttp://www.jstor.org/stable/40469092http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/stable/40469092http://www.jstor.org/action/showPublisher?publisherCode=aerahttp://www.jstor.org/ -
7/24/2019 15. Accommodations on Large-scale Assessment
2/35
Review
f
Educational
esearch
September
009,
Vol.
9,
No.
3,
pp.
1168-1201
DOI: 10.3102/0034654309332490
2009AERA, ttp://rer.aera.net
Accommodations
or
English
Language
Learners
TakingLarge-Scale
Assessments:
A
Meta-
Analysis
n
Effectiveness
nd
Validity
MichaelJ.Kieffer ndNonieK. Lesaux
HarvardGraduate
chool
of
Education
Mabel Rivera
nd
David
J.
Francis
University
f
Houston
IncludingEnglish language
learners
ELLs)
in
large-scale
assessments
raises
questions
bout the
validity
f
nferences
ased on
their cores.
Test
accommodations
or
ELLs are intended
o reduce
the
impact
of
limited
Englishproficiency
n the assessment
f
the
target
onstruct,
most
often
mathematicr science
proficiency.
his
meta-analysis
ynthesizes
esearch
on the ffectivenessndvalidityf uch ccommodationsorELLs. Findings
indicate hat one
of
the even ccommodations
tudied
hreaten
he
validity
of inferences.
owever,
nly
one
accommodation-
roviding
nglish
dic-
tionaries r
glossaries-
has
a
statisticallyignificant
ffect
n
ELLs
perfor-
mance,
nd this
ffectquates
to
only
small
reduction
n the chievement
score
gap
between
LLs and native
nglish peakers.
indings uggest
hat
accommodations o
reduce the
mpact
f
limited
anguage
proficiency
n
academic skill ssessment
re
not
articularly
ffective.
iven
his,
we
posit
a
hypothesis
bout the
necessary
ole
of
cademic
anguage
kills n mathe-
matics nd science ssessments.
Keywords:
achievement
gap,
assessment,
English
language
learners,
high
stakes
testing, anguage development.
As the tandards
movement
n
education
has
gained
n
momentum,
olicy
mak-
ers have
increasingly
ocused
on test-based
ccountability
ystems
with
the
goal
of
mproving
cademic achievement
or ll children.
he
principles
f
setting igh
standards,
ssessing
all
students elative
o those
standards,
nd
holding
schools
accountable for tudent
chievement ave
long
been central
o reform
movements
in
public
education
(e.g.,
Fuhrman,
2003).
However,
since the
No Child
Left
Behind
Act of 2001
(NCLB),
the
application
of
these
principles
o
subgroups
f
studentsdentified s particularlytriskfor cademicdifficultiesas becomevery
important.
1168
This content downloaded from 161.112.232.221 on Sun, 15 Nov 2015 00:20:46 UTCAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
7/24/2019 15. Accommodations on Large-scale Assessment
3/35
Accommodations
or
ELLs
One
of these
ubgroups
onsists
f studentswho ack full
proficiency
n
English,
commonly
eferred
o
as
English
anguage
earners
ELLs).
ELLs
represent
ne of
thefastest-growingroupsamongtheschool-aged population n thisnation e.g.,
Capps
et
al.,
2005).
Speaking
a wide
variety
f
languages,
this
group
almost dou-
bled in size
between
1980
and
2000,
and
the most
recent stimates
place
the size
of
the
population
t more han million
e.g.,
Batalova, Fix,
&
Murray,
007).
The
results
from
many arge-scale
assessments
suggest
thatwhen
compared
to
their
native
nglish-speaking
eers,
ELLs
lag
behind
n
all
grades
nd
content reas. For
example,
on recent
national ssessments
f
reading
nd
math,
nly
a
small minor-
ity
of
ELLs scored
at
proficient
evels
(4%
to 1
1%,
depending
on
grade
and
sub-
ject),
compared
o
a third r more
of
native
English speakers
National
Centerfor
Education
Statistics,
005).
According omany ducators,NCLB has succeededin ncreasing warenessof
the
cademic
needs and
achievement f
ELLs
through
ew
requirements
o evalu-
ate
schools,
districts,
nd
statesbased on
the
English
and content
utcomes
of
this
group
of earners
Center
n Education
Policy,
2006).
However,
ncluding
LLs
in
large-scale
assessments
is not a
straightforwardndertaking.
LLs
present
a
unique
set
of
challenges
foreducators and
policy
makersbecause
of
the central
role
played by
language proficiency
n
the
acquisition
and assessmentof content
area
knowledge.
Thus,
many
unanswered
uestions
remain bout
the
nclusionof
ELLs
in
large-scale
assessments;
foremost
mong
them re
questions
about how
valid inferences
bout
ELLs' abilities can be
made
based
on
scores
from
hese
assessments.
The
purpose
of this
study
was to determine he
effectiveness nd
validity ftest ccommodations orELLs taking arge-scale ssessmentsby using
meta-analysis
o
quantify
he
mpact
of the
specific
accommodationson the
per-
formance
f
ELLs and
native
English speakers.
Including
ELLs in
Large-Scale
Assessments
Historically,
ELLs
have often been
excluded from
arge-scale
assessments
because
limited
English
proficiency
as
thought
o
prevent
tudents
romunder-
standing uestions
nd/or
esult
n
invalid
estresults nder tandard est
dminis-
tration
rocedures
Rivera,
Collum,
& Shafer
Willner,
006).
Exclusion of
large
numbers
f students
rom
articipation
n
standards-based ests
not
only
can
result
in substantial istortionfthepercentage
f students
chieving roficiency
ut
lso,
more
important,
an obscure
important
nd
systematic
differencesn student
achievement
between
different
emographicgroups.
Thus,
one
of
the laudable
goals
of
NCLB and
state fforts
s to ncrease
participation
f all learners includ-
ing
those
n
dentified
ubgroups
in
large-scale
ssessments.
However,
t s
not
enough
for tudents
o
participate
n
state
ssessments;
tu-
dents'
participation
ust
ead to valid nferences
bouttheir chievement.
btaining
valid
results
s a
particularly
ressing
ssue
because the takesof mandated ssess-
ments
for
states,
districts,
nd schools
are
high.
NCLB
and state
accountability
systems
not
only place
considerable
pressure
n schools and districts
o
increase
participation
ates
n
arge-scale
ssessmentsbut lso
impose
sanctions
n
schools
that cannot move students n all identified ubgroupstowardproficiency.n
addition,
performance
n
large-scale
assessments s
increasingly
igh
stakes for
1169
This content downloaded from 161.112.232.221 on Sun, 15 Nov 2015 00:20:46 UTCAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
7/24/2019 15. Accommodations on Large-scale Assessment
4/35
Kiefferetal.
students:
By
2008,
28 states n the United States
will
require
hat tudents
ass
a
state-administeredestfor
high
school
graduation
Fuhrman,
003).
There s reasonfor oncern bout thevalidity f test cores f n fact hesereflect
individual
differences
n abilitiesthat
re distinct
rom
hose
that re
the
target
f
assessment
(American
Educational
Research
Association
[AERA],
American
Psychological
Association
[APA],
&
National
Council
on
Measurement
in
Education
NCME],
1999).
Because
language plays
an
integral
ole
n
most,
f not
all,
academic
learning,
ny
test f academic
achievement
s
also,
to some
degree,
test
f
anguage ability.
onsequently,
LLs
present
special
challenge
to schools
and
those nvolved
n
arge-scale
ssessment;
f ests
re
not
ppropriately
esigned
or
if ELLs are not
testedunder
ppropriate
onditions,
hen
anguage
demands
of
the
test hat
re notcentral
o the
target
f assessment
may
unfairly
nd
negatively
influencetheirperformance.Research conductedbyAbedi and colleagues has
demonstrated
hat here
s
indeed
a substantial
inkbetween
tudents'
nglish
an-
guage proficiency
nd their
erformance
n tests
f
math,
cience,
and
social stud-
ies
(e.g.,
Abedi &
Leon,
1999;
Bailey,
2005;
Butler
&
Castellon-Wellington,
005).
Furthermore,
lthough
here
may
be substantial
ifferences
etween
ELLs
and
their
peers
in content
knowledge,
research
hows
that
he size
of
this
knowledge
gap
often
depends
on
the
anguage
demands
of the assessment.
Several
correlational
studies
have found hat
ssessments
nd
individual est
tems
thathave
more
in-
guistic
complexityyield
larger
performance aps
between
ELLs
and
non-ELLs
(e.g.,
Abedi, Leon,
&
Mirocha, 2003;
Abedi,
Lord,
Hofstetter,
Baker,
2000;
Abedi,
Lord,
&
Plummer, 997;
Martiniello,
007).
Thesefindingsuggest hat contraryo somepopular onceptions assessments
in all domains ssess
anguage
kills s
well as content
nowledge
nd
skills.
However,
such
a
relationship
oes
not ead
directly
o the
conclusion
hat
alid
nferences
an
neverbe
made about
the content
nowledge
f
ELLs from
arge-scale
ssessments.
Rather,
he
key
question
s to what
extent he
anguage
skills
measured
by
these
assessments
re essential
o the
construct
argeted
y
the
test
nd,
n
turn,
o
what
extent
hey
measure
anguage
demands
hat re
rrelevant
o the
cademic
kills
being
assessed.
Use
of
Accommodations
or
ELLs
TakingLarge-Scale
Assessments
Making specificchanges
to the test
format
r the conditions
under
which
stu-
dents
are tested s one method hathas been
proposed
tominimize he nfluence
on content rea
test
performance
f variation
n
ELLs'
language
skills that
s not
central
to the construct
eing
assessed.
Such
test accommodations
nclude
any
alteration
o standard est
dministration
rocedures
designed
to
provide
support
for studentsbased
on their
pecial
needs
without
hanging
the construct
eing
assessed
(AERA,
APA,
&
NCME,
1999).
These
procedures
nclude
the
presenta-
tion
of the assessment
tems,
he
ways
in
which students
espond
o
the
tems,
ny
equipment
r materials
o be
used,
the
period
of time
llowed
to
complete
he
est,
and
the environment
n which
students
ake the test.
There
are
as
many
s 75
dif-
ferent ccommodations
currently
n
use
with
ELLs,
although
not all
of them
re
appropriate.Moreover, heir electionand implementationarybystate nddis-
trict
for
review
of state
policies
on accommodations
for
ELLs,
see
Rivera et
al.,
2006).
1170
This content downloaded from 161.112.232.221 on Sun, 15 Nov 2015 00:20:46 UTCAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
7/24/2019 15. Accommodations on Large-scale Assessment
5/35
Accommodations
or
ELLs
An
appropriate
ccommodationfocuses
on
those extraneous actors hat
ffect
the test
cores of studentswith
pecial
needs but that
re
not the
target
f assess-
ment.An exampleof an appropriateccommodationwouldbe toprovide large-
print
version of a
test to a studentwith a visual
impairment.
t
the
same
time,
accommodations
hould
not
provide nappropriate upport
r
change
thenature f
the
task such that
esulting
cores no
longer
llow valid
inferences
bout the
cen-
tral
construct
eing
measured. An
example
of an
inappropriate
ccommodation
would
be to rewrite
he
passages
in
a
reading
omprehension
ssessment n a
way
that lters
heir
undamental
ifficulty
evel.
Thus,
for
ELLs,
appropriate
ccom-
modations
provide
direct
or indirect
inguistic
upport
o
minimizethe
negative
impact
of irrelevant
anguage
demands
on
students'
erformance
o that he stu-
dents
an
demonstrate
heir
ontent
nowledge
nd academic skillsto the
greatest
extent ossible.
Evaluating
Accommodations
or
ELLs in
Large-Scale
Assessments
Theoretically peaking,
many
accommodations
that offer
inguistic upport,
such
as
providing
ictionaries
r
simplifying
he
English
sentence tructuref the
test
tems,
may
ndeed be
appropriate
orELLs.
However,
because content
nowl-
edge
is
inextricably
inked
to
language,
the use of certain
anguage supports
or
ELLs
may
notbe as
straightforward
s
providing
large-print
ersion f an
assess-
ment o
a studentwith
visual
mpairment;
ven
anguage-based
ccommodations
that re
grounded
n
theorymay
n
practice
e
ineffective
r threaten he
validity
f
scores.
Thus,
the election
f accommodations
or LLs must e based on
empirical
evidencefor heir ffectivenessndvalidityAbedi,Hofstetter, Lord,2004).
Although
ccommodations
for
ELLs can
be
evaluated
along
several dimen-
sions,
evaluating
ccommodations
for
effectiveness
nd
validity
s of
paramount
importance.
Effectiveness
efers
o the extentto
which
students
receiving
the
accommodation
demonstrate
mproved
est scores.
In
contrast,
he
validity
f
an
accommodation
refers,
n
part,
to the
notion that the accommodation should
improve
he
performance
f students
who
require
t
but
not
affect
he
performance
of students
who do not.
f an accommodation
ffects he
performance
f
students
who
do not
require
t,
hen
providing
he ccommodation
o
some
students utnot
others
would threaten
he
validity
f the
resulting
est cores.
If
an assessment s
valid foruse with specificgroup, hen tudentswho do notrequirethe accom-
modation
will
be neither
dvantaged
nor
disadvantaged
y
receiving
t.
A
growing
body
of
empirical
esearchhas evaluated
ccommodations
or
LLs,
but heresults
of these
individual
studies
have
yet
to be
quantitatively
ynthesized
o
produce
aggregate
stimates
f
their ffectiveness
nd
validity.
Moreover,
nvestigation
f factors
hat
may
potentially
moderate
he effective-
ness
of
these accommodations
(e.g.,
grade
level,
domain
tested,
anguage
of
instruction)
s
needed.
It
is
possible
that
a
given
accommodation
will be
more
effective
or ests
n some
domains than
for ests
n
otherdomains or
that
ccom-
modations
will
be
more effective
t some
grade
levels than at others.Curricular
content
nd
corresponding
measures
of achievement
hange
with
respect
o
both
difficultyNationalCenteron Education and theEconomy,1998) and thenature
of the skills tested
e.g., Koenig
&
Bachman, 2004;
RAND
Mathematical
Study
Panel,
2003;
RAND
Reading
Study Group,
2002)
over
the course of the
grade
span,
thus
potentially nfluencing
he effectiveness f
specific
accommodations.
1171
This content downloaded from 161.112.232.221 on Sun, 15 Nov 2015 00:20:46 UTCAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
7/24/2019 15. Accommodations on Large-scale Assessment
6/35
Kieffer
t l
This s
particularly
mportant
n
the ontext fELLs' test
erformanceiven
he
differinganguage
emands f
cademic asks
ver ime nd he
anguage
emands
specificodifferentomains ested. orexample,he ourthrademathestmay
emphasize
nd
prioritize
hildren's alculation
kills,
whereas
he
ighth
rade
tests
n
the ame content
rea of math
may mphasize
omplex
word
roblems
with
ophisticatedanguage. inally,
valuating
ccommodations
or his
opula-
tion
must
urther
ecognize
otential
ources
fdifferential
ffectiveness
y
focus-
ing
on the nstructional
nd
inguistic
ontext
n
which
he
esting
s
occurring,
given
he
iffering
odels
f
nstruction
ffered
or LLs
(Abedi
t
al.,
2004).
Present
tudy
The
purpose
f his
tudy
s
to
evaluate
he
ffectivenessnd
validity
f ccom-
modationsor LLsparticipatingn arge-scalessessments.wonarrativeeviews
(Abedi
t
al.,
2004;
Sireci, i,
&
Scarpati,
003)
have
previously
ynthesized
he
findings
f
studies n
test ccommodations
or LLs
published
efore 001.
The
present
tudy
as
designed
obuild n this
work
n
two
ways.
irst,
sing
meta-
analytic pproach,
he
urrent
tudy uantifies
he
verage
ffects
f the ccom-
modations tudied.
econd,
hecurrent
tudy
pdates
he
findings
f
previous
reviews
y ncluding
he
indings
f
several
tudies
ublished
ince 001
as well
as those
reviously
eviewed. iven
he
otential
ources f
differential
ffective-
ness
of
accommodations
iscussed
bove,
the
meta-analysis
lso
includes
n
examination
f
several
moderatorsf effects.
he
analyses
were
guided
y
two
specific
esearch
uestions:
1. What
evidence xists
hat
pecific
est ccommodations
re
effective
n
improving
he
performance
f ELLs
takingarge-scale
ssessments?
hat
evidence xists
hat hese ffects iffer
s a function
f the
grade
evelof
students,
omain
ested,
rovision
f
xtra
ime,
r
anguage
f nstruction?
2. What vidence xists
hat
pecific
est ccommodations
esigned
or
LLs
are
valid
n
arge-scale
ssessments?
Method
Study
nclusion
riteria
Based on ourresearch
uestions,
e selected
our haracteristics
hat
ormed
the riteriaor nclusion
f studies hat
rovide
mpirical
vidence
or
valuating
accommodationsor LLs. We ncludedtudies
n
he
meta-analysis
hat
a)
exam-
ined
ndividual ccommodations
r
individual
ccommodations
undledwith
extra
ime,
b)
were articles
ublished
n
peer-reviewed
ournals
r technical
reports
vailable
online,
c)
employed
n
experimental,
uasi-experimental,
r
repeated
measures
esign,
nd
d)
reported
ufficientata o allow
for he
stima-
tion feffect
izes.
Search
procedure.
tudies
or eviewwere
obtained
hrough
wo searches on-
ductednJuly006designedo nclude ll studies vailable ptothat ime. irst,
we conducted
comprehensive
earch
f online
atabases,
ncluding
ducation
Resources nformation
enter,
PsycINFO,
Modern
Language
Association,
1172
This content downloaded from 161.112.232.221 on Sun, 15 Nov 2015 00:20:46 UTCAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
7/24/2019 15. Accommodations on Large-scale Assessment
7/35
Accommodations
or
ELLs
Education
Abstracts,
nd
Academic Search Premier
which
yielded
1 14
entries),
as
well as the
online database
of the National Center
for
Research on
Evaluation,
Standards, nd StudentTesting whichyieldedan additional 27 entries,manyof
them
redundant
with
the
1 14
previously
found).
The abstract f each
identified
citation
was
read to determine
f t was an
empirical tudy xamining
he ffects f
one
or more
ccommodations.
econd,
we collected citations f studies
previously
reviewed
by
Sireci
et al.
(2003)
and/or
y
Abedi
et al.
(2004).
Based on the ist of
citations
f
empirical
tudiesfrom
he wo
searches,
we collected technical
eports
as
well as
articles.
However,
we did not collect
presentations
t academic confer-
ences
because
of both
practical
nd
quality
concerns.
n
several
cases,
the results
of
a
single
study
were
reported
n
multiple
documents;
n
such
cases,
the docu-
ments
were linked
together
nd cross-checked
for
complete
nformation
nd
the
mostrecent ocument s cited here.
Excluded
studies.
The
search
procedure
bove
yielded
21
studies or
possible
nclu-
sion
n the
nalyses.
However,
everal
fthese
tudies,
ncluding
ome cited
n
previ-
ous
reviews,
ad to
be excluded
from he
meta-analysis
or easonsof data
reporting
or
methodology.
n
three nstances
N.
E.
Anderson,
enkins, Miller,1996; Hafner,
2001
Lotherington-
oloszyn,
1993),
the tudiesdid
not
report
he
necessary
nfor-
mation
o
quantify
he effects
f accommodations
eparately
orELLs and native
English
peakers.
n two
cases
(Abedi
&
Hejri,
2004;
Shepard, aylor,
Betebenner,
1998),
studies
xamined
he ffect
f various
ccommodations hosenfor ndividual
students
y
their
eachers
nd thus
were
inappropriate
or
xamining
he effect f
specific ccommodations.nonecase,a previouslyited tudyMiller,Okum,Sinai,
&
Miller,
1999)
was a conference
resentation.
After
xcluding
he tudies
bove,
a
totalof 15 studiesremained.Of these
tud-
ies,
4
(Abedi
&
Lord,
2001;
Albus,
Thurlow,Liu,
&
Burlinski,
005;
Castellon-
Wellington,
000;
Johnson
Monroe,
2004)
employedrepeated
measures
designs
in which
he
ame
group
of students
was
testedwith nd without ccommodations.
Because
the
preponderance
f the tudies
o be
included
mployedbetween-groups
designs
and
because
effect
izes
from
epeated
measures
designs
are
not
strictly
comparable
o those
from
etween-groups
tudies,
esults
rom
hese
studieswere
not ncluded
n the
formal
meta-analysis
ut were
considered
n
our
findings.
Studies ncludedinMeta-Analysis
In
all,
1 1 studies
were
ncluded
n the
meta-analysis
with
total
of
23,999
par-
ticipants
17,445
native
English
speakers,
6,554
ELLs).
Of these
studies,
6
were
conducted
by
Abedi
and
colleagues,
whereas 5
otherswere conducted
by
other
research
eams
(i.e.,
M.
Anderson,Liu,
Swierzbin,
Thurlow,
&
Bielinski,
2000;
Brown,
1999;
Garcia
Duncan
et
al.,
2005;
Hofstetter,003;
Rivera
&
Stansfield,
2004).
With
respect
to
design,
8
were
true
experiments,
n
which students
were
randomly ssigned
to accommodated
or
unaccommodated
onditions,
whereas 3
(Abedi,
Courtney,
Leon,
2003a;
Abedi,
Courtney,
Leon, 2003b; Brown,
1999)
were
classified
s
quasi-experiments
ecause
of
factors
pecific
to each
study.
n
thestudybyBrown 1999), the mechanism fassignments unclear n thereport
and
could
not be
confirmed
hrough
ommunications
with the
study
author
or
school
personnel
nvolved
n
the
study.
Observed
pretest
ifferences
etween the
two
groups
were
negligible.
In the
study by
Abedi,
Courtney,
t al.
(2003a),
1173
This content downloaded from 161.112.232.221 on Sun, 15 Nov 2015 00:20:46 UTCAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
7/24/2019 15. Accommodations on Large-scale Assessment
8/35
Kieffer
t l.
students
were
originally ssigned
at
random
o
a treatment
ondition;however,
ot
all
students
andomly ssigned
were
actually
provided
he ccommodation ecause
of limited pace and equipment. imilarly,n thestudybyAbedi,Courtney,tal.
(2003b),
only Spanish speakers
were
randomly ssigned
to a
bilingual
dictionary
condition,
although
the
control
group
included
students
with native
anguages
other han
Spanish.
The
findings eported
elow
were
largely
robust
o
the nclu-
sion or exclusion
of these three tudies.
All but 1 of the
1 1
studies
used
multiple
amples
to
nvestigate
ifferent
ccom-
modations and/or
single
accommodation
provided
n
multiplegrades.1
Thus,
together
he tudies
yielded
38
different
ests f the ffectiveness
f
specific
ccom-
modationsfor
ELLs as well
as
30
testsof the
validity
f
accommodations.
Of
the
38
testsof
effectiveness,
4
involved students
n fourth
rade
n
=
1
1)
or
eighth
(n
-
23) grade,whereas4 involved studentsn fifth rade n
=
2) or sixthgrade
(n
=
2).
Of
the 38 testsof
effectiveness,
7 used
a math est
s the outcome
mea-
sure,
20 used
a science
test,
nd
1
used
a
reading
est.Of
these
effects,
9
used the
NationalAssessment f
Educational
Progress
NAEP)
assessment
r NAEP
items
(n
=
23)
or itemsdrawnfrom
he NAEP and
Trends
n International
Mathematics
and
Science
Study
assessments
(n
=
6).
Only
9
effects
were based
on a state
accountability
ssessment
8
of
which came
from wo studies
using
the
Delaware
StateTest and
1
of whichcame from
study sing
the
Minnesota
tate
est).
Of
the
1
1
studies,
reported
hat tudents
wereclassified
s
ELLs based
on school
records
of a limited
nglishproficient
r
ELL
designation,
whereas
ELL classification
was
not
reported
n
the
remaining
tudies.
Although
his
suggests
consistency
n
ELL classification cross studies, t s importanto notethat he criteria or uch
school-based
designations
an
vary
onsiderably
cross states
nd districts
Ragan
&
Lesaux,
2006).
Appendix
A
provides
detailed
nformation
n the
design
of each
study
nd the characteristics
f the
participants.
In their
review
of state assessment
policies
regarding
LLs,
Rivera
and col-
leagues
(2006)
identified
5
accommodations
hat re
currently
made available
to
ELLs.
Of
these,
hey
ound
roughly
7
that re considered
potentially
ppropriate
insofar s
they
re
specially
designed
to
address the
inguistic
needs
of
ELLs. In
contrast o
thisbreadth f
accommodations
ffered o
ELLs
by
states,
he
11 stud-
ies and 38 testsof the effectiveness
f
specific
ccommodations
focused
on
only
seven differentypesof accommodation: implified nglish n
=
16),English
dic-
tionary
r
glossary
n
= 1
1),
bilingual
dictionary
r
glossary
n
=
5),
extratime
(n
=
2),
Spanish language
test
n
=
2),
dual
language questions
n
=
1),
and
dual
language
booklet
n
=
1).
In
addition to the
two effects
hat ncluded
extratime
alone,
seven estimated ffects
ame from tudies
hat nvolved
xtra
imebundled
withone of three ther ccommodations:
implified
nglish
n
=
2),
English
dic-
tionary
n
-
3),
or
bilingual
dictionary
n
=
2).
One
study
Abedi,
Courtney,
Mirocha,
Leon,
&
Goldberg,
2005)
allowed
extra ime to
participants
n
both the
control nd treatment
onditions;
his
tudy
was notcoded as
evaluating
he
effect
of extra ime.
All
but two of
the
reported
ffect ize
estimates
re based
on
paper
and
pencil
tests;
he
remaining
wo used
computerized
ssessments.
Because technical eportswere ncluded n addition opublished rticles, here
is littlereason
to
believe that
publication
bias
would have
led to the
nflation f
effect
izes.
Nonetheless,
o
nvestigate
he
possibility
hat
heresults f
studies
with
nonsignificant
esults
were
more
ikely
o
go
unreported
han
hosewith
ignificant
1174
This content downloaded from 161.112.232.221 on Sun, 15 Nov 2015 00:20:46 UTCAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
7/24/2019 15. Accommodations on Large-scale Assessment
9/35
Accommodations
or
ELLs
results,
we
plotted
he tandard rror f
Hedges's
g
tatistic
gainst
hevalue
of the
Hedges's
gu
tatistic or ach
study.
nspection
f this
plot
revealed hefunnel
hape
we wouldexpect nthe absence of substantial ublication ias,with ampleswith
more
precise
estimates
yielding
ffect izes closer to the mean and
little vidence
of a
gap
in
which
unreported
onsignificant
ffect
izes would
occur.
Accommodations
hat Have Been Evaluated
Empirically
As
mentioned,
n the tudies
eviewed,
even
different
ypes
f
accommodations
were
evaluated:
implified
nglish, nglish
dictionaries r
glossaries,
ilingual
dic-
tionaries
r
glossaries,
ests
n
thenative
anguage,
dual
language
test
ooklets,
ual
language
questions
for
English passages,
and extra ime.Each of
these
s
theoreti-
cally
ustifiable
or LLs
insofar
s
they
re
designed
o address he
anguage
needs
oftheELLs by minimizing ariationnscores because of construct-irrelevantan-
guage
abilities.
With he
single exception
of dual
language questions,
he accom-
modations
were studied
xclusively
with ests
f
math nd
science.
Simplified
nglish
nvolves
hanges
n the
vocabulary
nd
grammar
f test tems
to
eliminate
rrelevant
inguistic omplexity
while
maintaining
he
same
content
vocabulary
nd evel of
complexity
n
the
ontent ask.These
changes
nclude lim-
inating
are
vocabulary
nrelated o
the
content,
hortening
r
simplifying
entence
structure,
eplacing
passive
voice with active
voice,
and
replacingcomplex
verb
forms
with
present
ense
verbs
for
a
description,
ee
Abedi et
al.,
1997).
English
dictionaries
r
glossaries
involve
providing
efinitionalnformationn
English
n
some
form,
ncluding
tandard
ictionaries,
ictionaries ustomized o the assess-
ment, rglossariesfor pecificwordsusedinthe ssessment.Hereagain,the ntent
is to
provide
efinitional
nformationbout
words
hat
re
necessary
o
comprehend
the
askbut
do not
represent
ey oncepts
fthe ontent.
imilarly,
ilingual
diction-
ary,glossary,
r
marginal
glosses
provide
bilingual
tudents
with
ccess to defini-
tions
r
direct ranslations
f
selectednoncontent
ords n students' ative
anguage.
Another
varianton
this accommodation
nvolves
providing
marginal
glosses
explanatory
otes
written
n the
margin
f
the ext n the tudents' ative
anguage.
Threeother
ccommodations
nvolve
heuse
of
native
anguage
n
the est tself.
Native
anguage
versions
f tests nvolve
dapting
ests
nto he
native
anguage
of
students.
he most ommon
method f
adapting
test o another
anguage
s
to
use
back translation;he test s translated rom heoriginal anguage into the native
language
by
a biliterate
est
maker.This
adapted
test s thentranslated ack
into
the
original
anguage
by
an
independent
ndividual,
nd
thetwo
original anguage
tests
re
compared
for
quivalence.
This
process
not
only
s resource ntensive ut
also
can
introduce dditional
hreats
o
validity
ecause
of the
difficulty
n main-
taining quivalence
in the
constructmeasured
American
nstitutes f
Research,
1999).
Dual
language
assessments
nvolve
est
booklets,
n
which
English
versions
and
native
anguage
versions
of the same
item are
placed
on
facing pages.
Two
types
of dual
language
testshave been
investigated
dual
language
booklets
n
which all
items
on
mathtest
are
presented
n two
languages
and
dual
language
questions
n which
a
readingpassage
is
presented
n
English,
followed
by ques-
tionsreadaloud intwo anguages.
Finally,
ne of the
most
frequently
sed accommodations or LLs is to
provide
extra
ime
o
complete
the
test.The theoretical
ationale
s
thatELLs will be able
to
demonstrate
heir ontent
nowledge
nd skillsbetter
f
given
dditional
ime
o
1175
This content downloaded from 161.112.232.221 on Sun, 15 Nov 2015 00:20:46 UTCAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
7/24/2019 15. Accommodations on Large-scale Assessment
10/35
Kiefferetal
work
through
he
anguage
demands of the test.
Often,
xtra
time s
provided
n
combinationwith nother
ype
of
accommodation,
n which case the
rationale
s
to allow students he timerequired ouse the accommodation e.g., to use a dic-
tionary
o look
up
the
meanings
of unknown
words).
Methods
or
Meta-Analysis
To evaluate
the
appropriateness
nd
practical
mportance
f test
ccommoda-
tions for
ELLs,
three
ets of
meta-analyses
were
conducted.
First,
preliminary
meta-analysis
was
conducted
o
compare
the academic
achievement
est
cores
of
ELLs
in the bsence
of accommodations
with hose
of native
nglish peakers.
his
first
nalysis
was undertaken
n
an
effort
o describe
the
magnitude
f
differences
in test cores between
ELLs
and non-ELLs
in the
absence
of accommodations.
t
is this et ofdifferenceshat he accommodations re intended ohelpameliorate,
and thus
t erves s
a metric or
udging
he
magnitude
f the
ffect izes
for ccom-
modations.
he second
analysis
ddressed
he ffectiveness
f accommodations
y
estimating
he
degree
to whicheach accommodation
ed
to
improved erformance
forELLs.
The third
nalysis
ddressed
validity
f
the ccommodations
y
estimat-
ing
the
mpact
of
the accommodations
n the
performance
f
non-ELLs,
with
he
assumption
hat
valid accommodation
hould
have
no
significant
ffect
n their
performance.
o
compute
average
effect
izes,
we treated
ach
study
ample
as
the
unit of
analysis,
yielding
38
tests
of
effectiveness.
We made
this
decision
because
effects f different
ccommodations
that
were
derived
from
he same
study
were
based
on
different
amples
of students.
Although
ffect
izes
derived
from hesame study annotgenerallybe considered ndependent,n thepresent
case
multiple
effects
rom he
same
study
were
not
generally
nvolved
in eval-
uating
the effects
f
any particular
ccommodation.
That
is,
studies
contributed
multiple
ffects cross
the set
of
accommodations
butdid
not
typically
ontribute
multiple
ffect
izes for
ny single
accommodation.
nsofar
s thenet
ffect f
this
nonindependence
s to
reduce
the standard
rror
f the mean
effect
ize,
it
will
be
seen that
ny
failure
f this
trategy
o
fully
ddress
the ssue
of
nonindependence
would not alter
he
general
conclusions
from he
analyses
of
mean effect
izes.
To
compute
average
effect izes across
the
entire et
of
samples
and
for all
samples
addressing pecific
accommodations,
we
averaged
across different
ut-
comes and
grades.2
n
averaging
he different
ffect
izes,
we
weighted
he
ndi-
vidual effect izes
according
o their
recision.
As ourmeasureof effect
ize,
we
first
omputed
the mean
difference
n
performance
etween
ELLs
receiving
he
accommodated
est
nd ELLs
taking
he estwithout
ccommodations.
For
analy-
ses of
validity,
his
difference
as
computed
for
non-ELLs
taking
he
accommo-
dated test
with nd those
taking
he est
without
ccommodations.)
This difference
in
mean
performance
as
then tandardized
sing
the
pooled
within-groups
sti-
mateof the tandard
eviation.
This measure
of effect
ize is
thecommon
Cohen's
d,
which
s known o be biased
in small
samples.
We
therefore
orrected
his
mea-
sure of effect ize
using
a transformation
f
recommended
by
Hedges
(1981)
to
produce
estimates
n
Hedges's
gu.
These
estimates
were
computed
directly
rom
the means and standard eviations eportedn thestudiesby usinga programmed
routine
n
the
Comprehensive
Meta-Analysis
Version
2)
software
Borenstein,
Hedges,
Higgins,
&
Rothstein,
005).
1176
This content downloaded from 161.112.232.221 on Sun, 15 Nov 2015 00:20:46 UTCAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
7/24/2019 15. Accommodations on Large-scale Assessment
11/35
Accommodations
or
ELLs
In additiono
stimating
hemean ffectizefor ach
ccommodation,
e nves-
tigated
hetherther
spects
f he ccommodationreatment
oderatedhe ffect
ofthe ccommodations;hesemoderatorsncludedhegradeevel fthe tudents,
thedomain ested
math,
cience,
r
reading),
hether
he
estwas based on the
NAEP
or
state
est,
nd
whetherhe
ccommodationas bundled ith xtra ime
or
provided
lone.
Using
ROC
MIXED in
SAS
(SAS
Institute,
999),
two-level
hierarchical
inearmodel
HLM)
was
fitted,
nwhich evel
1
equations
epresented
the evel
f he ffectize
for ach bservationndLevel
2
equations epresented
he
study
evel,
where
tudy
haracteristics
includingype
f
accommodations
well
as
moderating
actors)
hat erved
o
explain
ariation
neffectizes
were ncluded
(Raudenbush
Bryk,
002).
We
first
ittedn unconditional
odel,
n which an-
dom
ffectsariance
t Level
1 was
specified
o be
thevariance ue
to
sampling
error ithinamplewhichwas assumed nownndgiven ythe quare f the
standardrror f
he
Hedges's
u
tatisticrom acheffectize
estimate)
ndLevel
2 variance as
specified
o be thevariance
n
Hedges's
gu
tatistics
ttributableo
differences
etween
amples.
ext,
we
fitted set
of
conditional odels
n
which
dummy
ariables
or he
ype
f ccommodationnd ther
otential
oderatorari-
ables
were ncluded
tLevel
2 todeterminef
hey xplained
ariationn the ffect
sizes
between
amples.
o determine
f
given
ariable
xplained tatisticallyig-
nificant
ariation
n
ffect
izes,
we
examinedhe
hange
n
goodness
ffit etween
models
sing
he
hange
n -2
log
likelihood tatistic
A-2LL)
and
conducted
significance
est
y omparing
his
tatistico
chi-square
istributionith
degree
of
freedom.
n addition
o
nvestigating
oderator
ffectsecause f
ype
f ccom-
modation,nalyseswere onductedo determinef he ffectsor pecificccom-
modations
iffereds a
function
fa characteristic
f
the
tudies
hemselves
e.g.,
whether
he
tudy mployed
n
experimental
r
quasi-experimentalesign,
he
grade
evel f he
tudents,
ontent
omain
measured).
Results
Preliminarynalyses: ifferences
nAchievement
Test
cores
Between
LLs and Native
nglish
peakers
Before
ddressing
he
uestion
f
effectiveness
f
accommodations,
e esti-
mated
he
verage
ifference
n academic chievementest
cores
etween
LLs
andnativenglishpeakershatanbeexpectedn arge-scalessessments.hese
estimates
rovide
context
or
valuating
he
practical
mportance
f the ffects
of
accommodations.
able
1
presents
everal
stimates f
themath
nd
science
achievement
aps
between
LLs and native
nglish peakers.
he
top
half
of
Table
1
presents
ean
ffect
izes
reported
s
Hedges's
gu
tatistics)
or hedif-
ferences
nmath
nd cience
chievement
cores etween
LLs
andnative
nglish
speakers
n
he
naccommodated
onditions
romhe tudies
eviewed. hese sti-
mates
uggest
hat
here re
arge
chievementcore
differencesetween he wo
groups
cross
hese
grades
nd
domains
f
knowledge,
ithmean
effect izes
ranging
rom ix
tenths
othree
ourths
f standardeviation.
hey
lso
suggest
that
he chievement
ap
differs
y
test
omain o
some
xtent,
ith
arger aps
present
nscience hann math.
Although
hese
ifferencesetween LLs andnon-ELLs re
quite
ubstantial,
they
re omewhat
mall
n
comparison
o estimates
fthe
chievement
ap
from
1177
This content downloaded from 161.112.232.221 on Sun, 15 Nov 2015 00:20:46 UTCAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
7/24/2019 15. Accommodations on Large-scale Assessment
12/35
TABLE 1
Estimates
f
he chievement
core
ifferences
etween
nglishanguage
earners
andnative
nglish peakers
nmathnd cienceromtudieseviewedasHedgesys
gu)
nd
from
he
005 National
Assessment
f
Educational
rogress
as
Cohen's
d;
National enter
or
Education
tatistics,
005)
95%
confidence
interval
Number
Mean effect
Lower
Upper
of studies
size
limit
limit
By
domain3
Math 7 0.604 0.279 0.929
Science
11
0.748
0.581
0.914
2005
National Assessment
of Educational
Progress
4th
grade
math
0.831
0.799
0.864
8th
grade
math
1.006
0.964
1.047
4th
grade
science
1.051
1.008
1.094
8th
grade
science
1.227
1.177
1.277
a. The chievement
core ifference
n
reading
asnot stimated
ecause
nly
single
tudy
xamined
his
domain.
national
tudies. or
xample,
s another
oint
f
reference,
he ottom
alf
fTable
1
presents
stimates
f he
chievement
ifference
etween
ative
nglish
peakers
and ELLs from
he
2005
NAEP.3
hese
estimates
re
expressed
ppropriately
s
Cohen's
d because
of
the
arge ample
n
which he
stimates
re
based.
These
estimates
re
ppreciably
arger
han
hose
rom he tudies
eviewed,
ith
hree f
the
fourdifferences
reater
han
one standard
eviation.
s with he
studies
reviewed,
he
ap
was
arger
or cience
han
ormath
ndfor
ighth rade
tudents
compared
ofourth
rade
tudents.
hedifference
n
magnitude
etween
he
NAEP
estimatesndthose
rom he tudies
eviewed
may
e
because
f he
onfounding
of oncomitantredictorsf chievement,uch spoverty,nthenationalamples,
which
ikely
rebetterontrolled
y
the
design
f the
esearch
tudies
f accom-
modations.
ll ofthe tudies
eviewed
ampled
LL
andnative
nglish-speaking
students
rom ithinhe ame
chools nd/or
istricts,
hereashe
NAEP
estimates
are based
on
a
nationally
epresentative
ample.
he NAEP
estimates
may
hus
capture
more f thevariation
ue to
differencesetween
he
chools ttended
y
ELLs and
those ttended
y
native
nglish
peakers
s well
s those
oncomitant
demographic
haracteristics
hat end
o affect
chievement
f
at-risk
opulations
innational
amples
ut
whose ffects
remasked
hen esults
re
disaggregated
n
only
single
imension.
evertheless,
oth ets
f stimates
ndicate
hat
here re
large
bserved ifferences
n
achievement
nbothmath
nd cience
etween
LLs
andnativenglish peakersn arge-scalessessments,uggestinghat nemetric
by
which
we can
udge
he
ffectiveness
f ccommodations
s the xtent
owhich
they
educe
hese
pparent
chievement
aps.
1178
This content downloaded from 161.112.232.221 on Sun, 15 Nov 2015 00:20:46 UTCAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
7/24/2019 15. Accommodations on Large-scale Assessment
13/35
1
i
i
v>
S
1
I
I
I
.s
s
1
t)
1
I
1
11
;t
Os t^ os
^h
en
,- wi
,-
t
O
co
vo
O
O
OsQQO
>*
V
* *
' P*
P* P
^0000
^
' *
*
V
'
V V V
p
'*>
^
||
^
- -
^
w
2
.s
p
ooocovo
n^Tcsr-
rt
en
en
tJ-'
'
Tt
i-^
^*
(N
--h ^h
VO CN
Os
-
(OsOsO'-H H
1-H
Os
^>
r-oooNCNON'-H
m
i-
i
3
(Nor^r^oso r*
oo
S
S
^
n
^-
r
^
r
o
-
v
^
T
o
n
o
n
e
n
n
n
"
n
w
o
n
o
e
n
c
I
d
r
n
n
N
r
v
r
n
n
C
O
0
u
N
C
N
0
O
(
N
0
w
r
o
v
O
n
o
n
i
n
m
o
r
o
n
o
C
r
O
m