Post on 16-Jan-2016
Identifying the gaps in state assessment systems
CCSSO Large-Scale Assessment ConferenceNashville
June 19, 2007
Sue Bechard Office of Inclusive Educational Assessment
Ken Godin
Research Questions
Of all the students who are not proficient, how can states identify those who are in the assessment gap?
Who are the students in the gaps, what are their attributes, and how do they perform?
Gap identification processConduct exploratory
interviews with teachers to identify the assessment
gaps
Review student assessment data
Review teacher judgment data
Operationalize gap criteria
Conduct focused teacher interviews to
confirm gap criteria
Parker and Saxon: Teacher
views of students and assessments
Bechard and Godin: Finding the real assessment
gaps
Data sources
State assessment data – grade 8 mathematics results from two systems– General large-scale test results– Demographics (special programs, ethnicity, gender)– Teachers’ judgments of students’ classroom work– Student questionnaires completed at time of test– Accommodations used at time of test
State data bases for additional student demographic data– Disability classification– Free/reduced lunch– Attendance
Student-focused teacher interviews
Why use teacher judgment of students’ classroom performance?
Gap 1: the test may not reflect classroom performance
Teachers see students performing proficiently in class, but test results are below proficient.
Gap 2: the test may not be relevant for instructional planning
Teachers rate students’ class work as low as possible and test results are at “chance” level. No information is generated on what students can do.
Teacher judgment instructionsThe instructions were clear that this was to be a judgment
of the student’s demonstrated achievement on GLE-aligned academic material in the classroom, not a prediction of test performance.
NECAP: The teacher judgment field consisted of 12 possibilities – each of the 4 achievement levels had low, medium, and high divisions.
MEA: The teacher judgment field consisted of 4 possibilities - one possibility per achievement level.
(For comparisons across the two systems, we used a collapsed version of the NECAP judgments (down to the 4 achievement levels).
Research on validity of teacher judgment
While there are some conflicting results, the most accurate judgments were found when:
• teachers were given specific evaluation criteria • levels of competency were clearly delineated • criterion-referenced tests in mathematics or reading
were the matching measure • criterion-referenced tests reflected the same content as
did classroom assessments • judgments were of older students who had no
exceptional characteristics, and • teachers were asked to assign ratings to students, not to
rank-order them
Validation of teacher judgment data from NECAP and MEA
Data collected to establish as “Round 1” cutpoints (of 3 rounds) during standard-setting.
Validation studies were conducted which asked: Were there differences between the sample of students with non-
missing teacher judgments data and the rest of the population? Were there suspicious trends in the judgment data suggesting that
teachers did not take the task seriously? How did teacher judgments compare with students’ actual test
scores?
Results of these investigations were considered supportive of using the teacher judgment data for standard setting.
Teacher judgment vs. test performance (NECAP)
Mathematics Achievement Levels – Student Performance and Teacher Judgments: NECAP
Achievement Level
Overall Mathematics Performance (N=36,708)
Teacher Judgments* (n=24,168)
4 Proficient with Distinction
12.9% 17.9%
3 Proficient
40.6% 53.5%
39.7% 57.6%
2 Partially Proficient
21.6% 31.0%
1 Substantially Below
Proficient 24.9%
46.5% 11.4%
42.4%
Test Floor† (4.6%) *Collapsed from 12 to 4 categories † Students within error of bottom of scale (i.e., chance score) is subset of Achievement Level 1.
Teacher judgment vs. test performance (MEA)
Mathematics Achievement Levels – Student Performance and Teacher Judgments: MEA
Achievement Level
Overall Mathematics Performance (N=16,213)
Teacher Judgments* (n=10,319)
4 Exceeds the Standards
10.6% 9.6%
3 Meets the Standards
34.1% 44.7%
41.0% 50.6%
2 Partially Meets the
Standards 29.3% 35.9%
1 Does Not Meet the Standards
26.0%
55.3%
13.5%
49.4%
Test Floor† (8.1%)
† Students within error of bottom of scale (i.e., chance score) is subset of Achievement Level 1.
Operationalizing the gap definitions using teacher judgment
Operationalizations of the Two Gaps (Grade 8 Mathematics Test). Gap1
Non-gap 1
student performance ≤ 1 S.E.M. below sub-proficient/proficient cutscore but teacher judgment ≥ Proficient. student performance ≤ 1 S.E.M. below sub-proficient/proficient cutscore, and, if score was within 1 S.E.M. of achievement level 2 boundaries, received level 2 teacher judgment, or, if score was within 1 S.E.M. of achievement level 1 boundaries, received achievement level 1 teacher judgment.
Gap2
Non-gap 2
student performance within 1 S.E.M. of the floor of the test and teacher judgment matched as closely as possible within assessment system (NECAP: lowest available within level 1. MEA: Level 1).
student performance within 1 S.E.M. of the floor of the test and teacher judgment too high (NECAP: next higher available within level 1. MEA: Level 2).
Comparison student performance ≥ 1 S.E.M. above sub-proficient/proficient cutscore and teacher judgment ≥ Proficient.
Student questionnaires (answered after taking the test)
1. How difficult was the mathematics test?A. harder than my regular mathematics schoolworkB. about the same as my regular mathematics schoolworkC. easier than my regular mathematics schoolwork
2. How hard did you try on the mathematics test?
A. I tried harder on this test than I do on my regular mathematics schoolwork.
B. I tried about the same as I do on my regular mathematics schoolwork.
C. I did not try as hard on this test as I do on my regular mathematics schoolwork
Accommodations (used during the mathematics test)
NECAP: 16 accommodations listed by category:SettingScheduling/timingPresentation formatsResponse formats
MEA: 21 accommodations listed by category:SettingSchedulingModalityEquipmentRecording
Student-focused teacher interviews
Student profile data math test scores (both overall and on subtests)specific responses to released math test items student’s responses to the questionnaire special program statusaccommodations used during testing
Teacher interview questions Questions regarding perceptions of the students in each
gap on various aspects of gap criteria, 17 Likert scale questions on the student’s class work and
participation in classroom activities.
Student-focused teacher interview samples
NECAP sample: 20 8th grade math and special ed teachers7 schools across three states (NH, RI, and VT). 51 students: gap 1=19, gap 2=18, and comparison
group=14. MEA sample: 7 8th grade math and special ed teachers3 schools14 students: gap 1=4, non-gap 1=3, gap 2=2, non-gap
2=5, and comparison group=0.
Results: Percentages of students in the gaps (NECAP)
Breakdown of Gap Group Designations: NECAP
Group NECAP
(N=24,168) Gap 1 8.6%
Non-gap 1 8.8%† Gap 2 0.8% [2.3%]*
Non-gap 2 1.5% [1.2%]* Comparison 39.0%
† 188 (i.e., 8.7% of) non-gap 1 students scored so low that they also fit the criterion for gap 2 * Shown in brackets: If teacher judgments were collapsed to four achievement levels as on MEA.
Gap 2 and non-gap 2 percentages are different when fine or gross
grained ratings are used.
Results: Percentages of students in the gaps (MEA)
Breakdown of Gap Group Designations: MEA
Group MEA
(N=10,319) Gap 1 7.1%
Non-gap 1 7.1%† Gap 2 4.3%
Non-gap 2 3.1% Comparison 31.8%
† 444 (i.e., 60.3% of) non-gap 1 students scored so low that they also fit the criterion for gap 2
Accommodations use (NECAP)
Mathematics Accommodation Frequencies within Gap and Comparison Groups: NECAP
Within Group 0 1 2-3 4-6 7+ Gap 1 (n=2,070) 89.8%+ 3.1%- 5.6%- 1.6%- none- Non-gap 1 (n=2,129) 54.3%- 10.4%+ 23.7%+ 10.1%+ 1.6%+ Gap 2 (n=188) 26.5% 15.1% 30.8% 22.2% 5.4%
Non-gap 2 (n=369) 33.9% 16.3% 32.5% 15.5% 1.9% Comparison
(n=9,429) 97.9%+ 1.3%- 0.6%- 0.2%- none- Overall Population 89.8% 3.1% 5.6% 1.6% none + Statistically higher than expected - Statistically lower than expected
•Students in gap 1 were significantly less likely to use accommodations than students in non-gap 1. •Only a small percentage of students in gap 1 used any accommodations at all.•The majority of students in both gap 2 and non-gap 2 used one or more accommodations.
Accommodations use (MEA)
Mathematics Accommodation Frequencies within Gap and Comparison Groups: MEA Within Group 0 1 2-3 4-6 7+
Gap 1 (n= 734) 86.5%+ 1.6%- 6.7%- 4.5%- 0.7%- Non-gap 1 (n=736) 45.0%- 4.1%+ 12.9%+ 26.4%+ 11.7%+ Gap 2 (n=444) 36.3% 4.3% 14.4% 32.0% 13.1%+ Non-gap 2 (n=318) 45.0% 5.7% 21.1% 22.3% 6.0%- Comparison
(n=3, 278) 97.8%+ 0.6%- 0.7%- 0.7%- 0.1%- Overall Population 84.8% 1.4% 5.5% 6.3% 1.9% + Statistically higher than expected - Statistically lower than expected
Similar patterns of accommodations use are seen for gap 1 on the MEA as in NECAP.
Performance of students in gap 1 compared to non-gap 1 on the NECAP
Subpopulation Mean Mathematics Scaled Scores* within Gap Group Designations: Within Group IEP only ELL only IEP&ELL General Ed
Gap 1 (n=2,070) 830.9+ 829.7+ 827.7+ 833.2+
Non-gap 1 (n=2,129) 819.7- 819.1- 815.8- 829.3-
Comparison
(n=9,429) 847.4 848.8 none 850.2
Overall Population 828.2 827.3 817.5 842.3
*AL scale score ranges
AL 1: 800-833
AL 2: 834-839
AL 3: 840-851
AL 4: 852-880
Below proficient Above proficient
+ Statistically higher than expected- Statistically lower than expected
Performance of students in gap 1 compared to non-gap 1 on the MEA
Subpopulation Mean Mathematics Scaled Scores* within Gap Group Designations: Within Group IEP only ELL only IEP&ELL General Ed
Gap 1 (n=734) 823.6+ 826.0+ none 827.7+
Non-gap 1 (n=736) 808.8- 812.2- 808.6 812.5-
Comparison (n=3,278) 855.4 856.0 none 858.6
Overall Population 824.1 828.0 815.1 842.6
*AL scale score ranges
AL 1: 800-828
AL 2: 829-840
AL 3: 841-860
AL 4: 861-880
Below proficient Above proficient
+ Statistically higher than expected- Statistically lower than expected
Special program status of students in gap 1 (NECAP)
Breakdown of Subpopulations within Gap 1 and Comparison Groups Within Group IEP only ELL only IEP&ELL General Ed
Gap 1 (n=2,070) 14.2%- 2.3% 0.1% 83.4%+
Non-gap 1 (n=2,129) 50.8%+ 5.0% 0.9% 43.3%-
Comparison
(n=9,429) 2.2% 0.5% none 97.3%
Overall Population 15.1% 1.9% 0.2% 82.8%
•The majority of students in gap 1 were in general education.
•Students with IEPs were under-represented in gap 1 and over-represented in non-gap 1.
+ Statistically higher than expected- Statistically lower than expected
Special program status of students in gap 1 (MEA)
Breakdown of Subpopulations within Gap 1 and Comparison Groups Within Group IEP only ELL only IEP&ELL General Ed
Gap 1 (n=734) 12.3%- 1.1% none 86.7%+
Non-gap 1 (n=736) 50.3%+ 4.6% 1.0% 44.2%-
Comparison (n=3,278) 2.5% 0.7% none 96.7%
Overall Population 14.7% 1.3% 0.1% 83.9%
There were similar gap 1 compositions in MEA.
+ Statistically higher than expected- Statistically lower than expected
Disability designations in gap 1
Learning disabilities (NECAP) Gap 1: 57.7% of the IEP gap 1 group (n=208)Non-gap 1: 49.7% of the IEP non-gap 1 group (n=860)Comparison: 49.2% of the IEP comparison group (n=83)Total population: 52% of students with IEPs (N=4,465)
Disability designations only seen in non-gap 1:NECAP: Students with learning impairments, deafness, multiple
disabilities and traumatic brain injury
MEA: Students with learning impairments and traumatic brain injury
Additional characteristics of students in gap 1 compared to non-gap 1
Gap 1 students:Were more likely female and whiteHad the fewest absencesHad higher SESFound the state test about the same level of
difficulty as class workExhibited academic and mathematics-
appropriate behaviors in class
Performance of students in gap 2 on the test (NECAP and MEA)
By definition, students in both gap 2 and non-gap 2 scored no better than chance on the assessment.
Special program status of students in gap 2 (NECAP)
Breakdown of Sub-Populations within Gap 2 and Comparison Groups Within Group IEP only ELL only IEP&ELL General Ed
Gap 2 (n=185)
80.0% 6.5% 2.7% 10.8%-
Non-gap 2 (n=369)
69.4% 9.8% 1.6% 19.2%
Comparison (n=9,429)
2.2% 0.5% none 97.3%
Overall Population
15.1% 1.9% 0.2% 82.8%
The majority of students in gap 2 and non-gap 2 were students with IEPs.
Special program status of students in gap 2 (MEA)Breakdown of Sub-Populations within Gap 2 and Comparison Groups
Within Group IEP only ELL only IEP&ELL General Ed
Gap 2 (n=444) 57.4% 4.5% 1.1% 36.9%
Non-gap 2 (n=318) 47.8% 2.5% 0.9% 48.7%
Comparison (n=3,278) 2.5% 0.7% none 96.7%
Overall Population 14.7% 1.3% 0.1% 83.9%
MEA results show the majority of the students in gap 2 had IEPs.
The percentages of students in general education in gap 2 and non-gap 2 groups are higher than in NECAP.
Disability designations in gap 2
Learning disabilities: Fewer than half of the students in gap 2 groups had learning disabilities in both systems
Other disability designations differed between the two systems.NECAP Students who were deaf/blind and those with multiple disabilities were
only found in gap 2. Students with hearing impairments, deafness and traumatic brain injury
were only found in non-gap 2.
MEA Students with hearing impairments were only in gap 2. Students with visual impairments or blindness were only in non-gap 2.
Additional characteristics of students in gap 2 compared to non-gap 2Students in gap 2 were very similar to students in non-gap
2 on most variables.
Students from both groups felt that the test was as hard as or harder than their schoolwork.
They tried as hard as or harder on the test as in class.
They used mathematics tools in the classroom (e.g., calculators).
Summary: How many students are in the gaps?
10.9% - 11.4% of the total student population in two systems are in gaps 1 & 2.
NECAPGap 1 = 8.6% Gap 2 = 2.3%
MEAGap 1 = 7.1% Gap 2 = 4.3%
SummaryWe found substantial differences between the
composition of the gap 1 groups, which held in both systems.
Gap 1 students may have characteristics and behaviors that mask their difficulties.
Non-gap 1 students are those generally thought to be in the “achievement gap”.
Summary (cont.)
Low performing students in gap 2 and non-gap 2 share many characteristics.
Their extremely low performances in both classroom activities and the test raise issues about the relevancy of the general assessment for them.
ConclusionsFor students in gap 1, increase focus on classroom supports and
training on how to transfer their knowledge and skills from classroom to assessment environments.
For students in non-gap 1, examine expectations and opportunities to learn. Providing a different test based on modified academic achievement standards is premature.
Students with IEPs in gap 2 and non-gap 2 may benefit from the 2% option for AYP and an alternate assessment based on modified academic achievement standards (AA-MAAS).
There will be challenges designing a test based on MAAS that is strictly aligned with grade level content.
www.measuredprogress.org
sbechard@measuredprogress.orgkgodin@measuredprogress.org