Efforts to teach in a way that tests can detect : Pointless or profitable?
description
Transcript of Efforts to teach in a way that tests can detect : Pointless or profitable?
![Page 1: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/1.jpg)
EFFORTS TO TEACH IN A WAY THAT TESTS CAN DETECT: POINTLESS OR PROFITABLE?Megan WelshNeag School of EducationNERA Conference10/21/11
![Page 2: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/2.jpg)
The Title: Pointless or Profitable?
In 1989, Mehrens and Kaminski publish a paper entitled “Methods for Improving Standardized Test Scores: Fruitful, Fruitless or Fraudulent?”
It addressed the “old, but increasingly relevant issue of teaching to the test” (p. 21)
They conclude that at least some test preparation efforts are both fruitless and fraudulent.
![Page 3: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/3.jpg)
This talk Explores the assumptions underlying many
current uses of test scores
Provides some preliminary evidence about the relationship between test-focused instruction and student performance
Discusses implications for next generation assessments
![Page 4: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/4.jpg)
Then (1989) Norm-referenced tests are widely used
Expectation that teachers do not know content of test; the test samples from a content area domain and that content area knowledge will generalize to test performance
Accountability=parent/community perceptions of schools
Test scores are considered to gauge minimum competency in a subject, but are not typically used to inform curriculum or to reflect on specific lessons
![Page 5: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/5.jpg)
Now Standards-based assessments are
criterion-referenced
Both the test and teaching are expected to closely align with state standards /Common Core
High-stakes accountability based on test scores assumes test scores are reflective of instructional quality
Test scores are used to reflect on instruction and curriculum for specific topics
![Page 6: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/6.jpg)
New uses of large-scale tests:1. Support accountability
![Page 7: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/7.jpg)
Question #: 15Question Type: Multiple ChoiceTopic: Number SenseShutesbury (correct): 6%Massachusettts (correct): 49%Correct Answer: C
61% selected A, 22% selected B, while only 6% selected the correct answer C.
New uses of large-scale tests:2. Reflect on instruction
![Page 8: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/8.jpg)
Question #: 22Question Type: Multiple ChoiceTopic: Number SenseShutesbury (correct): 56%Massachusettts (correct): 72%Correct Answer: C 22% selected A, 22% selected B.
New uses of large-scale tests:2. Reflect on instruction
![Page 9: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/9.jpg)
Should standards based assessments be used in these ways?
Is teaching to the test now appropriate?
Does teaching to the test improve scores?
![Page 10: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/10.jpg)
Should standards based assessments be used in these ways?
Is teaching to the test now appropriate?
Does teaching to the test improve scores?
Are tests sensitive to instructional efforts?
![Page 11: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/11.jpg)
Test scores might reflect Instruction focused on standards Teaching skill Attainment of standards due to
experiences outside of school Test-wiseness Situational anomalies (illness,
distractions, mood, etc) Aptitude
![Page 12: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/12.jpg)
If test are insensitive to instruction
![Page 13: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/13.jpg)
If test are insensitive to instruction
![Page 14: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/14.jpg)
Question #: 15Question Type: Multiple ChoiceTopic: Number SenseShutesbury (correct): 6%Massachusettts (correct): 49%Correct Answer: C
61% selected A, 22% selected B, while only 6% selected the correct answer C.
If test are insensitive to instruction
Waste of time
![Page 15: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/15.jpg)
If test are insensitive to instruction
Why teach to the test?
![Page 16: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/16.jpg)
Exploring instructional sensitivity
A series of studies conducted in one suburban school district located in the Southwest.Participants
16 third- and 20 fifth-grade mathematics classes in 13 schools
784 students Relatively white, high-performing district with moderate
SES Teachers were relatively experienced (M=13.9, SD= 9.9) District used standards-based report cards Districtwide mathematics curriculum uniformly
implemented
![Page 17: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/17.jpg)
Data Collection Teachers interviewed for approximately two
hours about: instruction and assessment of two performance
objectives grading practices Likelihood that students will correctly answer
state test items relating to the objectives Student mathematics scores on the state test End-of-year grades Demographics
![Page 18: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/18.jpg)
Research questions
Is teaching to the test now appropriate?
Does teaching to the test improve scores?
Are tests sensitive to instructional efforts?
![Page 19: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/19.jpg)
Is teaching to the test appropriate?
![Page 20: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/20.jpg)
My thoughts…
General instruction on tested objectives
Teaching test taking skills
Instruction on tested objectives using examples similar to the test
Decontextualized practice
Practice on the operational (real) test
![Page 21: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/21.jpg)
Is teaching to the test effective?First need to gauge teaching to the test.
1. Asked teachers about their test preparation
practices.
2. Teachers participated in a blind review of mathematics tests containing items from their own and other state tests. They identified items their students could answer and commented on sources of difficulty.
![Page 22: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/22.jpg)
Participants: This analysis 31teachers (12 third-grade, 19 fifth-
grade)
711 students
Students relatively low-performing relative to district
![Page 23: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/23.jpg)
Frequency of test preparation practices
Test Taking Practice Frequency1 General instruction on tested
objectives. 122 Teaching test taking skills. 63 Instruction on tested objectives
using examples like the test format.
6
4 Decontextualized practice that mirrors the state test 12
5 Practice on the operational test. 0
![Page 24: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/24.jpg)
Item review State test awareness
![Page 25: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/25.jpg)
AnalysisConducted a multilevel analysis; students nested within classroomsPredicted mathematics achievement on state standards-based assessment, standardized relative to statewide test performance and pooled across gradesControlled for student-level minority status, ELL status, special education statusTeacher-level main effects:
-teaching to the test categories compared to general instruction on tested objectives
-state test awareness categories compared to test averse teachers
![Page 26: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/26.jpg)
ResultsAfter controlling for student demographics
teaching to the test did not predict achievement being test-secure did predict achievement;
students of test secure teachers performed half a standard deviation better on the state test than students of test-averse teachers
there was no difference in performance between students whose teachers were test averse and those whose teachers were state test focused or out-of-state test focused
![Page 27: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/27.jpg)
Final Model Predicting Mathematics Achievement
Fixed Effect coefficient se df t ratioModel for mean classroom math achievement, β0
Intercept, γ00 -0.242 0.058 27 -0.414Test Secure, γ01 0.525 0.230 27 2.277*Out-of-State Focus, γ02 0.248 0.161 27 1.537In-State Focus, γ03 0.274 0.150 27 1.825
Model for ELL, β1 ELL, γ10 --0.736 0.186 699 -3.961*
Model for Minority, β2 Minority, γ20 -0.380 0.053 699 -7.170*
Model for SPED, β3 SPED, γ30 -0.985 0.152 699 -6.471*
* indicates statistically significant relationship at p<0.05.
![Page 28: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/28.jpg)
Possible interpretations Teaching to the test does not work
The teachers are teaching state standards in a relatively uniform way
The test does not detect instructional efforts
![Page 29: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/29.jpg)
Instructional sensitivityThe degree to which a test can detect differences in the instruction students receive.
With teachers who do not teach state standards
With teachers who teach state standards well
![Page 30: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/30.jpg)
Big question… How do we know what instruction has
occurred? (opportunity to learn)
Instructional sensitivity: the degree of correspondence between opportunity to learn and test performance
![Page 31: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/31.jpg)
Measuring opportunity to learn Teaching to the test is one (gross) approach Alignment: How consistent were test items and
instructional efforts in terms of content and cognitive demand?
Emphasis: Were most heavily tested concepts fully addressed?
The interaction of alignment and emphasis is perhaps the best estimate and should also correlate with achievement
![Page 32: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/32.jpg)
Alignment as Opportunity to Learn
teach skill
unlike test
TestTest Test
![Page 33: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/33.jpg)
My instructional sensitivity studyBased on interviews with teachers about their teaching and assessment of the two objectives most heavily emphasized on the state test Grade 3 Grade 5
PerformanceObjective 1
Make a diagram to represent thenumber of combinations availablewhen 1 item is selected from eachof 3 sets of 2 items (e.g., 2 different shirts,2 different hats, 2 different belts).2 Items
Interpret graphical representations and data displays including bar graphs, circle graphs, frequency tables, three set Venn diagrams, and line graphs that display continuous data. AND Answerquestions based on graphicalrepresentations and data displays.4 Items
PerformanceObjective 2
Discriminate necessary informationfrom unnecessary information in agiven grade-level appropriate wordproblem.3 Items
Describe the rule used in a simplegrade-level appropriate function (e.g., T-chart, input-output model).4 Items
![Page 34: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/34.jpg)
Measuring alignment
Perfect Alignment
Interprets a table
Interprets visual and writteninformation
Interprets a 3 set tree diagram
Close Alignment
Combinations involve 3 sets of items AND multiple visual displays
OR students create a tree diagram
Some Alignment
Introduces concept of combination:
Select 1 item from each
set
Represents combination
in some way (list or
diagram)
Uses relevant vocabulary
(combination, diagram,
different)
Not Aligned
Does not teach skill
![Page 35: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/35.jpg)
For exampleThe teacher who drew these examples was coded as having “close alignment” to AIMS because she required students to solve problems involving three sets of items using a tree diagram.
She did not, however, present students with tree diagrams that they had to interpret (required for “perfect” alignment).
![Page 36: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/36.jpg)
Distribution of alignment scores by grade level
Some Alignment
Close Alignment
Perfect Alignment
![Page 37: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/37.jpg)
Distribution of emphasis scores by grade level
Daily
Weekly
Every other weekMonthly
2 weeks per year
1 week per year1-2 lessonsNot taught
Freq
uenc
y of
in
stru
ctio
n
![Page 38: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/38.jpg)
AnalysisConducted a multilevel analysis; students nested within classroomsPredicted mathematics achievement on state standards-based assessment, standardized relative to statewide test performance, run separately by grade levelControlled for student-level minority status, ELL status, special education status, teacher experience and education, school-level free lunch eligibility, and prior achievement on a norm-referenced testTeacher-level main effects:
-alignment-emphasis-alignment x emphasis interaction
![Page 39: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/39.jpg)
Results None of the main effects predicted achievement
after controlling for prior achievement and demographics at fifth grade
Alignment predicted achievement at third grade after accounting for prior achievement and free lunch eligibility; students whose teachers were a standard deviation above the mean in alignment scored a tenth of a standard deviation above the sample mean on the state test
![Page 40: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/40.jpg)
Final Model Predicting Mathematics Achievement, Third Grade
* indicates statistically significant relationship at p<0.05.
Fixed effect Coefficient SE df t ratio
Model, β0 for mean classroom math achievement
Intercept, γ00 0.45 0.04 13 10.12*
Free Lunch, γ02 -0.07 0.04 13 -1.68
Alignment, γ03 0.10 0.04 13 2.40*
Model for SAT9, β1
Intercept, γ10 0.68 0.04 315 17.01*
![Page 41: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/41.jpg)
Possible interpretations Test is instructionally sensitive to a limited
degree at one grade level, but not the other
Objectives selected impacted results; third grade objectives comprised less of the curriculum—to teach them you had to be very aware of their presence on the test—while fifth grade objectives reflect commonly taught skills.
![Page 42: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/42.jpg)
Implications Need to evaluate instructional sensitivity
if we want to use large scale assessments for accountability or to guide instruction
Sensitivity of total test scores Review item sensitivity during test
development
![Page 43: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/43.jpg)
Implications Need to evaluate instructional sensitivity
if we want to use large scale assessments for accountability or to guide instruction
Sensitivity of total test scores Review item sensitivity during test
development
![Page 44: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/44.jpg)
Exploring item sensitivityTwo approaches recommended by Popham andKaase (2009)
1. Judgmental review of test items2. Differential item functioning based on content
teachers report teaching well and teaching poorlySo far, only approach 2 has been studied. Found norelationship between content teachers said theytaught badly (or didn’t teach) and item functioning.
![Page 45: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/45.jpg)
Another approachCombines both approaches…
Teachers review a test and identify items they consider problematic
Compare classroom level and statewide item difficulties across the entire test
Determine if teacher-identified items perform differently
![Page 46: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/46.jpg)
Visual Analysis: An Example of DIF
![Page 47: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/47.jpg)
Participants: This analysis 10 third grade and 12 fifth grade
teachers from the same data collection
Number of student test scores per classroom ranged from 19 to 30
![Page 48: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/48.jpg)
Teacher who reported instructional alignment
![Page 49: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/49.jpg)
Teacher concerned with a few items
![Page 50: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/50.jpg)
Teacher concerned with test emphasis
![Page 51: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/51.jpg)
Patterns across classrooms Teachers did not do a good job of
predicting which items may function differently in either grade level
Teachers differed in the specific items they identified as problematic, but were more consistent in terms of over- and under-emphasized topics
Item functioning randomly varies across line plots
![Page 52: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/52.jpg)
Possible Interpretations Teachers don’t do a good job of
predicting which items will be difficult for students
Items on this test do not appear to be instructionally sensitive
Negative result: Method failure or test failure????
![Page 53: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/53.jpg)
Limitations (All analyses) Small sample
One district
One content area
Two grade levels
Two objectives used to generalize to entire test for analysis of test score sensitivity
![Page 54: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/54.jpg)
Teaching to the test: Pointless or Profitable?
In this example, teachers seem to have difficulty linking items to what happens in classroomsTest may get at general mathematics aptitude more than attainment of specific standardsTherefore, broadly teaching content may have a greater (or at least equal) impact on achievement
CAVEAT: When a test is comprised of anomalous items, teaching to the test may help.
![Page 55: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/55.jpg)
Implications for the Next Generation Assessments
New item formats may improve instructional sensitivity; requires investigation
Computer-adaptive nature of SMARTER balanced assessment makes teaching to specific items pointless
Test validation should examine instructional sensitivity, especially if scores will be used for school and teacher accountability
![Page 56: Efforts to teach in a way that tests can detect : Pointless or profitable?](https://reader035.fdocuments.in/reader035/viewer/2022070422/568163d7550346895dd5263e/html5/thumbnails/56.jpg)
Questions?Megan WelshEducational Psychology DepartmentNeag School of [email protected]