CREATE – National Evaluation Institute Annual Conference – October 8-10, 2009 The Brown Hotel,...
-
Upload
derek-jackson -
Category
Documents
-
view
213 -
download
0
Transcript of CREATE – National Evaluation Institute Annual Conference – October 8-10, 2009 The Brown Hotel,...
CREATE – National Evaluation InstituteAnnual Conference – October 8-10, 2009
The Brown Hotel, Louisville, Kentucky
Research and Evaluation that inform Leadership for Results
Masking Variations
in
Achievement Gains
By Eliot R. LongA*Star Audits, LLC - Brooklyn, NY 11231eliotlong.astaraudits.com -30%
-20%
-10%
0%
10%
20%
30%
Per
cent
Gai
n or
Dec
line
Observed
True
True vs. Observed Gainsat min. score for passing
Teacher encouraged guessing:Unstructured influence on student test item responses
An accepted practice
Recommended by educational assessment writers Supported by extensive research – since the early 1920’s A common practice in schools across the U.S.
- Assessment mantra: “If it’s blank, it’s wrong” Informal, entrepreneurial teacher activity
– no written policy or instructions how to – or not to - do it
Yet, no evaluation of impact on program evaluation or accountability
No study of effects on low performing students No study of impact on comparison of test scores over time No study of recommendations put into general practice
A Norms Review
The following exhibits are based on four separate research projects, each including the development of group response pattern norms
- Classroom groups, grades 3-7 in a northeast urban school district
15,825 classrooms, 391,078 students
- School groups, grade 3 statewide in Midwest 2,317 schools, 140,203 students
- Nationwide sample, grade 4A test section of the 2002 NAEP Reading36,314 students
- Job applicant groups across the U.S.87 employers, 447 employer groups, 32,458 job applicants
Percent Correct &Test Completion
Teacher Administered Tests Non-teacher Administered Tests Pct. Pct. Attp. Pct. Pct. Attp.
Test-Takers Correct All Questions Test-Takers Correct All Questions
Northeast 1999-2001 Independent Proctor AdministeredUrban School District – Reading Tests NAEP Reading 2002 Grade 3 68.6% 97.4% Grade 4 67.6% 60.9% Grade 4 74.7% 96.7% Grade 5 65.5% 93.6% Employer Administered 1996-1999 Grade 6 67.4% 93.1% Verbal Skills Grade 7 71.0% 96.4% Job Applicants 82.0% 44.0%
Midwest 2001 Quantitative SkillsStatewide – Math Test Job Applicants 75.2% 28.2% Grade 3 63.5% 97.4%
“If it’s blank, it’s wrong.” No encouraged guessing
Test Completion: A Teacher/Proctor Effect
Answers left blank are concentrated by classroom15.6% of all classrooms account for 77.6% of all answers left blank. 5.6% of all classrooms account for 48.0% of all answers left blank.
Grade 5 Reading45 items – 4 alternative, multiple-choice
All Classes ‘Low Blanks’ Classes ‘High Blanks’ Classes < 26 Ans. Left Blank 26+ Ans. Left Blank
Class Blanks Pct. Attp. Blanks Blanks Pct. of All Standing Classes Per Class All Ques. Classes Per Class Classes Per Class Classes Blanks
4th Q. 617 1.8 97.3% 613 1.6 4 34.3 0.6% 12.0% 3rd Q. 620 4.5 94.9% 599 3.2 21 43.3 3.4% 32.5% 2nd Q. 619 6.1 93.1% 580 3.7 39 42.0 6.3% 43.5% 1st Q. 619 10.4 90.1% 544 4.3 75 54.6 12.1% 63.8%
All 2,475 5.7 94.0% 2,336 3.1 139 48.8 5.6% 48.0%
-------------------------------------------------------------------------------------------------------- Pct. Correct 65.5% 65.9% 59.3% Pct. Attp. All 94.0% 95.1% 74.0%
Tale of Two Classes:Number Attempted by Number Correct
Two classrooms at the same class average scorewith and without encouraged guessing.
Class: n = 21, Blanks = 3 Class n = 21, Blanks = 199 RS Avg. = 19.4 SD = 4.3 RS Avg. = 19.4 SD = 7.9 KR-20 = .53 – Pct. Blank = 0.3% KR-20 = .89 – Pct. Blank = 21.1%
The Norm of Classroom Test Administration The Exception9 11 11 13 14 14 15 15 16 16 16 16 19 23 23 24 24 25 30 33 41
Grade 5 Reading - Number Correct Score
0
15
30
45
Num
ber
Corr
ect -
N
um
ber A
ttem
pte
d
Number Correct Number Attempted Forecast Number Attempted
Student Scores: Number Correct and Number Attempted
High Blanks Class KR-20 = .89Regression estimate (r =.679, n = 72): Number Attempted = 20.2 =(0.67*Number Correct)
12 13 14 15 16 16 17 18 19 19 20 20 21 21 21 21 22 24 25 25 29
Grade 5 Reading - Number Correct Score
0
15
30
45
Num
ber
Cor
rect
- N
um
ber
Atte
mpt
ed
Number Correct Number Attempted Forecast Number Attempted
Student Scores: Number Correct and Number Attempted
Low Blanks Class KR-20 = .53Regression estimate: Number Attempted = 20.2 + (0.67*Number Correct)
NAEP & Job Applicants:Number Attempted by Number Correct
Independent Test Administrators Employer Administered NAEP 2000 Grade 4 Reading Test of Basic Verbal Skills
Students leave many answers blank Job applicants leave many answers blank Pct. correct of attempts = 67.6% Pct. correct of attempts = 75.1%
Pct. attempt all questions = 60.9% Pct. attempt all questions = 1.8%
0 1 2 3 4 5 6 7 8 9 10 11 12
Grade 4 R eading 2002 - Item Block #4 - Number Correct
0
2
4
6
8
10
12
14
Num
ber
Cor
rect
-
Ave
rage
Num
ber
Att
empt
ed
Number Correct Average Number Attempted
Average Number Attempted by Number CorrectNAEP Grade 4 Reading 2000 - n = 36,314, Av erage RS 7.3
10 13 15 16 16 18 19 20 21 22 23 24 26 27 28 30 33 38 41
Job Applicants - Verbal Skills Test - Number Correct
0
10
20
30
40
50
Nu
mb
er
Co
rre
ct -
Nu
mb
er
Atte
mp
ted
Number Correct Number Attempted Forecast Number Attempted.
Number Attempted by Number Correct
Job Applicants - Employer Group: n = 56, Av erage RS = 23.5;
Correlation Analysis:Number Attempted - Number Correct
Teacher Administered All Students Students with Blanks =>5Grade 5 Reading r = .153 n = 66,320 r = .527 n = 1,094Grade 5 Math r = .110 n = 69,413 r = .549 n = 238Grade 6 Reading r = .162 n = 62,524 r = .583 n = 658Grade 7 Reading r = .202 n = 58,915 r = .597 n = 1,416
Independent Test AdministratorNAEP Grade 4 Reading r = .608 n = 36,314
Employer AdministeredJob ApplicantsTest of Verbal Skills r = .717 n = 32,458Test of Quantitative Skills r = .581 n = 31,629
Hovland and Wonderlic (1939)Adult workers & studentsOtis Test of Mental Ability 4 test forms & 2 time limits r = .608 to .723 n = 125 to 2,274 (8 variations)
Location of Answers Left Blank Recommendations to encourage guessing presume that most answers left blank are imbedded; that is, they representquestions that are addressed and, for some reason, skipped.
Our norms reveal that most blanks are trailing; that is,they represent questions that are not reached during the time limit.
Position of Blanks Imbedded Trailing
Grade 5 Reading 22.3% 77.7%NAEP Grade 4 Reading 15.8% 84.2%Job Applicant Verbal Skills 5.2% 94.8%
Teachers must significantly change students’ test work behaviorto achieve answers to ‘not reached’ questions. How?
Test Score Reliability (KR-20) by Classroom
Teacher involvement in their students’ test work behavior to encourage guessing is entrepreneurial, often undermining test score reliability.
50+ Answers Left Blank No Answers Left Blank 42 classrooms at and below average 330 classrooms at and below average likely to have little encouragement to guess likely to have extensive encouragement to
guess
15.620.6
21.622.4
23.023.6
24.124.6
25.025.5
25.826.2
26.526.8
27.127.6
28.128.4
28.7
Grade 5 Reading - Class Average Number Correct Score
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
K-R
20
-
T
est R
elia
bility -
Inte
rna
l C
onsis
tency
Forecast for Blanks => 50 Observed for Blanks => 50
Test Reliabil ity (K-R 20) by Class Average Number Correct Score
Classes with 50 or more Answers Left Blank - n = 42Average K-R 20 = .82; RS forecast K-R20: r = .013; constant .824, slope -.0003
15.620.6
21.622.4
23.023.6
24.124.6
25.025.5
25.826.2
26.526.8
27.127.6
28.128.4
28.7
Grade 5 Reading - Class Average Number Correct Score
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
K-R
20
-
Te
st R
elia
bili
ty -
Inte
rna
l Co
nsi
ste
ncy
Forecast for Blanks = 0 Observed for Blanks = 0
Test Reliabil ity (K-R 20) by Class Average Number Correct Score
Classes with No Answers Left Blank - n = 330Average K-R 20 = .75; RS forecast K-R20: r = .339; constant .364, slope .0153
The volume of teacher encouraged guessing
Parsing Grade 5 number correct scores:
The traditional correction-for-guessing:
S = R – W/(n-1)
For the number correct score at the minimum for Basic (R = 18):
S = 18 – 27/(4-1) = 18 - 9S = 9
Result: Half of the number correct score is due to random guessing.
RS 18 = Min. Scale Score For ‘Basic’ - just passing
S = True ScoreR = Number RightW = Number Wrong n = Number of Answer Choices
Grade 5 Reading: 45 items 4 ans. alternatives
Success rate: A norms approach
The traditional correction-for-guessing formula assumes that 100% of skills based answers are correct. A regression of median percent correct on number attempted for test-takers who leave 5+ answers blank finds a variable rate of success:
Regression of Median Pct. Correct on Number Attempted
Test-Takers Number Data Pts R squared Constant Slope Grade 5 Reading 1,449 7* .699 0.321 0.0091Grade 6 Reading 1,486 7* .877 0.416 0.0065Grade 7 Reading 1,269 7* .703 0.468 0.0040
Job Applicants 15,650 25** .905 0.465 0.0094
or Percent Correct = 0.465 + 0.0094*As
where As represents the number of questions answered based on the test-taker’s skills.
* Number attempted ranges: Up to 15, 16-20, 21-25, 26-30, 31-35, 36-40, 41-45 ** Number attempted: 21 through 45
Add norms to The traditional formula = Empirical Approach
Traditional formula:S = R – W/(n-1)
or R = S + W/(n-1) skills + guessing
Empirical formula:
R = Pct. Correct*As + (At – As)/nor R = 0.0094*As2 + 0.465*As + (At – As)/n
---- skills ----- + guessing
For a score of 18:
18 = (0.0094*17.72) + (0.465*17.7) + ((45-17.7)/4) = 2.945 + 8.23 + 6.825
18 = 11.175 + 6.825 skills + guessing
Results: 39% (17.7/45) of answers are attempted based on skills61% of answers are guessed due to teacher encouragement38% of the observed score is based on encouraged random guessing
Note:
W = (At – As)*((n-1)/n)
At = Total attempts = 45
As = Skill based attempts
Solution:
Substitute 45 for At
and 18 for R, find
As = 17.7
Observed and Estimated True Scores
Grade 5 Reading Test: Distribution of Observed and Estimated True Skills
Application of the ‘empirical’ parsingformula to the full distribution of Grade 5 scores*.
Student Distribution Mean SD
Observed 29.1 7.8Est. True 26.4 9.4Change +10.2% -16.5%
Classroom Distribution Avg. Mean Avg. SD
Observed 29.1 5.9Est. True 26.0 7.4Change +11.6% -19.9%
* Random guessing outcomes are forecast bythe binomial distribution and moderated by the variation in the volume of guessing with student skill level. The actual percent guessed correct is lower than expected among lower observed scores and higher than expected among higher observed scores.
0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45
Grade 5 Reading Test - Number Correct Score
0%
1%
2%
3%
4%
5%
Fre
quen
cy -
Per
cent
of A
ll S
tude
nts
Est. True Score Distribution Observed Score Distribution
Distribution of Number Correct Scores
Estimated Skills Based and Observed Number Correct Scores
Volume of encouraged guessingBy Performance Level
Contribution of Encouraged GuessingTo Student Scores
Student Averages by Performance LevelGrade 5 Reading
Estimates for Random Guessing
Average Average Pct. of Pct. of Student Pct. of Number Number Number All Performance Level Students Correct Attempted Correct Answers Level 4 Advanced 6.2% 42.1 45.0 0.0% 0.0% Level 3 Proficient 48.4% 34.1 44.9 5.3% 15.4% Level 2 Basic 37.1% 23.6 44.6 19.0% 42.8% Level 1 Below Basic 8.3% 14.4 44.1 44.1% 69.8%
Levels 1-2 Basic & Below 45.4% 21.9 44.5 23.8% 47.1%
All Students 100.0% 29.1 44.8 10.2% 26.7%
Encouraged guessing Creates a test score modulator
Changes in skill and guessing move in opposite direction, offsetting in the total score.
Comparison of First Test and Second Test Scores
Test AnswersTest Observed Based on Based on Guessing
Administration Total Skills Guessing Contribution
1st Test Admin.Correct 18 11.2 6.8 37.8%Attempts 45 17.7 27.3 60.7%Pct. Correct 40.0% 63.3% 25.0%
2nd Test Admin.Correct 20 13.8 6.2 31.0%Attempts 45 20.1 24.9 55.3%Pct. Correct 44.4% 68.7% 25.0%
Gain 2 2.6 -0.6Pct. Gain 11.1% 23.2%
52% of
true gain
masked by
guessing
-30%
-20%
-10%
0%
10%
20%
30%
Pe
rce
nt
Ga
in o
r D
ec
line
Observed
True
True vs. Observed Gainsat min. score for passing
Estimated Gain Masked by Guessing
The ‘empirical’ formula may be applied to first test and second tests at each score level.
Hypothetical Gains Parsed for Guessing Effects
Number Correct Pct. Pct. Pct.Percentile First Second Observed Est. True True GainStanding Score Score Gain Gain Masked_
90% 39.0 42.9 10.0% 10.9% 8.3% 80% 36.0 39.6 10.0% 13.3% 24.9% 70% 34.0 37.4 10.0% 13.2% 24.2% 60% 32.0 35.2 10.0% 13.8% 27.3% 50% 30.0 33.0 10.0% 13.4% 25.6% 40% 27.0 29.7 10.0% 15.2% 34.1% 30% 25.0 27.5 10.0% 15.8% 36.7% 20% 22.0 24.2 10.0% 16.5% 39.2% 10% 18.0 19.8 10.0% 20.6% 51.5%
Findings of a Norms Review
The informal practice of teacher encouraged guessing to complete all test answers has the following effects:
1. High volume of non-skills based test answers
The volume of test answers that result from teacher encouragement is very high: 26% of all answers for students at the school district average and 50% or more among students most at risk of failing.
2. Teacher involvement lowers test score reliability Teacher involvement is unstructured, varying from classroom to classroom and from student to student, creating widely varying and generally lower test score reliability.
3. Guessed correct answers reduce the range of measurement Added guessing increases among lower performing students, raising their scores more than higher performing students and therefore narrowing the range of measurement by ~20%.
Findings of a Norms Review Continued
4. Guessing creates a test score modulator Changes in student achievement will cause changes in the volume of guessing – in the opposite, offsetting direction - modulating observed scores. This modulating effect masks variations in gain, by as much as 50% or more among low performing students.
Teacher encouraged guessing narrows the window onto studentachievement gains, while reducing both the range and reliability of the measurement that can be observed. As a consequence, non-skills related variation may predominate, misdirecting test score interpretation and education policy.