Comparing Predictive Accuracy and Correct Classification
description
Transcript of Comparing Predictive Accuracy and Correct Classification
Comparing Predictive Accuracy and Correct Classification
Yaacov Petscher, Barbara Foorman, Leilani Saez, Anne Bishop, & Christopher SchatschneiderThe Florida State University
Florida Center for Reading Research
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1-Specificity
Sen
siti
vity
Abstract
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1-Specificity
Sen
siti
vity
With the passing of NCLB, a focus of educators has been the identification of students who are likely to be at risk for future reading problems. A significant issue facing researchers today is the development and validation of screening instruments to assess such problems. Traditional approaches to diagnostic accuracy maximize correct classification as a predominant modality of establishing the clinical or practical utility of screening instruments. This practice has been accepted, largely based on screening practices that are typically used in the medical and psychological research communities. A shortcoming of this paradigm is ignoring the base rate of the problem in one’s sample. It is well known that base rate information is typically ignored in the assessment of diagnostic validity, and that the predictive accuracy will vary as a function of the base rate. While correct classification indices are unaffected by base rates, predictive accuracy indices are.
The purpose of the present study was to examine the trade-off between maximizing the percentages of students correctly classified as at-risk/not at-risk for reading comprehension failure on the SAT-10 and a new screener, with maximizing predictive accuracy of risk on the SAT-10 by a screener. Typical assessments seek to maximize correct classification based on Sensitivity and Specificity; however, it was of interest for us to achieve 90% negative predictive power (i.e., 90% of students that are identified as at-risk on a screen end up at-risk on the criterion).
Our analyses were based on representative sample of 1,935 kindergarten through second grade students who were administered a new screening inventory. The screener consisted of four tasks: Letter Naming, Letter Sounds, Phonological Awareness, and Word Reading. Students were also tested on the SESAT (kindergarten) or SAT-10 (1st-2nd). Logistic regression and ROC analyses were used to determine cut-points that were compared to maximize correct classification or predictive accuracy.
Introduction
Screen Score SAT-10 (<40th%ile)
Y-axis
SAT-10 (>=40th%ile)
X-axis
505 35 (.35) 5 (.05)
520 30 (.65) 10 (.15)
550 20 (.85) 20 (.35)
600 10 (.95) 30 (.65)
700 5 (1.00) 35 (1.00)
TOTALS N=100 N=100
DC
BA
DC
BA
Outcome
Fail Pass
Fail
Screen
Pass
DCBA
DA OCC
DC
D NPP
BA
A PPP
DB
DSP
CA
ASE
Based on Cumulative Frequency %
Method
White
Black
LatinoFRL
ELL Alachua
Leon
Manatee
White
Black
LatinoFRL
ELL K
1st
2nd
Results
Fall Winter Spring
Maximize NPP Maximize NPP Maximize NPPScreen Screen Screen
At-Risk Not At-Risk Total FP= 0.61 At-Risk Not At-Risk Total FP= 0.75 At-Risk Not At-Risk Total FP= 0.79At-Risk 96 115 211 FN= 0.06 At-Risk 117 153 270 FN= 0.03 At-Risk 112 156 268 FN= 0.02Not At-Risk 6 75 81 PPP= 0.45 Not At-Risk 3 50 53 PPP= 0.43 Not At-Risk 2 41 43 PPP= 0.42Total 102 190 292 NPP= 0.93 Total 120 203 323 NPP= 0.94 Total 114 197 311 NPP= 0.95
Maximize SE & SP Maximize SE & SP Maximize SE & SPScreen Screen Screen
At-Risk Not At-Risk Total FP= 0.28 At-Risk Not At-Risk Total FP= 0.23 At-Risk Not At-Risk Total FP= 0.31At-Risk 73 54 127 FN= 0.28 At-Risk 77 47 124 FN= 0.36 At-Risk 75 62 137 FN= 0.34Not At-Risk 29 136 165 PPP= 0.57 Not At-Risk 43 156 199 PPP= 0.62 Not At-Risk 39 135 174 PPP= 0.55Total 102 190 292 NPP= 0.82 Total 120 203 323 NPP= 0.78 Total 114 197 311 NPP= 0.78
Outcome Outcome Outcome
Outcome Outcome Outcome
Maximize NPP Maximize NPP Maximize NPPScreen Screen Screen
At-Risk Not At-Risk Total FP= 0.60 At-Risk Not At-Risk Total FP= 0.48 At-Risk Not At-Risk Total FP= 0.44At-Risk 174 234 408 FN= 0.10 At-Risk 184 172 356 FN= 0.10 At-Risk 176 137 313 FN= 0.08Not At-Risk 19 159 178 PPP= 0.43 Not At-Risk 26 240 266 PPP= 0.52 Not At-Risk 24 265 289 PPP= 0.56Total 193 393 586 NPP= 0.89 Total 210 412 622 NPP= 0.90 Total 200 402 602 NPP= 0.92
Maximize SE & SP Maximize SE & SP Maximize SE & SPScreen Screen Screen
At-Risk Not At-Risk Total FP= 0.17 At-Risk Not At-Risk Total FP= 0.17 At-Risk Not At-Risk Total FP= 0.18At-Risk 105 66 171 FN= 0.46 At-Risk 107 72 179 FN= 0.49 At-Risk 133 73 206 FN= 0.34Not At-Risk 88 327 415 PPP= 0.61 Not At-Risk 103 340 443 PPP= 0.60 Not At-Risk 67 329 396 PPP= 0.65Total 193 393 586 NPP= 0.79 Total 210 412 622 NPP= 0.77 Total 200 402 602 NPP= 0.83
Outcome Outcome Outcome
Outcome Outcome Outcome
Maximize NPP Maximize NPP Maximize NPPScreen Screen Screen Outcome
At-Risk Not At-Risk Total FP= 0.58 At-Risk Not At-Risk Total FP= 0.48 At-Risk Not At-Risk Total FP= 0.54At-Risk 139 225 364 FN= 0.10 At-Risk 151 204 355 FN= 0.11 At-Risk 142 219 361 FN= 0.10Not At-Risk 15 164 179 PPP= 0.38 Not At-Risk 19 225 244 PPP= 0.43 Not At-Risk 16 188 204 PPP= 0.39Total 154 389 543 NPP= 0.92 Total 170 429 599 NPP= 0.92 Total 158 407 565 NPP= 0.92
Maximize SE & SP Maximize SE & SP Maximize SE & SPScreen Screen Screen
At-Risk Not At-Risk Total FP= 0.25 At-Risk Not At-Risk Total FP= 0.28 At-Risk Not At-Risk Total FP= 0.28At-Risk 94 96 190 FN= 0.39 At-Risk 109 121 230 FN= 0.36 At-Risk 102 117 219 FN= 0.35Not At-Risk 60 293 353 PPP= 0.49 Not At-Risk 61 308 369 PPP= 0.47 Not At-Risk 56 299 355 PPP= 0.47Total 154 389 543 NPP= 0.83 Total 170 429 599 NPP= 0.83 Total 158 416 574 NPP= 0.84
Outcome Outcome Outcome
Outcome Outcome
Conclusions
When looking at the predictive accuracy of a screener, it is important to carefully consider the trade-offsbetween correct classification and accounting for the prevalence of risk. While correct classification has beenargued to be the most important element in diagnostic accuracy, this has largely been relevant to studiesin the medical and clinical psychology fields. Accounting for base rates, while leading to possible over-identification, reduces the likelihood of “missing” students who are at risk and thus decreases the need forintensive interventions.