Comparing Predictive Accuracy and Correct Classification

1
Comparing Predictive Accuracy and Correct Classification Yaacov Petscher, Barbara Foorman, Leilani Saez, Anne Bishop, & Christopher Schatschneider The Florida State University Florida Center for Reading Research 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1-Specificity Sensitivity Abstract 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1-Specificity Sensitivity With the passing of NCLB, a focus of educators has been the identification of students who are likely to be at risk for future reading problems. A significant issue facing researchers today is the development and validation of screening instruments to assess such problems. Traditional approaches to diagnostic accuracy maximize correct classification as a predominant modality of establishing the clinical or practical utility of screening instruments. This practice has been accepted, largely based on screening practices that are typically used in the medical and psychological research communities. A shortcoming of this paradigm is ignoring the base rate of the problem in one’s sample. It is well known that base rate information is typically ignored in the assessment of diagnostic validity, and that the predictive accuracy will vary as a function of the base rate. While correct classification indices are unaffected by base rates, predictive accuracy indices are. The purpose of the present study was to examine the trade- off between maximizing the percentages of students correctly classified as at- risk/not at-risk for reading comprehension failure on the SAT-10 and a new screener, with maximizing predictive accuracy of risk on the SAT-10 by a screener. Typical assessments seek to maximize correct classification based on Sensitivity and Specificity; however, it was of interest for us to achieve 90% negative predictive power (i.e., 90% of students that are identified as at-risk on a screen end up at- risk on the criterion). Introduction Screen Score SAT-10 (<40 th %ile) Y-axis SAT-10 (>=40 th %ile) X-axis 505 35 (.35) 5 (.05) 520 30 (.65) 10 (.15) 550 20 (.85) 20 (.35) 600 10 (.95) 30 (.65) 700 5 (1.00) 35 (1.00) TOTALS N=100 N=100 D C B A D C B A Outcom e Fail Pass Fail Screen Pass D C B A D A OCC D C D N PP B A A PPP D B D SP C A A SE Based on Cumulative Frequency % Method W hite Black Latino FRL ELL A lachua Leon M anatee W hite Black Latino FRL ELL K 1st 2nd Results Fall Winter Spring M axim ize N PP M axim ize N PP M axim ize N PP Screen Screen Screen At-R isk N otAt-R isk Total FP= 0.61 At-R isk N otAt-Risk Total FP= 0.75 At-R isk N otAt-R isk Total FP= 0.79 At-R isk 96 115 211 FN= 0.06 At-R isk 117 153 270 FN= 0.03 At-R isk 112 156 268 FN= 0.02 N otAt-R isk 6 75 81 PPP= 0.45 N otAt-R isk 3 50 53 PPP= 0.43 N otAt-R isk 2 41 43 PPP= 0.42 Total 102 190 292 N PP= 0.93 Total 120 203 323 N PP= 0.94 Total 114 197 311 N PP= 0.95 M axim ize SE & SP M axim ize SE & SP M axim ize SE & SP Screen Screen Screen At-R isk N otAt-R isk Total FP= 0.28 At-R isk N otAt-Risk Total FP= 0.23 At-R isk N otAt-R isk Total FP= 0.31 At-R isk 73 54 127 FN= 0.28 At-R isk 77 47 124 FN= 0.36 At-R isk 75 62 137 FN= 0.34 N otAt-R isk 29 136 165 PPP= 0.57 N otAt-R isk 43 156 199 PPP= 0.62 N otAt-R isk 39 135 174 PPP= 0.55 Total 102 190 292 N PP= 0.82 Total 120 203 323 N PP= 0.78 Total 114 197 311 N PP= 0.78 Outcom e Outcom e Outcom e Outcom e Outcom e Outcom e M axim ize N PP M axim ize N PP M axim ize N PP Screen Screen Screen At-R isk N otAt-R isk Total FP= 0.60 At-R isk N otAt-R isk Total FP= 0.48 At-R isk N otAt-R isk Total FP= 0.44 At-R isk 174 234 408 FN= 0.10 At-R isk 184 172 356 FN= 0.10 At-R isk 176 137 313 FN= 0.08 N otAt-R isk 19 159 178 PPP= 0.43 N otAt-R isk 26 240 266 PPP= 0.52 N otAt-R isk 24 265 289 PPP= 0.56 Total 193 393 586 N PP= 0.89 Total 210 412 622 N PP= 0.90 Total 200 402 602 N PP= 0.92 M axim ize SE & SP M axim ize SE & SP M axim ize SE & SP Screen Screen Screen At-R isk N otAt-R isk Total FP= 0.17 At-R isk N otAt-R isk Total FP= 0.17 At-R isk N otAt-R isk Total FP= 0.18 At-R isk 105 66 171 FN= 0.46 At-R isk 107 72 179 FN= 0.49 At-R isk 133 73 206 FN= 0.34 N otAt-R isk 88 327 415 PPP= 0.61 N otAt-R isk 103 340 443 PPP= 0.60 N otAt-R isk 67 329 396 PPP= 0.65 Total 193 393 586 N PP= 0.79 Total 210 412 622 N PP= 0.77 Total 200 402 602 N PP= 0.83 Outcom e Outcom e Outcom e Outcom e Outcom e Outcom e M axim ize N PP M axim ize N PP M axim ize N PP Screen Screen Screen Outcom e At-R isk N otAt-R isk Total FP= 0.58 At-R isk N otAt-R isk Total FP= 0.48 At-R isk N otAt-R isk Total FP= 0.54 At-R isk 139 225 364 FN= 0.10 At-R isk 151 204 355 FN= 0.11 At-R isk 142 219 361 FN= 0.10 NotAt-R isk 15 164 179 PPP= 0.38 N otAt-R isk 19 225 244 PPP= 0.43 N otAt-R isk 16 188 204 PPP= 0.39 Total 154 389 543 N PP= 0.92 Total 170 429 599 N PP= 0.92 Total 158 407 565 N PP= 0.92 M axim ize SE & SP M axim ize SE & SP M axim ize SE & SP Screen Screen Screen At-R isk N otAt-R isk Total FP= 0.25 At-R isk N otAt-R isk Total FP= 0.28 At-R isk N otAt-R isk Total FP= 0.28 At-R isk 94 96 190 FN= 0.39 At-R isk 109 121 230 FN= 0.36 At-R isk 102 117 219 FN= 0.35 NotAt-R isk 60 293 353 PPP= 0.49 N otAt-R isk 61 308 369 PPP= 0.47 N otAt-R isk 56 299 355 PPP= 0.47 Total 154 389 543 N PP= 0.83 Total 170 429 599 N PP= 0.83 Total 158 416 574 N PP= 0.84 Outcom e Outcom e Outcom e Outcom e Outcom e Conclusions When looking at the predictive accuracy of a screener, it is important to carefully consider the trade-offs between correct classification and accounting for the prevalence of risk. While correct classification has been argued to be the most important element in diagnostic accuracy, this has largely been relevant to studies in the medical and clinical psychology fields. Accounting for base rates, while leading to possible over-identific “missing” students who are at risk and thus decreases the need for intensive interventions.

description

Comparing Predictive Accuracy and Correct Classification Yaacov Petscher, Barbara Foorman, Leilani Saez, Anne Bishop, & Christopher Schatschneider The Florida State University Florida Center for Reading Research. Method. Abstract. Introduction. - PowerPoint PPT Presentation

Transcript of Comparing Predictive Accuracy and Correct Classification

Page 1: Comparing Predictive Accuracy and Correct Classification

Comparing Predictive Accuracy and Correct Classification

Yaacov Petscher, Barbara Foorman, Leilani Saez, Anne Bishop, & Christopher SchatschneiderThe Florida State University

Florida Center for Reading Research

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1-Specificity

Sen

siti

vity

Abstract

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1-Specificity

Sen

siti

vity

With the passing of NCLB, a focus of educators has been the identification of students who are likely to be at risk for future reading problems. A significant issue facing researchers today is the development and validation of screening instruments to assess such problems. Traditional approaches to diagnostic accuracy maximize correct classification as a predominant modality of establishing the clinical or practical utility of screening instruments. This practice has been accepted, largely based on screening practices that are typically used in the medical and psychological research communities. A shortcoming of this paradigm is ignoring the base rate of the problem in one’s sample. It is well known that base rate information is typically ignored in the assessment of diagnostic validity, and that the predictive accuracy will vary as a function of the base rate. While correct classification indices are unaffected by base rates, predictive accuracy indices are.

The purpose of the present study was to examine the trade-off between maximizing the percentages of students correctly classified as at-risk/not at-risk for reading comprehension failure on the SAT-10 and a new screener, with maximizing predictive accuracy of risk on the SAT-10 by a screener. Typical assessments seek to maximize correct classification based on Sensitivity and Specificity; however, it was of interest for us to achieve 90% negative predictive power (i.e., 90% of students that are identified as at-risk on a screen end up at-risk on the criterion).

Our analyses were based on representative sample of 1,935 kindergarten through second grade students who were administered a new screening inventory. The screener consisted of four tasks: Letter Naming, Letter Sounds, Phonological Awareness, and Word Reading. Students were also tested on the SESAT (kindergarten) or SAT-10 (1st-2nd). Logistic regression and ROC analyses were used to determine cut-points that were compared to maximize correct classification or predictive accuracy.

Introduction

Screen Score SAT-10 (<40th%ile)

Y-axis

SAT-10 (>=40th%ile)

X-axis

505 35 (.35) 5 (.05)

520 30 (.65) 10 (.15)

550 20 (.85) 20 (.35)

600 10 (.95) 30 (.65)

700 5 (1.00) 35 (1.00)

TOTALS N=100 N=100

DC

BA

DC

BA

Outcome

Fail Pass

Fail

Screen

Pass

DCBA

DA OCC

DC

D NPP

BA

A PPP

DB

DSP

CA

ASE

Based on Cumulative Frequency %

Method

White

Black

LatinoFRL

ELL Alachua

Leon

Manatee

White

Black

LatinoFRL

ELL K

1st

2nd

Results

Fall Winter Spring

Maximize NPP Maximize NPP Maximize NPPScreen Screen Screen

At-Risk Not At-Risk Total FP= 0.61 At-Risk Not At-Risk Total FP= 0.75 At-Risk Not At-Risk Total FP= 0.79At-Risk 96 115 211 FN= 0.06 At-Risk 117 153 270 FN= 0.03 At-Risk 112 156 268 FN= 0.02Not At-Risk 6 75 81 PPP= 0.45 Not At-Risk 3 50 53 PPP= 0.43 Not At-Risk 2 41 43 PPP= 0.42Total 102 190 292 NPP= 0.93 Total 120 203 323 NPP= 0.94 Total 114 197 311 NPP= 0.95

Maximize SE & SP Maximize SE & SP Maximize SE & SPScreen Screen Screen

At-Risk Not At-Risk Total FP= 0.28 At-Risk Not At-Risk Total FP= 0.23 At-Risk Not At-Risk Total FP= 0.31At-Risk 73 54 127 FN= 0.28 At-Risk 77 47 124 FN= 0.36 At-Risk 75 62 137 FN= 0.34Not At-Risk 29 136 165 PPP= 0.57 Not At-Risk 43 156 199 PPP= 0.62 Not At-Risk 39 135 174 PPP= 0.55Total 102 190 292 NPP= 0.82 Total 120 203 323 NPP= 0.78 Total 114 197 311 NPP= 0.78

Outcome Outcome Outcome

Outcome Outcome Outcome

Maximize NPP Maximize NPP Maximize NPPScreen Screen Screen

At-Risk Not At-Risk Total FP= 0.60 At-Risk Not At-Risk Total FP= 0.48 At-Risk Not At-Risk Total FP= 0.44At-Risk 174 234 408 FN= 0.10 At-Risk 184 172 356 FN= 0.10 At-Risk 176 137 313 FN= 0.08Not At-Risk 19 159 178 PPP= 0.43 Not At-Risk 26 240 266 PPP= 0.52 Not At-Risk 24 265 289 PPP= 0.56Total 193 393 586 NPP= 0.89 Total 210 412 622 NPP= 0.90 Total 200 402 602 NPP= 0.92

Maximize SE & SP Maximize SE & SP Maximize SE & SPScreen Screen Screen

At-Risk Not At-Risk Total FP= 0.17 At-Risk Not At-Risk Total FP= 0.17 At-Risk Not At-Risk Total FP= 0.18At-Risk 105 66 171 FN= 0.46 At-Risk 107 72 179 FN= 0.49 At-Risk 133 73 206 FN= 0.34Not At-Risk 88 327 415 PPP= 0.61 Not At-Risk 103 340 443 PPP= 0.60 Not At-Risk 67 329 396 PPP= 0.65Total 193 393 586 NPP= 0.79 Total 210 412 622 NPP= 0.77 Total 200 402 602 NPP= 0.83

Outcome Outcome Outcome

Outcome Outcome Outcome

Maximize NPP Maximize NPP Maximize NPPScreen Screen Screen Outcome

At-Risk Not At-Risk Total FP= 0.58 At-Risk Not At-Risk Total FP= 0.48 At-Risk Not At-Risk Total FP= 0.54At-Risk 139 225 364 FN= 0.10 At-Risk 151 204 355 FN= 0.11 At-Risk 142 219 361 FN= 0.10Not At-Risk 15 164 179 PPP= 0.38 Not At-Risk 19 225 244 PPP= 0.43 Not At-Risk 16 188 204 PPP= 0.39Total 154 389 543 NPP= 0.92 Total 170 429 599 NPP= 0.92 Total 158 407 565 NPP= 0.92

Maximize SE & SP Maximize SE & SP Maximize SE & SPScreen Screen Screen

At-Risk Not At-Risk Total FP= 0.25 At-Risk Not At-Risk Total FP= 0.28 At-Risk Not At-Risk Total FP= 0.28At-Risk 94 96 190 FN= 0.39 At-Risk 109 121 230 FN= 0.36 At-Risk 102 117 219 FN= 0.35Not At-Risk 60 293 353 PPP= 0.49 Not At-Risk 61 308 369 PPP= 0.47 Not At-Risk 56 299 355 PPP= 0.47Total 154 389 543 NPP= 0.83 Total 170 429 599 NPP= 0.83 Total 158 416 574 NPP= 0.84

Outcome Outcome Outcome

Outcome Outcome

Conclusions

When looking at the predictive accuracy of a screener, it is important to carefully consider the trade-offsbetween correct classification and accounting for the prevalence of risk. While correct classification has beenargued to be the most important element in diagnostic accuracy, this has largely been relevant to studiesin the medical and clinical psychology fields. Accounting for base rates, while leading to possible over-identification, reduces the likelihood of “missing” students who are at risk and thus decreases the need forintensive interventions.