David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014
description
Transcript of David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014
![Page 1: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/1.jpg)
Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic, Jackknife,
Bootstrap and other Statistical Methodologies
David G. Brown and Frank Samuelson
Center for Devices and Radiological Health, FDA
6 July 2014
![Page 2: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/2.jpg)
Course OutlineI. Performance measures for Computational Intelligence (CI) observers
1. Accuracy
2. Prevalence dependent measures
3. Prevalence independent measures
4. Maximization of performance: Utility analysis/Cost functions
II. Receiver Operating Characteristic (ROC) analysis1. Sensitivity and specificity
2. Construction of the ROC curve
3. Area under the ROC curve (AUC)
III. Error analysis for CI observers1. Sources of error
2. Parametric methods
3. Nonparametric methods
4. Standard deviations and confidence intervals
IV. Boot strap methods1. Theoretical foundation
2. Practical use
V. References
![Page 3: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/3.jpg)
What’s the problem?
• Emphasis on algorithm innovation to exclusion of performance assessment
• Use of subjective measures of performance – “beauty contest”
• Use of “accuracy” as a measure of success• Lack of error bars—My CIO is .01 better than
yours (+/- ?)• Flawed methodology—training and testing on
same data • Lack of appreciation for the many different
sources of error that can be taken into account
![Page 4: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/4.jpg)
Original image
Lena. Courtesy of the Signal and Image Processing Institute at the University of Southern California.
![Page 5: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/5.jpg)
CI improved image
Baboon. Courtesy of the Signal and Image Processing Institute at the University of Southern California.
![Page 6: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/6.jpg)
Panel of expertsfunnymonkeysite.com
![Page 7: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/7.jpg)
I. Performance measures for computational intelligence (CI) observers
• Task based: (binary) discrimination task– Two populations involved: “normal” and “abnormal,”
• Accuracy – Intuitive but incomplete– Different consequences for success or failure for each
population
• Some measures depend on the prevalence (Pr) some do not, Pr = – Accuracy, positive predictive value, negative predictive value– Sensitivity, specificity, ROC, AUC
• True optimization of performance requires knowledge of cost functions or utilities for successes and failures in both populations
populationin subjects all ofnumber Total
populationin abnormals ofNumber
![Page 8: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/8.jpg)
How to make a CIO with >99% accuracy
• Medical problem: Screening mammography (“screening” means testing in an asymptomatic population)
• Prevalence of breast cancer in the screening population Pr = 0.5 %
• My CIO always says “normal”• Accuracy (Acc) is 99.5% (accuracy of accepted
present-day systems ~75%)• Accuracy in a diagnostic setting (Pr~20%) is
80% -- Acc=1-Pr (for my CIO)
![Page 9: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/9.jpg)
CIO operates on two different populations
Normal cases p(t|0)
Abnormal casesp(t|1)
Threshold t = T
t-axis
![Page 10: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/10.jpg)
Must consider effects on normal and abnormal populations separately
• CIO output t• p(t|0) probability distribution of t for the population of normals• p(t|1) probability distribution of t for the population of abnormals• Threshold T. Everything to the right of T called abnormal, and
everything to the left of T called normal• Area of p(t|0) to left of T is the true negative fraction (TNF =
specificity) and to the right the false positive fraction (FPF = type 1 error). TNF + FPF = 1
• Area of p(t|1) to left of T is the false negative fraction (FNF = type 2 error) and to the right is the true positive fraction (TPF = sensitivity) FNF + TPF = 1
• TNF, FPF, FNF, TPF all are prevalence independent, since each is some fraction of one of our two probability distributions
• {Accuracy = Pr x TPF + (1-Pr) x TNF}
![Page 11: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/11.jpg)
Normalcases
Abnormalcases
Threshold T
TPF (.95)
TNF (.5)
t-axis
t-axis
FNF (.05)
FPF (.5)
![Page 12: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/12.jpg)
Prevalence dependent measures
• Accuracy (Acc) Acc = Pr x TPF + (1-Pr) x TNF
• Positive predictive value (PPV): fraction of positives that are true positives PPV = TPF x Pr / (TPF x Pr + FPF x (1-Pr))
• Negative predictive value (NPV): fraction of negatives that are true negativesNPV = TNF x (1-Pr) / (TNF x (1-Pr) + FNF x Pr)
• Using the mammography screening Pr and previous TPF, TNF, FNF, FPF values: Pr = .05, TPF = .95, TNF = 0.5, FNF=.05, FPF=0.5 Acc = .05x.95+.95x.5 = .52 PPV = .95x.05/(.95x.05+.5x.95) = .10NPV = .5x.95/(.5x.95+.05x.05) = .997
![Page 13: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/13.jpg)
Prevalence dependent measures
• Accuracy (Acc) Acc = Pr x TPF + (1-Pr) x TNF
• Positive predictive value (PPV): fraction of positives that are true positives PPV = TPF x Pr / (TPF x Pr + FPF x (1-Pr))
• Negative predictive value (NPV): fraction of negatives that are true negativesNPV = TNF x (1-Pr) / (TNF x (1-Pr) + FNF x Pr)
• Using the mammography screening Pr and previous TPF, TNF, FNF, FPF values: Pr = .005, TPF = .95, TNF = 0.5, FNF=.05, FPF=0.5 Acc = .005x.95+.995x.5 = .50 PPV = .95x.005/(.95x.005+.5x.995) = .01NPV = .5x.995/(.5x.995+.05x.005) = .995
![Page 14: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/14.jpg)
Acc, PPV, NPV as functions of prevalence(screening mammography)
• TPF=.95• FNF=.05• TNF=0.5• FPF=0.5
![Page 15: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/15.jpg)
Acc = NPV as function of prevalence(forced “normal” response CIO)
![Page 16: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/16.jpg)
Prevalence independent measures
• Sensitivity = TPF
• Specificity = TNF (1-FPF)
• Receiver Operating Characteristic (ROC) = TPF as a function of FPF (Sensitivity as a function of 1 – Specificity)
• Area under the ROC curve (AUC)
= Sensitivity averaged over all values of Specificity
![Page 17: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/17.jpg)
17
Threshold
TP
F,
sens
itivi
tyFPF, 1-specificity
Entire ROC curve
Normal / Class 0subjects
Abnormal / Class 1subjects
ROC slope
![Page 18: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/18.jpg)
Empirical ROC data for mammography screening in the US
0 .0
0 .0
0 .0
0 .0
0 .1
0 .1
0 .1
0 .1
0 .2
0 .2
0 .2
0 .2
0 .3
0 .3
0 .3
0 .3
0 .4
0 .4
0 .4
0 .4
0 .5
0 .5
0 .5 0 .5
0 .6
0 .6
0 .6
0 .6
0 .7
0 .7
0 .7
0 .7
0 .8
0 .8
0 .8
0 .8
0 .9
0 .9
0 .9
0 .9
1 .0
1 .01 .0
1 .0
F a lse P o sitiv e F ractio n
Tru e N ega tiv e F ractio n
Tru
e P
osit
ive
Fra
ctio
n
Fal
se N
egat
ive
Fra
ctio
n
Craig Beam et al.
![Page 19: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/19.jpg)
Maximization of performance• Need to know utilities or costs of each type of decision outcome –
but these are very hard to estimate accurately. You don’t just maximize accuracy.
• Need prevalence• For mammography example
– TPF: prolongation of life minus treatment cost– FPF: diagnostic work-up cost, anxiety– TNF: peace of mind– FNF: delay in treatment => shortened life
• Hypothetical assignment of utilities for some decision threshold T:– UtilityT= U(TPF) x TPF x Pr + U(FPF) x FPF x (1-Pr)
+ U(TNF) x TNF x (1-Pr) + U(FNF) x FNF x Pr– U(TPF) = 100, U(FPF) = -10, U(TNF) = 4, U(FNF) = -20– UtilityT= 100 x .95 x .05 – 10 x .50 x .95
+ 4 x .50 x .95 – 20 x .05 x .05 = 1.85• Now if we only knew how to trade off TPF versus FPF, we could
optimize (?) medical performance.
![Page 20: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/20.jpg)
Utility maximization(mammography example)
![Page 21: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/21.jpg)
Choice of ROC operating point through utility analysis—screening mammography
![Page 22: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/22.jpg)
Utility maximization(mammography example)
![Page 23: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/23.jpg)
Utility maximization calculation
UTPFTPF+UFNFFNF)PR+(UTNFTNF+UFPFFPF)(1-PR)
=(UTPFTPF+UFNF(1-TPF))PR+(UTNF(1-FPF)+UFPFFPF)(1-PR)
d/dFPF=(UFPF-UTNF)(1-PR)+(UTPF-UFNF)PRdTPF/dFPF
=0 dTPF/dFPF=(UTNF-UFPF)(1-PR)/(UTPF-UFNF)PR
PR=.005 dTPF/dFPF = 23.
PR=.05 dTPF/dFPF = 2.2
(UTPF=100, UFNF=-20, UTNF=4, UFPF=-20)
![Page 24: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/24.jpg)
Threshold
Abnormalcases
TPF,
sensi
tivit
yFPF, 1-specificity
Entire ROC curve
Normalcases
ROC slope
![Page 25: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/25.jpg)
Estimators
• TPF, FPF, TNF, FNF, Accuracy, the ROC curve, and AUC are all fractions or probabilities.
• Normally we have a finite sample of subjects on which to test our CIO. From this finite sample we try to estimate the above fractions– These estimates will vary depending upon the
sample selected (statistical variation).– Estimates can be nonparametric or
parametric
![Page 26: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/26.jpg)
Estimators
• TPF=
• TPF=
• Number in sample << Number in population (at least in theory)
Number of abnormals that would be selected by CIO in the population
Number of abnormals in the population
Number of abnormals that were selected by CIO in the sample
Number of abnormals in the sample
![Page 27: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/27.jpg)
II. Receiver Operating Characteristic (ROC)
• Receiver Operating Characteristic
• Binary Classification
• Test result is compared to a threshold
![Page 28: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/28.jpg)
Distribution of CIO Output for all Subjects
Threshold
Computational intelligence observer output
![Page 29: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/29.jpg)
Threshold
Computational intelligence observer outputt-axis
Distribution of Outputfor Normal / Class 0
Subjects, p(t|0)
Distribution of Outputfor Abnormal / Class 1
Subjects, p(t|1)
![Page 30: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/30.jpg)
Abnormal / Class 1subjects
Threshold
Distribution of Outputfor Normal / Class 0
Subjects, p(t|0)
![Page 31: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/31.jpg)
Abnormal / Class 1subjects
Threshold
Sensitivity
Specificity
= True Negative Fraction = TNF
= True Positive Fraction = TPF
Distribution of Outputfor Normal / Class 0
Subjects, p(t|0)
![Page 32: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/32.jpg)
DecisionD0 D1
Threshold
Sensitivity
Specificity
TNF0.50
TPF0.95
Tru
thH
1
H0
Normal / Class 0subjects
Abnormal / Class 1subjects
![Page 33: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/33.jpg)
Threshold
1 - Specificity
= False Positive Fraction = FPF
1 - Sensitivity
= False Negative Fraction = FNF
Normal / Class 0subjects
Abnormal / Class 1subjects
![Page 34: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/34.jpg)
1 - Specificity
1 - Sensitivity
FNF0.05
DecisionD0 D1
TNF0.50
FPF0.50
TPF0.95
Tru
thH
1
H0
Threshold
Normal / Class 0subjects
Abnormal / Class 1subjects
![Page 35: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/35.jpg)
TP
F,
sens
itivi
tyFPF, 1-specificity
highsensitivity
Threshold
Normal / Class 0subjects
Abnormal / Class 1subjects
![Page 36: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/36.jpg)
TP
F,
sens
itivi
tyFPF, 1-specificity
sensitivity = specificity
Threshold
Normal / Class 0subjects
Abnormal / Class 1subjects
![Page 37: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/37.jpg)
TP
F,
sens
itivi
tyFPF, 1-specificity
highspecificity
Threshold
Normal / Class 0subjects
Abnormal / Class 1subjects
![Page 38: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/38.jpg)
TP
F,
sens
itivi
tyFPF, 1-specificity
CIO #1Threshold
Normal / Class 0subjects
Abnormal / Class 1subjects
CIO #2
CIO #3
Which CIO is best?
TPF FPF
CIO #1 0.50 0.07
CIO #2 0.78 0.22
CIO #3 0.93 0.50
![Page 39: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/39.jpg)
TP
F,
sens
itivi
tyFPF, 1-specificity
CIO #1Threshold
Normal / Class 0subjects
Abnormal / Class 1subjects
CIO #2
CIO #3
Do not compare rates of one class, e.g. TPF, at different rates of the other class (FPF).
TPF FPF
CIO #1 0.50 0.07
CIO #2 0.78 0.23
CIO #3 0.93 0.50
![Page 40: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/40.jpg)
Threshold
TP
F,
sens
itivi
tyFPF, 1-specificity
Entire ROC curve
Normal / Class 0subjects
Abnormal / Class 1subjects
![Page 41: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/41.jpg)
TP
F,
sens
itivi
tyFPF, 1-specificity
Entire ROC curve
Discriminability-or-
CIO performance
chan
ce lin
e
AUC=0.5
AUC=0.85
AUC=0.98
![Page 42: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/42.jpg)
AUC (Area under ROC Curve)
• AUC is a separation probability• AUC = probability that
– CIO output for abnormal > CIO output for normal– CIO correctly tells which of 2 subjects is normal
• Estimating AUC from finite sample– Select abnormal subject score = xi
– Select normal subject score = yk
– Is xi > yk ?
– Average over all x,y:
01
I1
AUC01
n
kki
n
i
yxnn
![Page 43: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/43.jpg)
![Page 44: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/44.jpg)
![Page 45: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/45.jpg)
![Page 46: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/46.jpg)
![Page 47: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/47.jpg)
![Page 48: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/48.jpg)
![Page 49: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/49.jpg)
![Page 50: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/50.jpg)
![Page 51: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/51.jpg)
![Page 52: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/52.jpg)
![Page 53: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/53.jpg)
![Page 54: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/54.jpg)
![Page 55: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/55.jpg)
![Page 56: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/56.jpg)
![Page 57: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/57.jpg)
![Page 58: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/58.jpg)
![Page 59: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/59.jpg)
![Page 60: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/60.jpg)
![Page 61: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/61.jpg)
![Page 62: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/62.jpg)
![Page 63: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/63.jpg)
![Page 64: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/64.jpg)
![Page 65: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/65.jpg)
![Page 66: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/66.jpg)
![Page 67: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/67.jpg)
![Page 68: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/68.jpg)
![Page 69: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/69.jpg)
![Page 70: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/70.jpg)
![Page 71: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/71.jpg)
![Page 72: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/72.jpg)
![Page 73: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/73.jpg)
![Page 74: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/74.jpg)
![Page 75: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/75.jpg)
![Page 76: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/76.jpg)
![Page 77: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/77.jpg)
![Page 78: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/78.jpg)
![Page 79: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/79.jpg)
![Page 80: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/80.jpg)
![Page 81: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/81.jpg)
![Page 82: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/82.jpg)
![Page 83: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/83.jpg)
![Page 84: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/84.jpg)
![Page 85: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/85.jpg)
![Page 86: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/86.jpg)
![Page 87: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/87.jpg)
![Page 88: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/88.jpg)
![Page 89: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/89.jpg)
![Page 90: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/90.jpg)
![Page 91: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/91.jpg)
![Page 92: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/92.jpg)
ROC as a Q-Q plot
• ROC plots in probability space
• ROC plots in quantile space
![Page 93: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/93.jpg)
Linear Likelihood Ratio Observer for Gaussian Data
• When the input features of the data are distributed as Gaussians with equal variance,– The optimal discriminant, the log-likelihood ratio, is a
linear function,– That linear discriminant is also distributed as a
Gaussian,– The signal to noise ratio (SNR) is easily calculated
from the input data distributions and is a monotonic function of AUC.
• Can serve as a benchmark against which to measure CIO performance
![Page 94: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/94.jpg)
Linear Ideal Observer
• p(x|0) probability distribution of data x for the population of normals and p(x|1) probability distribution of x for the population of abnormals with components xi independent Gaussian distributed with means 0 and i respectively and identical variances i
2
D
ii
Di ixxp
1
222/2 )2/exp()2()0|(
D
iii
Di ixxp
1
222/2 )2/)(exp()2()1|(
![Page 95: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/95.jpg)
Maximum Likelihood CIO
)0|(/)1|()0()0|(/)1()1|()|0(/)|1( xpxkppxppxpxpxpL
D
i i
iD
i i
iixkL1
2
2
12 2
expexp
D
i i
iixt1
21 exp
D
i i
iixtt1
21)ln(
![Page 96: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/96.jpg)
Linear Ideal Observer ROC
),0(,0)0|(1
2
2
GaussGausstpN
i i
i
,)1|( Gausstp
21
21
)2/()2/(
/))0|()1|(( ))0|()1|((
tptptptpSNR
ondistributiGaussian Cumulative );( SNRAUC
21
![Page 97: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/97.jpg)
Likelihood Ratio = Slope of ROC
• The likelihood ratio of the decision variable t is the slope of the ROC curve:
• ROC= TPF(FPF); TPF= 1-P(t|0); FPF= 1-P(t|1)
)L()0|p(
)1|p(
)0|P(
1)| (P
FPF
TPF slope t
t
t
td
td
d
d
![Page 98: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/98.jpg)
III. Error analysis for CI observers
• Sources of error
• Parametric methods
• Nonparametric methods
• Standard deviations and confidence intervals
• Hazards
![Page 99: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/99.jpg)
Sources of error
• Test error—limited number of samples in the test set
• Training error—limited number of samples in the training set– Incorrect parameters– Incorrect feature selection, etc.
• Human observer error (when applicable)– Intraobserver– Interobserver
![Page 100: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/100.jpg)
Parametric methods
• Use known underlying probability distribution – may be exact for simulated data
• Assume Gaussian distribution
• Other parameterization – e.g., Binomial or ROC linearity in z-transformation coordinates– (-1(TPF) versus -1(FPF), where is the cumulative
Gaussian distribution)
![Page 101: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/101.jpg)
Binomial Estimates of Variance
• For single population measures, f= TPF, FPF, FNF, TNF
• Var(f) = f (1-f) / N
• For AUC (back of envelope calculation)
Var(AUC) = N
AUC)-(1 AUC
![Page 102: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/102.jpg)
Data rich case
• Repeat experiment M times
• Estimate distribution parameters—e.g., for a Gaussian distributed performance measure f, G(,2):
• Find error bars or confidence limits
Mff/ˆˆ 22
ˆ
M
iif f
M 1
22 )ˆ(1
1ˆ
M
iiMf ff
1
1ˆˆ
ff ˆˆˆ
fkf ˆ
ˆ )96.1( %95 k
![Page 103: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/103.jpg)
Example: AUC
• Mean AUC
• “Distribution” variance
• Variance of mean
• Error bars, confidence interval
![Page 104: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/104.jpg)
Probability distribution for calculation of AUC from 40 values
![Page 105: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/105.jpg)
Probability distribution for calculation of SNR from 40 values
![Page 106: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/106.jpg)
But what’s a poor boy to do?
• Reuse the data you have: Resubstitution, Resampling
• Two common approaches:– Jackknife– Bootstrap
![Page 107: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/107.jpg)
Resampling
![Page 108: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/108.jpg)
Resampling
![Page 109: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/109.jpg)
Resampling
![Page 110: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/110.jpg)
Resampling
![Page 111: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/111.jpg)
Resampling
![Page 112: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/112.jpg)
Resampling
![Page 113: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/113.jpg)
Resampling
![Page 114: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/114.jpg)
Resampling
![Page 115: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/115.jpg)
Jackknife
• Have N observations• Leave out m of these, then have M subsets
of the N observations to calculate , 2
• N=10, m=5: M=252; N=10, m=1: M=10
m
NM
![Page 116: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/116.jpg)
Round-robin jackknife bias derivation and variance
• Given N datasets
1
1
1
1
)1(ˆ
)1(
1
)1(
)12(2
)1/(
/
NNJ
NN
NN
N
N
AUCNAUCNCUA
AUCAUCNN
k
NN
NkAUCAUCAUC
NkAUCAUC
NkAUCAUC
21
1)(1
2 )(1
ˆ
N
N
iiNJ AUCAUC
N
N
![Page 117: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/117.jpg)
Fukunaga-Hayes bias derivation
• Divide both the normal and abnormal classes in half, yielding 4 possible pairings
2/
2/
2ˆ
/2
/
NNHF
N
N
AUCAUCCUA
NkAUCAUC
NkAUCAUC
![Page 118: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/118.jpg)
Jackknife bias correction exampleTraining error
• AUC estimates as a function of number of cases N. Solid line is the multilayer perceptron result. Open circle jackknife, closed circle Fukunaga-Hayes. The horizontal dotted line is the asymptotic ideal result
![Page 119: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/119.jpg)
IV. Bootstrap methods
• Theoretical foundation
• Practical use
![Page 120: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/120.jpg)
Bootstrap variance
• What you have is what you’ve got—the data is your best estimate of the probability distribution:– Sampling with replacement, M times– Adequate number of samples M>N
Simple bootstrap
2
1)(
2 )(1
1ˆ B
M
iiBB AUCAUC
M
![Page 121: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/121.jpg)
Bootstrap and jackknife error estimates
• Standard deviation of AUC: Solid line simulation results, open circles jackknife estimate, closed circles bootstrap estimate. Note how much larger the jackknife error bars are than those provided by the bootstrap method.
![Page 122: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/122.jpg)
Comparison of s.d. estimates:2 Gaussian dist., 20 normal, 20 abnormal, pop. AUC=.936
• Actual s.d. .0380• Binomial approx. .0380• Bootstrap .0388• Jackknife .0396
• Mean bootstrap AUC est. .936• Mean jackknife bias est. 2x10-17
![Page 123: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/123.jpg)
.632 bootstrap for classifier performance evaluation
• Have N cases, draw M samples of size N with replacement for training (Have on average .632 x N unique cases in each sample of size N)
• Test on the unused (~.368 x N) cases for each sample
Ncasescases
epp
ep
testingtraining
casecase
NNcase
ii
i
...632.1
...632.11
)1(1
11
NN )/11(
N 5 10 20 100 Infinity
.3281-.672
.3491-.651
.3581-.642
.3661-.634
.3681-.632
![Page 124: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/124.jpg)
.632 bootstrap for classifier performance evaluation 2
• Have N cases, draw M samples of size N with replacement for training (Have on average .632 x N unique cases in each sample of size N)
• Test on the unused (~.368 x N) cases for each sample
• Get bootstrap average result AUCB • Get resubstitution result (testing on training set)
AUCR
• AUC.632 = .632 x AUCB + .368 x AUCR • As variance take the AUCB variance
![Page 125: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/125.jpg)
Traps for the unwary: Overparameterization
• Cover’s theorem:
• For N<2(d+1) a hyperplane exists that will perfectly separate almost all possible dichotomies of N points in d space
.1
2),(0
d
k k
NdNC
![Page 126: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/126.jpg)
fd(N) for d=1,5,25,125, and the limit of large d. The abscissa x=N/2(d+1) is scaled so that the values of fd(N)=0.5 lie superposed at
x=1 for all d.
![Page 127: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/127.jpg)
Poor data hygiene
• Reporting on training data results/ testing on training data
• Carrying out any part of the training process on data later used for testing– e.g., using all of the data to select a
manageable feature set from among a large number of features—and then dividing the data into training and test sets.
![Page 128: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/128.jpg)
Overestimate of AUC frompoor data hygiene
0.4 0.5 0.6 0.7 0.8 0.9 10
10
20
30
40
50
60
Area Under the ROC curve
Num
ber
of e
xper
imen
ts (
out
of 9
00)
Distributions of AUC values in 900 simulation experiments (on the left) and the mean ROC curves (on the right) for four validation methods: Method 1 – Feature selection and classifier training on one dataset and classifier testing on another independent dataset; Method 2 – Given perfect feature selection, classifier training on one dataset and classifier testing on another independent dataset; Method 3 – Feature selection using the entire dataset and then the dataset is partitioned into two, one for training and one for testing the classifier; Method 4 – Feature selection, classifier training, and testing using the same dataset.
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Positive Fraction
Tru
e P
ositi
ve F
ract
ion
Method 1, AUC=0.52
Method 2, AUC=0.62
Method 3, AUC=0.82
Method 4, AUC=0.91
![Page 129: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/129.jpg)
Correct feature selection is hard to do
100
101
102
103
0
20
40
60
80
100
120
140
160
180
Feature Index
Num
ber
of e
xper
imen
ts (
out
of 9
00)
0 2 4 6 8 10 120
50
100
150
200
250
Number of useful features (out of 30)
Num
ber
of e
xper
imen
ts (
out
of 9
00)
An insight of feature selection performance in Method 1. On the left plots the number of experiments (out of 900) that a feature is selected. By design of the simulation population, the first 30 features are useful for classification and the remaining are useless. On the right plots the distribution of the number of useful features (out of 30) in the 900 experiments.
![Page 130: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/130.jpg)
Conclusions
• Accuracy and other prevalence dependent measures are inadequate
• ROC/AUC provide good measures of performance
• Uncertainty must be quantified
• Bootstrap and jackknife techniques are useful methods
![Page 131: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/131.jpg)
V. References• [1] K. Fukunaga, Statistical Pattern Recognition, 2nd Edition. Boston: Harcourt Brace Jovanovich, 1990.• [2] K. Fukunaga and R. R. Hayes, “Effects of sample size in classifier design,” IEEE Trans. Pattern Anal. Machine Intell.,
vol. PAMI-11, pp. 873–885, 1989.• [3] D. M. Green and J. A. Swets, Signal Detection Theory and Psychophysics. New York: John Wiley & Sons, 1966.• [4] J. P. Egan, Signal Detection Theory and ROC Analysis. New York: Academic Press, 1975.• [5] C. E. Metz, “Basic principles of roc analysis,” Seminars in Nuclear Medicine, vol. VIII, no. 4.• [6] H. H. Barrett and K. J. Myers, Foundations of Image Science. Hoboken: John Wiley & Sons, 2004, ch. 13 Statistical
Decision Theory.• [7] B. Efron and R. J. Tibshirani, Introduction to the Bootstrap. Boca Raton: Chapman & Hall/CRC, 1993.• [8] B. Efron, The Jackknife, the Bootstrap and Other Resampling Plans. Philadelphia: Society for Industrial and Applied
Mathematics, 1982.• [9] A. C. Davison and D. V. Hinkley, Bootstrap Methods and their Applications. Cambridge: Cambridge University Press,
1997.• [10] B. Efron, “Estimating the error rate of a prediction rule: Some improvements on cross-validation,” Journal of the
American Statistical Association, vol. 78, pp. 316–331, 1983.• [11] B. Efron and R. J. Tibshirani, “Improvements on cross-validation: The .632+ bootstrap method,” Journal of the
American Statistical Association, vol. 92, no. 438, pp. 548–560, 1997.• [12] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 3rd Edition. New York: Springer,
2009.• [13] C. M. Bishop, Pattern Recognition and Machine Learning. New York: Springer, 2006.• [14] ——, Neural Networks for Pattern Recognition. Oxford: Oxford University Press, 1995.• [15] R. F. Wagner, D. G. Brown, J.-P. Guedon, K. J. Myers, and K. A. Wear, “Multivariate Guassian pattern classification:
effects of finite sample size and the addition of correlated or noisy features on summary measures of goodness,” in Information processing in Medical Imaging, Proceedings of IPMI ’93, 1993, pp. 507–524.
• [16] ——, “On combining a few diagnostic tests or features,” in Proceedings of the SPIE, Image Processing, vol. 2167, 1994.
• [17] D. G. Brown, A. C. Schneider, M. P. Anderson, and R. F. Wagner, “Effects of finite sample size and correlated noisy input features on neural network pattern classification,” in Proceedings of the SPIE, Image Processing, vol. 2167, 1994.
• [18] C. A. Beam, “Analysis of clustered data in receiver operating characteristic studies,” Statistical Methods in Medical Research, vol. 7, pp. 324–336, 1998.
• [19] W. A. Yousef, et al. “Assessing Classifiers from Two Independent Data Sets Using ROC Analysis: A Nonparametric Approach,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 11, pp. 1809-1817, 2006
• [19] F. W. Samuelson and D. G. Brown, “Application of cover’s theorem to the evaluation of the performance of CI observers,” in Proceedings of the IJCNN 2011, 2011.
• [20] W. Chen and D. G. Brown, “Optimistic bias in the assessment of high dimensional classifiers with a limited dataset,” in Proceedings of the IJCNN 2011, 2011.
![Page 132: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/132.jpg)
Appendix I
![Page 133: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/133.jpg)
Searching suitcases
![Page 134: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/134.jpg)
![Page 135: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/135.jpg)
![Page 136: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/136.jpg)
![Page 137: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/137.jpg)
![Page 138: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/138.jpg)
![Page 139: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/139.jpg)
![Page 140: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/140.jpg)
![Page 141: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/141.jpg)
![Page 142: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/142.jpg)
![Page 143: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/143.jpg)
![Page 144: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/144.jpg)
Previous class results
![Page 145: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/145.jpg)
![Page 146: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/146.jpg)
![Page 147: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/147.jpg)
![Page 148: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/148.jpg)
![Page 149: David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014](https://reader037.fdocuments.in/reader037/viewer/2022110104/56815957550346895dc69303/html5/thumbnails/149.jpg)