ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of...
Transcript of ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of...
![Page 1: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/1.jpg)
ROC
Statistics for the LazyMachine Learner in All of Us
Bradley MalinLecture for COS Lab
School of Computer ScienceCarnegie Mellon University
9/22/2005
![Page 2: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/2.jpg)
Why Should I Care?
• Imagine you have 2 different probabilistic classification models– e.g. logistic regression vs. neural network
• How do you know which one is better?
• How do you communicate your belief?
• Can you provide quantitative evidence beyond a gut feeling and subjective interpretation?
![Page 3: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/3.jpg)
Recall Basics: Contingencies
MODEL PREDICTED
It’s NOT aHeart Attack
Heart Attack!!!
GOLD STANDARDTRUTH
Was NOT a Heart Attack A B
Was a Heart Attack C D
![Page 4: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/4.jpg)
Some Terms
MODEL PREDICTED
NO EVENT EVENT
GOLD STANDARDTRUTH
NO EVENTTRUE
NEGATIVEB
EVENT CTRUE
POSITIVE
![Page 5: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/5.jpg)
Some More Terms
MODEL PREDICTED
NO EVENT EVENT
GOLD STANDARDTRUTH
NO EVENT AFALSE
POSITIVE(Type 1 Error)
EVENTFALSE
NEGATIVE(Type 2 Error)
D
![Page 6: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/6.jpg)
Accuracy
• What does this mean?
• What is the difference between “accuracy” and an “accurate prediction”?
• Contingency Table Interpretation
(True Positives) + (True Negatives)
(True Positives) + (True Negatives)
+ (False Positives) + (False Negatives)
• Is this a good measure? (Why or Why Not?)
![Page 7: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/7.jpg)
Note on Discrete Classes
• TRADITION … Show contingency table when reporting predictions of model.
• BUT … probabilistic models do not provide discrete calculations of the matrix cells!!!
• IN OTHER WORDS … Regression does not report the number of individuals predicted positive (e.g. has a heart attack) … well, not really
• INSTEAD … report probability the output will be certain variable (e.g. 1 or 0)
![Page 8: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/8.jpg)
Visual Perspective
0
5
10
15
20
25
30
-5 -3 -1 1 3
Score
Fre
qu
en
cy
No Disease
Disease
?? ????
![Page 9: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/9.jpg)
ROC Curves
• Originated from signal detection theory– Binary signal corrupted by Guassian noise– What is the optimal threshold (i.e. operating
point)?
• Dependence on 3 factors– Signal Strength– Noise Variance– Personal tolerance in Hit / False Alarm Rate
![Page 10: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/10.jpg)
ROC Curves
• Receiver operator characteristic
• Summarize & present performance of any binary classification model
• Models ability to distinguish between false & true positives
![Page 11: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/11.jpg)
Use Multiple Contingency Tables
• Sample contingency tables from range of threshold/probability.
• TRUE POSITIVE RATE (also called SENSITIVITY)
True Positives
(True Positives) + (False Negatives)
• FALSE POSITIVE RATE (also called 1 - SPECIFICITY)
False Positives
(False Positives) + (True Negatives)
• Plot Sensitivity vs. (1 – Specificity) for sampling and you are done
![Page 12: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/12.jpg)
Data-Centric ExampleTRUTH LOGISTIC NEURAL
1 0.7198 0.9038
0 0.2460 0.8455
0 0.1219 0.4655
0 0.1560 0.3204
0 0.7527 0.2491
1 0.3064 0.7129
0 0.7194 0.4983
0 0.5531 0.6513
1 0.2173 0.3806
0 0.0839 0.1619
1 0.8429 0.7028
![Page 13: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/13.jpg)
ROC RatesLOGISTIC REGRESSION NEURAL NETWORK
THRESHOLD TP-Rate FP-Rate TP-Rate FP-Rate
1 1 1 1 1
0.9 1 0.8571 1 1
0.8 1 0.5714 1 0.8571
0.7 0.75 0.4286 1 0.7143
0.6 0.5 0.4286 0.75 0.5714
0.5 0.5 0.4286 0.75 0.2857
0.4 0.5 0.2857 0.75 0.2857
0.3 0.5 0.2857 0.75 0.1429
0.2 0.25 0 0.25 0.1429
0.1 0 0 0.25 0
0 0 0 0 0
![Page 14: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/14.jpg)
ROC Plot
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
1-specificity
sen
siti
vity
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
LOGISTIC NEURAL
![Page 15: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/15.jpg)
Sidebar: Use More Samples
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
1-Specificitiy
Se
ns
itiv
ity
RM RM_AGE41 Series1 Linear (RM)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
1-specificity
se
ns
itiv
ity
Deviance Model RM+Age41
(These are plots from a much larger dataset – See Malin 2005)
![Page 16: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/16.jpg)
ROC Quantification
• Area Under ROC Curve– Use quadrature to
calculate the area– e.g. trapz (trapezoidal
rule) function in Matlab will work
• Example – Appears “Neural Network” model is better.
AREA UNDER ROC CURVE
LOGISTIC 0.7321
NEURAL 0.7679
![Page 17: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/17.jpg)
Theory: Model Optimality
• Classifiers on convex hull are always “optimal”– e.g. Net & Tree
• Classifiers below convex hull are always “suboptimal”– e.g. Naïve Bayes
NaïveBayes
DecisionTree
NeuralNet
![Page 18: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/18.jpg)
Building Better Classifiers
• Classifiers on convex hull can be combined to form a strictly dominant hybrid classifier
– ordered sequence of classifiers
– can be converted into “ranker”
DecisionTree
NeuralNet
![Page 19: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/19.jpg)
Some Statistical Insight• Curve Area:
– Take random healthy patient score of X– Take random heart attack patient score of Y– Area estimate of P [Y > X]
• Slope of curve is equal to likelihood:
P (score | Signal)
P (score | Noise)
• ROC graph captures all information in conting. table– False negative & true negative rates are complements of true
positive & false positive rates, resp.
![Page 20: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/20.jpg)
Can Always QuantifyBest Operating Point
• When misclassification costs are equal, best operating point is …
• 45 tangent to curve closest to (0,1) coord.
• Verify this mathematically (economic interpretation)
• Why?
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
1-SpecificitiyS
en
sit
ivit
y
RM RM_AGE41 Series1 Linear (RM)
![Page 21: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/21.jpg)
Quick Question
• Are ROC curves always appropriate?
• Subjective operating points?
• Must weight the tradeoffs between false positives and false negatives– ROC curve plot is independent of the class distribution or error
costs
• This leads into utility theory (not touching this today)
![Page 22: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/22.jpg)
Much Much More on ROC
• Oh, if only I had more time.
• You should also look up and learn about:– Iso-accuracy lines
– Skew distributions and why the 45 line isn’t always “best”
– Convexity vs. non-convexity vs. concavity
– Mann-Whitney-Wilcoxon sum of ranks
– Gini coefficient
– Calibrated thresholds
– Averaging ROC curves
• Precision-Recall (THIS IS VERY IMPORTANT)
• Cost Curves
![Page 23: ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d0e5503460f949e3638/html5/thumbnails/23.jpg)
Some ReferencesGood Bibliography: http://splweb.bwh.harvard.edu:8000/pages/ppl/zou/roc.html
Drummond C and Holte R. What ROC curves can and can’t do (and cost curves can). In Proceedings of the Workshop on ROC Analysis in AI; in conjunction with the European Conference on AI. Valencia, Spain. 2004.
Malin B. Probabilistic prediction of myocardial infarction: logistic regression versus simple neural networks. Data Privacy Lab Working Paper WP-25, School of Computer Science, Carnegie Mellon University. Sept 2005.
McNeil BJ, Hanley JA. Statistical approaches to the analysis of receiver operating characteristic (ROC) curves. Medical Decision Making. 1984; 4: 137-50.
Provost F and Fawcett T. The case against accuracy estimation for comparing induction algorithms. In Proceedings of the 15th International Conference on Machine Learning. Madison, Wisconsin. 1998: 445-453.
Swets J. Measuring the accuracy of diagnostic systems. Science. 1988; 240(4857): 1285-1293. (based on his 1967 book Information Retrieval Systems)