Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel ...
-
Upload
stephanie-robinson -
Category
Documents
-
view
232 -
download
2
description
Transcript of Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel ...
![Page 1: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/1.jpg)
Chapter 5 – Evaluating Predictive Performance
Data Mining for Business AnalyticsShmueli, Patel & Bruce
![Page 2: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/2.jpg)
Why Evaluate?Multiple methods are available to classify
or predictFor each method, multiple choices are
available for settingsTo choose best model, need to assess each
model’s performance
![Page 3: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/3.jpg)
Types of outcomePredicted numerical value
Outcome variable is numerical and continuousPredicted class membership
Outcome variable is categoricalPropensity
Probability of class membership when the outcome variable is categorical
![Page 4: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/4.jpg)
Accuracy Measures (Continuous outcome)
![Page 5: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/5.jpg)
Evaluating Predictive PerformanceNot the same as goodness-of-fit (R2, SY/X)Predicted performance is measured using
the validation datasetBenchmark is average used as prediction
![Page 6: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/6.jpg)
Prediction accuracy measuresei = yi – ŷi,
where yi = Actual y value and ŷi = predicted y value
MAE = Mean Absolute Error = MAE is also known as MAD = Mean Absolute DeviationAverage Error = MAPE = Mean Absolute Percent Error = 100% x RSME = Root-Mean-Squared Error = Total SSE = Total Sum of Squared Error =
![Page 7: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/7.jpg)
Excel MinerExample: Multiple Regression using Boston Housing dataset
![Page 8: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/8.jpg)
Excel MinerResiduals for Training dataset Residuals for Validation dataset
![Page 9: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/9.jpg)
Excel MinerResiduals for Training dataset Residuals for Validation dataset
![Page 10: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/10.jpg)
SAS Enterprise MinerExample: Multiple Regression using Boston Housing dataset
![Page 11: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/11.jpg)
SAS Enterprise MinerResiduals for Validation dataset
![Page 12: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/12.jpg)
SAS Enterprise MinerBoxplot of residuals
![Page 13: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/13.jpg)
Lift ChartOrder predicted y-value from highest to
lowestX-axis = Cases from 1 to nY-axis = Cumulative predicted value of YTwo lines are plotted …
one with as the predicted valueone with ŷ found using a prediction model
![Page 14: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/14.jpg)
Excel Miner: Lift Chart
![Page 15: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/15.jpg)
Decile-wise lift ChartOrder predicted y-value from highest to
lowestX-axis = % of cases from 10% to 100%
i.e. 1st decile to 10th decileY-axis = Cumulative predicted value of YFor each decile the ratio of sum of ŷ and
sum of is potted
![Page 16: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/16.jpg)
XLMINER: Decile-wise lift Chart
![Page 17: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/17.jpg)
Accuracy Measures (Classification)
![Page 18: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/18.jpg)
Misclassification error
Error = classifying a record as belonging to one class when it belongs to another class.
Error rate = percent of misclassified records out of the total records in the validation data
![Page 19: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/19.jpg)
Benchmark: Naïve RuleNaïve rule: classify all records as belonging
to the most prevalent classUsed only as benchmark to evaluate more
complex classifiers
![Page 20: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/20.jpg)
Separation of Records “High separation of records” means that using predictor variables attains low error
“Low separation of records” means that using predictor variables does not improve much on naïve rule
![Page 21: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/21.jpg)
High Separation of Records
![Page 22: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/22.jpg)
Low Separation of Records
![Page 23: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/23.jpg)
Classification Confusion Matrix
201 1’s correctly classified as “1” (True positives)
85 1’s incorrectly classified as “0” (False negatives)
25 0’s incorrectly classified as “1” (False positives)
2689 0’s correctly classified as “0” (True negatives)
Actual Class 1 01 201 850 25 2689
Predicted ClassClassification Confusion Matrix
![Page 24: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/24.jpg)
Error Rate
Overall error rate = (25+85)/3000 = 3.67%
Accuracy = 1 – err = (201+2689) = 96.33%
If multiple classes, error rate is: (sum of misclassified records)/(total
records)
Actual Class 1 01 201 850 25 2689
Predicted ClassClassification Confusion Matrix
![Page 25: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/25.jpg)
Classification Matrix:Meaning of each cell
Actual Class
Predicted ClassC1 C2
C1 n1,1= No. of C1 cases classified correctly as C1
n1,2= No. of C1 cases classified incorrectly as C2
C2 n2,1= No. of C2 cases classified incorrectly as C1
n2,2= No. of C2 cases classified correctly as C2
Misclassification rate = err =
Accuracy = 1 - err =
![Page 26: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/26.jpg)
PropensityPropensities are estimated probabilities
that a case belongs to each of the classesIt is used in two ways …
To generate predicted class membershipRank ordering cases by probability of
belonging to a particular class of interest
![Page 27: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/27.jpg)
Most Data Mining algorithms classify via a 2-step process:
For each case,1. Compute probability of belonging to the
class of interest2. Compare to cutoff value, and classify
accordingly
Default cutoff value is 0.50 If >= 0.50, classify as “1”If < 0.50, classify as “0”
Can use different cutoff values Typically, error rate is lowest for cutoff =
0.50
Propensity and Cutoff for Classification
![Page 28: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/28.jpg)
Cutoff TableActual Class Prob. of "1" Actual Class Prob. of "1"
1 0.996 1 0.5061 0.988 0 0.4711 0.984 0 0.3371 0.980 1 0.2181 0.948 0 0.1991 0.889 0 0.1491 0.848 0 0.0480 0.762 0 0.0381 0.707 0 0.0251 0.681 0 0.0221 0.656 0 0.0160 0.622 0 0.004
If cutoff is 0.50: thirteen records are classified as “1”
If cutoff is 0.80: seven records are classified as “1”
![Page 29: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/29.jpg)
Confusion Matrix for Different Cutoffs
![Page 30: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/30.jpg)
Confusion Matrix for Different Cutoffs0.25
Actual Class owner non-owner
owner 11 1
non-owner 4 8
0.75
Actual Class owner non-owner
owner 7 5
non-owner 1 11
Cut off Prob.Val. for Success (Updatable)
Classification Confusion Matrix
Predicted Class
Cut off Prob.Val. for Success (Updatable)
Classification Confusion Matrix
Predicted Class
![Page 31: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/31.jpg)
When One Class is More ImportantIn many cases it is more important to identify members of one class
Tax fraudCredit defaultResponse to promotional offerPredicting delayed flights
In such cases, overall accuracy is not a good measure of evaluating the classifier.
SensitivitySpecificity
![Page 32: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/32.jpg)
SensitivitySuppose that “C1” is the important class. Sensitivity is the ability (probability) to
detect membership in “C1” class correctly and is given by
=
Hit rate = True Positive rate
![Page 33: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/33.jpg)
SpecificitySuppose that “C1” is the important class. Specificity is the ability (probability) to
rule out members who belong to “C2” class correctly and is given by
=
1 – Specificity = False Positive rate
![Page 34: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/34.jpg)
ROC CurveReceiver Operating Characteristics Curve
![Page 35: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/35.jpg)
Asymmetric Misclassification Costs
![Page 36: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/36.jpg)
Misclassification Costs May Differ
The cost of making a misclassification error may be higher for one class than the other(s)
Looked at another way, the benefit of making a correct classification may be higher for one class than the other(s)
![Page 37: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/37.jpg)
Example – Response to Promotional Offer
“Naïve rule” (classify everyone as “0”) has error rate of 1% (seems good)
Suppose we send an offer to 1000 people, with 1% average response rate (“1” = response, “0” = nonresponse)
Predict Class 0 Predict Class 1Actual 0 990 0Actual 1 10 0
![Page 38: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/38.jpg)
The Confusion Matrix
Error rate = (2+20) = 2.2% (higher than naïve rate)
Predict Class 0 Predict Class 1Actual 0 970 20Actual 1 2 8
Suppose that using DM we can correctly classify eight 1’s as 1’s
It comes at the cost of misclassifying twenty 0’s as 1’s and two 1’s as 0’s.
![Page 39: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/39.jpg)
Introducing Costs & BenefitsSuppose:Profit from a “1” is $10Cost of sending offer is $1Then:Under naïve rule, all are classified as “0”,
so no offers are sent: no cost, no profitUnder DM predictions, 28 offers are sent.
8 respond with profit of $10 each20 fail to respond, cost $1 each972 receive nothing (no cost, no profit)
Net profit = $80 - $20 = $60
![Page 40: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/40.jpg)
Profit Matrix
Predict Class 0 Predict Class 1Actual 0 0 -$20Actual 1 0 $80
![Page 41: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/41.jpg)
Minimize Opportunity costsAs we see, best to convert everything to
costs, as opposed to a mix of costs and benefits
E.g., instead of “benefit from sale” refer to “opportunity cost of lost sale”
Leads to same decisions, but referring only to costs allows greater applicability
![Page 42: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/42.jpg)
Cost Matrix with opportunity costs
Recall original confusion matrix (profit from a “1” = $10, cost of sending offer = $1):
Costs Predict Class 0 Predict Class 1Actual 0 970 x $0 = $0 20 x $1 = $20Actual 1 2 x $10 = $20 8 x $1 = $8
Predict Class 0 Predict Class 1Actual 0 970 20Actual 1 2 8
Total opportunity cost = 0 + 20 + 20 + 8 = 48
![Page 43: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/43.jpg)
Average misclassification cost
q1 = cost of misclassifying an actual C1 as belonging to C2
q2 = cost of misclassifying an actual C2 as belonging to C1
Average misclassification cost =
Look for a classifier that minimizes this average cost.
![Page 44: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/44.jpg)
Generalize to Cost RatioSometimes actual costs and benefits are hard to estimate
Need to express everything in terms of costs (i.e., cost of misclassification per record)
A good practical substitute for individual costs is the ratio of misclassification costs (e.g., “misclassifying fraudulent firms is 5 times worse than misclassifying solvent firms”)
![Page 45: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/45.jpg)
Multiple Classes
Theoretically, there are m(m-1) misclassification costs, since any case from one of the m classes could be misclassified into any one of the m-1 other classes
Practically too many to work withIn decision-making context, though, such
complexity rarely arises – one class is usually of primary interest
For m classes, confusion matrix has m rows and m columns
![Page 46: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/46.jpg)
Judging Ranking Performance
![Page 47: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/47.jpg)
Lift Chart for Binary DataInput: Scored validation dataset
Actual class and propensity (probability) to belong to the class of interest C1)
Sort records in descending order of propensity to belong to the class of interest
Compute cumulative number of C1 members for each row
Lift chart is the plot with row number (no. of records) as the x-axis and cumulative number of C1 members as the y-axis
![Page 48: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/48.jpg)
Lift Chart for Binary Data - ExampleCase No.
Propensity(Predicted
probability of belonging to class
"1")Actual
class
Cumulative actual classes
1 0.9959767 1 12 0.9875331 1 23 0.9844564 1 34 0.9804396 1 45 0.9481164 1 56 0.8892972 1 67 0.8476319 1 78 0.7628063 0 79 0.7069919 1 810 0.6807541 1 911 0.6563437 1 1012 0.6224195 0 1013 0.5055069 1 1114 0.4713405 0 1115 0.3371174 0 1116 0.2179678 1 1217 0.1992404 0 1218 0.1494827 0 1219 0.0479626 0 1220 0.0383414 0 1221 0.0248510 0 1222 0.0218060 0 1223 0.0161299 0 1224 0.0035600 0 12
![Page 49: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/49.jpg)
Lift Chart with Costs and BenefitsSort records in descending order of
probability of success (success = belonging to class of interest)
For each record compute cost/benefit with actual outcome
Compute a column of cumulative cost/benefit
Plot cumulative cost/benefit on y-axis and row number (no. of records) as the x-axis
![Page 50: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/50.jpg)
Lift Chart with cost/benefit - Example
Case No.
Propensity(Predicted
probability of belonging to class
"1")
Actual
class
Cost/Benefi
t
Cumulative Cost/benefi
t1 0.9959767 1 10 102 0.9875331 1 10 203 0.9844564 1 10 304 0.9804396 1 10 405 0.9481164 1 10 506 0.8892972 1 10 607 0.8476319 1 10 708 0.7628063 0 -1 699 0.7069919 1 10 7910 0.6807541 1 10 8911 0.6563437 1 10 9912 0.6224195 0 -1 9813 0.5055069 1 10 10814 0.4713405 0 -1 10715 0.3371174 0 -1 10616 0.2179678 1 10 11617 0.1992404 0 -1 11518 0.1494827 0 -1 11419 0.0479626 0 -1 11320 0.0383414 0 -1 11221 0.0248510 0 -1 11122 0.0218060 0 -1 11023 0.0161299 0 -1 10924 0.0035600 0 -1 108
![Page 51: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/51.jpg)
Lift Curve May Go Negative
If total net benefit from all cases is negative, reference line will have negative slope
Nonetheless, goal is still to use cutoff to select the point where net benefit is at a maximum
![Page 52: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/52.jpg)
Negative slope to reference curve
![Page 53: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/53.jpg)
Oversampling and Asymmetric Costs
![Page 54: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/54.jpg)
Rare Cases
Responder to mailingSomeone who commits fraudDebt defaulter
Often we oversample rare cases to give model more information to work with
Typically use 50% “1” and 50% “0” for training
Asymmetric costs/benefits typically go hand in hand with presence of rare but important class
![Page 55: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/55.jpg)
ExampleFollowing graphs show optimal classification
under three scenarios:assuming equal costs of misclassificationassuming that misclassifying “o” is five times
the cost of misclassifying “x”Oversampling scheme allowing DM methods
to incorporate asymmetric costs
![Page 56: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/56.jpg)
Classification: equal costs
![Page 57: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/57.jpg)
Classification: Unequal costsSuppose that failing to catch “o” is 5 times as costly as failing to catch “x”.
![Page 58: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/58.jpg)
Oversampling for asymmetric costs
Oversample “o” to appropriately weight misclassification costs – without or with replacement
![Page 59: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/59.jpg)
Equal number of respondersSample equal number of responders and
non-responders
![Page 60: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/60.jpg)
An Oversampling Procedure1. Separate the responders (rare) from non-
responders2. Randomly assign half the responders to
the training sample, plus equal number of non-responders
3. Remaining responders go to validation sample
4. Add non-responders to validation data, to maintain original ratio of responders to non-responders
5. Randomly take test set (if needed) from validation
![Page 61: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/61.jpg)
Assessing model performance1. Score the model with a validation
dataset selected without oversampling2. Score the model with a oversampled
validation dataset and reweight the results to remove the effects of oversampling
Method 1 is straightforward and easier to implement.
![Page 62: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/62.jpg)
Adjusting confusion matrix for oversampling
Whole data SampleResponders 2% 50%Nonresponders 98% 50%
Example:
One responder in whole data = 50/2 = 25 in the sample
One nonresponder in whole data = 50/98 = 0.5102 in the sample
![Page 63: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/63.jpg)
Adjusting confusion matrix for oversampling
Suppose that the confusion matrix with oversampled validationdataset is as follows:
Classification matrix, Oversampled data (Validation)
Predicted 0 Predcited 1 TotalActual 0 390 110 500Actual 1 80 420 500Total 470 530 1000
Misclassification rate = (80 + 110)/1000 = 0.19 or 19% Percentage of records predicted as “1” = 530/1000 =
0.53 or 53%
![Page 64: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/64.jpg)
Adjusting confusion matrix for oversampling
Classification matrix, ReweightedPredicted 0 Predcited 1 Total
Actual 0 390/0.5102 = 764.4 110/0.5102 = 215.6 980Actual 1 80/25 = 3.2 420/25 = 16.8 500Total 767.6 232.4 1000
Weight for responder (Actual 1) = 25 Weight for nonresponder (Actual 0) = 0.5102
Misclassification rate = (3.2 + 215.6)/1000 = 0.219 or 21.9%
Percentage of records predicted as “1” = 232.4/1000 = 0.2324 or 23.24%
![Page 65: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/65.jpg)
Adjusting Lift curve for OversamplingSort records in descending order of
probability of success (success = belonging to class of interest)
For each record compute cost/benefit with actual outcome
Divide the value by the oversampling rate of the actual class
Compute a cumulative column of weighted cost/benefit
Plot the cumulative weighted cost/benefit on y-axis and row number (no. of records) as the x-axis
![Page 66: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/66.jpg)
Adjusting Lift curve for Oversampling -- Example
Cost/BenefitOversamplin
g weightActual 0 -3 0.6Actual 1 50 20
Suppose that cost/benefit and oversampling weights are as follows:
Case No.
Propensity(Predicted
probability of belonging to class
"1")Actual
classweighted
Cost/Benefit
Cumulative weighted
Cost/benefit1 0.9959767 1 2.50 2.502 0.9875331 1 2.50 5.003 0.9844564 1 2.50 7.504 0.9804396 1 2.50 10.005 0.9481164 1 2.50 12.506 0.8892972 1 2.50 15.007 0.8476319 1 2.50 17.508 0.7628063 0 -5.00 12.509 0.7069919 1 2.50 15.0010 0.6807541 1 2.50 17.5011 0.6563437 1 2.50 20.0012 0.6224195 0 -5.00 15.0013 0.5055069 1 2.50 17.5014 0.4713405 0 -5.00 12.5015 0.3371174 0 -5.00 7.5016 0.2179678 1 2.50 10.0017 0.1992404 0 -5.00 5.0018 0.1494827 0 -5.00 0.0019 0.0479626 0 -5.00 -5.0020 0.0383414 0 -5.00 -10.0021 0.0248510 0 -5.00 -15.0022 0.0218060 0 -5.00 -20.0023 0.0161299 0 -5.00 -25.0024 0.0035600 0 -5.00 -30.00
![Page 67: Chapter 5 Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel Bruce.](https://reader035.fdocuments.in/reader035/viewer/2022081503/5a4d1b7a7f8b9ab0599b8caa/html5/thumbnails/67.jpg)
Adjusting Lift curve for Oversampling -- Example