Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the...
-
Upload
felix-hudson -
Category
Documents
-
view
214 -
download
0
Transcript of Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the...
Assessment of Model Development Assessment of Model Development Techniques and Evaluation Methods for Techniques and Evaluation Methods for
Binary Classification in the Credit IndustryBinary Classification in the Credit Industry
DSI ConferenceDSI Conference
Jennifer Lewis PriestleyJennifer Lewis Priestley
Satish NargundkarSatish Nargundkar
November 24, 2003November 24, 2003
Paper Research QuestionsPaper Research Questions
This paper addresses the answers to two the following This paper addresses the answers to two the following research questions:research questions:
1.1.Does model development technique improve Does model development technique improve classification accuracy?classification accuracy?
2.2.How will model selection vary based upon the evaluation How will model selection vary based upon the evaluation method used?method used?
Discussion OutlineDiscussion Outline
Discussion of Modeling TechniquesDiscussion of Modeling Techniques
Discussion of Model Evaluation MethodsDiscussion of Model Evaluation Methods Global Classification RateGlobal Classification Rate Loss FunctionLoss Function K-S TestK-S Test ROC CurvesROC Curves
Empirical ExampleEmpirical Example
Model Development TechniquesModel Development TechniquesModeling plays an increasingly important role in CRM strategies:
Target MarketingTarget Marketing Response ModelsResponse Models Risk ModelsRisk Models
Customer Behavioral Customer Behavioral ModelsModels Usage ModelsUsage Models Attrition ModelsAttrition Models Activation ModelsActivation Models
CollectionsCollections Recovery ModelsRecovery Models
Product Product PlanninPlannin
gg
Product Product PlanninPlannin
gg
CustomeCustomerr
AcquisitiAcquisitionon
CustomeCustomerr
AcquisitiAcquisitionon
CustomerCustomerManagemeManageme
ntnt
CustomerCustomerManagemeManageme
ntnt
CreatingCreatingValueValue
CreatingCreatingValueValueCollectioCollectio
ns/ns/RecoveryRecovery
CollectioCollections/ns/
RecoveryRecovery
Other ModelsOther Models Segmentation Models Segmentation Models Bankruptcy ModelsBankruptcy Models Fraud ModelsFraud Models
Model Development TechniquesModel Development TechniquesGiven that even minimal improvements in model Given that even minimal improvements in model classification accuracy can translate into significant classification accuracy can translate into significant savings or incremental revenue, an entire literature exists savings or incremental revenue, an entire literature exists on the comparison of model development techniques on the comparison of model development techniques (e.g., Atiya, 2001; Reichert et al., 1983; West, 2000; (e.g., Atiya, 2001; Reichert et al., 1983; West, 2000; Vellido et al., 1993; Zhang et al., 1999).Vellido et al., 1993; Zhang et al., 1999).
Statistical TechniquesStatistical Techniques Linear Discriminant AnalysisLinear Discriminant Analysis Logistic AnalysisLogistic Analysis Multiple Regression AnalysisMultiple Regression Analysis
Non-Statistical TechniquesNon-Statistical Techniques Neural NetworksNeural Networks Cluster AnalysisCluster Analysis Decision TreesDecision Trees
Model Evaluation MethodsModel Evaluation Methods
But, developing the model is really only half the problem. But, developing the model is really only half the problem. How do you then determine which model is “best”?How do you then determine which model is “best”?
Model Evaluation MethodsModel Evaluation Methods
In the context of binary classification (one of the most In the context of binary classification (one of the most common objectives in CRM modeling), one of four outcomes common objectives in CRM modeling), one of four outcomes is possible:is possible:
1.1. True positive True positive
2.2.False positiveFalse positive
3.3. True negativeTrue negative
4.4. False negativeFalse negative
FNFN
FPFPTPTP
TNTN
True GoodTrue Good True BadTrue Bad
Pred. BadPred. Bad
Pred. GoodPred. Good
Model Evaluation MethodsModel Evaluation Methods
If all of these outcomes, specifically the errors, have the If all of these outcomes, specifically the errors, have the same associated costs, then a simple same associated costs, then a simple global global classification rateclassification rate is a highly appropriate evaluation is a highly appropriate evaluation method:method:
True GoodTrue Good True BadTrue Bad TotalTotalPredicted GoodPredicted Good
Predicted BadPredicted Bad
TotalTotal
Classification Rate = 75% ((100+650)/1000) Classification Rate = 75% ((100+650)/1000)
200200
5050650650
100100
850850 150150
700700
300300
10001000
The global classification method is the most commonly The global classification method is the most commonly used (Bernardi and Zhang, 1999), but fails when the costs used (Bernardi and Zhang, 1999), but fails when the costs of the misclassification errors are different (Type 1 vs Type of the misclassification errors are different (Type 1 vs Type 2 errors) - 2 errors) -
Model 1 results:Model 1 results:
Global Classification Rate = 75%Global Classification Rate = 75%False Positive Rate = 5%False Positive Rate = 5%False Negative Rate = 20%False Negative Rate = 20%
Model 2 results:Model 2 results:
Global Classification Rate = 80%Global Classification Rate = 80%False Positive Rate = 15%False Positive Rate = 15%False Negative Rate = 5%False Negative Rate = 5%
What if the cost of a false positive was great, and the cost of What if the cost of a false positive was great, and the cost of a false negative was negligible? What if it was the other a false negative was negligible? What if it was the other way around?way around?
Model Evaluation MethodsModel Evaluation Methods
Model Evaluation MethodsModel Evaluation Methods
If the misclassification error costs are understood with some If the misclassification error costs are understood with some certainty, a certainty, a loss functionloss function could be used to evaluate the best could be used to evaluate the best model:model:
Loss=πLoss=π00ff00cc00+ π+ π
11ff11cc1 1
Where, Where, ππii is the probability that an element comes from is the probability that an element comes from
class class ii, (prior probability), f, (prior probability), fii is the probability that an element is the probability that an element
will be misclassified in will be misclassified in i i class, and cclass, and cii is the cost associated is the cost associated
with that misclassification error. with that misclassification error.
Model Evaluation MethodsModel Evaluation Methods
0%
20%
40%
60%
80%
100%
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
Score Cut Off
Cum
ula
tive P
erc
enta
ge
of O
bse
rva
tions
An evaluation model that uses the same conceptual An evaluation model that uses the same conceptual foundation as the global classification rate is the foundation as the global classification rate is the Kolmorgorov-Smirnov TestKolmorgorov-Smirnov Test: :
Greatest separation occurs at a cut off score of .65
Model Evaluation MethodsModel Evaluation MethodsWhat if you don’t have ANY information regarding What if you don’t have ANY information regarding misclassification error costs…or…the costs are in the eye of misclassification error costs…or…the costs are in the eye of the beholder? the beholder?
Model Evaluation MethodsModel Evaluation MethodsThe area under the The area under the ROC (Receiver Operating ROC (Receiver Operating Characteristics) CurveCharacteristics) Curve accounts for all possible outcomes accounts for all possible outcomes (Swets et al., 2000; Thomas et al., 2002; Hanley and McNeil, 1982, 1983):(Swets et al., 2000; Thomas et al., 2002; Hanley and McNeil, 1982, 1983):
Sen
sitiv
ity (
Tru
e P
ositi
ves)
Sen
sitiv
ity (
Tru
e P
ositi
ves)
1-Specificity (False Positives)1-Specificity (False Positives)00 11
11
θ=.5θ=.5
θ=1θ=1
.5<θ<1.5<θ<1FNFN
FPFPTPTP
TNTN
True GoodTrue Good True BadTrue Bad
Pred. BadPred. Bad
Pred. GoodPred. Good
Empirical ExampleEmpirical Example
So, given this background, the guiding questions of So, given this background, the guiding questions of our research were – our research were –
1.1. Does model development technique impact Does model development technique impact prediction accuracy?prediction accuracy?
2.2. How will model selection vary with the evaluation How will model selection vary with the evaluation method used?method used?
Empirical ExampleEmpirical Example
We elected to evaluate these questions using a large data We elected to evaluate these questions using a large data set from a pool of car loan applicants. The data set set from a pool of car loan applicants. The data set included:included:
• 14,042 US applicants for car loans between June 1, 1998 and June 14,042 US applicants for car loans between June 1, 1998 and June 30, 1999.30, 1999.
• Of these applicants, 9442 were considered to have been “good” and Of these applicants, 9442 were considered to have been “good” and 4600 were considered to be “bad” as of December 31, 1999. 4600 were considered to be “bad” as of December 31, 1999.
• 65 variables, split into two groups – 65 variables, split into two groups – • Transaction variables (miles on the vehicle, selling price, age of Transaction variables (miles on the vehicle, selling price, age of
vehicle, etc.)vehicle, etc.)• Applicant variables (bankruptcies, balances on other loans, Applicant variables (bankruptcies, balances on other loans, number of revolving trades, etc.)number of revolving trades, etc.)
Empirical Example – Empirical Example –
The LDA and Logistic models were developed using SAS The LDA and Logistic models were developed using SAS 8.2, while the Neural Network models were developed using 8.2, while the Neural Network models were developed using Backpack® 4.0. Backpack® 4.0.
Because there is no accepted guidelines for the number of Because there is no accepted guidelines for the number of hidden nodes in Neural Network development (Zhang et al., hidden nodes in Neural Network development (Zhang et al., 1999; Chen and Huang, 2003), we tested a range of hidden 1999; Chen and Huang, 2003), we tested a range of hidden nodes from 5 to 50. nodes from 5 to 50.
Empirical Example – Empirical Example – Feed Forward Back Propogation Neural Networks:Feed Forward Back Propogation Neural Networks:
Input LayerInput Layer Hidden LayerHidden Layer
Output LayerOutput Layer
ΣΣ SSCombination Combination FunctionFunctioncombines all combines all inputs into a inputs into a single value, single value, usually as a usually as a weighted weighted summationsummation
Transfer Transfer FunctionFunctionCalculates the Calculates the output value from output value from the combination the combination functionfunction
inputinput
inputinput
inputinput
inputinput
outputoutput
Empirical Example - ResultsEmpirical Example - Results
TechniqueTechnique Class Rate Class Rate “Goods”“Goods”
Class Rate Class Rate “Bads”“Bads”
Class Rate Class Rate “Global”“Global”
ThetaTheta K-S K-S TestTest
LDALDA 73.91%73.91% 43.40%43.40% 59.74%59.74% 68.98%68.98% 19%19%
LogisticLogistic 70.54%70.54% 59.64%59.64% 69.45%69.45% 68.00%68.00% 24%24%
NN-5 Hidden NodesNN-5 Hidden Nodes 63.50%63.50% 56.50%56.50% 58.88%58.88% 63.59%63.59% 38%38%
NN-10 Hidden NodesNN-10 Hidden Nodes 75.40%75.40% 44.50%44.50% 55.07%55.07% 64.46%64.46% 11%11%
NN-15 Hidden NodesNN-15 Hidden Nodes 60.10%60.10% 62.10%62.10% 61.40%61.40% 65.89%65.89% 24%24%
NN-20 Hidden NodesNN-20 Hidden Nodes 62.70%62.70% 59.00%59.00% 60.29%60.29% 65.27%65.27% 24%24%
NN-25 Hidden NodesNN-25 Hidden Nodes 76.60%76.60% 41.90%41.90% 53.78%53.78% 63.55%63.55% 16%16%
NN-30 Hidden NodesNN-30 Hidden Nodes 52.70%52.70% 68.50%68.50% 63.13%63.13% 65.74%65.74% 22%22%
NN-35 Hidden NodesNN-35 Hidden Nodes 60.30%60.30% 59.00%59.00% 59.46%59.46% 63.30%63.30% 22%22%
NN-40 Hidden NodesNN-40 Hidden Nodes 62.40%62.40% 58.30%58.30% 59.71%59.71% 64.47%64.47% 17%17%
NN-45 Hidden NodesNN-45 Hidden Nodes 54.10%54.10% 65.20%65.20% 61.40%61.40% 64.50%64.50% 31%31%
NN-50 Hidden NodesNN-50 Hidden Nodes 53.20%53.20% 68.50%68.50% 63.27%63.27% 65.15%65.15% 37%37%
ConclusionsConclusions
What were we able to demonstrate?What were we able to demonstrate?
1.1. The “best” model depends upon the evaluation method The “best” model depends upon the evaluation method selected;selected;
2.2. The appropriate evaluation method depends upon The appropriate evaluation method depends upon situational and data context;situational and data context;
3.3. No multivariate technique is “best” under all No multivariate technique is “best” under all circumstances.circumstances.