Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the...

Assessment of Model Development Assessment of Model Development Techniques and Evaluation Methods for Techniques and Evaluation Methods for

Binary Classification in the Credit IndustryBinary Classification in the Credit Industry

DSI ConferenceDSI Conference

Jennifer Lewis PriestleyJennifer Lewis Priestley

Satish NargundkarSatish Nargundkar

November 24, 2003November 24, 2003

Paper Research QuestionsPaper Research Questions

This paper addresses the answers to two the following This paper addresses the answers to two the following research questions:research questions:

1.1.Does model development technique improve Does model development technique improve classification accuracy?classification accuracy?

2.2.How will model selection vary based upon the evaluation How will model selection vary based upon the evaluation method used?method used?

Discussion OutlineDiscussion Outline

Discussion of Modeling TechniquesDiscussion of Modeling Techniques

Discussion of Model Evaluation MethodsDiscussion of Model Evaluation Methods Global Classification RateGlobal Classification Rate Loss FunctionLoss Function K-S TestK-S Test ROC CurvesROC Curves

Empirical ExampleEmpirical Example

Model Development TechniquesModel Development TechniquesModeling plays an increasingly important role in CRM strategies:

Target MarketingTarget Marketing Response ModelsResponse Models Risk ModelsRisk Models

Customer Behavioral Customer Behavioral ModelsModels Usage ModelsUsage Models Attrition ModelsAttrition Models Activation ModelsActivation Models

CollectionsCollections Recovery ModelsRecovery Models

Product Product PlanninPlannin

gg

Product Product PlanninPlannin

gg

CustomeCustomerr

AcquisitiAcquisitionon

CustomeCustomerr

AcquisitiAcquisitionon

CustomerCustomerManagemeManageme

ntnt

CustomerCustomerManagemeManageme

ntnt

CreatingCreatingValueValue

CreatingCreatingValueValueCollectioCollectio

ns/ns/RecoveryRecovery

CollectioCollections/ns/

RecoveryRecovery

Other ModelsOther Models Segmentation Models Segmentation Models Bankruptcy ModelsBankruptcy Models Fraud ModelsFraud Models

Model Development TechniquesModel Development TechniquesGiven that even minimal improvements in model Given that even minimal improvements in model classification accuracy can translate into significant classification accuracy can translate into significant savings or incremental revenue, an entire literature exists savings or incremental revenue, an entire literature exists on the comparison of model development techniques on the comparison of model development techniques (e.g., Atiya, 2001; Reichert et al., 1983; West, 2000; (e.g., Atiya, 2001; Reichert et al., 1983; West, 2000; Vellido et al., 1993; Zhang et al., 1999).Vellido et al., 1993; Zhang et al., 1999).

Statistical TechniquesStatistical Techniques Linear Discriminant AnalysisLinear Discriminant Analysis Logistic AnalysisLogistic Analysis Multiple Regression AnalysisMultiple Regression Analysis

Non-Statistical TechniquesNon-Statistical Techniques Neural NetworksNeural Networks Cluster AnalysisCluster Analysis Decision TreesDecision Trees

Model Evaluation MethodsModel Evaluation Methods

But, developing the model is really only half the problem. But, developing the model is really only half the problem. How do you then determine which model is “best”?How do you then determine which model is “best”?


In the context of binary classification (one of the most In the context of binary classification (one of the most common objectives in CRM modeling), one of four outcomes common objectives in CRM modeling), one of four outcomes is possible:is possible:

1.1. True positive True positive

2.2.False positiveFalse positive

3.3. True negativeTrue negative

4.4. False negativeFalse negative

FNFN

FPFPTPTP

TNTN

True GoodTrue Good True BadTrue Bad

Pred. BadPred. Bad

Pred. GoodPred. Good


If all of these outcomes, specifically the errors, have the If all of these outcomes, specifically the errors, have the same associated costs, then a simple same associated costs, then a simple global global classification rateclassification rate is a highly appropriate evaluation is a highly appropriate evaluation method:method:

True GoodTrue Good True BadTrue Bad TotalTotalPredicted GoodPredicted Good

Predicted BadPredicted Bad

TotalTotal

Classification Rate = 75% ((100+650)/1000) Classification Rate = 75% ((100+650)/1000)

200200

5050650650

100100

850850 150150

700700

300300

10001000

The global classification method is the most commonly The global classification method is the most commonly used (Bernardi and Zhang, 1999), but fails when the costs used (Bernardi and Zhang, 1999), but fails when the costs of the misclassification errors are different (Type 1 vs Type of the misclassification errors are different (Type 1 vs Type 2 errors) - 2 errors) -

Model 1 results:Model 1 results:

Global Classification Rate = 75%Global Classification Rate = 75%False Positive Rate = 5%False Positive Rate = 5%False Negative Rate = 20%False Negative Rate = 20%

Model 2 results:Model 2 results:

Global Classification Rate = 80%Global Classification Rate = 80%False Positive Rate = 15%False Positive Rate = 15%False Negative Rate = 5%False Negative Rate = 5%

What if the cost of a false positive was great, and the cost of What if the cost of a false positive was great, and the cost of a false negative was negligible? What if it was the other a false negative was negligible? What if it was the other way around?way around?



If the misclassification error costs are understood with some If the misclassification error costs are understood with some certainty, a certainty, a loss functionloss function could be used to evaluate the best could be used to evaluate the best model:model:

Loss=πLoss=π00ff00cc00+ π+ π

11ff11cc1 1

Where, Where, ππii is the probability that an element comes from is the probability that an element comes from

class class ii, (prior probability), f, (prior probability), fii is the probability that an element is the probability that an element

will be misclassified in will be misclassified in i i class, and cclass, and cii is the cost associated is the cost associated

with that misclassification error. with that misclassification error.


0%

20%

40%

60%

80%

100%

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

Score Cut Off

Cum

ula

tive P

erc

enta

ge

of O

bse

rva

tions

An evaluation model that uses the same conceptual An evaluation model that uses the same conceptual foundation as the global classification rate is the foundation as the global classification rate is the Kolmorgorov-Smirnov TestKolmorgorov-Smirnov Test: :

Greatest separation occurs at a cut off score of .65

Model Evaluation MethodsModel Evaluation MethodsWhat if you don’t have ANY information regarding What if you don’t have ANY information regarding misclassification error costs…or…the costs are in the eye of misclassification error costs…or…the costs are in the eye of the beholder? the beholder?

Model Evaluation MethodsModel Evaluation MethodsThe area under the The area under the ROC (Receiver Operating ROC (Receiver Operating Characteristics) CurveCharacteristics) Curve accounts for all possible outcomes accounts for all possible outcomes (Swets et al., 2000; Thomas et al., 2002; Hanley and McNeil, 1982, 1983):(Swets et al., 2000; Thomas et al., 2002; Hanley and McNeil, 1982, 1983):

Sen

sitiv

ity (

Tru

e P

ositi

ves)

Sen

sitiv

ity (

Tru

e P

ositi

ves)

1-Specificity (False Positives)1-Specificity (False Positives)00 11

11

θ=.5θ=.5

θ=1θ=1

.5<θ<1.5<θ<1FNFN

FPFPTPTP

TNTN

True GoodTrue Good True BadTrue Bad

Pred. BadPred. Bad

Pred. GoodPred. Good


So, given this background, the guiding questions of So, given this background, the guiding questions of our research were – our research were –

1.1. Does model development technique impact Does model development technique impact prediction accuracy?prediction accuracy?

2.2. How will model selection vary with the evaluation How will model selection vary with the evaluation method used?method used?


We elected to evaluate these questions using a large data We elected to evaluate these questions using a large data set from a pool of car loan applicants. The data set set from a pool of car loan applicants. The data set included:included:

• 14,042 US applicants for car loans between June 1, 1998 and June 14,042 US applicants for car loans between June 1, 1998 and June 30, 1999.30, 1999.

• Of these applicants, 9442 were considered to have been “good” and Of these applicants, 9442 were considered to have been “good” and 4600 were considered to be “bad” as of December 31, 1999. 4600 were considered to be “bad” as of December 31, 1999.

• 65 variables, split into two groups – 65 variables, split into two groups – • Transaction variables (miles on the vehicle, selling price, age of Transaction variables (miles on the vehicle, selling price, age of

vehicle, etc.)vehicle, etc.)• Applicant variables (bankruptcies, balances on other loans, Applicant variables (bankruptcies, balances on other loans, number of revolving trades, etc.)number of revolving trades, etc.)

Empirical Example – Empirical Example –

The LDA and Logistic models were developed using SAS The LDA and Logistic models were developed using SAS 8.2, while the Neural Network models were developed using 8.2, while the Neural Network models were developed using Backpack® 4.0. Backpack® 4.0.

Because there is no accepted guidelines for the number of Because there is no accepted guidelines for the number of hidden nodes in Neural Network development (Zhang et al., hidden nodes in Neural Network development (Zhang et al., 1999; Chen and Huang, 2003), we tested a range of hidden 1999; Chen and Huang, 2003), we tested a range of hidden nodes from 5 to 50. nodes from 5 to 50.

Empirical Example – Empirical Example – Feed Forward Back Propogation Neural Networks:Feed Forward Back Propogation Neural Networks:

Input LayerInput Layer Hidden LayerHidden Layer

Output LayerOutput Layer

ΣΣ SSCombination Combination FunctionFunctioncombines all combines all inputs into a inputs into a single value, single value, usually as a usually as a weighted weighted summationsummation

Transfer Transfer FunctionFunctionCalculates the Calculates the output value from output value from the combination the combination functionfunction

inputinput

inputinput

inputinput

inputinput

outputoutput

Empirical Example - ResultsEmpirical Example - Results

TechniqueTechnique Class Rate Class Rate “Goods”“Goods”

Class Rate Class Rate “Bads”“Bads”

Class Rate Class Rate “Global”“Global”

ThetaTheta K-S K-S TestTest

LDALDA 73.91%73.91% 43.40%43.40% 59.74%59.74% 68.98%68.98% 19%19%

LogisticLogistic 70.54%70.54% 59.64%59.64% 69.45%69.45% 68.00%68.00% 24%24%

NN-5 Hidden NodesNN-5 Hidden Nodes 63.50%63.50% 56.50%56.50% 58.88%58.88% 63.59%63.59% 38%38%










ConclusionsConclusions

What were we able to demonstrate?What were we able to demonstrate?

1.1. The “best” model depends upon the evaluation method The “best” model depends upon the evaluation method selected;selected;

2.2. The appropriate evaluation method depends upon The appropriate evaluation method depends upon situational and data context;situational and data context;

3.3. No multivariate technique is “best” under all No multivariate technique is “best” under all circumstances.circumstances.

Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the...

Documents

Transcript of Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the...