Example-Dependent Cost-Sensitive Logistic Regression for...
Transcript of Example-Dependent Cost-Sensitive Logistic Regression for...
![Page 1: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/1.jpg)
Example-Dependent Cost-Sensitive Logistic
Regression for Credit Scoring
December 5, 2014
Alejandro Correa Bahnsen
with
Djamila Aouada, SnT Björn Ottersten, SnT
![Page 2: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/2.jpg)
Credit Scoring - Example
• Just fund a bank • Just quit college
![Page 3: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/3.jpg)
Credit Scoring - Example
• Biggest Ponzi scheme • Now Billionaire
![Page 4: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/4.jpg)
Credit Scoring
• Mitigate the impact of credit risk and make more objective and accurate decisions
• Estimate the risk of a customer defaulting his contracted financial obligation if a loan is granted, based on past experiences
• Different ML methods are used in practice, and in the literature: logistic regression, neural networks, discriminant analysis, genetic programing, decision trees, among others
![Page 5: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/5.jpg)
• Evaluation of credit score models
• Brier score
• AUC
• KS
• F1-Score
• Misclassification
• Nevertheless, none of these measures takes into account the business and economic realities that take place in credit scoring. Different costs that the financial institution has incur to acquire customers, or the expected profit due to a particular client, are not incorporated in the evaluation of the different models
Credit Scoring
![Page 6: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/6.jpg)
• Financial evaluation of credit score models
• Correct classification costs are assumed to be 0
• C_FN = losses if customer i defaults
• Cl_i is the credit line of customer i
• Lgd is the loss given default. Percentage of loss over the total credit line when the customer defaulted
Credit Scoring
![Page 7: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/7.jpg)
• Financial evaluation of credit score models
•
•
• loss in profit by rejecting what would have been a good customer
• assumption that the financial institution will not keep the money of the declined customer idle, but instead it will give a loan to an alternative customer
• Whom as an average customer has default probability equal to the prior default probability
Credit Scoring
![Page 8: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/8.jpg)
• Financial evaluation of credit score models
Credit Scoring
![Page 9: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/9.jpg)
Experiments
• Two publicly available datasets
• Kaggle Credit dataset
• PAKDD Credit dataset
• Contains information regarding customers income and debt from which the credit limit can be inferred, see appendix.
![Page 10: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/10.jpg)
Experiments
• Using Decision Trees (DT), Logistic Regression (LR) and Random Forest (RF) to estimate the probabilities
• Databases partitioned in training (t), validation and testing
• Each of them contain 50%, 25% and 25% of the total examples, respectively
• Under-sampled (u) dataset
• SMOTE - Synthetic Minority Over-sampling Technique (s)
![Page 11: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/11.jpg)
Experiments
• Savings of the DT, LR and RF algorithms
![Page 12: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/12.jpg)
Cost-Sensitive Classification
• Changing class distribution
• Cost Proportionate Rejection Sampling
• Cost Proportionate Over Sampling
• Direct Cost
• Bayes Minimum Risk
• Modifying a learning algorithm
• Cost-Sensitive Logistic Regression
![Page 13: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/13.jpg)
Experiments
Cost-Sensitive Sampling Bayes minimum risk
![Page 14: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/14.jpg)
• Logistic Regression Model
• Cost Function
• Cost Analysis
Cost-Sensitive - Logistic Regression
![Page 15: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/15.jpg)
• Actual Costs
• Cost-Sensitive Function
Cost-Sensitive - Logistic Regression
![Page 16: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/16.jpg)
Experiments
• Savings of the Cost-Sensitive Logistic Regression
![Page 17: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/17.jpg)
Experiments
• Comparison of the different algorithms
![Page 18: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/18.jpg)
Conclusion
• Selecting models based on traditional statistics does not give the best results in terms of cost
• Models should be evaluated taking into account real financial costs of the application
• Algorithms should be developed to incorporate those financial costs
https://github.com/albahnsen/CostSensitiveClassification
![Page 19: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/19.jpg)
Thank You!!
Alejandro Correa Bahnsen
![Page 20: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/20.jpg)
Contact information
Alejandro Correa Bahnsen
University of Luxembourg
albahnsen.com
http://www.linkedin.com/in/albahnsen
https://github.com/albahnsen/CostSensitiveClassification
![Page 21: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/21.jpg)
Appendix
![Page 22: Example-Dependent Cost-Sensitive Logistic Regression for ...albahnsen.github.io/files/Example-Dependent Cost... · •KS •F1-Score •Misclassification •Nevertheless, none of](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f4455ad220b8a5408047615/html5/thumbnails/22.jpg)
Appendix