Developing Intelligent Systems for Credit Scoring Using Machine Learning Techniques Bart Baesens...
-
Upload
austin-ramsey -
Category
Documents
-
view
216 -
download
0
Transcript of Developing Intelligent Systems for Credit Scoring Using Machine Learning Techniques Bart Baesens...
Developing Intelligent Systems for Credit Scoring Using Machine Learning
Techniques
Bart Baesens
Public Defence
September 24th, 2003
PhD Committee
J. Vanthienen
(promotor, K.U.Leuven)
J. Vandenbulcke
(K.U.Leuven)
M. Verhelst
(K.U.Leuven)
M. Vandebroek
(K.U.Leuven)
J. Crook
(Univ. Edinburgh)
L. Thomas
(Univ. Southampton)
Overview
►Knowledge Discovery in Data►The Credit Scoring Classification
Problem►Developing Accurate Credit Scoring
Systems►Developing Comprehensible Credit
Scoring Systems►Survival Analysis for Credit Scoring►Conclusions
Overview
KDD
Credit Scoring
Accuracy
Comprehensibility
Survival Analysis
Conclusions
Knowledge Discovery in Data
► The data avalanche problem finance, marketing, medicine, engineering
► Knowledge Discovery in Data (KDD) aims at learning patterns from data using advanced algorithms
► KDD steps Data preprocessing Data mining Post processing
► Machine learning provides a multitude of induction algorithms aimed at learning patterns from data
Overview
KDD
Credit Scoring
Accuracy
Comprehensibility
Survival Analysis
Conclusions
The Credit Scoring Classification Problem
► Credit scoring is a technique that helps organizations to decide whether or not to grant credit to customers who apply for a loan.
► The aim is to develop classification models based upon repayment behavior of past applicants.
► These models summarize all available information of an applicant in a score P(applicant is good payer | age, marital status, savings amount, …).
► If this score is above a predetermined threshold credit is granted, otherwise credit is denied.
Overview
KDD
Credit Scoring
Accuracy
Comprehensibility
Survival Analysis
Conclusions
Developing Accurate Credit Scoring Systems
► Credit scoring systems should be able to accurately distinguish good applicants from bad applicants.
► The problem is usually tackled using classification techniques.
► E.g., logistic regression, discriminant analysis, decision trees, Bayesian networks, neural networks, support vector machines, k-nearest neighbor, …
► Benchmarking study
Overview
KDD
Credit Scoring
Accuracy
Comprehensibility
Survival Analysis
Conclusions
Income > $50,000
Job > 3 Years High Debt
No
No No
Good Risk
Yes
Bad Risk
Yes
Good RiskBad Risk
Yes
Developing Accurate Credit Scoring Systems (contd.)
► Experimental setup 8 real-life credit scoring data sets Various cut-off setting schemes Classification accuracy + Area under Receiver
Operating Characteristic Curve McNemar test + DeLong, DeLong and Clarke-Pearson
test► Conclusions
Flat maximum effect Non-linear classifiers perform consistently good,
however simple, linear classifiers also give good performance
Only a handful of techniques were clearly inferior
Overview
KDD
Credit Scoring
Accuracy
Comprehensibility
Survival Analysis
Conclusions
Developing Comprehensible Credit Scoring Systems
► Ideally, a credit scoring system should be easy to understand and implement.
► “What is needed, clearly, is a redirection of credit scoring research efforts toward development of explanatory models of credit performance and the isolation of variables bearing an explanatory relationship to credit performance” (Capon, 1982)
► Legally and ethically justified (e.g. Equal Credit Opportunities Act in US)
► Trade-off between accuracy and comprehensibility (Occam’s Razor)
Overview
KDD
Credit Scoring
Accuracy
Comprehensibility
Survival Analysis
Conclusions
Pluralitas non est ponenda sine neccesitate William of Occam (ca. 1285-1349)
Developing Comprehensible Credit Scoring Systems (contd.)
► Neural network rule extraction► Rule representation formalisms
Propositional rule If purpose=cash and Savings Account ≤ 50€ Then Applicant=bad
Oblique rule If 0.84Income + 0.32Savings Account ≤ 1000€ Then Applicant=bad
M-of-N rules If {at least/exactly/at most} M of the N conditions (C1,C2,..,CN) are satisfied Then Applicant=bad
Descriptive fuzzy rules
If percentage of financial burden is large Then Applicant=bad
Approximate fuzzy rules
If term is trapezoidal(19.2 31.9 70.2 81.4) Then Applicant=bad
Overview
KDD
Credit Scoring
Accuracy
Comprehensibility
Survival Analysis
Conclusions
Developing Comprehensible Credit Scoring Systems (contd.)
A p p lic a n t= b a d
P u rp o se = c a sh p ro v is io n in g
Te rm > 1 2 M o n th s
E c o n o m ic a l S e c to r= S e c to r C
Ye a rs C lie n t > 3 y e a rs
P ro p e rty = N o
S a v in g s A c c o u n t > 1 2 .4 0 E u ro
In c o m e > 7 1 9 E u ro
P u rp o se = se c o n d h a n d c a r
0 .3 8 0
0 .6 11
A p p lic a n t= g o o d
-0 .2 0 2
-0 .1 6 2
0 .2 7 8
-0 .1 0 2
-0 .2 8 7
-0 .0 8 1
-0 .2 8 9
0 .1 3 7
0 .4 5 7
-0 .4 5 3
Overview
KDD
Credit Scoring
Accuracy
Comprehensibility
Survival Analysis
Conclusions
Developing Comprehensible Credit Scoring Systems (contd.)
if Term > 12 months and Purpose = cash provisioning and Savings account <= 12.40 Euro and Years client <= 3 then Applicant = bad if Term > 12 months and Purpose = cash provisioning and Owns property = No and Savings account <= 12.40 Euro then Applicant = bad if Purpose = cash provisioning and Income > 719 Euro and Owns property = No and Savings account <= 12.40 Euro and Years client <= 3 then Applicant = bad if Purpose = second hand car and Income > 719 Euro and Owns property = No and Savings account <= 12.40 Euro and Years client <= 3 then Applicant = bad if Savings account <= 12.40 Euro and Economical sector = Sector C then Applicant = bad Default class: Applicant = good
Overview
KDD
Credit Scoring
Accuracy
Comprehensibility
Survival Analysis
Conclusions
Developing Comprehensible Credit Scoring Systems (contd.)
Overview
KDD
Credit Scoring
Accuracy
Comprehensibility
Survival Analysis
Conclusions
Survival Analysis for Credit Scoring
► Predict when customers default► Implications for profit scoring and debt
provisioning► Censored data► Statistical models for survival analysis
E.g. Kaplan-Meier, parametric models, proportional hazards
► Drawbacks Linear relationships No interaction effects Proportional hazards assumption
Overview
KDD
Credit Scoring
Accuracy
Comprehensibility
Survival Analysis
Conclusions
Survival Analysis for Credit Scoring (contd.)
►Neural networks for survival analysis►Requirements
Monotonically decreasing survival curve Scalable Censoring
►Empirically tested for predicting default and early repayment
►Comparisons with proportional hazards models
Overview
KDD
Credit Scoring
Accuracy
Comprehensibility
Survival Analysis
Conclusions
Conclusions
► Developing accurate credit scoring systems Flat maximum effect Superiority of non-linear classifiers Satisfactory performance of linear classifiers
► Developing comprehensible credit scoring systems Neural network rule extraction Decision tables Fuzzy rule extraction
► Neural network survival analysis
Overview
KDD
Credit Scoring
Accuracy
Comprehensibility
Survival Analysis
Conclusions
Future Research
►Indirect Credit Scoring►Knowledge Fusion►Behavioral Credit Scoring►Extensions to other Contexts
and Problem Domains
Overview
KDD
Credit Scoring
Accuracy
Comprehensibility
Survival Analysis
Conclusions