Developing Intelligent Systems for Credit Scoring Using Machine Learning Techniques Bart Baesens...

Developing Intelligent Systems for Credit Scoring Using Machine Learning

Techniques

Bart Baesens

Public Defence

September 24th, 2003

PhD Committee

J. Vanthienen

(promotor, K.U.Leuven)

J. Vandenbulcke

(K.U.Leuven)

M. Verhelst

(K.U.Leuven)

M. Vandebroek

(K.U.Leuven)

J. Crook

(Univ. Edinburgh)

L. Thomas

(Univ. Southampton)

Overview

►Knowledge Discovery in Data►The Credit Scoring Classification

Problem►Developing Accurate Credit Scoring

Systems►Developing Comprehensible Credit

Scoring Systems►Survival Analysis for Credit Scoring►Conclusions

Overview

KDD

Credit Scoring

Accuracy

Comprehensibility

Survival Analysis

Conclusions

Knowledge Discovery in Data

► The data avalanche problem finance, marketing, medicine, engineering

► Knowledge Discovery in Data (KDD) aims at learning patterns from data using advanced algorithms

► KDD steps Data preprocessing Data mining Post processing

► Machine learning provides a multitude of induction algorithms aimed at learning patterns from data

Overview

KDD

Credit Scoring

Accuracy

Comprehensibility

Survival Analysis

Conclusions

The Credit Scoring Classification Problem

► Credit scoring is a technique that helps organizations to decide whether or not to grant credit to customers who apply for a loan.

► The aim is to develop classification models based upon repayment behavior of past applicants.

► These models summarize all available information of an applicant in a score P(applicant is good payer | age, marital status, savings amount, …).

► If this score is above a predetermined threshold credit is granted, otherwise credit is denied.

Overview

KDD

Credit Scoring

Accuracy

Comprehensibility

Survival Analysis

Conclusions

Developing Accurate Credit Scoring Systems

► Credit scoring systems should be able to accurately distinguish good applicants from bad applicants.

► The problem is usually tackled using classification techniques.

► E.g., logistic regression, discriminant analysis, decision trees, Bayesian networks, neural networks, support vector machines, k-nearest neighbor, …

► Benchmarking study

Overview

KDD

Credit Scoring

Accuracy

Comprehensibility

Survival Analysis

Conclusions

Income > $50,000

Job > 3 Years High Debt

No

No No

Good Risk

Yes

Bad Risk

Yes

Good RiskBad Risk

Yes

Developing Accurate Credit Scoring Systems (contd.)

► Experimental setup 8 real-life credit scoring data sets Various cut-off setting schemes Classification accuracy + Area under Receiver

Operating Characteristic Curve McNemar test + DeLong, DeLong and Clarke-Pearson

test► Conclusions

Flat maximum effect Non-linear classifiers perform consistently good,

however simple, linear classifiers also give good performance

Only a handful of techniques were clearly inferior

Overview

KDD

Credit Scoring

Accuracy

Comprehensibility

Survival Analysis

Conclusions

Developing Comprehensible Credit Scoring Systems

► Ideally, a credit scoring system should be easy to understand and implement.

► “What is needed, clearly, is a redirection of credit scoring research efforts toward development of explanatory models of credit performance and the isolation of variables bearing an explanatory relationship to credit performance” (Capon, 1982)

► Legally and ethically justified (e.g. Equal Credit Opportunities Act in US)

► Trade-off between accuracy and comprehensibility (Occam’s Razor)

Overview

KDD

Credit Scoring

Accuracy

Comprehensibility

Survival Analysis

Conclusions

Pluralitas non est ponenda sine neccesitate William of Occam (ca. 1285-1349)

Developing Comprehensible Credit Scoring Systems (contd.)

► Neural network rule extraction► Rule representation formalisms

Propositional rule If purpose=cash and Savings Account ≤ 50€ Then Applicant=bad

Oblique rule If 0.84Income + 0.32Savings Account ≤ 1000€ Then Applicant=bad

M-of-N rules If {at least/exactly/at most} M of the N conditions (C1,C2,..,CN) are satisfied Then Applicant=bad

Descriptive fuzzy rules

If percentage of financial burden is large Then Applicant=bad

Approximate fuzzy rules

If term is trapezoidal(19.2 31.9 70.2 81.4) Then Applicant=bad

Overview

KDD

Credit Scoring

Accuracy

Comprehensibility

Survival Analysis

Conclusions


A p p lic a n t= b a d

P u rp o se = c a sh p ro v is io n in g

Te rm > 1 2 M o n th s

E c o n o m ic a l S e c to r= S e c to r C

Ye a rs C lie n t > 3 y e a rs

P ro p e rty = N o

S a v in g s A c c o u n t > 1 2 .4 0 E u ro

In c o m e > 7 1 9 E u ro

P u rp o se = se c o n d h a n d c a r

0 .3 8 0

0 .6 11

A p p lic a n t= g o o d

-0 .2 0 2

-0 .1 6 2

0 .2 7 8

-0 .1 0 2

-0 .2 8 7

-0 .0 8 1

-0 .2 8 9

0 .1 3 7

0 .4 5 7

-0 .4 5 3

Overview

KDD

Credit Scoring

Accuracy

Comprehensibility

Survival Analysis

Conclusions


if Term > 12 months and Purpose = cash provisioning and Savings account <= 12.40 Euro and Years client <= 3 then Applicant = bad if Term > 12 months and Purpose = cash provisioning and Owns property = No and Savings account <= 12.40 Euro then Applicant = bad if Purpose = cash provisioning and Income > 719 Euro and Owns property = No and Savings account <= 12.40 Euro and Years client <= 3 then Applicant = bad if Purpose = second hand car and Income > 719 Euro and Owns property = No and Savings account <= 12.40 Euro and Years client <= 3 then Applicant = bad if Savings account <= 12.40 Euro and Economical sector = Sector C then Applicant = bad Default class: Applicant = good

Overview

KDD

Credit Scoring

Accuracy

Comprehensibility

Survival Analysis

Conclusions


Overview

KDD

Credit Scoring

Accuracy

Comprehensibility

Survival Analysis

Conclusions

Survival Analysis for Credit Scoring

► Predict when customers default► Implications for profit scoring and debt

provisioning► Censored data► Statistical models for survival analysis

E.g. Kaplan-Meier, parametric models, proportional hazards

► Drawbacks Linear relationships No interaction effects Proportional hazards assumption

Overview

KDD

Credit Scoring

Accuracy

Comprehensibility

Survival Analysis

Conclusions

Survival Analysis for Credit Scoring (contd.)

►Neural networks for survival analysis►Requirements

Monotonically decreasing survival curve Scalable Censoring

►Empirically tested for predicting default and early repayment

►Comparisons with proportional hazards models

Overview

KDD

Credit Scoring

Accuracy

Comprehensibility

Survival Analysis

Conclusions

Conclusions

► Developing accurate credit scoring systems Flat maximum effect Superiority of non-linear classifiers Satisfactory performance of linear classifiers

► Developing comprehensible credit scoring systems Neural network rule extraction Decision tables Fuzzy rule extraction

► Neural network survival analysis

Overview

KDD

Credit Scoring

Accuracy

Comprehensibility

Survival Analysis

Conclusions

Future Research

►Indirect Credit Scoring►Knowledge Fusion►Behavioral Credit Scoring►Extensions to other Contexts

and Problem Domains

Overview

KDD

Credit Scoring

Accuracy

Comprehensibility

Survival Analysis

Conclusions

Developing Intelligent Systems for Credit Scoring Using Machine Learning Techniques Bart Baesens...

Documents

Transcript of Developing Intelligent Systems for Credit Scoring Using Machine Learning Techniques Bart Baesens...