Assessing the use of indeterminates for scorecard model ......scorecard model development James...

23
Assessing the use of indeterminates for scorecard model development James Tebboth & Manoel Gadi Santander Analytics Credit Scoring and Credit Control XI University of Edinburgh 27 August 2009 Risk Division Santander Analytics

Transcript of Assessing the use of indeterminates for scorecard model ......scorecard model development James...

Page 1: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

Assessing the use of indeterminates for scorecard model development

James Tebboth & Manoel GadiSantander Analytics

Credit Scoring and Credit Control XIUniversity of Edinburgh27 August 2009

Risk Division

Santander Analytics

Page 2: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

Just use goods and bads – don’t use indeterminates!

Risk Division

Santander Analytics

Page 3: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

3

Assessing the use of indeterminates

• 1. Organisational context• 2. Use of indeterminates in scorecards• 3. Empirical assessment• 4. Findings and application

Page 4: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

4

Assessing the use of indeterminates

• 1. Organisational context• 2. Use of indeterminates in scorecards• 3. Empirical assessment• 4. Findings and application

Page 5: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

5

323,911

202,244

4,908

1,247

92,684

2,945

Banco Santander

Total profit: € 8.9bn

Customer loans: € 620bn

AMERICAS

UNITED KINGDOM

CONTINENTAL EUROPE

75% Retail banking

21% Wholesale banking

4% Asset management / insurance

Page 6: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

6

Santander Analytics

� In 2007, the retail bank covered eleven countries, but only three had the capability to develop models

� Santander Analytics was created in 2007 as a group function for the management and development of credit risk scorecards

� The three existing units were combined to cover the entire retail bank of Santander

To contribute to achieving the best lending decisions for Santander through the development and monitoring of decision models, and ensuring the most

effective use of analytical resources available worldwide

Page 7: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

7

Scorecard development and management

� One of our objectives was to:

- to ensure a consistent approach

- to establish best practice

- to be dynamic: a focus for learning, capturing and building on the experience throughout the group

develop and implement a corporate methodology and process to develop and manage decision models

Page 8: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

8

Scorecard development and management

Gap analysis� Three corporate centres discussed and compared approaches

� Found subtle but significant differences in the three approaches:- reject inference

- use of indeterminates

- model formulation

- involvement of internal stakeholders

- statistics used to assess model performance

- reports used for monitoring models

- methods used to manage and revise models

- …

Page 9: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

9

Assessing the use of indeterminates

• 1. Organisational context• 2. Use of indeterminates in scorecards• 3. Empirical assessment• 4. Findings and application

Page 10: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

10

What are indeterminates?

Non-default

Default 3+ cycles

2 cyclesIndet

GoodIncreasing arrears status at performance point

Bad

Good

Bad

Indet

Good

Bad

REALITY CLASSIFICATION FOR MODEL BUILD

3+ cycles

1, 2 cycles

3+ cycles

� Model built on goods and bads only – indeterminates, if present, excluded from model build

� Considering indeterminates here as “intermediates”

OR OR

Page 11: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

11

Industry context

Historical background� Account for mistakes and inconsistencies in manual calculations of arrears

� Allow for judgemental treatment in collections

� With limited computing power, focus model build on small samples of clear-cut goods and bads

Diverse industry views…� Anderson, 2007: Logic for indeterminate range is that (i) seemingly bad

behaviour may be the result of technical arrears, or company strategies; and (ii) good and bad are more clear cut, which should hopefully aid identification of truly problematic accounts

� Scallan, 2008: Avoid indeterminates if at all possible: (i) extra discrimination is spurious; (ii) makes models less sensitive to borderline cases; (iii) statistical estimation more complex; (iv) complicates strategy setting

� Hand, 2003: Proper way to handle indeterminates, intermediate between other classes, is as a three class problem

R Anderson, 2007: The credit scoring toolkit: Theor y and practice for retail credit risk management an d decision automationG Scallan, 2008: Building Better ScorecardsD J Hand, 2003: Good practice in retail credit scor ecard assessment

Page 12: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

12

What are we trying to predict?

� Identify accounts that default … at some point

� Want recent development samples, so use proxy definition, such as three payments in arrears after 12 months

� Choose proxy definition to maximise correlation with default

� So purpose of scorecard is to identify “bads”, and want to distinguish bads from not-bads

Page 13: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

13

Assessing the use of indeterminates

• 1. Organisational context• 2. Use of indeterminates in scorecards• 3. Empirical assessment• 4. Findings and application

Page 14: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

14

Evaluation framework

� For each test case, two models built: one using a G/B definition; another using a G/I/B definition

� Models built and validated in consistent way

� Results assessed using Gini and K-S statistics

� In order to compare results, we’ll measure and assess performance using a consistent G/B definition

Page 15: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

15

Evaluation data

Three data sets used

� Use existing data sets, from previous model developments

� Brazil, Spain, UK

� Auto-loans, customer score, unsecured personal loan

� Four comparisons tested, representing four G/B or G/I/B definitions used in practice over the three data sets

� Volume of observations: 15,700 32,500 169,100

� Number of independent variables: 124 216 257

� Impossible to cover all variants, but aim for diverse samples toincrease robustness of results

Page 16: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

16

Performance evaluation

Jan-04

Feb-04

Mar-04

Apr-04

May-04

Jun-04

Jul-04

Aug-04

Sep-04

Oct-04

Nov-04

Dec-04

Jan-05

Feb-05

Mar-05

Apr-05

May-05

Jun-05

Jul-05

Aug-05

Sep-05

Oct-05

Nov-05

Dec-05

Jan-06

Feb-06

Mar-06

Apr-06

May-06

Jun-06

Jul-06

Aug-06

Sep-06

Oct-06

Nov-06

Dec-06

Jan-07

Feb-07

Mar-07

Apr-07

May-07

Jun-07

Jul-07

Aug-07

Sep-07

Oct-07

Nov-07

Dec-07

Jan-08

Feb-08

Mar-08

Apr-08

May-08

1 Jan-04 X X X X X X X X X X X X X X X X X X X

2 Feb-04 X X X X X X X X X X X X X X X X X X X

3 Mar-04 X X X X X X X X X X X X X X X X X X X

4 Apr-04 X X X X X X X X X X X X X X X X X X X

5 May-04 X X X X X X X X X X X X X X X X X X X

6 Jun-04 X X X X X X X X X X X X X X X X X X X

7 Jul-04 X X X X X X X X X X X X X X X X X X X

8 Aug-04 X X X X X X X X X X X X X X X X X X X

9 Sep-04 X X X X X X X X X X X X X X X X X X X

10 Oct-04 X X X X X X X X X X X X X X X X X X X

11 Nov-04 X X X X X X X X X X X X X X X X X X X

12 Dec-04 X X X X X X X X X X X X X X X X X X X

13 Jan-0514 Feb-0515 Mar-0516 Apr-0517 May-0518 Jun-0519 Jul-0520 Aug-0521 Sep-0522 Oct-0523 Nov-0524 Dec-0525 Jan-0626 Feb-0627 Mar-0628 Apr-0629 May-0630 Jun-0631 Jul-06 X X X X X X X X X X X X X X X X X X X

32 Aug-06 X X X X X X X X X X X X X X X X X X X

33 Sep-06 X X X X X X X X X X X X X X X X X X X

34 Oct-06 X X X X X X X X X X X X X X X X X X X

35 Nov-06 X X X X X X X X X X X X X X X X X X X

|- 5m O

OT

-||- distance =

19 months -|

|- 12m develop. -|

Out-of-time sample: mimic when model would have been in use

Development / holdout sample to build model

Observation (scoring) point

Performance point

Page 17: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

17

Assessing the use of indeterminates

• 1. Organisational context• 2. Use of indeterminates in scorecards• 3. Empirical assessment• 4. Findings and application

Page 18: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

18

Results

OOT SampleDev Sample

G/BG/BTest 4

G/BG/BTest 3

G/BG/BTest 2

G/BG/BTest 1

Table 1

OOT Sample

-0.4%

-5.0%

-1.4%

-2.7%

Table 2

� Table 1: better performing model, out of G/B and G/I/B models

� Table 2: relative drop in Gini, G/I/B model relative to G/B model- all results assessed using G/B definition

Page 19: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

19

Conclusions

Results clearly show that indeterminates do not add power to models

Recommend building models on G/B definition, ie, good = not bad

� Results apply to indeterminates as intermediates

� In certain situations may still be relevant to exclude observations where performance genuinely indeterminate

- eg where genuine customer performance unavailable or compromised

- better thought of as exclusions

Page 20: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

20

Convincing the stakeholders

A significant amount of discussion with stakeholders followed!� Evaluation framework had been agreed prior to work

� Results and conclusions circulated

� Many questions from stakeholders from all three centres followed- how the models were built

- why the performance statistics varied as they did

- had the model taken account of this or that particular detail in the data

� While no test can ever be perfect or comprehensive, the test framework used and the robustness of the results obtained meant that the conclusions were adopted

� Conclusions also follow principle of not introducing complexity unless warranted

- and also avoids complexity in other areas, eg, reject inference

Page 21: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

21

Implementation

� End result was an agreed approach to build models on a G/B definition

� Incorporated into corporate methodology

� Methodology implemented through model building code, to help give consistency and efficiency

Page 22: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

22

Further work

� Address all issues raised by gap analysis

� … but not all through empirical testing!

� Incorporate scorecard areas and techniques from recently acquired companies

- Banco Real

- Alliance & Leicester

- Sovereign

� Primary objective remains helping the bank to make the best decisions through the best models

� Recognise that standard techniques are not suitable for all situations

� So plenty of opportunities to continue learning

Page 23: Assessing the use of indeterminates for scorecard model ......scorecard model development James Tebboth & Manoel Gadi Santander Analytics ... behaviour may be the result of technical

Risk Division

Santander Analytics

Thank you for your attention

Any questions?