Gradient Boosting Survival Tree with Applications in ...€¦ · Rapid growth Heterogeneous data...

Gradient Boosting Survival Tree

with Applications in Credit Scoring

Miaojun Bai, Yan Zheng, Yun Shen

360 Finance Inc. (Nasdaq: QFIN)

Credit Scoring and Credit Control XVI, Edinburgh, 29.08.2019

Yun Shen | Gradient Boosting Survival Tree 1/21

Outline

1 Motivation

2 Gradient boosting survival tree

3 Applications in credit scoring

4 Conclusion

Chinese consumer finance market

01.2010 10.2018

1207.7

Market size ($ billion)

Rapid growth

Heterogeneous data

PBC report: 1/3 has credit ratings

personal info.

device info.

third party rate agencies

Changing market conditions

regulation

macroeconomic factor

Motivation

Pros of tree ensemble methods (e.g., XGB, LightGBM)

robust for heterogeneous data

fast modeling for credit scoring

utilize numerous “weak” a�ributes

Pros of survival analysis

predict the probability of default time

take long-term behavior into consideration

Idea: survival analysis + tree ensemble methods?

Motivation

Survival analysis

Survival function: S(t) = P(T > t)Discrete time periods

Hazard function:

h(τj) := P(τj−1 < T ≤ τj|T > τj−1), j = 1, 2, . . . ,

Hence,

S(τj) =j∏

(1− h(τl))

Likelihood

P(τj−1 < T ≤ τj) = h(τj)S(τj−1) = h(τj)j−1∏l=1

(1− h(τl))

Survival analysis

Survival function: S(t) = P(T > t)Discrete time periods

Hazard function:

h(τj) := P(τj−1 < T ≤ τj|T > τj−1), j = 1, 2, . . . ,

Hence,

S(τj) =j∏

(1− h(τl))

Likelihood

P(τj−1 < T ≤ τj) = h(τj)S(τj−1) = h(τj)j−1∏l=1

(1− h(τl))

Likelihood

Log hazard function

f (t) := log

h(t)1− h(t)

Likelihood

P(T = t) =J(t)∧J∏j=1

1 + e−yj(t)f (τj),

J(t) :={

j, if t ∈ (τj−1, τj]J + 1, if t > τJ

yj(t) :={−1, if t > τj1, if t ≤ τj

Learning objective

For each individual x, f is approximated by a survival tree ensemble

f (t; x) ∼= f̂ (t; x) :=K∑

fk(t; x)

education salary

education

male female

femalemale low high

low high

Learning objective

To minimize the negative log-likelihood

L =N∑i=1

J(ti)∧J∑j=1

(1 + exp

{−yj(ti)f̂ (τj; xi)

‖w‖2

J∑j=1

∑i∈Nj

(1 + exp

(−yj(ti)f̂ (τj; xi)

‖w‖2

where Nj := {i ∈ {1, 2, . . . ,N}|J(ti) ≥ j} is the set of samples

surviving longer than τj−1.

Regularization term

punish model complexity

avoid over-fi�ing

overcome numerical problems

Gradient tree boosting

Boosting algorithm:

At mth iteration, given f̂ (m−1)

fL(m) =

∑j,i

(1 + exp

{−yj(ti)

(f̂ (m−1)(τj; xi) + f (τj; xi)

)})+λ

‖w‖2 ⇒ fm

update f̂ (m)(t; x) = f̂ (m−1)(t; x) + fm(t; x)

Approximate by Taylor expansion up to the 2nd order

L(m)(f ) ∼=∑j,i

(r(m−1)i,j f (τj; xi) +

σ(m−1)i,j f 2(τj; xi)

‖w‖2

Gradient tree boosting

Survival tree with L nodes: f (τj; xi) =∑L

l=1wl(τj)1(i ∈ Il)

The objective function is strictly convex with optimal solution

w(m)l (τj) = −

∑i∈Nj∩Il r

(m−1)i,j∑

i∈Nj∩Il σ(m−1)i,j + λ

Split rule: I = IL ∪ IR

L̃split =1

(∑i∈Nj∩IL r

(m−1)i,j

∑i∈Nj∩IL σ

(m−1)i,j + λ

(∑i∈Nj∩IR r

(m−1)i,j

∑i∈Nj∩IR σ

(m−1)i,j + λ

(∑i∈Nj∩I r

(m−1)i,j

∑i∈Nj∩I σ

(m−1)i,j + λ

Summary

Log hazard function is approximated by a survival tree ensemble

maximum likelihood as the objective function

boosting algorithm

for each step, a gradient method applied to optimize the approximated

objective up to 2nd order

Datasets

Installment loans with 12 months

Definition of default: if on any scheduled repayment due date the

borrower is overdue for at least 10 days

Early repayments: regarded as “repaying on time” in the rest time

training and testing datasets

dataset time sample sizetraining set January 2018 200,000

testing set March 2018 120,000

Default rate

default rate(t) =#default accounts up to month t

#total accounts

Default rates on datasets

1 2 3 4 5 6 7 8 9 10 11 12Month

1.2bDe

fault R

Training dataTesting database_rate: b

Dataset and preprocessing

Over 400 original a�ributes are collected

exclude a�ributes with missing rate higher than 80%one-hot encoding for categorical a�ributes

50 features are selected by xgboost

source feature

PBC report

income score

credit score

overdue information of credit cards

personal information

education level

device information location

third-party rate agency

no. of loans in other lending platforms

travel intensity

other information

whether possessing a car

application channel

Convergence

1000 runs with λ = 0.001 and the max tree depth 6

0 5 10 15 20 25 30Iterations

Performance

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Survival Groups

Default R

Month 1Month 2Month 3Month 4Month 5Month 6Month 7Month 8Month 9Month 10Month 11Month 12

Comparison with existing models: C-Index

1 2 3 4 5 6 7 8 9 10 11 12Month

C-Inde

GBSTCOXRSFXGB

Comparison with existing models: AUC

1 2 3 4 5 6 7 8 9 10 11 12Month

GBSTCOXRSFXGB

Comparison with existing models

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Survival Groups

GBSTCOXRSFXGBbase_rate: b

Conclusion

Propose the gradient boosting survival tree (GBST) model

Confirm the convergence of GBST with a real dataset

GBST outperforms existing survival analysis and machine learning

models

Thank you!

Conclusion

Propose the gradient boosting survival tree (GBST) model

Confirm the convergence of GBST with a real dataset

GBST outperforms existing survival analysis and machine learning

models

Thank you!

Gradient Boosting Survival Tree with Applications in ...€¦ · Rapid growth Heterogeneous data...

Documents

Transcript of Gradient Boosting Survival Tree with Applications in ...€¦ · Rapid growth Heterogeneous data...

121004-PBC 7 STEP

Libro Romeo PBC ELSsi

Powur PBC Overview

Integral-V Technology - PBC · PDF file I LINEAR MOTION SOLUTIONS 1 INTEGRAL-V TECHNOLOGY PBC Linear® SIMULTANEOUS INTEGRAL MILLING OPERATION PBC

PRICE LIST - · PDF file · 2017-08-22Price List w.e.f. 9th January 2013 . CABLE GLANDS ... 120 25.0 28.0 32.0 36.0 39.0 PBC 1 PBC I PBC I PBC 1 PBC I ... FOR XLPE INSULATED ARMOURED

Pbc bulletin

PBC Infra_1

QinetiQ approach to Performance Based Contracting (PBC ... Performance... · 5. PBC Training • Conduct various PBC Training activities from ½ day executive overviews to 3 day intensive

39th PBC Ramon Enerio

PBC Bulletin - August 21

Pbc Computer Books-sc

SUS PBC Comparators

Web view · 2013-11-01WO/PBC/21/INF.1ANNEX I. WO/PBC/21/INF.1ANNEX IV . WO/PBC/21/INF.1ANNEX V. WO/PBC/21/INF.1ANNEX IX. WO/PBC/21/INF.1ANNEX VIII. WO/PBC/21/INF.1. Page 7. WO/PBC/21/INF.1ANNEX

PBC vs CIR

PBC LPN 19 2016

WO/PBC/20/-- (Arabic) - WIPO€¦ · Web viewWO/PBC/22/4 Annex I 2 WO/PBC/22/4 المرفق الثالث WO/PBC/22/4 18 WO/PBC/22/4 المرفق الأول WO/PBC/22/4 المرفق

PBC Bulletin - March 11

2012 PBC & SEGH

RECOMENDACIONES PBC-FT

PBC Name Clip Art