Gradient Boosting Survival Tree with Applications in ...€¦ · Rapid growth Heterogeneous data...

Post on 11-Jul-2020

3 views 0 download

Transcript of Gradient Boosting Survival Tree with Applications in ...€¦ · Rapid growth Heterogeneous data...

Gradient Boosting Survival Tree

with Applications in Credit Scoring

Miaojun Bai, Yan Zheng, Yun Shen

360 Finance Inc. (Nasdaq: QFIN)

Credit Scoring and Credit Control XVI, Edinburgh, 29.08.2019

Yun Shen | Gradient Boosting Survival Tree 1/21

Outline

1 Motivation

2 Gradient boosting survival tree

3 Applications in credit scoring

4 Conclusion

Yun Shen | Gradient Boosting Survival Tree 2/21

Chinese consumer finance market

01.2010 10.2018

97.1

1207.7

Market size ($ billion)

Rapid growth

Heterogeneous data

PBC report: 1/3 has credit ratings

personal info.

device info.

third party rate agencies

Changing market conditions

regulation

macroeconomic factor

Yun Shen | Gradient Boosting Survival Tree 3/21

Motivation

Pros of tree ensemble methods (e.g., XGB, LightGBM)

robust for heterogeneous data

fast modeling for credit scoring

utilize numerous “weak” a�ributes

Pros of survival analysis

predict the probability of default time

take long-term behavior into consideration

Idea: survival analysis + tree ensemble methods?

Yun Shen | Gradient Boosting Survival Tree 4/21

Motivation

Pros of tree ensemble methods (e.g., XGB, LightGBM)

robust for heterogeneous data

fast modeling for credit scoring

utilize numerous “weak” a�ributes

Pros of survival analysis

predict the probability of default time

take long-term behavior into consideration

Idea: survival analysis + tree ensemble methods?

Yun Shen | Gradient Boosting Survival Tree 4/21

Motivation

Pros of tree ensemble methods (e.g., XGB, LightGBM)

robust for heterogeneous data

fast modeling for credit scoring

utilize numerous “weak” a�ributes

Pros of survival analysis

predict the probability of default time

take long-term behavior into consideration

Idea: survival analysis + tree ensemble methods?

Yun Shen | Gradient Boosting Survival Tree 4/21

Survival analysis

Survival function: S(t) = P(T > t)Discrete time periods

...

Hazard function:

h(τj) := P(τj−1 < T ≤ τj|T > τj−1), j = 1, 2, . . . ,

Hence,

S(τj) =j∏

l=1

(1− h(τl))

Likelihood

P(τj−1 < T ≤ τj) = h(τj)S(τj−1) = h(τj)j−1∏l=1

(1− h(τl))

Yun Shen | Gradient Boosting Survival Tree 5/21

Survival analysis

Survival function: S(t) = P(T > t)Discrete time periods

...

Hazard function:

h(τj) := P(τj−1 < T ≤ τj|T > τj−1), j = 1, 2, . . . ,

Hence,

S(τj) =j∏

l=1

(1− h(τl))

Likelihood

P(τj−1 < T ≤ τj) = h(τj)S(τj−1) = h(τj)j−1∏l=1

(1− h(τl))

Yun Shen | Gradient Boosting Survival Tree 5/21

Likelihood

Log hazard function

f (t) := log

h(t)1− h(t)

Likelihood

P(T = t) =J(t)∧J∏j=1

1

1 + e−yj(t)f (τj),

where

J(t) :={

j, if t ∈ (τj−1, τj]J + 1, if t > τJ

yj(t) :={−1, if t > τj1, if t ≤ τj

...

Yun Shen | Gradient Boosting Survival Tree 6/21

Learning objective

For each individual x, f is approximated by a survival tree ensemble

f (t; x) ∼= f̂ (t; x) :=K∑

k=1

fk(t; x)

age

sex

education salary

education

sex

male female

femalemale low high

low high

Yun Shen | Gradient Boosting Survival Tree 7/21

Learning objective

To minimize the negative log-likelihood

L =N∑i=1

J(ti)∧J∑j=1

log

(1 + exp

{−yj(ti)f̂ (τj; xi)

})+λ

2

‖w‖2

=

J∑j=1

∑i∈Nj

log

(1 + exp

(−yj(ti)f̂ (τj; xi)

))+λ

2

‖w‖2

where Nj := {i ∈ {1, 2, . . . ,N}|J(ti) ≥ j} is the set of samples

surviving longer than τj−1.

Regularization term

punish model complexity

avoid over-fi�ing

overcome numerical problems

Yun Shen | Gradient Boosting Survival Tree 8/21

Gradient tree boosting

Boosting algorithm:

At mth iteration, given f̂ (m−1)

min

fL(m) =

∑j,i

log

(1 + exp

{−yj(ti)

(f̂ (m−1)(τj; xi) + f (τj; xi)

)})+λ

2

‖w‖2 ⇒ fm

update f̂ (m)(t; x) = f̂ (m−1)(t; x) + fm(t; x)

Approximate by Taylor expansion up to the 2nd order

L(m)(f ) ∼=∑j,i

(r(m−1)i,j f (τj; xi) +

1

2

σ(m−1)i,j f 2(τj; xi)

)+λ

2

‖w‖2

Yun Shen | Gradient Boosting Survival Tree 9/21

Gradient tree boosting

Survival tree with L nodes: f (τj; xi) =∑L

l=1wl(τj)1(i ∈ Il)

The objective function is strictly convex with optimal solution

w(m)l (τj) = −

∑i∈Nj∩Il r

(m−1)i,j∑

i∈Nj∩Il σ(m−1)i,j + λ

Split rule: I = IL ∪ IR

L̃split =1

2

∑j

(∑i∈Nj∩IL r

(m−1)i,j

)2

∑i∈Nj∩IL σ

(m−1)i,j + λ

+

(∑i∈Nj∩IR r

(m−1)i,j

)2

∑i∈Nj∩IR σ

(m−1)i,j + λ

(∑i∈Nj∩I r

(m−1)i,j

)2

∑i∈Nj∩I σ

(m−1)i,j + λ

.

Yun Shen | Gradient Boosting Survival Tree 10/21

Summary

Log hazard function is approximated by a survival tree ensemble

maximum likelihood as the objective function

boosting algorithm

for each step, a gradient method applied to optimize the approximated

objective up to 2nd order

Yun Shen | Gradient Boosting Survival Tree 11/21

Datasets

Installment loans with 12 months

Definition of default: if on any scheduled repayment due date the

borrower is overdue for at least 10 days

Early repayments: regarded as “repaying on time” in the rest time

training and testing datasets

dataset time sample sizetraining set January 2018 200,000

testing set March 2018 120,000

Default rate

default rate(t) =#default accounts up to month t

#total accounts

Yun Shen | Gradient Boosting Survival Tree 12/21

Default rates on datasets

1 2 3 4 5 6 7 8 9 10 11 12Month

0

0.2b

0.4b

0.6b

0.8b

b

1.2bDe

fault R

ate

Training dataTesting database_rate: b

Yun Shen | Gradient Boosting Survival Tree 13/21

Dataset and preprocessing

Over 400 original a�ributes are collected

exclude a�ributes with missing rate higher than 80%one-hot encoding for categorical a�ributes

50 features are selected by xgboost

source feature

PBC report

income score

credit score

overdue information of credit cards

personal information

age

sex

education level

device information location

third-party rate agency

no. of loans in other lending platforms

travel intensity

other information

whether possessing a car

application channel

Yun Shen | Gradient Boosting Survival Tree 14/21

Convergence

1000 runs with λ = 0.001 and the max tree depth 6

0 5 10 15 20 25 30Iterations

0

2

4

6

8

10

12

14

16Lo

ss

Yun Shen | Gradient Boosting Survival Tree 15/21

Performance

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Survival Groups

Default R

ate

Month 1Month 2Month 3Month 4Month 5Month 6Month 7Month 8Month 9Month 10Month 11Month 12

Yun Shen | Gradient Boosting Survival Tree 16/21

Comparison with existing models: C-Index

1 2 3 4 5 6 7 8 9 10 11 12Month

0.77

0.78

0.79

0.80

0.81

C-Inde

x

GBSTCOXRSFXGB

Yun Shen | Gradient Boosting Survival Tree 17/21

Comparison with existing models: AUC

1 2 3 4 5 6 7 8 9 10 11 12Month

0.77

0.78

0.79

0.80

0.81

AUC

GBSTCOXRSFXGB

Yun Shen | Gradient Boosting Survival Tree 18/21

Comparison with existing models

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Survival Groups

0

0.5b

b

1.5b

2b

2.5b

3b

3.5b

4bDe

faul

t Rat

e

GBSTCOXRSFXGBbase_rate: b

Yun Shen | Gradient Boosting Survival Tree 19/21

Conclusion

Propose the gradient boosting survival tree (GBST) model

Confirm the convergence of GBST with a real dataset

GBST outperforms existing survival analysis and machine learning

models

Thank you!

Yun Shen | Gradient Boosting Survival Tree 20/21

Conclusion

Propose the gradient boosting survival tree (GBST) model

Confirm the convergence of GBST with a real dataset

GBST outperforms existing survival analysis and machine learning

models

Thank you!

Yun Shen | Gradient Boosting Survival Tree 20/21