Behavior-Based Predictive Models


Disclaimer

Any opinions, advice, statements, or other information or content expressed or made in the following presentation are those of the presenter and do not necessarily state or reflect the positions or opinions of JPMorgan Chase, its affiliates or subsidiaries.


Special Thanks to

- SAS and Dr. Jerry Oglesby

- ChoicePoint Precision Marketing (CPPM) analytic team that supported me to finish this work but doesn’t exist any more.

Rules:

1. During talk, stop me any time if you have question.

2. After talk, welcome to discuss with me offline.


Introduction

Most popular Data Mining models:

Logistic regression and its variants (NNets, SVM, … …)

2-state Assumption => Bernoulli Outcome

Predict the presence of certain behaviors (response, … …)

Major Limitation:

- Ignore Frequency and Severity given presence of behavior

Ex. 1st-time Auto Claim => Bad Luck => Normal

2 or More Claims => Bad Habit => Risky

- Consequence of 2-state: rank order head count but not $$$


Current Status

Most efforts focusing on relationship exploration:

- NNets, SVM, GAM, CART, … …

Overlook Definition of Left-hand Side:

- Binary outcome is derived from but over-simplifies behaviors

- Why not model behaviors directly using Count Models?

Any loss without 2-state assumption (Logistic Regression)?

Law of Small Numbers:

Bernoulli (N, p) ≈ Poisson (Np) given N -> ∞ and p -> 0

=> Prob (Y = 1|Y~Bern.) ≈ Prob (Y ≥ 1|Y~Pois.) - Show later !


Genuine Count Model

Starting Point: basic Poisson Model

iii

Yii

ii XExp where,!Y

ExpX|Yf

i

Major Drawback:

Strong Assumption of Equi-Dispersion => Mean = Variance

Real-world Data => Over-Dispersion

- Excess Zeroes: Majority with 0 delinquency in Credit Card

- Long Right Tail: Severely sick patients in Insurance

Observed Heterogeneity


Major Alternative

Negative Binomial Model (continuous mixture):

),(Gamma~Exp where,ExpXExpXExp 11iiiiii

E(Y|X) = λ and Var(Y|X) = λ + α λ2 > λ => Problem Solved !

Potential Limitation: 1-Process Assumption

- Lack of flexibility for heterogeneous population

- Lack of intuitive interpretation on excess zeroes

- Lack of insight for customer segmentation

Observed Heterogeneity

Unobserved Heterogeneity┴


Composite Models

Main Assumption => Multiple(2+) Components

- Data governed by multiple processes

Ex. Insurance claimant might behave differently after 1st claim.

Models covered:

- Hurdle Model (Mullahy 1986)

- Zero-Inflated Poisson Model (Lambert 1992)

- Latent Class Poisson Model (Wedel 1993)

Additional Benefit:

- Segmentation by behavior or / and characteristics


An Application

Credit Card Data used in Econometric Analysis (Greene 1992)

Outcome: # of 60-day Delinquencies in payment

Predictors:AGE Age in years as of November, 1989INCOME Self reported income, in $10,000sAVGEXP Average monthly credit card expenseEXP_INC Average monthly credit card expense/Average monthly incomeMAJOR Binary indicator of whether applicant has a major credit cardOWNRENT Binary indicator of whether applicant owns their homeDEPNDT Number of dependentsINC_PER Monthly income divided by 1 + DEPNDTSELFEMPL Binary indicator of whether the applicant is self-employedACTIVE Number of active credit card accountsCUR_ADD Number of months living at current address


Data SummaryVariable Mean Std.Dev. Min Max.

MAJORDRG 0.4564 1.3453 0.00 14.00AGE 33.2131 10.1428 0.17 83.50INCOME 3.3654 1.6939 0.21 13.50EXP_INC 0.0687 0.0947 0.00 0.91AVGEXP 185.0570 272.2190 0.00 3100.00MAJOR 0.8173 0.3866 0.00 1.00OWNRENT 0.4405 0.4966 0.00 1.00DEPNDT 0.9939 1.2478 0.00 6.00INC_PER 2.1556 1.3635 0.07 11.00SELFEMPL 0.0690 0.2535 0.00 1.00ACTIVE 6.9970 6.3058 0.00 46.00CUR_ADD 55.2676 66.2717 0.00 540.00

For outcome, Variance = 4 times Mean


EDA on Outcome

0%

20%

40%

60%

80%

0 1 2 3 4 5 6 7 8 9 10+

1. 80% Cardholders have 0 delinquency.

2. Large dispersion with long tail


Traditional Modeling Practice

Logistic Regression based on 2-State assumption:

Define Y = 0 if MajorDrg = 0 and Y = 1 otherwise

Fit a logistic regression with 0/1 Bernoulli outcome

proc logistic data = credit;

model Y = < PREDICTORS > ;

run;

Can’t differentiate between 1 delinquency and 3 delinquencies

Able to capture head counts but not dollar


Standard Count Data Model

Basic Poisson Model => Not Sufficient for data with 80% Zeroes

Negative Binomial Model:

proc genmod data = credit;

model Y = < PREDICTORS > / dist = NB link = log ;

run;

Goodness-of-Fit: Both portfolio level and account level

1Y

YX|Yf

i1

i

i1

1

1i

1i

ii

1


NB Output

Parameter Estimate Standard Error t Value Pr > |t|

B2_Intercept -1.7324 0.3771 -4.59 <.0001

B2_Age 0.004542 0.008946 0.51 0.6118

B2_Income -0.06657 0.08815 -0.76 0.4503

B2_Exp_inc -7.7236 2.6413 -2.92 0.0035

B2_Avgexp -0.0001 0.000832 -0.12 0.9031

B2_Ownrent -0.7795 0.1727 -4.51 <.0001

B2_Selfempl -0.07347 0.2802 -0.26 0.7932

B2_Depndt 0.1932 0.1215 1.59 0.112

B2_Inc_per 0.1292 0.1191 1.08 0.2783

B2_Cur_add 0.002557 0.001195 2.14 0.0326

B2_Major 0.02561 0.1893 0.14 0.8924

B2_Active 0.1152 0.0141 8.17 <.0001

alpha 3.5161 0.4046 8.69 <.0001


NB Portfolio Prediction

0%

20%

40%

60%

80%

0 1 2 3 4 5 6 7 8 9 10+

MajorDrg NB Prediction


How to Score

Count Model Scoring Scheme

Model Development

Prob(Y=0), Prob(Y=1), Prob(Y=2) Prob(Y=3), Prob(Y=4) ……

Define Good / Bad Ex: Bad = 1 when Y >= 2

Logit Model Scoring Scheme

Define Good / Bad Ex: Bad = 1 when Y >= 2

Model Development

Prob(Good) = Prob(Y=0 or 1) Prob(Bad) = 1 – Prob(Y=0 or 1)

Prob(Good) and Prob(Bad)


NB Account Prediction

0 %

1 0 %

2 0 %

3 0 %

4 0 %

5 0 %

6 0 %

7 0 %

8 0 %

9 0 %

1 0 0 %

0 % 1 0 % 2 0 % 3 0 % 4 0 % 5 0 % 6 0 % 7 0 % 8 0 % 9 0 % 1 0 0 %

C u m u la t iv e % o f P o p u la t io n

Cum

ulat

ive

% o

f Bad

s

N B S c o r e fo r 1 + D e l in q u e n c ie s N B S c o r e fo r 2 + D e l in q u e n c ie s

N B S c o r e fo r 3 + D e l in q u e n c ie s L o g is t ic R e g r e s s io n S c o r e


Hurdle Model

Two-Component Assumption:

- Zeroes counts determined by Binomial distribution

- Positive counts governed by Zero-Truncated Poisson distribution

2-Group Segmentation:

- Group without delinquency

- Group with delinquency

0Y for

!YExp1

Exp1

0Y for

X|Yfi

ii

Yiii

ii

iii


Hurdle Model in SAS

proc nlmixed data = data;

params b0 = 0 b1 = 0 ... a0 = 0 a1 = 0 ...;

xb = b0 + b1 * INCOME ... ...);

mu = exp(xb);

xa = a0 + a1 * INCOME ... ...);

if y = 0 then p = exp(xa) / (1 + exp(xa));

else p = (1 - exp(xa) / (1 + exp(xa))) / (1 - exp(-mu)) * (exp(-mu) * mu ** y / fact(y));

ll = log(p);

model y ~ general(ll);

run;

Probability for Zero

Probability for Zero-Truncated Poisson


Hurdle Output

Parameter Estimate Pr > |t| Parameter Estimate Pr > |t|B1_Intercept 1.92 <.0001 B2_Intercept 0.51 0.04B1_Age 0.00 0.64 B2_Age -0.01 0.34B1_Income 0.01 0.95 B2_Income -0.18 0.01B1_Exp_inc 6.71 0.00 B2_Exp_inc -13.81 0.00B1_Avgexp 0.00 0.47 B2_Avgexp 0.00 0.97B1_Ownrent 0.71 <.0001 B2_Ownrent -0.41 0.00B1_Selfempl -0.06 0.81 B2_Selfempl -0.05 0.80B1_Depndt -0.07 0.56 B2_Depndt 0.28 0.00B1_Inc_per -0.04 0.69 B2_Inc_per 0.25 0.00B1_Cur_add 0.00 0.00 B2_Cur_add 0.00 0.70B1_Major 0.18 0.35 B2_Major 0.22 0.09B1_Active -0.10 <.0001 B2_Active 0.04 <.0001

Logit Component Truncated Poisson Component

Drivers for Presence of Delinquency Drivers for Severity of Delinquency


Hurdle Portfolio Prediction

0 %

2 0 %

4 0 %

6 0 %

8 0 %

0 1 2 3 4 5 6 7 8 9 1 0 +

M a j o r D r g H u r d l e P r e d i c t i o n T r u n c a t e d P o i s s o n P r e d i c t i o n

Un-normalized Truncated Poisson Distribution

Composite Distribution


Hurdle Segmentation

0 %

2 0 %

4 0 %

6 0 %

8 0 %

0 1 2 3 4 5 6 7 8 9 1 0 +

N o D e l i n q u e n c y s e g m e n t ( 8 0 % ) D e l i n q u e n c y s e g m e n t ( 2 0 % )

1. Segmentation Model:

Logistic Model separates BLUE from RED

2. Severity Model:

Truncated Poisson predicts severity of RED


Hurdle Account Prediction

0 %

1 0 %

2 0 %

3 0 %

4 0 %

5 0 %

6 0 %

7 0 %

8 0 %

9 0 %

1 0 0 %

0 % 1 0 % 2 0 % 3 0 % 4 0 % 5 0 % 6 0 % 7 0 % 8 0 % 9 0 % 1 0 0 %


Cum

ulat

ive

% o

f Bad

s

H D L S c o r e fo r 1 + D e l in q u e n c ie s H D L S c o r e fo r 2 + D e l in q u e n c ie s

H D L S c o r e fo r 3 + D e l in q u e n c ie s L o g is t ic R e g S c o r e


Zero-Inflated Poisson Model

Two-Component Assumption:

- Part of zeroes determined by Binomial distribution

- Rest of zeroes together with positive counts determined by standard Poisson distribution

2-Group Segmentation:

- Group without delinquency risk

- Group with delinquency risk

0Y for !Y

Exp1

0Y for Exp1

X|Yfi

i

Yii

i

iiii

iii


ZIP Model in SAS


params b0 = 0 b1 = 0 ... a0 = 0 a1 = 0 ...;

xb = b0 + b1 * INCOME ... ...);

mu = exp(xb);

xa = a0 + a1 * INCOME … …);

if y = 0 then p = exp(xa) / (1 + exp(xa)) + (1 - exp(xa) / (1 + exp(xa)) * exp(-mu);

else p = (1 - exp(xa) / (1 + exp(xa))) * (exp(-mu) * mu ** y / fact(y));

ll = log(p);

model y ~ general(ll);

Run;

Probability for zero

Probability for Poisson after excluding zero


ZIP Output

Parameter Estimate Pr > |t| Parameter Estimate Pr > |t|B1_Intercept 1.61 0.00 B2_Intercept 0.45 0.08B1_Age -0.01 0.31 B2_Age -0.01 0.26B1_Income -0.12 0.31 B2_Income -0.18 0.00B1_Exp_inc 4.65 0.23 B2_Exp_inc -7.26 0.01B1_Avgexp 0.00 0.18 B2_Avgexp 0.00 0.52B1_Ownrent 0.54 0.01 B2_Ownrent -0.46 0.00B1_Selfempl 0.01 0.98 B2_Selfempl 0.01 0.94B1_Depndt 0.14 0.33 B2_Depndt 0.29 0.00B1_Inc_per 0.17 0.24 B2_Inc_per 0.26 0.00B1_Cur_add 0.00 0.00 B2_Cur_add 0.00 0.88B1_Major 0.36 0.13 B2_Major 0.23 0.08B1_Active -0.09 <.0001 B2_Active 0.04 <.0001

Logit Component Poisson Component

Drivers for Existence of Risk Drivers for Severity of Risk


ZIP Portfolio Prediction

0 %

2 0 %

4 0 %

6 0 %

8 0 %

0 1 2 3 4 5 6 7 8 9 1 0 +

M a j o r D r g Z I P P r e d i c t i o n P o i s s o n P r e d i c t i o n

Un-normalized Poisson Distribution



ZIP Segmentation

0 %

2 0 %

4 0 %

6 0 %

8 0 %

0 1 2 3 4 5 6 7 8 9 1 0 +

N o D e l i n q u e n c y S e g m e n t ( 7 2 % ) P o t e n t i a l D e l i n q u e n c y S e g m e n t ( 2 8 % )

Same outcome but different risk implications

1. Blue (72%): Established, free from financial risk

2. Red (8%): Vulnerable, might deteriorate in bad time


ZIP Account Prediction

0 %

1 0 %

2 0 %

3 0 %

4 0 %

5 0 %

6 0 %

7 0 %

8 0 %

9 0 %

1 0 0 %

0 % 1 0 % 2 0 % 3 0 % 4 0 % 5 0 % 6 0 % 7 0 % 8 0 % 9 0 % 1 0 0 %


Cum

ulat

ive

% o

f Bad

s

Z IP S c o r e fo r 1 + D e l in q u e n c ie s Z IP S c o r e fo r 2 + D e l in q u e n c ie s

Z IP S c o r e fo r 3 + D e l in q u e n c ie s L o g is t ic R e g S c o r e


Latent Class Poisson Model

General S-Component Assumption for S>= 2:

- Avoid sharp dichotomization

- Each case drawn from an unobserved Poisson component with different parameter

- S is determined by AIC / BIC

Segmentation assumed S = 2:

- Group with low risk

- Group with high risk

S

1s i

Ys| is| i

sii !Y

ExppX|Yf

i


LCP Model in SAS


params a0 = 0 ... b0 = 1 ...

prior1 = 0 to 1 by 0.1;

xa = a0 + a1 * INCOME ... ...); ma = exp(xa);

pa = exp(-ma) * ma ** y / fact(y);

xb = b0 + b1 * INCOME ... ...); mb = exp(xb);

pb = exp(-mb) * mb ** y / fact(y);

p = prior1 * pa + (1 - prior1) * pb;

ll = log(p);

run;

Probability of LC component 1

Probability of LC component 2


LCP Output

Parameter Estimate Pr > |t| Parameter Estimate Pr > |t|B1_Intercept -1.82 <.0001 B2_Intercept 0.31 0.42B1_Age 0.00 0.92 B2_Age 0.00 0.74B1_Income -0.10 0.38 B2_Income -0.17 0.06B1_Exp_inc -31.40 0.00 B2_Exp_inc -4.73 0.06B1_Avgexp 0.00 0.00 B2_Avgexp 0.00 0.29B1_Ownrent -0.97 0.00 B2_Ownrent -0.47 0.00B1_Selfempl 0.34 0.21 B2_Selfempl 0.40 0.37B1_Depndt 0.10 0.55 B2_Depndt 0.27 0.06B1_Inc_per 0.05 0.73 B2_Inc_per 0.23 0.05B1_Cur_add 0.00 <.0001 B2_Cur_add 0.00 0.03B1_Major -0.27 0.30 B2_Major 0.19 0.24B1_Active 0.09 <.0001 B2_Active 0.07 <.0001

Latent Poisson Component 1 Latent Poisson Component 2

Drivers for Low Risk Drivers for High Risk


LCP Portfolio Prediction

0 %

2 0 %

4 0 %

6 0 %

8 0 %

0 1 2 3 4 5 6 7 8 9 1 0 +

M a j o r D r g L C P r e d i c t i o n

L o w - M e a n P o i s s o n P r e d i c t i o n H i g h - M e a n P o i s s o n P r e d i c t i o n

Poisson Distribution of High Mean


Poisson Distribution of Low Mean


LCP Account Prediction

0 %

1 0 %

2 0 %

3 0 %

4 0 %

5 0 %

6 0 %

7 0 %

8 0 %

9 0 %

1 0 0 %

0 % 1 0 % 2 0 % 3 0 % 4 0 % 5 0 % 6 0 % 7 0 % 8 0 % 9 0 % 1 0 0 %


Cum

ulat

ive

% o

f Bad

s

L C S c o r e fo r 1 + D e l in q u e n c ie s L C S c o r e fo r 2 + D e l in q u e n c ie s

L C S c o r e fo r 3 + D e l in q u e n c ie s L o g is t ic R e g S c o r e

~ 5% benefit at high-risk zone


Parameter Comparison

Logit Trunc. Logit Poisson Class 1 Class 2 Intercept -1.73 1.92 0.51 1.61 0.45 -1.82 0.31Age 0.00 0.00 -0.01 -0.01 -0.01 0.00 0.00Income -0.07 0.01 -0.18 -0.12 -0.18 -0.10 -0.17Exp_inc -7.72 6.71 -13.81 4.65 -7.26 -31.40 -4.73Avgexp 0.00 0.00 0.00 0.00 0.00 0.00 0.00Ownrent -0.78 0.71 -0.41 0.54 -0.46 -0.97 -0.47Selfempl -0.07 -0.06 -0.05 0.01 0.01 0.34 0.40Depndt 0.19 -0.07 0.28 0.14 0.29 0.10 0.27Inc_per 0.13 -0.04 0.25 0.17 0.26 0.05 0.23Cur_add 0.00 0.00 0.00 0.00 0.00 0.00 0.00Major 0.03 0.18 0.22 0.36 0.23 -0.27 0.19Active 0.12 -0.10 0.04 -0.09 0.04 0.09 0.07Dispersion 3.52Prior_Prob 0.87

ZIP 2-Class LCPParameters

Hurdle Neg Binomial

In Hurdle / ZIP, 1st set of BETAs explain why delinquent and 2nd set explain how many delinquencies will be.


Prediction Comparison

Overall, NB model fits the best

Hurdle / ZIP works better in excess zeroes

In Cherry-picking, all are comparable to Logistic regression

Implied Models: Hurdle NB / Zero-Inflated NB / Latent Class NB ?

Outcome Observed NB Hurdle ZIP LCP0 1060 1058 1060 1059 10451 137 144 98 105 1552 50 52 65 72 493 24 24 38 42 264 17 13 21 22 165 11 8 10 10 106 5 5 5 5 67 6 3 3 2 48 0 2 1 1 29 2 2 1 1 1

10+ 7 8 17 1 4total 1319 1319 1319 1319 1319


Model Comparison

Statistical Consideration:

Better Statistics, More Parsimonious => NB

Business Consideration:

Better Interpretation, More Insight => Hurdle / ZIP / LCP

Statistics NB Hurdle ZIP LCPLog Likelihood -982.40 -1007.00 -1018.30 -986.00# of Parameters 13 24 24 25AIC 1979.80 2040.00 2062.60 1999.00BIC 2005.36 2088.89 2111.49 2050.01Voung Test -3.75 -2.11 -0.40

Behavior-Based Predictive Models

Technology

Transcript of Behavior-Based Predictive Models