Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive...

35
Practical Predictive Analytics Karim Maarouf, Senior Data Scientist - Teradata Egypt Cairo’s Data Science Community Meetup January 9 th 2015

Transcript of Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive...

Page 1: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

Practical Predictive Analytics Karim Maarouf, Senior Data Scientist - Teradata Egypt

Cairo’s Data Science Community Meetup

January 9th 2015

Page 2: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

2

• Introduction and Motivation

• Business Understanding of Churn

• Data Preparation

• Modeling

• Evaluation

• Retention

• Other Topics in Churn

Agenda

© 2014 Teradata 2

Page 3: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

3

How can we help your business become more data driven?

Enabling data-driven business

Corporate Vision

Providing the world’s best analytic data solutions to drive competitive advantage for our customers

Mission

Page 4: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

4

• Organizations must move from information and hindsight to optimization and foresight.

Gartner: Use Data More Proactively!

Page 5: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

5

Predictive Analytics in Marketing

Detect customers at risk of default Detect instances of fraud

Detect customers that are at risk of churn and successfully retain them

Forecast projected net profit from a customer

Determine which customers are likely to buy which products. Recommend products

accordingly

Direct Marketing

Retention

Customer Lifetime Value

Risk

Page 6: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

6

• Follow the standard CRISP-DM

Methodology

Page 7: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

7

• Egyptian Telecommunications Market:

– Saturated market: mobile phone penetration is at 111%

– Majority (>95%) pre-paid

– Customers are price sensitive and easily switch between different providers

– Multi-SIM penetration is high (roughly 50%)

• Typical Pre-paid Customer Lifecycle

– Active customers can make and receive calls and other activities as long as they have enough credit

– Customers who are inactive (no activities or recharges) for an extended period of time (usually 90 days) are suspended.

– Customers who fail to recharge their line during suspension (usually 1 month) are disconnected. This event is considered churn.

• Churn by definition is a customer cancelling or deciding not to renew a service. For pre-paid subscribers the target event is actually inactivity.

Business Understanding Churn for a Telecommunications Provider

Page 8: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

8

• Churn rate = number of customers churned / total number of customers (calculated over a certain period of time)

• Typical monthly churn rates are anywhere between 3 and 8%

• Annual churn rate = 1 – Annual retention rate

= 1 – [(1 – monthly churn rate) ^ 12]

(For simplicity annual retention rate = monthly churn rate * 12)

• Assume monthly churn rate = 5%

Customers remaining at the end of one year = (1 – 0.05)^12 = 56%

Annual churn rate = 1 – [(1 – 0.05)^12] = 46%

This means by the end of the year the provider has lost almost half of the customers it started out with!

Business Understanding Churn for a Telecommunications Provider

Page 9: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

9

• Create classification models to predict which customers are most likely to churn and gain insights about customer churn from these models.

Modeling Approach

Customer Population

Churners

Non-

Churners

Churn Model

Historical Training Data

Non-Churners

Churners

Train Model Current Unclassified Data

Score Customers

Page 10: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

10

• Model scoring results in a score between 0 and 1 for each customer indicating the risk this customer will churn.

• Order customers by descending churn score and divide them into equal sized bins.

• Measure model accuracy overall and for the top X bins

• Target customers in the top X bins depending on model accuracy and budget.

Modeling Approach Scoring Customers

Bin Percentile

10 90% – 100%

9 80% – 90%

. .

. .

1 0% – 10%

Top Bin: 10% of customers with the highest churn scores

Churn Score

Page 11: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

11

• Churn modeling is a type of rare event modeling

– Special considerations when sampling and when evaluating model accuracy

• Churn prediction is time sensitive.

Customer A:

Predicting inactivity for customer A is trivial. However, the prediction is too late.

Customer B:

Predicting inactivity for customer B is more actionable. However, it may be too late.

Customer C:

This is the ideal case. Proactively detect inactivity to be able to retain the customer before it is too late.

Special Considerations

Today Last Activity Date

Today Last Activity Date

Today Last Activity Date

Page 12: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

12

• Observation Period: This is the period used to study the historical behavior of the customers.

• Latency Period: This is usually a 1 week or 2 week gap between the historical period and when the churn event starts to take place. It is used to simulate the time needed to build an ADS on the observation period, score all the customers and launch retention campaigns.

• Target Period: This is when the churn event takes place. In our case this is the start of the inactivity period.

Modeling Approach Time Periods

Observation

Period

Latency

Period

Target

Period

12W 1W 8W

Page 13: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

13

• Customers in a pre-paid multi-SIM market tend to become inactive and either churn completely or in most case the line becomes a secondary line.

• The target is therefore defined as a consecutive period of inactive days starting from the first week of the target period as opposed to a single event of churn

• Conduct a pre-analysis to find the tipping point. After how many consecutive days of inactivity will it be highly unlikely that a customer will return?

• The choice of target definition depends on how aggressive we want to be in targeting churners.

Define the Target Observation

Period

Latency

Period

Target

Period

12W 1W 1W 8W

Find consecutive

inactivity period that

starts in the first week

That leads to

dormancy

here

0 10 20 60

50

100

Number of Consecutive Days of Inactivity

Po

pu

latio

n %

Continue to 60D

Reactive before 60D

Example: 50% of the customers who are inactive for 10 or more days will continue inactivity to 60 days.

Page 14: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

14

• Simplest way is to evaluate based on overall accuracy.

– Percentage of correct predictions = (TN+TP) / Total Count

– Problem in rare event modeling

Assume churn rate = 5%

Assume we predict all cases as non-churn

Overall accuracy = (95 + 0) / 100 = 95%!!

Evaluation Overall Accuracy

Predicted

Non-Churn Churn

Actual Non-Churn True Negatives False Positives

Churn False Negatives True Positives

Predicted

Non-C Churn

Actual Non-C 95 0

Churn 5 0

Page 15: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

15

• Need to answer two questions

– When a customer is predicted as a churner, what is the likelihood that this prediction is correct? / What percentage of customers predicted as churners are in fact churners?– Precision

Precision = TP / Total predicted as churn = TP / (TP+FP)

– What percentage of actual churners are detected as churners?– Recall

Recall = TP / Total actual churners = TP / (TP+FN)

Evaluation Precision and Recall

Predicted

Non-Churn Churn

Actual Non-Churn True Negatives False Positives

Churn False Negatives True Positives

Page 16: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

16

• Extreme Case (High Precision - Low Recall)

• Extreme Case (High Recall - Low Precision)

• Need to successfully balance precision and recall

Evaluation Precision and Recall

Predicted

Non-C Churn

Actual Non-C 95 0

Churn 4 1

Predicted

Non-C Churn

Actual Non-C 50 45

Churn 0 5

Precision = 1/1 = 100%

Recall = 1/5 = 20%

Precision = 5/50 = 10%

Recall = 5/5 = 100%

Predicted

Non-C Churn

Actual Non-C 93 2

Churn 1 4

Precision = 4/6 = 67%

Recall = 4/5 = 80%

Page 17: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

17

• How to determine if precision is good enough?

• Assume churn rate is 5%

• A random model (selecting churners at random) will have a precision of 5%

• Lift specifies how much better the predictive model is doing than the random model

Lift = Precision / Churn Rate

Evaluation Lift

Case A

Assume

Churn rate = 5%

Precision = 50%

Lift = 50/5 = 10

Max Lift = 100/5 = 20

Case B

Assume

Churn rate = 10%

Precision = 60%

Lift = 60/10 = 6

Max Lift = 100/10 = 10

Page 18: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

18

• An Analytical Dataset (ADS) captures all relevant attributes which are necessary to predict future churn.

• Variables in an ADS cover different subject areas related to the business problem.

• The same variable is normally repeated for different time periods throughout the observation period.

– Example:

- Num_Calls_W1 Num_Calls_W2 Num_Calls_M1 Num_Calls_M2

- Num_Services_Used_M1 Num_Services_Used_M2

- Num_Complaints_M1 Num_Complaints_M2

Data Understanding and Preparation ADS and Core Variables

Usage Revenue Recharges Roaming and International

Services

Calling Circle Network

Experience Call Center Complaints

Status Call Gaps

Tariff Loyalty Points Competition

Page 19: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

19

• Evolution Variables

– Compare a variable over time

– Important to compare current behavior to customer’s normal behavior

– Usually used on active days, revenue, call gaps, number of recharges, total usage, etc. variables.

– Example: compare number of active days during last month of the observation period to the first month of the observation period.

– 𝑁𝑢𝑚_𝐴𝑐𝑡𝑖𝑣𝑒_𝐷𝑎𝑦𝑠_𝐸𝑉 =𝑁𝑢𝑚_𝐴𝑐𝑡𝑖𝑣𝑒_𝐷𝑎𝑦𝑠_𝑀3−𝑁𝑢𝑚_𝐴𝑐𝑡𝑖𝑣𝑒_𝐷𝑎𝑦𝑠_𝑀1

𝑁𝑢𝑚_𝐴𝑐𝑡𝑖𝑣𝑒_𝐷𝑎𝑦𝑠_𝑀1

=𝑁𝑢𝑚_𝐴𝑐𝑡𝑖𝑣𝑒_𝐷𝑎𝑦𝑠_𝑀3

𝑁𝑢𝑚_𝐴𝑐𝑡𝑖𝑣𝑒_𝐷𝑎𝑦𝑠_𝑀1 −1

M1 = 20, M3 = 30 EV = 0.5

Number of active days increased by 50%

M1 = 30, M3 = 15 EV = -0.5

Number of active days decreased by 50%

Data Understanding and Preparation Derived Variables

Page 20: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

20

• Ratio Variables

– Ratio between on-net and off-net usage.

– Ratio between calls to top 10 calling circle and total calls.

– Ratio between usage during business hours, night, etc. to total usage.

– Ratio between number of small, medium, etc. recharges to total number of recharges.

• Combine different types of derived variables

– Example: Evolution of num_small_recharges_Ratio variable

Data Understanding and Preparation Derived Variables

Page 21: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

21

• The following are some important transformations that should be applied to the data before modeling:

• Missing Value Replacement: missing values should be replaced either by a zero or the mean or mode of the variable (depending on the meaning of the variables itself)

• Outlier Replacement: outliers are extreme values in a sample that skew the distribution of a variable. Simplest way to replace outliers is using z-score.

• Normalization: transform skewed variables to a normal distribution using a log transform, square root or even binning.

Data Understanding and Preparation Variable Transformation

Log transform

Page 22: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

22

• Customer data used for modeling is divided into training, validation and test sets. A Common division is (60% - 20% - 20%)

– Training data: Data used for learning i.e. training the model.

– Validation data: Data used to verify the accuracy of the learned model. The purpose of a validation data set is to avoid over-fitting by measuring the accuracy of the model on data it has not seen before. If a model is accurate on the training data but significantly less accurate on the validation data, then this model has over-fit the training data and needs to be revised.

– Test data: Data used to test the final selected model. Measures of the model’s accuracy are calculated using this data set.

• In rare event modeling it is a good idea to increase the ratio of target cases in the training data

Modeling Samples

Training Data

Non-Churners 95%

Churners 5%s

Up sample non-churners or down sample churners

Non-Churners 50%

Churners 50%

Page 23: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

23

• Generally don’t build one model for all customers

• Need to divide customers into several populations that exhibit similar churn behavior

• The simplest way is to look at churn rate among different groups of customers

– Example: Assume overall churn rate is 5%

Modeling Populations

Value Segment Churn Rate

High 3%

Medium 5%

Low 10%

Tenure Churn Rate

> 2 yrs 3%

1yr – 2 yrs 6%

< 1 yr 15%

Value Segment Tenure

Separating the low value segment gives a

natural lift = 10/5 = 2

Separating the <1 yr tenure gives a

natural lift = 15/5 = 3

Page 24: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

24

• Collinearity between two variables means that they have a perfect linear relationship. You can perfectly predict one from the other.

• Having collinear variables doesn’t necessarily affect prediction accuracy. However, it makes it difficult to distinguish between the individual effects of these variables.

• In regression correlation is often used interchangeable with collinearity

• When two variables exhibit dependence they are said to be correlated

– When the variables increase together the correlation is positive

– When a variables increases when the other decreases and vice-versa the correlation is negative

• Use the Pearson correlation coefficient to measure the correlation between numeric variables

Modeling Removing Collinearity

Page 25: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

25

• One solution to eliminating collinearity is to drop correlated variables.

• This works since many correlated variables in our problem describe more or less the same thing.

• Dropping correlated variables will also ensure that the list of top contributing predictors is not dominated by variables from only one or two subject areas

• Example the following variables usually exhibit high correlation. Simply keep one of them:

Number of voice calls Duration of voice calls

Revenue of voice calls Number of distinct people called

• Build a correlation matrix to help in dropping variables. Example:

Modeling Removing Collinearity

Var 1 Var 2 Var 3

Var 1 1 0.3 0.6

Var 2 0.3 1 0.2

Var 3 0.6 0.2 1

Page 26: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

26

• Dimensionality reduction (i.e. reducing the number of predictor variables) helps in:

– Simplifying models and making them easier to interpret

– Removing collinearity

– Reducing over-fitting

• Use any of the following techniques to reduce the number of predictor variables:

• Principal Components Analysis (PCA): transform the predictors into a set of linearly uncorrelated variables called principal components. The top principal component has the largest possible variance.

• Feature Selection: eliminate predictors that are either irrelevant or redundant.

• Use a simple decision tree: Train a simple decision tree using all predictor variables and neglect all but the top X variables. Focus only on these top X variables moving forward

Modeling Dimensionality Reduction

Page 27: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

27

• Logistic regression is a statistical method for analyzing a dataset in which there are one or more predictor variables and a target variable that has one of two outcomes.

• Find the best fitting model to represent the relationship between the predictors and the target.

• It is a generalized form of the general linear model where a logit (log odds) link function is used.

𝑙𝑜𝑔𝑖𝑡 𝑝 = ln𝑝

1−𝑝= 𝑏0 + 𝑏1𝑥1 + …. + 𝑏𝑛𝑥𝑛

– P is the probability of the occurrence of the target event (churn)

– X1 to Xn represent the predictor variables and b0 to bn represent the coefficients

– Assume the probability of churn is 0.25 then the odds of churn are 0.25/0.75 = 1 to 3 and the logit = -1.1

Modeling Logistic Regression

Page 28: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

28

• A decision tree in predictive modeling maps observations about predictor variables to values for the target variable by following a sequence of rules.

• Decision trees classify instances by traversing the tree starting from the root to a leaf node with the decision.

• When building a decision tree the split is made on the variables that provide the most information gain.

• Example: Decision Tree for ‘play tennis’

Modeling Decision Trees

Page 29: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

29

• Build a model using the training data set.

• Use the validation data set to determine the best performing model.

• After selecting the final model test it on the test data set.

• Evaluating the accuracy of a model:

– Order customers by descending churn score and divide them into equal sized bins.

– Examine cumulative precision, recall and lift in the first 2 or 3 bins

Example*:

* Figures just for illustration. Assume churn rate is 10%

Evaluation

Bin Percentile Precision Recall Lift

10 90% – 100% 80% 50% 8

9 80% – 90% 60% 65% 6

8 70% - 80% 50% 75% 5

Page 30: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

30

• Cumulative gains is used as a measure of recall for portions of the population (usually each bin)

• What percentage of the real churners are detected by the model?

• Explanation:

– If you target the top 2 bins (top 20 %), you capture 50% of the real churners.

Evaluation Gains Chart

0 10 20 100

50

10

100

Population %

Targ

et

%

Random

Model

Perfect

Page 31: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

31

• A lift chart specifies how much better the predictive model is doing than the random model for portions of the population.

• How much better is the model performing than a random model?

• Explanation:

– In the top bin (top 10%) the model’s precision is 3 times the random model’s precision.

Evaluation Lift Chart

0 10 20 100

3

1

5

Population %

Lift

Random

Model

Perfect

Page 32: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

32

• Use the list of top contributing variables to reveal insights about the churn problem

• Logistic Regression Example:

𝑙𝑜𝑔𝑖𝑡 𝑝 = 𝑏0 + 𝑏1. 𝑐𝑎𝑙𝑙𝑖𝑛𝑔_𝑐𝑖𝑟𝑐𝑙𝑒_𝑠𝑖𝑧𝑒 + 𝑏2. 𝑟𝑎𝑡𝑖𝑜_𝑠𝑚𝑎𝑙𝑙_𝑟𝑒𝑐ℎ𝑎𝑟𝑔𝑒𝑠_𝑒𝑣 + 𝑏3. 𝑛𝑢𝑚_𝑠𝑒𝑟𝑣𝑖𝑐𝑒𝑠_𝑎𝑐𝑡𝑖𝑣𝑒

• A one unit increase in calling_circle_size is associated with a b1 increase or decrease (depending on sign) in the log odds of churn when the other predictors are constant

• Decision Tree Example:

• Derive business rules by traversing the tree

Modeling Interpretation

Calling Circle Size < 2

Churn

Non-Churn Ratio small recharges ev

<= 0

Num services active > 3

Non-Churn

Churn Y

N

Y

N Y

N

Page 33: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

33

• Churn detection is only half the problem!

• Need to successfully retain customers.

• Make sure detection is not too late and action is timely (within the latency period).

• Use insights from models to determine possible churn reasons:

– Better tariff plans or offers from competitors

– Network quality issues

– Unresolved complaints

• Tailor Retention offers based on customer segments and interests

Retention

Score customers and send out

retention campaigns before

latency is over

1st day to take action

Observation Latency Target

Page 34: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

34

• Important to have a churn program

– Proactive methods: different predictive models for different customer segments and subject areas.

– Re-active methods: churn triggers (customers inactive for an abnormally extended period of time

• Build separate models based on different subject areas. Give more weight to variables being overshadowed by stronger ones in the primary model.

– Example: a model built using only network variables and check the overlap in churners detected.

Other Topics in Churn

All Churners

Churners Detected using Primary Model

Churners Detected using Secondary Model

Page 35: Practical Predictive Analytics - Meetupfiles.meetup.com/17637122/Practical Predictive Analytics.pdf · Predictive Analytics in Marketing Detect customers at risk of default ... •Churn

35

Thank You