ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which...

31
FACULTEIT ECONOMISCHE EN SOCIALE WETENSCHAPPEN & SOLVAY BUSINESS SCHOOL ES-Working Paper no. 12 THE CASE FOR PRESCRIPTIVE ANALYTICS: A NOVEL MAXIMUM PROFIT MEASURE FOR EVALUATING AND COMPARING CUSTOMER CHURN PREDICTION AND UPLIFT MODELS Floris Devriendt and Wouter Verbeke April 30th, 2018 Vrije Universiteit Brussel – Pleinlaan 2, 1050 Brussel – www.vub.be [email protected] © Vrije Universiteit Brussel

Transcript of ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which...

Page 1: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

FACULTEIT ECONOMISCHE EN SOCIALE WETENSCHAPPEN & SOLVAY BUSINESS SCHOOL

ES-Working Paper no. 12 THE CASE FOR PRESCRIPTIVE ANALYTICS: A NOVEL MAXIMUM PROFIT MEASURE FOR EVALUATING AND COMPARING CUSTOMER CHURN PREDICTION AND UPLIFT MODELS

Floris Devriendt and Wouter Verbeke�

April 30th, 2018 Vrije Universiteit Brussel – Pleinlaan 2, 1050 Brussel – www.vub.be – [email protected] © Vrije Universiteit Brussel

Page 2: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

This text may be downloaded for personal research purposes only. Any additional reproduction for other purposes, whether in hard copy or electronically, requires the consent of the author(s), editor(s). If cited or quoted, reference should be made to the full name of the author(s), editor(s), title, the working paper or other series, the year and the publisher. Printed in Belgium Vrije Universiteit Brussel Faculty of Economics, Social Sciences and Solvay Business School B-1050 Brussel Belgium www.vub.be

Page 3: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

The case for prescriptive analytics: a novel maximum profit measure forevaluating and comparing customer churn prediction and uplift models

Floris Devriendta,⇤, Wouter Verbekea

aData Analytics Laboratory, Faculty of Economic and Social Sciences and Solvay Business School, Vrije UniversiteitBrussel, Pleinlaan 2, B-1050 Brussels, Belgium

Abstract

Prescriptive analytics and uplift modeling are receiving more attention from the business analyt-

ics research community and from industry as an alternative and improved paradigm of predictive

analytics that supports data-driven decision making. Although it has been shown in theory that

prescriptive analytics improves decision-making more than predictive analytics, no empirical evi-

dence has been presented in the literature on an elaborated application of both approaches that

allows for a fair comparison of predictive and uplift modeling. Such a comparison is in fact prohib-

ited by a lack of available evaluation measures that can be applied to predictive and uplift models.

Therefore, in this paper, we introduce a novel evaluation metric called the maximum profit uplift

measure that allows one to assess the performance of an uplift model in terms of the maximum

potential profit that can be achieved by adopting an uplift model. The measure is developed for

evaluating customer churn uplift models and for extending the existing maximum profit measure

for evaluating customer churn prediction models. Both measures are subsequently applied to a case

study to assess and compare the performance of customer churn prediction and uplift models. We

find that uplift modeling outperforms predictive modeling and allows one to enhance the profitabil-

ity of retention campaigns. The empirical results indicate that prescriptive analytics are superior

to predictive analytics in the development of customer retention campaigns.

Keywords: Analytics, Business applications, Prescriptive analytics, Uplift modeling, Customer

churn prediction, Customer retention

⇤Corresponding authorEmail addresses: [email protected] (Floris Devriendt), [email protected] (Wouter Verbeke)

Preprint submitted to European Journal of Information Sciences April 9, 2018

Page 4: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

1. Introduction

The term business analytics is used as a catch-all term covering a wide variety of what essentially

are data-processing techniques. In its broadest sense, business analytics strongly overlaps with data

science, statistics, and related fields such as artificial intelligence (AI) and machine learning [1].

Analytics is used as a toolbox containing a variety of instruments and methodologies allowing one

to analyze data in support of evidence-based decision-making with the aim of enhancing e�ciency,

e�cacy, and, thus ultimately, profitability. Types of analytical tools, in increasing order, are

descriptive, predictive, and prescriptive analytics. While descriptive analytics o↵er insight into

current situations, predictive analytics allow one to explain complex relations between variables and

to predict future trends. As such, predictive analytics o↵er more uses than descriptive analytics.

Currently, prescriptive analytics are receiving more attention from practitioners and scientists in

that they add further value by allowing one to simulate the future as a function of control variables

to prescribe optimal settings for control variables. At the core of prescriptive analytics is uplift

modeling, which is introduced below. In the experiments reported in this article, the use and

performance of predictive and prescriptive analytics is thoroughly compared. Business analytics

is being applied to an increasingly diverse range of well-specified tasks across a broad variety of

industries. Popular examples include tasks related to credit scoring [2, 3], fraud detection [4], and

customer churn prediction [5, 6], the latter being the application of interest in this article.

Customer churn prediction models are designed to predict which customers are about to churn

and to accurately segment a customer base. This allows a company to target customers that are

most likely to churn during a retention marketing campaign, thus improving the e�cient use of

limited resources for such a campaign, i.e., the return on marketing investment (ROMI), while

reducing costs associated with churning [7]. Generally speaking, customer retention is profitable

to a company because (1) attracting new clients costs five to six times more than retaining exist-

ing customers [8–11]; (2) long-term customers generate more profits, tend to be less sensitive to

competitive marketing activities, tend to be less costly to serve, and may generate new referrals

through positive word-of-mouth processes, whereas dissatisfied customers might spread negative

word-of-mouth messages [12–17]; and (3) losing customers incurs opportunity costs due to a reduc-

tion in sales [18]. Therefore, a small improvement in customer retention can lead to a significant

increase in profits [19].

However, it has been reported that marketing actions undertaken to retain customers may

actually provoke the opposite behavior and may cause or motivate a customer to churn. As noted

in Radcli↵e and Simpson [20], churn risk is highly correlated with customer dissatisfaction, and

the goal in turn becomes to prevent a dissatisfied customer from actually leaving. Any attempt

made to contact a dissatisfied customer with the goal of retaining him or her can actually hasten

2

Page 5: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

the process and provoke the customer to leave earlier than expected [20]. Therefore, it is necessary

to evaluate the e↵ectiveness of a retention campaign at the individual customer level. Predictive

models fail to di↵erentiate between customers who respond favorably (i.e., who do not churn) to a

campaign and customers who respond favorably on their own accord regardless of a campaign (i.e.,

who would not have churned in any case and who were not targeted by a campaign).

To address this shortcoming of predictive models, uplift modeling has recently been proposed

as an alternative means of identifying customers who are likely to be persuaded by a promotional

marketing campaign, rather than predicting whether customers are likely to respond to a promo-

tional marketing campaign (which may or may not be the result of the campaign). Uplift modeling

can be applied to identify customers who are likely to be retained through a retention campaign

as an alternative to predicting whether customers are likely to churn [21]. More precisely, uplift

modeling aims at establishing the net di↵erence in customer behavior resulting from a specific treat-

ment a↵orded to customers, e.g., a reduction in the likelihood to churn with retention campaign

targeting.

In this paper we aim to contrast customer churn prediction (CCP) and customer churn uplift

(CCU) modeling for customer retention by comparing their performance when applied to an ex-

perimental case study of the financial industry. To compare the performance of these approaches,

a common evaluation procedure is applied. However, given the di↵erent forms of output that these

models produce, to evaluate prediction and uplift models, di↵erent performance measures are used.

In evaluating classification models and, more specifically, CCP models, the receiver operating char-

acteristic (ROC) curve or lift curve are typically used. Performance can be expressed as the area

under the ROC curve, as the top decile lift or as the (expected) maximum profit. In evaluating

uplift models, the Qini curve and uplift per decile plots are typically used. Performance is typically

reported in terms of the Qini index or top decile uplift. As the goal of customer churn modeling is

to maximize ROMI, in Verbeke et al. [22], the authors introduce the maximum profit (MP) measure

for evaluating CCP models. The MP measure calculates profit generated when considering the op-

timal fraction of top-ranked customers according to the CCP model of a retention campaign. The

MP measure allows one to determine the optimal model and fraction of customers to include, yield-

ing a significant increase in profitability relative to that achieved when using statistical measures

[22–24].

In this article, we extend the MP measure to evaluate the performance of CCU models, and we

introduce the maximum profit for uplift (MPU) measure. Both the MP and MPU measure are then

used to compare the performance of CCP and CCU logistic regression and random forest models

through an experimental case study. Our main contributions are threefold:

1. We introduce an application of uplift modeling for customer retention.

2. We extend the maximum profit measure for evaluating uplift models.

3

Page 6: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

3. We apply and compare CCP and CCUmodels through an experimental case study of the financial

industry.

This paper is structured as follows. In Section 2, we first introduce customer churn prediction

modeling before discussing uplift modeling as an alternative approach to predictive modeling. Then

in Section 3, the MP measure for CCP models is defined and extended for application to customer

churn uplift models. In Section 4, we describe the experimental design of the case study and then

discuss the results of our experiments. Finally, in Section 5, conclusions are given.

2. Literature

In Section 2.1, customer churn prediction is introduced along with current standard approaches

as described in the literature and adopted in industry. Then in Section 2.2, we describe uplift

modeling and discuss the most prominent uplift modeling techniques and performance measures

developed for evaluating uplift models.

2.1. Customer Churn Prediction

Customer churning, which is also referred to as customer attrition or customer defection, is

defined as the loss or outflow of customers from the customer base [25]. In saturated markets,

there are limited opportunities to attract new customers, and hence, retaining existing customers is

considered essential to maintaining profitability. In the telecommunications industry, it is estimated

that attracting a new customer costs five to six times more than retaining an existing customer

[8, 22, 26]. Established customers are more profitable due to the lower costs required to serve them,

and a sense of brand loyalty they have developed over time renders them less likely to churn. Loyal

customers tend to be satisfied customers who also serve as word-of-mouth advertisers, referring

new customers to a given company. In the context of a financial institution as described in the

case study given in Section 4, a definition of churning is naturally present in the data, i.e., contract

termination.

Churning is typically addressed by developing a prediction model, i.e., a classification model

such as a logistic regression or a decision tree model. Such a model estimates for each customer the

probability for a customer to churn during a subsequent period of time. Then, it is straightforward

to o↵er customers presenting the highest churn probability with an incentive, e.g., a discount or

another promotional o↵er, to encourage them to extend their contracts or to keep their accounts

active. In other words, customers who are susceptible to churn can be targeted through a retention

campaign. Accurate predictions are perhaps the most apparent goal of developing a customer churn

4

Page 7: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

prediction model, but determining reasons for (or at least indicators of) churning is also invaluable

to a company. Comprehensible models can o↵er novel insight into correlations between customer

behavior and the propensity to churn [7], allowing management teams to address factors leading to

churning and to target the customers before they decide to churn.

Numerous classification techniques have been adopted for churn prediction, including traditional

statistical methods, such as logistic regression [27, 28], and non-parametric statistical models, such

as k-nearest neighbor models [29], decision trees [30, 31], ensemble methods [5, 32], support vector

machines [33–35] and neural networks [22, 36, 37]. Additionally, social network analysis has been

successfully adopted to predict customer churning [6, 26, 38] in addition to survival analysis, which

can be used to estimate the timing of customer churning. These analyses focus on the profitability

of a customer’s lifetime rather than on a single moment in time [39, 40]. For an extensive literature

review on customer churn prediction modeling, one may refer to Verbeke et al. [7]. The results of

an extensive benchmarking experiment are reported in Verbeke et al. [22], confirming the no-free-

lunch theorem in application to customer churn prediction, with no modeling technique consistently

winning across the various datasets. Recent work on customer churn prediction is covered in [6, 41–

44].

2.2. Uplift Modeling

In Section 2.2.1, a brief introduction of uplift modeling is provided. In Section 2.2.2, an overview

of the most prominent uplift modeling techniques is presented. Finally, in Section 2.2.3, evaluation

measures for assessing the performance of uplift models are discussed.

2.2.1. Definition

Generally speaking, uplift modeling aims to establish the net e↵ect of applying a treatment to an

outcome. When adopted for customer relationship management and, more specifically, for response

modeling, uplift models are developed to di↵erentiate between customers who respond favorably

as a result of being targeted with a campaign, i.e., being treated, and customers who respond

favorably on their own accord regardless of being targeted with a campaign or not. Note that the

outcome, i.e., response, may mean that a customer begins or continues to purchase a product or

service in the case of acquisition and retention modeling, respectively, or that a customer purchases

more or additional products or services in the case of up-sell or cross-sell modeling, respectively.

Conceptually, a customer base can be divided into four categories along two dimensions, as

shown in Figure 1[1, 45]:

5

Page 8: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

1. Sure Things. Customers who would always respond. Targeting Sure Things does not generate

additional returns but does generate additional costs, i.e., the fixed costs of contacting a customer

and possibly a cost related to a financial incentive o↵ered to targeted customers.

2. Lost Causes. Customers who would never respond (regardless of which campaign is used). Lost

Causes will not generate additional revenues, yet they do generate additional costs, although

these are lower than the costs of Sure Things. Lost Causes do not take advantage of financial

incentives o↵ered, which are an additional cost that we do take into account for Sure Things.

3. Do-Not-Disturbs. Customers who would not respond only because they are exposed to a cam-

paign. They will respond when not targeted but will not respond when they are. For example,

populations targeted for retention e↵orts can have an adverse reaction, for example, withdrawing

from the delivered product or service. Including Do-Not-Disturbs in a campaign thus generates

no additional revenues but comes with considerable additional costs.

4. Persuadables. Customers who respond only because they have been exposed to a campaign.

They respond only when contacted and cause a campaign to generate additional revenues, and

as such, a net profit after the subtraction of costs is generated by including other types of

customers.

Figure 1: The four theoretical classes.

The aim of uplift modeling is to allow for the targeting of Persuadables while avoiding Do-not-

Disturbs. From the perspective of a retention campaign, the last category is sometimes referred to

as sleeping dogs since, as long as these customers are not disturbed, they will continue to provide

benefits. Note that this classification is campaign dependent. It is possible for a customer to be a

Lost Cause when a campaign o↵ers a 5% discount for a next purchase, whereas that same customer

is a Persuadable when a campaign o↵ers a 20% discount. In others words, the classification is

dependent on the treatment given when all customers are treated similarly. In general, uplift

modeling involves determining optimal settings for control variables such as a dummy treatment

variable denoting whether a customer is targeted with a campaign to optimize a result or e↵ect.

Although in most, not to say all, studies on uplift modeling for marketing applications, control

variables are typically dummy variables that indicate whether a customer is targeted or not, these

6

Page 9: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

control variables may also be continuous or multivalue categorical variables, e.g., the discount or

contact channel. Clearly, uplift modeling may have applications to various settings and to many

di↵erent purposes. In this article, we focus on the goal of customer retention.

Uplift modeling for customer retention has been documented in relatively few cases. Radcli↵e

and Simpson [20] applied uplift modeling to two retention campaigns in telecommunications. One

campaign was highly e↵ective and profitable, whereas the other was counter-productive and incurred

losses. However, both campaigns improved conditions in terms of reducing churn as a result of uplift

modeling. In Guelman et al. [21], the authors applied uplift modeling to an insurance setting.

Although the treatment almost had a neutral impact on retention for the entire sample, they found

that the impact of the treatment might have been di↵erent for specific subgroups of the customer

base. They reported that uplift modeling allowed them to predict the expected change in probability

for a customer to switch to another company when targeted by a campaign. To the best of our

knowledge, no cases presented in the literature report on the application of uplift modeling to the

context of a financial institution and to churning in reference to financial services.

We assume that a sample of customers is randomly divided into two groups defined as the

treatment group and control group. A customer is either in the treatment group, i.e., is influenced

by the campaign, or in the control group, i.e., is not influenced by the campaign. As a formal

definition, let X be a vector of inputs or predictor variables, X = {X1, ..., Xn}, and let Y be the

binary outcome variable, Y 2 {0, 1}, that responds favorably or not. Let the treatment variable T

denote whether a customer belongs to the treatment group, T = 1, or to the control group, T = 0.

P denotes the probability as estimated by the model. Uplift is then defined for customer i with

characteristics xi as the probability of responding favorably (i.e., yi = 1) when treated (i.e., for

ti = 1) minus the probability of responding favorably when not treated (i.e., for ti = 0):

U(xi) := P (yi = 1|xi; ti = 1)� P (yi = 1|xi; ti = 0) (1)

In essence, uplift is the di↵erence in outcome, e.g., customer behavior, resulting from a treatment.

Uplift modeling aims at estimating uplift as a function of treatment and customer characteristics.

2.2.2. Techniques

Uplift modeling techniques can be grouped into data preprocessing and data processing ap-

proaches. The first group adopts traditional predictive analytics in an adapted setup for learning

an uplift model, whereas the second group applies adapted predictive analytics in developing uplift

models. Table 1 shows the most prominent and frequently adopted approaches to uplift modeling.

Data preprocessing approaches. Data preprocessing approaches include transformation approaches,

which redefine a target variable, and approaches that allow one to estimate uplift by defining and

7

Page 10: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

Preprocessing

Transformation [46, 47]

Variable Selection Procedure [48, 49]

Data processing

Two-Model Approach [50, 51]

Direct Estimation [52–54]

Table 1: Most frequently cited uplift modeling approaches.

selecting additional predictor variables.

The first group of data preprocessing approaches defines a transformed target variable that is

estimated. A customer cannot be assigned to any of the four groups shown in Figure 1, as this

information is unavailable and cannot be retrieved. However, we do know whether a customer

formed part of the treatment or control group and whether a customer responded or not. Hence,

customers can be assigned to any of the following four groups: treatment responders, treatment

non-responders, control responders and control non-responders. Techniques such as Lai’s approach

[46, 47] and pessimistic uplift modeling [55] make use of these four groups to define a transformed

target variable and as such transform the uplift modeling problem into a binary classification

problem. Any standard classification technique can be applied to this problem to yield an uplift

model.

The second group of data preprocessing approaches extends the set of predictor variables of the

model to allow for the estimation of uplift. In Kane et al. [47], Lo [48], an uplift modeling approach

that groups the treatment and control group into a single sample for response model estimation

is proposed. A dummy variable is introduced to denote the group of origin for each customer.

A model is then developed from the original predictor variables, the added dummy variable and

interaction variables between the predictor and dummy variables. Subsequently, any predictive

modeling approach can be adopted with this setup yielding an uplift model.

Data processing approaches. Among the data processing approaches, further di↵erentiations can

be made between indirect and direct estimation approaches.

Indirect estimation approaches include the two-model or naive approach, which is a simple and

intuitive approach to uplift modeling. Two separate predictive models can be identified: one for

the treatment group, MT , and one for the control group, MC with both estimating the probability

of a given response. The aggregated uplift model, MU , then subtracts the response probabilities

8

Page 11: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

resulting from both models to find the uplift:

MU = MT �MC . (2)

This approach has the benefit of being straightforward to implement, and similar to both data pre-

processing approaches, it allows one to adopt standard predictive modeling approaches. However,

the approach only appears to apply to the simplest of cases [50, 51]. As the main disadvantage of

the two models, they are built independent of one another; as such, they are not necessarily aligned

in terms of the predictor variables included, and the errors of independent estimates can reinforce

one another, generating significant errors in uplift estimates [53].

Alternatively, uplift can be directly modeled. Given the group-based nature of the uplift mod-

eling problem, the most frequently adopted direct estimation approaches are tree-based methods

that subsequently split the population into smaller segments. Uplift tree approaches are adapted

from well-known algorithms such as classification and regression trees (CART) [56] or chi-square

automatic interaction detection (CHAID) methods [57] applying modified splitting criteria and

pruning approaches. Examples of tree-based uplift modeling approaches include the significance-

based uplift trees proposed in Radcli↵e and Surry [53], decision trees making use of information

theory-inspired splitting criteria presented in Rzepakowski and Jaroszewicz [54], and uplift random

forests and causal conditional trees introduced in Guelman et al. [58].

2.2.3. Evaluation

Despite its clear potential to improve upon predictive modeling outcomes, uplift modeling su↵ers

from a lack of intuitive evaluation measures for assessing the performance of a model either in an

absolute sense or relative to other models. In the literature on uplift modeling, either charts are

used [48, 51] or an adapted version of the Gini coe�cient is used, i.e., the Qini coe�cient [47, 52].

In predictive modeling, evaluation metrics typically assess the error of point-wise estimates made

by a model on each observation for a hold-out test set by comparing observed and actual outcomes

and by summarizing observed errors. However, in uplift modeling, the actual outcome estimated,

i.e., uplift, is unobserved. As a customer cannot occupy both the treatment and control group,

i.e., cannot be treated and not-treated simultaneously, uplift (or, as indicated above, the group

shown in Figure 1 to which a customer belongs) cannot be observed for an individual customer.

Therefore, evaluation measures adopted in predictive modeling cannot be used. Instead, uplift can

be observed and uplift estimates can be evaluated by comparing di↵erences in the behaviors of

equivalent subgroups of the treatment and control groups [53].

The performance of an uplift model can be visualized by plotting the cumulative di↵erence in

response rates between treatment and control groups as a function of the selected fraction x of

9

Page 12: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

customers ranked by the uplift model from high to low values of estimated uplift. This curve is

referred to as the cumulative uplift, as cumulative incremental gains, or as the Qini curve [52].

The cumulative di↵erence in the response rate is measured as the absolute or relative number of

additional favorable responders, i.e., respectively expressed as the additional number in terms of

the number of favorable responders or the fraction of the total population. Note that performance

is evaluated by comparing groups of observations rather than individual observations. An example

is provided in Figure 2.

Figure 2: Incremental gains or Qini curve.

The Qini metric is a measure related to the Qini curve. It measures the area between the Qini

curve of the uplift model and the Qini curve of the baseline random model (see Figure 2). The

measure is an adapted version of the Gini metric, which in turn is related to the Gini curve (or the

cumulative gains curve) [52].

Although uplift models are developed and adopted to enhance the e�ciency and returns of

retention campaigns, few articles assess the costs and benefits of applying uplift modeling. In

Hansotia and Rukstales [59], the authors compute the incremental return on investment at the

gross margin level. These gross profits are then considered as a contribution to the overhead and to

net profits [59]. In Radcli↵e [52], the incremental profit is calculated by multiplying the incremental

response rate by the total profit. In the next section, we analyze involved costs and benefits and

develop a profit-driven approach to evaluating customer churn uplift models.

3. Maximum Profit Measure

The first part of this section discusses the Maximum Profit measure, as introduced in Verbeke

et al. [22]. In the second part, we extend the Maximum Profit measure for evaluating customer

10

Page 13: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

churn uplift models to compare customer churn prediction and uplift models in Section 4.

3.1. Customer churn prediction models

To maximize the e�ciency and returns of a retention campaign, typically, a limited fraction of

customers is targeted and given an incentive to remain loyal. Therefore, customer churn prediction

models are often evaluated using, for instance, the top-decile lift measure, which only accounts

for the performance of the model regarding the top 10% of customers with the highest predicted

probabilities of churning. Recently, Verbeke et al. [22] demonstrated that from a profit-centric

point of view, using the top decile lift can be expected to result in sub-optimal model selection.

Instead, the maximum profit (MP) measure is proposed, which calculates the profit generated when

considering the optimal fraction of top-ranked customers using a model for a retention campaign.

In essence, this measure evaluates a customer churn prediction model at the cuto↵ leading to the

maximum profit rather than at an arbitrary cuto↵ such as 10%. Performance is expressed as the

profit in monetary units that can be achieved by adopting the model for selecting customers to be

targeted in a retention campaign. This, as shown by the authors, can yield a significant increase

in profits relative to adopting statistical measures and to selecting a fixed fraction of customers to

be targeted in an arbitrary or expert-based manner [22].

To calculate profits generated from a retention campaign, we analyze the dynamic process of

customer flows in a company (Figure 3). The process involves customers entering by subscribing

to the services of an operator and then leaving by churning. To prevent customers from churning,

retention campaigns can be established with the goal of retaining customers.

A customer churn prediction model allows one to rank customers based on their probability of

churning from high to low. This subsequently allows one to select and target customers with the

highest probability of churning from a campaign. The profits of a retention campaign can then be

formulated as [27]:

⇧ = N↵[��(b� ccontact � cincentive) + �(1� �)(�ccontact)

+ (1� �)(�ccontact � cincentive)]

�A

(3)

with ⇧ denoting the profit generated by the campaign, N denoting the number of customers included

in the customer base, ↵ denoting the fraction of the customer base targeted by the retention

campaign and o↵ered an incentive to remain loyal, � denoting the fraction of true would-be churners

of customers targeted by the retention campaign, � denoting the fraction of targeted would-be

churners deciding to remain due to incentives (i.e., the success rate of incentives), b denoting the

benefits of the retained customers, ccontact denoting the cost of contacting a customer to o↵er him

11

Page 14: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

Figure 3: Visual representation of Neslin et al. [27]’s formula. Colors indicate matching parts of the formula and

schematics.

or her the incentive, cincentive denoting the cost of the incentive to the firm when a customer accepts

and stays and A denoting the fixed administrative costs of running the churn management program.

The profit formula can be divided into five parts. We highlight each part below and in the

visual representation of the formula given in Figure 3:

(a) N↵ denotes that the costs and profits of a retention campaign are solely related to customers

targeted by the campaign (with the exception of A).

(b) ��(b� ccontact � cincentive) denotes the profits generated by the campaign, i.e., the reduction

in lost revenues minus the cost of the campaign b� ccontact � cincentive by retaining a fraction

� of would-be churners of the fraction of correctly identified would-be churners � included in

the campaign.

(c) �(1��)(�ccontact) reflects part of the costs of the campaign, i.e., the cost of including correctly

identified would-be churners who were not retained.

(d) (1� �)(�ccontact � cincentive) reflects part of the costs of the campaign, i.e., the cost resulting

from targeting non-churners through the campaign; these customers are expected to take

advantage of the incentive o↵ered to them through the retention campaign.

12

Page 15: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

(e) A reflects the fixed administrative cost that reduces the overall profitability of a retention

campaign.

As noted in Neslin et al. [27], � reflects the capacity of the predictive model to identify would-be

churners and can be expressed as:

� = ��0 (4)

with �0 denoting the fraction of all operator customers who will churn and � denoting the lift, i.e.,

how much more the fraction of customers targeted by the retention campaign is likely to churn

than all the operator’s customers. Rearranging the terms of Equation 3 leads to:

⇧ = N↵{[�b+ cincentive(1� �)]�0�� cincentive � ccontact}�A (5)

Neslin et al. [27] uses the direct link between lift and profitability as a means to motivate the use

of lift as a performance measure for evaluating customer churn prediction models. Verbeke et al.

[22], however, shows that using the lift of an arbitrary cuto↵ as a performance measure may lead

to suboptimal model selection and, from a business perspective, a significant loss of profitability.

Therefore, the authors propose a profit-centric performance measure called the maximum profit

(MP) defined as:

MP = max↵

(⇧) (6)

To calculate the maximum profit measure, a pragmatic approach is typically adopted [23, 60, 61],

and two assumptions are made: (1) the retention rate � is independent of the included fraction of

customers ↵, and (2) the benefit of a retained customer, b, is independent of the included fraction of

customers ↵. These assumptions allow one to use a constant value for both � and b in Equation 5,

and given the lift curve of the classification model that represents the relation between the lift and

↵, the maximum of Equation 5 over ↵ can be calculated in a straightforward manner [22].

3.2. Customer churn uplift models

None of the existing evaluation metrics for assessing the performance of an uplift model take

into account the costs and benefits of adopting the uplift model or express the performance of an

uplift model in terms of profitability. To evaluate customer churn uplift models and to compare

CCP and CCU models, we apply the profit formula of Equation 3 to the uplift modeling case.

First, consider how uplift models are di↵erent from their predictive counterparts. Customer churn

prediction models only make use of treatment group data to build a model, whereas uplift models

consider both treatment and control group data in developing a model. Additionally, in evaluating

an uplift model, the profit measure should consider both the treatment and control group.

13

Page 16: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

Consider the left-hand side of Figure 4. The campaign-targeted population consists of three

groups: the fraction of true would-be churners (�), (1) some of whom accept the o↵er (� or the

blue part), whereas (2) others do not (1 � � or the red part). The third group includes (3) the

fraction of those who will not churn (1� � or the yellow part) who are erroneously included in the

campaign. � is the campaign retention rate, which is fixed and to be estimated but in principle

unknown.

Figure 4: On the left-hand side is a di↵erent visualization of Neslin’s formula focusing on the campaign-targeted

population. On the right-hand side is a translation of the uplift modeling scenario.

For a translation toward uplift modeling, consider the right-hand side of Figure 4. For the

treatment group, the same division of groups applied for CCP was used. The control group not

targeted by the campaign includes two groups: the fraction of would-be churners and the fraction

who will not churn. Although the addition of a control group generally adds an extra layer of

complexity, in terms of the profit formula, it also contributes more useful knowledge. The di↵erence

�C � �T is the value of the uplift or the reduction in the churn rate. Whereas � must be estimated

through CCP modeling, in CCU modeling, it is observed, rendering the formula an instrument that

is easier to use and generating more reliable estimates of profits. In CCU modeling, � represents the

uplift, i.e., the reduction or di↵erence in the churn rate between the two groups (i.e., � = �C ��T ).

Additionally, we can fine-tune the � parameter of CCP as �C and �T . In turn, Equation 3 can be

rewritten as follows:

⌃ = N↵[(�C � �T )(b� ccontact � cincentive) + �T (�ccontact)

+ (1� �C)(�ccontact � cincentive)]

�A

(7)

14

Page 17: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

Reformulating the above formula to place more emphasis on costs and benefits leads to:

⌃ = N↵[(�C � �T ) ⇤ b� ccontact � (1� �T ) ⇤ cincentive]�A (8)

�C and �T are the churn rates of the control and treatment groups, respectively, and (1� �T )

is the non-churn rate. As in CCP modeling, the goal is to maximize what we denominate the

maximum profit uplift (MPU):

MPU = max↵

(⌃) (9)

The MPU measure expresses the performance of a CCU model in terms of the profits gener-

ated per customer of the customer base when targeting the optimal fraction of customers ranked

according to the estimated uplift of the CCU model.

4. Experiments

The objective of the experiments presented in this section is to compare and contrast customer

churn prediction modeling and customer churn uplift modeling outcomes. In the first part of

this section, information on the experimental setup is provided, i.e., the dataset and experimental

methodology. In Section 4.2, the results of the experiments are presented, and these results are

discussed and analyzed in Section 4.2.3.

4.1. Experimental Design

4.1.1. Dataset

The dataset used to conduct the experiments was obtained from a financial institution. It

consists of records containing customer information, including a variable on churning and a variable

determining whether a customer was targeted by a retention campaign. Table 2 provides detailed

information on the dataset. The retention campaign was targeted at a treatment group, for which,

in the subsequent period, a churn rate of 13.25 % was observed. For the control group not targeted

by the retention campaign, a significantly higher churn rate of 25.52 % was observed. The overall

uplift achieved is thus equal to 12.27 %, showing that the campaign had a significant impact on

customer behavior. The dataset includes 162 variables, including socio-demographic information

and usage and activity data. Both the treatment and control groups are randomly split into training

and test sets, respectively including 2/3 and 1/3 of the records.

15

Page 18: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

The data

Type of organization Financial institution

Total observations 200 903

Total variables 162

Control group observations 118 809

Control group churn rate 25.52 %

Treatment group observations 82 094

Treatment group churn rate 13.25 %

Overall Uplift: 12.27 %

Table 2: Information on the dataset obtained from a private financial institution in Belgium.

4.1.2. Methodology

Unlike conventional predictive modeling, uplift modeling manages two groups, a treatment

group and a control group. In testing such techniques and measures, we consider two scenarios.

The first scenario tests the classic profit measure, MP (Equation 3), which considers the population

to be part of one group, preferably a group that has not had any prior contact with campaigns

before. Therefore, to test the MP, we only use the test set of the control group. The second scenario

assumes the existence of both a treatment group and a control group, and thus, the MPU metric is

applied to the results of test sets of both the treatment and control groups. This is also illustrated

in Figure 5.

Figure 5: Scenario 1 focuses solely on the control group, whereas Scenario 2 considers both the treatment and control

group.

Two modeling techniques are used to develop and compare CCP and CCU models, i.e., logistic

regression and random forests. Both techniques can be used in a straightforward manner to develop

predictive models and have been adapted for developing uplift models [21, 48]. The use of these two

16

Page 19: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

techniques in our experiments is motivated by their popularity. Logistic regression is the standard

predictive modeling approach used in industrial settings across various applications and is a typical

benchmark approach used in experimental studies and scientific research. Additionally, logistic

regression facilitates the interpretation of the resulting model and typically performs well [62, 63].

Random forests are state of the art in the field of business analytics, have broad applications to

industry settings and to scientific research, and typically achieve strong outcomes [62, 63]. Note

that a full scale benchmarking study of a broad range of predictive and uplift modeling techniques

for various datasets falls beyond the scope of this study.

To execute our experiments, the open source R-package was used [64]. For CCP, the adopted

implementations stem from the R-package Caret1. For CCU, adapted implementations were applied

to take into account and contrast customer behaviors of the treatment and control groups, although

the underlying learning approach used is similar to the counterparts for predictive modeling. For

logistic regression, Lo’s approach was applied [48] to our experiments to draw comparisons with

standard logistic regression, whereas uplift random forests proposed in Guelman et al. [21] were

applied via the ’uplift’ R-package 2.

4.2. Results and discussion

4.2.1. Scenario 1 - Evaluation with Maximum Profit

In this section, we present the results of our experiments on the first scenario as detailed above,

in which the maximum profit measure (Equation 6) is used to evaluate the performance of logistic

regression and random forest CCP and CCU models. Figure 6 shows the profit curves generated

from the experiments on scenario 1. As no information was provided by financial institutions

regarding actual values of the cost and benefit parameters of the MP measure, three di↵erent sets

of parameters were used to calculate the MP. The three sets of values used are based on values

reported in the literature [22, 27, 60, 65] and represent situations presenting low, medium and high

profitability resulting from retaining a customer. A full sensitivity analysis on the impact of the

adopted cost and benefit parameters falls beyond the scope of this article and is recognized as

a topic for further research. However, the results of experiments conducted on the three sets of

parameters are fully consistent, and thus, conclusions drawn from the experiments appear to hold

irrespective of the assumed parameter values.

The profit curves presented in Figure 6 show the profits generated per customer of the customer

base for a fraction x of customers targeted by the retention campaign. These values are ranked

1http://caret.r-forge.r-project.org2https://cran.r-project.org/web/packages/uplift/index.html

17

Page 20: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

per the estimated probability of churning for the CCP models (black profit curves) and ranked per

the estimated uplift of the CCU models (blue profit curves). Note that the profits generated per

customer of the customer base, rather than the total profit, are plotted because the profit generated

per customer is independent of the size of the customer base yet is still proportional to the total

profit. Therefore, the profit curves denote the optimal fraction of customers to be targeted by the

retention campaign, giving rise to the maximum profit.

4.2.2. Scenario 2 - Evaluation with Maximum Profit Uplift

Figure 7 shows the profit curves generated from the experiments following the second scenario

detailed in the previous section, with the results of the CCP and CCU models evaluated using

the novel maximum profit for uplift (MPU) modeling measure. The MPU measure includes both

treatment and control group observations in the test set of the evaluation.

4.2.3. Discussion

The evaluation based on the maximum profit measure for scenario 1 clearly shows that CCP

modeling yields higher profits than CCU modeling for logistic regression and random forests. The

profit curves shown in Figure 6 of the CCP models exceed the profit curves of the CCU models. We

conclude that CCP models are superior in predicting which customers will churn. This makes sense,

as CCP models are trained with the objective of predicting churn patterns, whereas uplift models

are designed to predict uplift events, i.e., the impact of a retention campaign on the propensity to

attrite. Many of the churners predicted by the CCP model may be customers who have made up

their minds and who have decided to churn, and they therefore cannot be retained when targeted by

a retention campaign. A successful uplift model will therefore rank these customers at the bottom

of the ranking, i.e., will estimate their uplift as close to zero as the impact of the retention campaign

will be nil. In other words, many churners identified by a CCP model can be expected to be reflected

as Sure Things as defined in Section 2.2.1. When using MP as the evaluation measure, it is natural

to see that the measure values CCP more than CCU because the MP assumes a constant retention

rate and as such does not acknowledge the true retention rate that can be observed when a control

group is present. The MP measure is additionally linearly related to the lift or to the number of

churners of the fraction of selected customers. As CCP models can be expected to detect more

churners than CCU models, this further contributes to the superiority of the MP of CCP models

over CCU models.

For the results of the experiments based on scenario 2, in using the MPU measure to evaluate

the performance of the CCP and CCU models, we find that the CCU models outperform the CCP

models. This can be attributed to the fact that the uplift models e↵ectively succeed in predicting

18

Page 21: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

Profit Per Customer − Classic Profit Measure − Log. Regression

Percentage captured

Prof

it pe

r Cus

tom

er

Classic CPUplift CP

(a) Logistic Regression

b = 200, cincentive = 10, ccontact = 1

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

Profit Per Customer − Classic Profit Measure − Random Forest

Percentage captured

Prof

it pe

r Cus

tom

er

Classic CPUplift CP

(b) Random Forests

b = 200, cincentive = 10, ccontact = 1

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

Profit Per Customer − Classic Profit Measure − Log. Regression

Percentage captured

Prof

it pe

r Cus

tom

er

Classic CPUplift CP

(c) Logistic Regression

b = 100, cincentive = 10, ccontact = 1

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

6Profit Per Customer − Classic Profit Measure − Random Forest

Percentage captured

Prof

it pe

r Cus

tom

er

Classic CPUplift CP

(d) Random Forests

b = 100, cincentive = 10, ccontact = 1

0.0 0.2 0.4 0.6 0.8 1.0

−30

−20

−10

−50

Profit Per Customer − Classic Profit Measure − Log. Regression

Percentage captured

Prof

it pe

r Cus

tom

er

Classic CPUplift CP

(e) Logistic Regression

b = 100, cincentive = 50, ccontact = 1

0.0 0.2 0.4 0.6 0.8 1.0

−30

−20

−10

−50

Profit Per Customer − Classic Profit Measure − Random Forest

Percentage captured

Prof

it pe

r Cus

tom

er

Classic CPUplift CP

(f) Random Forests

b = 100, cincentive = 50, ccontact = 1

Figure 6: Profit curves for logistic regression (left) and random forest (right) CCP (black curves) and CCU (blue

curves) models based on the first scenario using the MP measure with three sets of cost and benefit parameters.

19

Page 22: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

Profit Per Customer

Percentage captured

Prof

it pe

r Cus

tom

er

Classic CPUplift CP

(a) Logistic Regression

b = 200, cincentive = 10, ccontact = 1

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

2025

Profit Per Customer

Percentage captured

Prof

it pe

r Cus

tom

er

Classic CPUplift CP

(b) Random Forests

b = 200, cincentive = 10, ccontact = 1

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Profit Per Customer

Percentage captured

Prof

it pe

r Cus

tom

er

Classic CPUplift CP

(c) Logistic Regression

b = 100, cincentive = 10, ccontact = 1

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

Profit Per Customer

Percentage captured

Prof

it pe

r Cus

tom

er

Classic CPUplift CP

(d) Random Forests

b = 100, cincentive = 10, ccontact = 1

0.0 0.2 0.4 0.6 0.8 1.0

−30

−20

−10

−50

Profit Per Customer

Percentage captured

Prof

it pe

r Cus

tom

er

Classic CPUplift CP

(e) Logistic Regression

b = 100, cincentive = 50, ccontact = 1

0.0 0.2 0.4 0.6 0.8 1.0

−30

−20

−10

−50

Profit Per Customer

Percentage captured

Prof

it pe

r Cus

tom

er

Classic CPUplift CP

(f) Random Forests

b = 100, cincentive = 50, ccontact = 1

Figure 7: Profit curves for logistic regression (left) and random forest (right) CCP (black curves) and CCU (blue

curves) models for the second scenario using the MPU measure with three sets of cost and benefit parameters.

20

Page 23: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

(a) Logistic Regression (b) Random Forests

Figure 8: Churn rate as a function of the selected fraction of customers for CCP and CCU logistic regression (a) and

random forest (b) models.

uplift, which is accounted for in the MPU measure as discussed in Section 3.2. When ranking both

the treatment and control groups of the test set following the predicted probabilities of churning in

evaluating the CCP models and the estimated uplift in evaluating the CCU models, the observed

reduction in churn rates for the selected fraction x of customers can be used to measure the profits

generated from a retention campaign when selecting customers based on the CCP and CCU models.

As CCP models rank customers who are likely to churn but who cannot necessarily be retained

high on the list (which is exactly what the CCU model predicts), CCP models appear to be less

profitable than CCU models. The objective of CCU models is to ascribe high scores to customers

who are likely to both churn and be retained, and as such, they achieve higher degrees of uplift

and profitability.

Note that it is only possible to calculate the MPU measure when both a control group and

a treatment group are present, which, in traditional customer churn prediction setups, is not the

case. The MP measure still has use in such settings, although uplift modeling is clearly a superior

paradigm with respect to developing a data-driven customer retention program.

In addition, although of less importance here, our profit curves show that random forests gen-

erally perform better than logistic regressions. Random forest models can generates higher profits

per customer and higher profits from a smaller fraction of customers targeted by a retention cam-

paign. This result is no surprise and is fully in line with the results of benchmarking experiments

conducted across various business domains as reported in the literature [2, 22].

To further analyze and gain insight into the results of the experiments, we plot the churn rate

as a function of the fraction of customers selected x for the CCP and CCU logistic regression and

21

Page 24: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Cummulative Uplift

Percentage captured

Cum

mul

ative

Upl

ift

Classic CPUplift CP

(a) Logistic Regression

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Cummulative Uplift

Percentage captured

Cum

mul

ative

Upl

ift Classic CPUplift CP

(b) Random Forests

Figure 9: Cumulative uplift in the function of the fraction of customers selected for the CCP and CCU logistic

regression (a) and random forest (b) models.

random forest models in the left and right panels of Figure 8, respectively. These figures show that

the cumulative churn rate for the CCP models always exceeds the churn rate of the CCU model.

This indicates that the CCP model captures more churners than the CCU model for the same

fraction x of selected customers. We also plot the uplift in the function for customers selected x

for the CCP and CCU logistic regression and random forest models in the left and right panels

of Figure 9, respectively. Here, it can be seen that the CCU model achieves a stronger degree of

uplift, i.e., a stronger reduction of the churn rate for the treatment group than for the control group

relative to the CCP model.

Figures 8 and 9 confirm the above analysis and support the conclusion that CCP models tend

to detect numerous Sure Things, i.e., customers who decide to churn and who cannot be retained

by a campaign, whereas CCU models aim to and succeed at avoiding targeting Sure Things and

instead allow one to treat Persuadables to realize a stronger decrease in the churn rate and yield

an increased return. This conclusion holds for both the logistic regression and random forests

techniques. For uplift churn prediction modeling, this is also seems to be the case. Further research

may extend these experiments to the use of alternative predictive and uplift modeling techniques.

A next step in the analysis of our results involves an assessment of similarities in the rankings

of customers when scored using the various models developed. For this purpose, Spearman’s rank

order correlation and Kendall’s tau are calculated for the first and second scenarios and are reported

in Tables 3 and 4. We find that overall, the rankings resulting from the various models substantially

di↵er. For the first scenario, the strongest similarity is found between logistic regression models of

the CCP and CCU setups and between random forests of the CCP and CCU setups, both presenting

22

Page 25: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

the maximum observed value of Spearman’s rank order correlation of 0.52. The weakest similarities

are found between the CCP logistic regression model and the CCU random forest model, with a

Spearman’s rank order correlation value of 0.31 found for the first scenario and a value of only

0.17 found for the second scenario. Between the CCP random forest and CCU logistic regression

models, we find a Spearman’s rank order correlation of 0.23 for the first scenario and of 0.24 for the

second scenario. These model setups are the most dissimilar, as both di↵er in terms of predictive

versus uplift and logistic regression versus random forests model. For the second scenario, which

considers the control set, we find that the rankings of the CCP and CCU logistic regression models

become more similar, whereas similarities in the Spearman’s rank order correlations for rankings of

the CCP and CCU random forest model decrease to 0.35, which is equal to the correlation between

CCU logistic regression and CCU random forest models. Overall, these results confirm that CCU

and CCP models identify di↵erent customers to target through campaigns.

Scenario 1 - Spearman Scenario 1 - Kendall’s tau

SC1.CCP.GLM SC1.CCP.RF SC1.CCU.DTA SC1.CCU.RF SC1.CCP.GLM SC1.CCP.RF SC1.CCU.DTA SC1.CCU.RF

SC1.CCP.GLM 1 0.46 0.52 0.31 1 0.32 0.39 0.21

SC1.CCP.RF 0.46 1 0.23 0.52 0.32 1 0.15 0.37

SC1.CCU.DTA 0.52 0.23 1 0.39 0.39 0.15 1 0.27

SC1.CCU.RF 0.31 0.52 0.39 1 0.21 0.37 0.27 1

Table 3: Spearman’s rank order correlation and Kendall’s tau, scenario 1.

Scenario 2 - Spearman Scenario 2 - Kendall’s tau

SC2.CCP.GLM SC2.CCP.RF SC2.CCU.DTA SC2.CCU.RF SC2.CCP.GLM SC2.CCP.RF SC2.CCU.DTA SC2.CCU.RF

SC2.CCP.GLM 1 0.47 0.59 0.17 1 0.32 0.44 0.12

SC2.CCP.RF 0.47 1 0.24 0.35 0.32 1 0.16 0.25

SC2.CCU.DTA 0.59 0.24 1 0.35 0.44 0.16 1 0.24

SC2.CCU.RF 0.17 0.35 0.35 1 0.12 0.25 0.24 1

Table 4: Spearman’s rank order correlation and Kendall’s tau, scenario 2.

The observations of the previous analysis on the similarities in the rankings of customers are

confirmed when plotting the overlap in selected customers. Figure 10 shows the percentage of

overlap in customers when comparing di↵erent cuto↵s of the ranking between di↵erent techniques

and methodologies. For the first scenario, both the logistic regression and random forest models

present an overlap of 0.55 and 0.46 at 5% for the CCP and CCU settings, respectively (Figure 10a).

In comparing logistic regression and random forest models of each setting, we find a lower overlap

of 0.30 and 0.38 at 5%, respectively (Figure 10b), revealing clear di↵erences between the targeted

customers. For the second scenario, the logistic regression models show a 0.52 overlap between

the CCP and CCU setups at a cuto↵ of 5%. The largest di↵erence is found between random

forest models of the CCP and CCU setups with an overlap of 0.21 at 5% (Figure 10c). This latter

observation combined with the MPU results (Figure 7b) again clearly shows that the CCU setup

ranks customers more profitably than the CCP setup. Finally, Figure 10d shows overlaps of 0.31

and 0.29 at 5% for techniques of the CCP and CCU setups, respectively. This only confirms the

presence of a significant di↵erence in rankings when comparing logistic regression and random forest

23

Page 26: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

models.

0.00

0.25

0.50

0.75

1.00

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0Cutoff

Ove

rlap Overlap between

glm and dta

rf and urf

Scen 1 − Overlap − CCP vs CCU

(a) Overlap comparison across

methodologies of scenario 1.

0.00

0.25

0.50

0.75

1.00

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0Cutoff

Ove

rlap Overlap between

ccp: glm and rf

ccu: rf and urf

Scen 1 − Overlap − Technique comparison

(b) Overlap comparison across

techniques of scenario 1.

0.00

0.25

0.50

0.75

1.00

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0Cutoff

Ove

rlap Overlap between

glm and dta

rf and urf

Scen 2 − Overlap − CCP vs CCU

(c) Overlap comparison across

methodologies of scenario 2.

0.00

0.25

0.50

0.75

1.00

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0Cutoff

Ove

rlap Overlap between

ccp: glm and rf

ccu: rf and urf

Scen 2 − Overlap − Technique comparison

(d) Overlap comparison across

techniques of scenario 1.

Figure 10: The overlap in customers observed when comparing di↵erent cuto↵s of the ranking of setups (10a and

10c) and techniques (10b and 10d) for scenarios 1 and 2.

In previous studies on uplift modeling, the performance of uplift models has been reported to be

unstable, i.e., to heavily vary across test folds when adopting an n-fold cross validation setup [63].

Therefore, the experiments reported above were repeated five times to assess the impact of randomly

splitting the dataset into training and test sets. The results generated across the five repetitions

were found to be highly stable, supporting the validity of the presented findings.

24

Page 27: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

5. Conclusions and future research

In this article, we introduce a novel, profit-driven evaluation measure for assessing the per-

formance of customer churn uplift models. The measure extends the maximum profit measure

for customer churn prediction models and allows one to compare customer churn prediction and

customer churn uplift models. The measure assesses the performance of a customer churn uplift

model in terms of the profits per customer of a customer base generated when targeting the optimal

fraction of customers with the highest uplift scores for a retention campaign. The optimal fraction

of customers to be targeted is determined by maximizing the profits generated from a retention

campaign and is indirectly determined based on the costs and benefits related to a retention cam-

paign and on retained customers who are about to churn. The results of a real-life case study of the

financial industry are presented. An experimental study was developed and conducted to assess

the added value of prescriptive over predictive analytics. The results indicate that customer churn

uplift models outperform customer churn prediction models. Uplift models appear to be able to

identify so-called persuadables and to therefore yield higher returns than customer churn prediction

models that top-rank and thus select lost causes, i.e., customers who are about to churn but who

will not be retained when targeted through a retention campaign. These results strongly imply

that uplift modeling serves as an improved tool for practical customer churn modeling applications.

Future studies will focus on generalizing the newly introduced MPU measure, as there is a need

for powerful and application-oriented evaluation measures for assessing the performance of uplift

models. This study also opens doors to the development of profit-driven uplift modeling approaches

that aim at maximizing profitability.

References

[1] W. Verbeke, B. Baesens, C. Bravo, Profit Driven Business Analytics: A Practitioner’s Guide to Transforming

Big Data into Added Value, John Wiley & Sons, 2017.

[2] S. Lessmann, B. Baesens, H.-V. Seow, L. C. Thomas, Benchmarking state-of-the-art classification algorithms

for credit scoring: An update of research, Eur. J. Oper. Res. 247 (2015) 124–136.

[3] S. Maldonado, J. Perez, C. Bravo, Cost-based feature selection for support vector machines: An application in

credit scoring, Eur. J. Oper. Res. 261 (2017) 656–665.

[4] B. Baesens, V. Van Vlasselaer, W. Verbeke, Fraud Analytics Using Descriptive, Predictive, and Social Network

Techniques: A Guide to Data Science for Fraud Detection, John Wiley & Sons, 2015.

[5] K. Coussement, K. W. De Bock, Customer churn prediction in the online gambling industry: The beneficial

e↵ect of ensemble learning, J Bus Res 66 (2013) 1629–1636.

[6] M. Oskarsdottir, C. Bravo, W. Verbeke, C. Sarraute, B. Baesens, J. Vanthienen, Social network analytics for

churn prediction in telco: Model building, evaluation and network architecture, Expert Syst. Appl. 85 (2017)

204–220.

25

Page 28: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

[7] W. Verbeke, D. Martens, C. Mues, B. Baesens, Building comprehensible customer churn prediction models with

advanced rule induction techniques, Expert Syst. Appl. 38 (2011) 2354–2364.

[8] A. D. Athanassopoulos, Customer satisfaction cues to support market segmentation and explain switching

behavior, J Bus Res 47 (2000) 191 – 207.

[9] C. B. Bhattacharya, When customers are members: Customer retention in paid membership contexts, J Acad

Market Sci 26 (1998) 31.

[10] M. R. Colgate, P. J. Danaher, Implementing a customer relationship strategy: The asymmetric impact of poor

versus excellent execution, J Acad Market Sci 28 (2000) 375–387.

[11] E. Rasmusson, Complaints can build relationships., Sales & Marketing Management 151 (1999) 89–89.

[12] M. Colgate, K. Stewart, R. Kinsella, Customer defection: a study of the student market in ireland, International

Journal of Bank Marketing 14 (1996) 23–29.

[13] J. Ganesh, M. J. Arnold, K. E. Reynolds, Understanding the customer base of service providers: An examination

of the di↵erences between switchers and stayers, J Mark 64 (2000) 65–87.

[14] R. W. Mizerski, An attribution explanation of the disproportionate influence of unfavorable information, J

Consum Res 9 (1982) 301–310.

[15] F. F. Reichheld, Learning from customer defections (1996).

[16] Stum, D. L, A. Thiry, Building customer loyalty, Train Dev J 45 (1991) 34–36.

[17] V. A. Zeithaml, L. L. Berry, A. Parasuraman, The behavioral consequences of service quality, J Mark 60 (1996)

31–46.

[18] R. T. Rust, A. J. Zahorik, Customer satisfaction, customer retention, and market share, J Retailing 69 (1993)

193 – 215.

[19] D. V. den Poel, B. Lariviere, Customer attrition analysis for financial services using proportional hazard models,

Eur. J. Oper. Res. 157 (2004) 196 – 217. Smooth and Nonsmooth Optimization.

[20] N. J. Radcli↵e, R. Simpson, Identifying who can be saved and who will be driven away by retention activity.,

Journal of Telecommunications Management 1 (2008).

[21] L. Guelman, M. Guillen, A. M. Perez-Marin, Random forests for uplift modeling: An insurance customer

retention case, in: K. J. Engemann, A. M. Gil-Lafuente, J. Merigo (Eds.), Modeling and Simulation in

Engineering, Economics and Management, volume 115 of Lecture Notes in Business Information Processing,

Springer Berlin Heidelberg, 2012, pp. 123–133. URL: http://dx.doi.org/10.1007/978-3-642-30433-0_13.

doi:10.1007/978-3-642-30433-0_13.

[22] W. Verbeke, K. Dejaeger, D. Martens, J. Hur, B. Baesens, New insights into churn prediction in the telecom-

munication sector: A profit driven data mining approach, Eur. J. Oper. Res. 218 (2012) 211 – 229.

[23] T. Verbraken, C. Bravo, R. Weber, B. Baesens, Development and application of consumer credit scoring models

using profit-based classification measures, Eur. J. Oper. Res. 238 (2014) 505 – 513.

[24] F. Garrido, W. Verbeke, C. Bravo, A robust profit measure for binary classification model evaluation, Expert

Syst. Appl. 92 (2018) 154–160.

26

Page 29: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

[25] B. Baesens, Analytics in a big data world: The essential guide to data science and its applications, John Wiley

& Sons, 2014.

[26] W. Verbeke, D. Martens, B. Baesens, Social network analysis for customer churn prediction, Appl. Soft Comput.

14 (2014) 431–446.

[27] S. A. Neslin, S. Gupta, W. Kamakura, J. Lu, C. H. Mason, Defection detection: Measuring and understanding

the predictive accuracy of customer churn models, Journal of Marketing Research 43 (2006) 204–211.

[28] J. Burez, D. V. den Poel, Handling class imbalance in customer churn prediction, Expert Syst. Appl. 36 (2009)

4626 – 4636.

[29] P. Datta, B. Masand, D. R. Mani, B. Li, Automated cellular modeling and prediction on a large scale, Artificial

Intelligence Review 14 (2000) 485–502.

[30] C.-P. Wei, I.-T. Chiu, Turning telecommunications call details to churn prediction: a data mining approach,

Expert Syst. Appl. 23 (2002) 103 – 112.

[31] E. Lima, C. Mues, B. Baesens, Domain knowledge integration in data mining using decision tables: case studies

in churn prediction, J Oper Res Soc 60 (2009) 1096–1106.

[32] A. Lemmens, C. Croux, Bagging and boosting classification trees to predict churn, Journal of Marketing

Research 43 (2006) 276–286.

[33] S. Lessmann, S. Voß, A reference model for customer-centric data mining with support vector machines, Eur.

J. Oper. Res. 199 (2009) 520–530.

[34] Z.-Y. Chen, Z.-P. Fan, M. Sun, A hierarchical multiple kernel support vector machine for customer churn

prediction using longitudinal behavioral data, Eur. J. Oper. Res. 223 (2012) 461–472.

[35] J. Moeyersoms, D. Martens, Including high-cardinality attributes in predictive models: A case study in churn

prediction in the energy sector, Decis Support Syst 72 (2015) 72–81.

[36] W.-H. Au, K. C. C. Chan, X. Yao, A novel evolutionary data mining algorithm with applications to churn

prediction, IEEE Trans. Evol. Comput. 7 (2003) 532–545.

[37] S.-Y. Hung, D. C. Yen, H.-Y. Wang, Applying data mining to telecom churn management, Expert Syst. Appl.

31 (2006) 515 – 524.

[38] K. Dasgupta, R. Singh, B. Viswanathan, D. Chakraborty, S. Mukherjea, A. Nanavati, A. Joshi, Social ties and

their relevance to churn in mobile telecom networks, in: Proceedings of the 11th international conference on

Extending Database Technology: Advances in database technology, EDBT ’08, 2008, pp. 697–711.

[39] B. Baesens, T. Van Gestel, M. Stepanova, D. Van den Poel, J. Vanthienen, Neural network survival analysis for

personal loan data, J Oper Res Soc 56 (2005) 1089–1098.

[40] A. Backiel, B. Baesens, G. Claeskens, Predicting time-to-churn of prepaid mobile telephone customers using

social network analysis, J Oper Res Soc 67 (2016) 0.

[41] A. Keramati, R. Jafari-Marandi, M. Aliannejadi, I. Ahmadian, M. Moza↵ari, U. Abbasi, Improved churn

prediction in telecommunication industry using data mining techniques, Appl. Soft Comput. 24 (2014) 994 –

1012.

27

Page 30: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

[42] A. Amin, S. Anwar, A. Adnan, M. Nawaz, K. Alawfi, A. Hussain, K. Huang, Customer churn prediction in the

telecommunication sector using a rough set approach, Neurocomputing 237 (2017) 242 – 254.

[43] K. Coussement, S. Lessmann, G. Verstraeten, A comparative analysis of data preparation algorithms for customer

churn prediction: A case study in the telecommunication industry, Decis Support Syst 95 (2017) 27 – 36.

[44] B. Zhu, B. Baesens, A. Backiel, S. K. L. M. vanden Broucke, Benchmarking sampling techniques for imbalance

learning in churn prediction, J Oper Res Soc 69 (2018) 49–65.

[45] N. Radcli↵e, Generating incremental sales: Maximizing the incremental impact of cross-selling, up-selling and

deep-selling through uplift modelling, Stochastic Solutions Limited (2007).

[46] L. Lai, S. F. U. (Canada)., Influential Marketing: A New Direct Marketing Strategy Addressing the Existence

of Voluntary Buyers, Canadian theses on microfiche, Simon Fraser University (Canada), 2006. URL: https:

//books.google.be/books?id=5EvSuAAACAAJ.

[47] K. Kane, V. S. Y. Lo, J. Zheng, True-lift modeling: Comparison of methods, J Market Analytics 2 (2014)

218–238.

[48] V. S. Y. Lo, The true lift model: A novel data mining approach to response modeling in database marketing,

SIGKDD Explor. Newsl. 4 (2002) 78–86.

[49] K. Larsen, Generalized naive bayes classifiers, SIGKDD Explor. Newsl. 7 (2005) 76–81.

[50] D. M. Chickering, D. Heckerman, A decision theoretic approach to targeted advertising, in: Proceedings of the

Sixteenth Conference on Uncertainty in Artificial Intelligence, UAI’00, Morgan Kaufmann Publishers Inc., San

Francisco, CA, USA, 2000, pp. 82–88. URL: http://dl.acm.org/citation.cfm?id=2073946.2073957.

[51] B. Hansotia, B. Rukstales, Incremental value modeling, Journal of Interactive Marketing 16 (2001) 35–46.

[52] N. J. Radcli↵e, Using control groups to target on predicted lift: Building and assessing uplift models, Direct

Market J Direct Market Assoc Anal Council 1 (2007) 14–21.

[53] N. J. Radcli↵e, P. D. Surry, Real-world uplift modelling with significance-based uplift trees, White Paper

TR-2011-1, Stochastic Solutions (2011).

[54] P. Rzepakowski, S. Jaroszewicz, Decision trees for uplift modeling with single and multiple treatments, Knowl

Inf Syst 32 (2012) 303–327.

[55] A. Shaar, T. Abdessalem, O. Segard, Pessimistic uplift modeling, ACM SIGKDD (2016).

[56] L. Breiman, J. Friedman, C. J. Stone, R. A. Olshen, Classification and regression trees, CRC press, 1984.

[57] G. V. Kass, An exploratory technique for investigating large quantities of categorical data, Applied statistics

(1980) 119–127.

[58] L. Guelman, M. Guillen, A. M. Perez-Marın, Optimal personalized treatment rules for marketing interventions:

A review of methods, a new proposal, and an insurance case study, Working Papers 2014-06, Universitat de

Barcelona, UB Riskcenter, 2014. URL: http://ideas.repec.org/p/bak/wpaper/201406.html.

[59] B. Hansotia, B. Rukstales, Direct marketing for multichannel retailers: Issues, challenges and solutions, Journal

of Database Marketing 9 (2002) 259–266.

28

Page 31: ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which is also referred to as customer attrition or customer defection, is defined as

[60] T. Verbraken, W. Verbeke, B. Baesens, A novel profit maximizing metric for measuring classification performance

of customer churn prediction models, IEEE Trans Knowl Data Eng 25 (2013) 961–973.

[61] T. Verbraken, W. Verbeke, B. Baesens, Profit optimizing customer churn prediction with bayesian network

classifiers, Intell. Data Anal. 18 (2014) 3–24.

[62] K. Dejaeger, W. Verbeke, D. Martens, B. Baesens, Data mining techniques for software e↵ort estimation: A

comparative study, IEEE Trans. Softw. Eng. 38 (2012) 375–397.

[63] F. Devriendt, W. Verbeke, A literature survey and experimental evaluation of the state-of-the-art in uplift

modeling: a stepping stone towards the development of prescriptive analytics, Big Data (2018). Submitted in

December 2017.

[64] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Com-

puting, Vienna, Austria, 2013. URL: http://www.R-project.org/.

[65] J. Burez, D. Van den Poel, CRM at a pay-TV company: Using analytical models to reduce customer attrition

by targeted marketing for subscription services, Expert Syst. Appl. 32 (2007) 277–288.

29