ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which...
Transcript of ES-Working Paper no. 12 · 2018. 4. 30. · 2.1. Customer Churn Prediction Customer churning, which...
FACULTEIT ECONOMISCHE EN SOCIALE WETENSCHAPPEN & SOLVAY BUSINESS SCHOOL
ES-Working Paper no. 12 THE CASE FOR PRESCRIPTIVE ANALYTICS: A NOVEL MAXIMUM PROFIT MEASURE FOR EVALUATING AND COMPARING CUSTOMER CHURN PREDICTION AND UPLIFT MODELS
Floris Devriendt and Wouter Verbeke�
April 30th, 2018 Vrije Universiteit Brussel – Pleinlaan 2, 1050 Brussel – www.vub.be – [email protected] © Vrije Universiteit Brussel
This text may be downloaded for personal research purposes only. Any additional reproduction for other purposes, whether in hard copy or electronically, requires the consent of the author(s), editor(s). If cited or quoted, reference should be made to the full name of the author(s), editor(s), title, the working paper or other series, the year and the publisher. Printed in Belgium Vrije Universiteit Brussel Faculty of Economics, Social Sciences and Solvay Business School B-1050 Brussel Belgium www.vub.be
The case for prescriptive analytics: a novel maximum profit measure forevaluating and comparing customer churn prediction and uplift models
Floris Devriendta,⇤, Wouter Verbekea
aData Analytics Laboratory, Faculty of Economic and Social Sciences and Solvay Business School, Vrije UniversiteitBrussel, Pleinlaan 2, B-1050 Brussels, Belgium
Abstract
Prescriptive analytics and uplift modeling are receiving more attention from the business analyt-
ics research community and from industry as an alternative and improved paradigm of predictive
analytics that supports data-driven decision making. Although it has been shown in theory that
prescriptive analytics improves decision-making more than predictive analytics, no empirical evi-
dence has been presented in the literature on an elaborated application of both approaches that
allows for a fair comparison of predictive and uplift modeling. Such a comparison is in fact prohib-
ited by a lack of available evaluation measures that can be applied to predictive and uplift models.
Therefore, in this paper, we introduce a novel evaluation metric called the maximum profit uplift
measure that allows one to assess the performance of an uplift model in terms of the maximum
potential profit that can be achieved by adopting an uplift model. The measure is developed for
evaluating customer churn uplift models and for extending the existing maximum profit measure
for evaluating customer churn prediction models. Both measures are subsequently applied to a case
study to assess and compare the performance of customer churn prediction and uplift models. We
find that uplift modeling outperforms predictive modeling and allows one to enhance the profitabil-
ity of retention campaigns. The empirical results indicate that prescriptive analytics are superior
to predictive analytics in the development of customer retention campaigns.
Keywords: Analytics, Business applications, Prescriptive analytics, Uplift modeling, Customer
churn prediction, Customer retention
⇤Corresponding authorEmail addresses: [email protected] (Floris Devriendt), [email protected] (Wouter Verbeke)
Preprint submitted to European Journal of Information Sciences April 9, 2018
1. Introduction
The term business analytics is used as a catch-all term covering a wide variety of what essentially
are data-processing techniques. In its broadest sense, business analytics strongly overlaps with data
science, statistics, and related fields such as artificial intelligence (AI) and machine learning [1].
Analytics is used as a toolbox containing a variety of instruments and methodologies allowing one
to analyze data in support of evidence-based decision-making with the aim of enhancing e�ciency,
e�cacy, and, thus ultimately, profitability. Types of analytical tools, in increasing order, are
descriptive, predictive, and prescriptive analytics. While descriptive analytics o↵er insight into
current situations, predictive analytics allow one to explain complex relations between variables and
to predict future trends. As such, predictive analytics o↵er more uses than descriptive analytics.
Currently, prescriptive analytics are receiving more attention from practitioners and scientists in
that they add further value by allowing one to simulate the future as a function of control variables
to prescribe optimal settings for control variables. At the core of prescriptive analytics is uplift
modeling, which is introduced below. In the experiments reported in this article, the use and
performance of predictive and prescriptive analytics is thoroughly compared. Business analytics
is being applied to an increasingly diverse range of well-specified tasks across a broad variety of
industries. Popular examples include tasks related to credit scoring [2, 3], fraud detection [4], and
customer churn prediction [5, 6], the latter being the application of interest in this article.
Customer churn prediction models are designed to predict which customers are about to churn
and to accurately segment a customer base. This allows a company to target customers that are
most likely to churn during a retention marketing campaign, thus improving the e�cient use of
limited resources for such a campaign, i.e., the return on marketing investment (ROMI), while
reducing costs associated with churning [7]. Generally speaking, customer retention is profitable
to a company because (1) attracting new clients costs five to six times more than retaining exist-
ing customers [8–11]; (2) long-term customers generate more profits, tend to be less sensitive to
competitive marketing activities, tend to be less costly to serve, and may generate new referrals
through positive word-of-mouth processes, whereas dissatisfied customers might spread negative
word-of-mouth messages [12–17]; and (3) losing customers incurs opportunity costs due to a reduc-
tion in sales [18]. Therefore, a small improvement in customer retention can lead to a significant
increase in profits [19].
However, it has been reported that marketing actions undertaken to retain customers may
actually provoke the opposite behavior and may cause or motivate a customer to churn. As noted
in Radcli↵e and Simpson [20], churn risk is highly correlated with customer dissatisfaction, and
the goal in turn becomes to prevent a dissatisfied customer from actually leaving. Any attempt
made to contact a dissatisfied customer with the goal of retaining him or her can actually hasten
2
the process and provoke the customer to leave earlier than expected [20]. Therefore, it is necessary
to evaluate the e↵ectiveness of a retention campaign at the individual customer level. Predictive
models fail to di↵erentiate between customers who respond favorably (i.e., who do not churn) to a
campaign and customers who respond favorably on their own accord regardless of a campaign (i.e.,
who would not have churned in any case and who were not targeted by a campaign).
To address this shortcoming of predictive models, uplift modeling has recently been proposed
as an alternative means of identifying customers who are likely to be persuaded by a promotional
marketing campaign, rather than predicting whether customers are likely to respond to a promo-
tional marketing campaign (which may or may not be the result of the campaign). Uplift modeling
can be applied to identify customers who are likely to be retained through a retention campaign
as an alternative to predicting whether customers are likely to churn [21]. More precisely, uplift
modeling aims at establishing the net di↵erence in customer behavior resulting from a specific treat-
ment a↵orded to customers, e.g., a reduction in the likelihood to churn with retention campaign
targeting.
In this paper we aim to contrast customer churn prediction (CCP) and customer churn uplift
(CCU) modeling for customer retention by comparing their performance when applied to an ex-
perimental case study of the financial industry. To compare the performance of these approaches,
a common evaluation procedure is applied. However, given the di↵erent forms of output that these
models produce, to evaluate prediction and uplift models, di↵erent performance measures are used.
In evaluating classification models and, more specifically, CCP models, the receiver operating char-
acteristic (ROC) curve or lift curve are typically used. Performance can be expressed as the area
under the ROC curve, as the top decile lift or as the (expected) maximum profit. In evaluating
uplift models, the Qini curve and uplift per decile plots are typically used. Performance is typically
reported in terms of the Qini index or top decile uplift. As the goal of customer churn modeling is
to maximize ROMI, in Verbeke et al. [22], the authors introduce the maximum profit (MP) measure
for evaluating CCP models. The MP measure calculates profit generated when considering the op-
timal fraction of top-ranked customers according to the CCP model of a retention campaign. The
MP measure allows one to determine the optimal model and fraction of customers to include, yield-
ing a significant increase in profitability relative to that achieved when using statistical measures
[22–24].
In this article, we extend the MP measure to evaluate the performance of CCU models, and we
introduce the maximum profit for uplift (MPU) measure. Both the MP and MPU measure are then
used to compare the performance of CCP and CCU logistic regression and random forest models
through an experimental case study. Our main contributions are threefold:
1. We introduce an application of uplift modeling for customer retention.
2. We extend the maximum profit measure for evaluating uplift models.
3
3. We apply and compare CCP and CCUmodels through an experimental case study of the financial
industry.
This paper is structured as follows. In Section 2, we first introduce customer churn prediction
modeling before discussing uplift modeling as an alternative approach to predictive modeling. Then
in Section 3, the MP measure for CCP models is defined and extended for application to customer
churn uplift models. In Section 4, we describe the experimental design of the case study and then
discuss the results of our experiments. Finally, in Section 5, conclusions are given.
2. Literature
In Section 2.1, customer churn prediction is introduced along with current standard approaches
as described in the literature and adopted in industry. Then in Section 2.2, we describe uplift
modeling and discuss the most prominent uplift modeling techniques and performance measures
developed for evaluating uplift models.
2.1. Customer Churn Prediction
Customer churning, which is also referred to as customer attrition or customer defection, is
defined as the loss or outflow of customers from the customer base [25]. In saturated markets,
there are limited opportunities to attract new customers, and hence, retaining existing customers is
considered essential to maintaining profitability. In the telecommunications industry, it is estimated
that attracting a new customer costs five to six times more than retaining an existing customer
[8, 22, 26]. Established customers are more profitable due to the lower costs required to serve them,
and a sense of brand loyalty they have developed over time renders them less likely to churn. Loyal
customers tend to be satisfied customers who also serve as word-of-mouth advertisers, referring
new customers to a given company. In the context of a financial institution as described in the
case study given in Section 4, a definition of churning is naturally present in the data, i.e., contract
termination.
Churning is typically addressed by developing a prediction model, i.e., a classification model
such as a logistic regression or a decision tree model. Such a model estimates for each customer the
probability for a customer to churn during a subsequent period of time. Then, it is straightforward
to o↵er customers presenting the highest churn probability with an incentive, e.g., a discount or
another promotional o↵er, to encourage them to extend their contracts or to keep their accounts
active. In other words, customers who are susceptible to churn can be targeted through a retention
campaign. Accurate predictions are perhaps the most apparent goal of developing a customer churn
4
prediction model, but determining reasons for (or at least indicators of) churning is also invaluable
to a company. Comprehensible models can o↵er novel insight into correlations between customer
behavior and the propensity to churn [7], allowing management teams to address factors leading to
churning and to target the customers before they decide to churn.
Numerous classification techniques have been adopted for churn prediction, including traditional
statistical methods, such as logistic regression [27, 28], and non-parametric statistical models, such
as k-nearest neighbor models [29], decision trees [30, 31], ensemble methods [5, 32], support vector
machines [33–35] and neural networks [22, 36, 37]. Additionally, social network analysis has been
successfully adopted to predict customer churning [6, 26, 38] in addition to survival analysis, which
can be used to estimate the timing of customer churning. These analyses focus on the profitability
of a customer’s lifetime rather than on a single moment in time [39, 40]. For an extensive literature
review on customer churn prediction modeling, one may refer to Verbeke et al. [7]. The results of
an extensive benchmarking experiment are reported in Verbeke et al. [22], confirming the no-free-
lunch theorem in application to customer churn prediction, with no modeling technique consistently
winning across the various datasets. Recent work on customer churn prediction is covered in [6, 41–
44].
2.2. Uplift Modeling
In Section 2.2.1, a brief introduction of uplift modeling is provided. In Section 2.2.2, an overview
of the most prominent uplift modeling techniques is presented. Finally, in Section 2.2.3, evaluation
measures for assessing the performance of uplift models are discussed.
2.2.1. Definition
Generally speaking, uplift modeling aims to establish the net e↵ect of applying a treatment to an
outcome. When adopted for customer relationship management and, more specifically, for response
modeling, uplift models are developed to di↵erentiate between customers who respond favorably
as a result of being targeted with a campaign, i.e., being treated, and customers who respond
favorably on their own accord regardless of being targeted with a campaign or not. Note that the
outcome, i.e., response, may mean that a customer begins or continues to purchase a product or
service in the case of acquisition and retention modeling, respectively, or that a customer purchases
more or additional products or services in the case of up-sell or cross-sell modeling, respectively.
Conceptually, a customer base can be divided into four categories along two dimensions, as
shown in Figure 1[1, 45]:
5
1. Sure Things. Customers who would always respond. Targeting Sure Things does not generate
additional returns but does generate additional costs, i.e., the fixed costs of contacting a customer
and possibly a cost related to a financial incentive o↵ered to targeted customers.
2. Lost Causes. Customers who would never respond (regardless of which campaign is used). Lost
Causes will not generate additional revenues, yet they do generate additional costs, although
these are lower than the costs of Sure Things. Lost Causes do not take advantage of financial
incentives o↵ered, which are an additional cost that we do take into account for Sure Things.
3. Do-Not-Disturbs. Customers who would not respond only because they are exposed to a cam-
paign. They will respond when not targeted but will not respond when they are. For example,
populations targeted for retention e↵orts can have an adverse reaction, for example, withdrawing
from the delivered product or service. Including Do-Not-Disturbs in a campaign thus generates
no additional revenues but comes with considerable additional costs.
4. Persuadables. Customers who respond only because they have been exposed to a campaign.
They respond only when contacted and cause a campaign to generate additional revenues, and
as such, a net profit after the subtraction of costs is generated by including other types of
customers.
Figure 1: The four theoretical classes.
The aim of uplift modeling is to allow for the targeting of Persuadables while avoiding Do-not-
Disturbs. From the perspective of a retention campaign, the last category is sometimes referred to
as sleeping dogs since, as long as these customers are not disturbed, they will continue to provide
benefits. Note that this classification is campaign dependent. It is possible for a customer to be a
Lost Cause when a campaign o↵ers a 5% discount for a next purchase, whereas that same customer
is a Persuadable when a campaign o↵ers a 20% discount. In others words, the classification is
dependent on the treatment given when all customers are treated similarly. In general, uplift
modeling involves determining optimal settings for control variables such as a dummy treatment
variable denoting whether a customer is targeted with a campaign to optimize a result or e↵ect.
Although in most, not to say all, studies on uplift modeling for marketing applications, control
variables are typically dummy variables that indicate whether a customer is targeted or not, these
6
control variables may also be continuous or multivalue categorical variables, e.g., the discount or
contact channel. Clearly, uplift modeling may have applications to various settings and to many
di↵erent purposes. In this article, we focus on the goal of customer retention.
Uplift modeling for customer retention has been documented in relatively few cases. Radcli↵e
and Simpson [20] applied uplift modeling to two retention campaigns in telecommunications. One
campaign was highly e↵ective and profitable, whereas the other was counter-productive and incurred
losses. However, both campaigns improved conditions in terms of reducing churn as a result of uplift
modeling. In Guelman et al. [21], the authors applied uplift modeling to an insurance setting.
Although the treatment almost had a neutral impact on retention for the entire sample, they found
that the impact of the treatment might have been di↵erent for specific subgroups of the customer
base. They reported that uplift modeling allowed them to predict the expected change in probability
for a customer to switch to another company when targeted by a campaign. To the best of our
knowledge, no cases presented in the literature report on the application of uplift modeling to the
context of a financial institution and to churning in reference to financial services.
We assume that a sample of customers is randomly divided into two groups defined as the
treatment group and control group. A customer is either in the treatment group, i.e., is influenced
by the campaign, or in the control group, i.e., is not influenced by the campaign. As a formal
definition, let X be a vector of inputs or predictor variables, X = {X1, ..., Xn}, and let Y be the
binary outcome variable, Y 2 {0, 1}, that responds favorably or not. Let the treatment variable T
denote whether a customer belongs to the treatment group, T = 1, or to the control group, T = 0.
P denotes the probability as estimated by the model. Uplift is then defined for customer i with
characteristics xi as the probability of responding favorably (i.e., yi = 1) when treated (i.e., for
ti = 1) minus the probability of responding favorably when not treated (i.e., for ti = 0):
U(xi) := P (yi = 1|xi; ti = 1)� P (yi = 1|xi; ti = 0) (1)
In essence, uplift is the di↵erence in outcome, e.g., customer behavior, resulting from a treatment.
Uplift modeling aims at estimating uplift as a function of treatment and customer characteristics.
2.2.2. Techniques
Uplift modeling techniques can be grouped into data preprocessing and data processing ap-
proaches. The first group adopts traditional predictive analytics in an adapted setup for learning
an uplift model, whereas the second group applies adapted predictive analytics in developing uplift
models. Table 1 shows the most prominent and frequently adopted approaches to uplift modeling.
Data preprocessing approaches. Data preprocessing approaches include transformation approaches,
which redefine a target variable, and approaches that allow one to estimate uplift by defining and
7
Preprocessing
Transformation [46, 47]
Variable Selection Procedure [48, 49]
Data processing
Two-Model Approach [50, 51]
Direct Estimation [52–54]
Table 1: Most frequently cited uplift modeling approaches.
selecting additional predictor variables.
The first group of data preprocessing approaches defines a transformed target variable that is
estimated. A customer cannot be assigned to any of the four groups shown in Figure 1, as this
information is unavailable and cannot be retrieved. However, we do know whether a customer
formed part of the treatment or control group and whether a customer responded or not. Hence,
customers can be assigned to any of the following four groups: treatment responders, treatment
non-responders, control responders and control non-responders. Techniques such as Lai’s approach
[46, 47] and pessimistic uplift modeling [55] make use of these four groups to define a transformed
target variable and as such transform the uplift modeling problem into a binary classification
problem. Any standard classification technique can be applied to this problem to yield an uplift
model.
The second group of data preprocessing approaches extends the set of predictor variables of the
model to allow for the estimation of uplift. In Kane et al. [47], Lo [48], an uplift modeling approach
that groups the treatment and control group into a single sample for response model estimation
is proposed. A dummy variable is introduced to denote the group of origin for each customer.
A model is then developed from the original predictor variables, the added dummy variable and
interaction variables between the predictor and dummy variables. Subsequently, any predictive
modeling approach can be adopted with this setup yielding an uplift model.
Data processing approaches. Among the data processing approaches, further di↵erentiations can
be made between indirect and direct estimation approaches.
Indirect estimation approaches include the two-model or naive approach, which is a simple and
intuitive approach to uplift modeling. Two separate predictive models can be identified: one for
the treatment group, MT , and one for the control group, MC with both estimating the probability
of a given response. The aggregated uplift model, MU , then subtracts the response probabilities
8
resulting from both models to find the uplift:
MU = MT �MC . (2)
This approach has the benefit of being straightforward to implement, and similar to both data pre-
processing approaches, it allows one to adopt standard predictive modeling approaches. However,
the approach only appears to apply to the simplest of cases [50, 51]. As the main disadvantage of
the two models, they are built independent of one another; as such, they are not necessarily aligned
in terms of the predictor variables included, and the errors of independent estimates can reinforce
one another, generating significant errors in uplift estimates [53].
Alternatively, uplift can be directly modeled. Given the group-based nature of the uplift mod-
eling problem, the most frequently adopted direct estimation approaches are tree-based methods
that subsequently split the population into smaller segments. Uplift tree approaches are adapted
from well-known algorithms such as classification and regression trees (CART) [56] or chi-square
automatic interaction detection (CHAID) methods [57] applying modified splitting criteria and
pruning approaches. Examples of tree-based uplift modeling approaches include the significance-
based uplift trees proposed in Radcli↵e and Surry [53], decision trees making use of information
theory-inspired splitting criteria presented in Rzepakowski and Jaroszewicz [54], and uplift random
forests and causal conditional trees introduced in Guelman et al. [58].
2.2.3. Evaluation
Despite its clear potential to improve upon predictive modeling outcomes, uplift modeling su↵ers
from a lack of intuitive evaluation measures for assessing the performance of a model either in an
absolute sense or relative to other models. In the literature on uplift modeling, either charts are
used [48, 51] or an adapted version of the Gini coe�cient is used, i.e., the Qini coe�cient [47, 52].
In predictive modeling, evaluation metrics typically assess the error of point-wise estimates made
by a model on each observation for a hold-out test set by comparing observed and actual outcomes
and by summarizing observed errors. However, in uplift modeling, the actual outcome estimated,
i.e., uplift, is unobserved. As a customer cannot occupy both the treatment and control group,
i.e., cannot be treated and not-treated simultaneously, uplift (or, as indicated above, the group
shown in Figure 1 to which a customer belongs) cannot be observed for an individual customer.
Therefore, evaluation measures adopted in predictive modeling cannot be used. Instead, uplift can
be observed and uplift estimates can be evaluated by comparing di↵erences in the behaviors of
equivalent subgroups of the treatment and control groups [53].
The performance of an uplift model can be visualized by plotting the cumulative di↵erence in
response rates between treatment and control groups as a function of the selected fraction x of
9
customers ranked by the uplift model from high to low values of estimated uplift. This curve is
referred to as the cumulative uplift, as cumulative incremental gains, or as the Qini curve [52].
The cumulative di↵erence in the response rate is measured as the absolute or relative number of
additional favorable responders, i.e., respectively expressed as the additional number in terms of
the number of favorable responders or the fraction of the total population. Note that performance
is evaluated by comparing groups of observations rather than individual observations. An example
is provided in Figure 2.
Figure 2: Incremental gains or Qini curve.
The Qini metric is a measure related to the Qini curve. It measures the area between the Qini
curve of the uplift model and the Qini curve of the baseline random model (see Figure 2). The
measure is an adapted version of the Gini metric, which in turn is related to the Gini curve (or the
cumulative gains curve) [52].
Although uplift models are developed and adopted to enhance the e�ciency and returns of
retention campaigns, few articles assess the costs and benefits of applying uplift modeling. In
Hansotia and Rukstales [59], the authors compute the incremental return on investment at the
gross margin level. These gross profits are then considered as a contribution to the overhead and to
net profits [59]. In Radcli↵e [52], the incremental profit is calculated by multiplying the incremental
response rate by the total profit. In the next section, we analyze involved costs and benefits and
develop a profit-driven approach to evaluating customer churn uplift models.
3. Maximum Profit Measure
The first part of this section discusses the Maximum Profit measure, as introduced in Verbeke
et al. [22]. In the second part, we extend the Maximum Profit measure for evaluating customer
10
churn uplift models to compare customer churn prediction and uplift models in Section 4.
3.1. Customer churn prediction models
To maximize the e�ciency and returns of a retention campaign, typically, a limited fraction of
customers is targeted and given an incentive to remain loyal. Therefore, customer churn prediction
models are often evaluated using, for instance, the top-decile lift measure, which only accounts
for the performance of the model regarding the top 10% of customers with the highest predicted
probabilities of churning. Recently, Verbeke et al. [22] demonstrated that from a profit-centric
point of view, using the top decile lift can be expected to result in sub-optimal model selection.
Instead, the maximum profit (MP) measure is proposed, which calculates the profit generated when
considering the optimal fraction of top-ranked customers using a model for a retention campaign.
In essence, this measure evaluates a customer churn prediction model at the cuto↵ leading to the
maximum profit rather than at an arbitrary cuto↵ such as 10%. Performance is expressed as the
profit in monetary units that can be achieved by adopting the model for selecting customers to be
targeted in a retention campaign. This, as shown by the authors, can yield a significant increase
in profits relative to adopting statistical measures and to selecting a fixed fraction of customers to
be targeted in an arbitrary or expert-based manner [22].
To calculate profits generated from a retention campaign, we analyze the dynamic process of
customer flows in a company (Figure 3). The process involves customers entering by subscribing
to the services of an operator and then leaving by churning. To prevent customers from churning,
retention campaigns can be established with the goal of retaining customers.
A customer churn prediction model allows one to rank customers based on their probability of
churning from high to low. This subsequently allows one to select and target customers with the
highest probability of churning from a campaign. The profits of a retention campaign can then be
formulated as [27]:
⇧ = N↵[��(b� ccontact � cincentive) + �(1� �)(�ccontact)
+ (1� �)(�ccontact � cincentive)]
�A
(3)
with ⇧ denoting the profit generated by the campaign, N denoting the number of customers included
in the customer base, ↵ denoting the fraction of the customer base targeted by the retention
campaign and o↵ered an incentive to remain loyal, � denoting the fraction of true would-be churners
of customers targeted by the retention campaign, � denoting the fraction of targeted would-be
churners deciding to remain due to incentives (i.e., the success rate of incentives), b denoting the
benefits of the retained customers, ccontact denoting the cost of contacting a customer to o↵er him
11
Figure 3: Visual representation of Neslin et al. [27]’s formula. Colors indicate matching parts of the formula and
schematics.
or her the incentive, cincentive denoting the cost of the incentive to the firm when a customer accepts
and stays and A denoting the fixed administrative costs of running the churn management program.
The profit formula can be divided into five parts. We highlight each part below and in the
visual representation of the formula given in Figure 3:
(a) N↵ denotes that the costs and profits of a retention campaign are solely related to customers
targeted by the campaign (with the exception of A).
(b) ��(b� ccontact � cincentive) denotes the profits generated by the campaign, i.e., the reduction
in lost revenues minus the cost of the campaign b� ccontact � cincentive by retaining a fraction
� of would-be churners of the fraction of correctly identified would-be churners � included in
the campaign.
(c) �(1��)(�ccontact) reflects part of the costs of the campaign, i.e., the cost of including correctly
identified would-be churners who were not retained.
(d) (1� �)(�ccontact � cincentive) reflects part of the costs of the campaign, i.e., the cost resulting
from targeting non-churners through the campaign; these customers are expected to take
advantage of the incentive o↵ered to them through the retention campaign.
12
(e) A reflects the fixed administrative cost that reduces the overall profitability of a retention
campaign.
As noted in Neslin et al. [27], � reflects the capacity of the predictive model to identify would-be
churners and can be expressed as:
� = ��0 (4)
with �0 denoting the fraction of all operator customers who will churn and � denoting the lift, i.e.,
how much more the fraction of customers targeted by the retention campaign is likely to churn
than all the operator’s customers. Rearranging the terms of Equation 3 leads to:
⇧ = N↵{[�b+ cincentive(1� �)]�0�� cincentive � ccontact}�A (5)
Neslin et al. [27] uses the direct link between lift and profitability as a means to motivate the use
of lift as a performance measure for evaluating customer churn prediction models. Verbeke et al.
[22], however, shows that using the lift of an arbitrary cuto↵ as a performance measure may lead
to suboptimal model selection and, from a business perspective, a significant loss of profitability.
Therefore, the authors propose a profit-centric performance measure called the maximum profit
(MP) defined as:
MP = max↵
(⇧) (6)
To calculate the maximum profit measure, a pragmatic approach is typically adopted [23, 60, 61],
and two assumptions are made: (1) the retention rate � is independent of the included fraction of
customers ↵, and (2) the benefit of a retained customer, b, is independent of the included fraction of
customers ↵. These assumptions allow one to use a constant value for both � and b in Equation 5,
and given the lift curve of the classification model that represents the relation between the lift and
↵, the maximum of Equation 5 over ↵ can be calculated in a straightforward manner [22].
3.2. Customer churn uplift models
None of the existing evaluation metrics for assessing the performance of an uplift model take
into account the costs and benefits of adopting the uplift model or express the performance of an
uplift model in terms of profitability. To evaluate customer churn uplift models and to compare
CCP and CCU models, we apply the profit formula of Equation 3 to the uplift modeling case.
First, consider how uplift models are di↵erent from their predictive counterparts. Customer churn
prediction models only make use of treatment group data to build a model, whereas uplift models
consider both treatment and control group data in developing a model. Additionally, in evaluating
an uplift model, the profit measure should consider both the treatment and control group.
13
Consider the left-hand side of Figure 4. The campaign-targeted population consists of three
groups: the fraction of true would-be churners (�), (1) some of whom accept the o↵er (� or the
blue part), whereas (2) others do not (1 � � or the red part). The third group includes (3) the
fraction of those who will not churn (1� � or the yellow part) who are erroneously included in the
campaign. � is the campaign retention rate, which is fixed and to be estimated but in principle
unknown.
Figure 4: On the left-hand side is a di↵erent visualization of Neslin’s formula focusing on the campaign-targeted
population. On the right-hand side is a translation of the uplift modeling scenario.
For a translation toward uplift modeling, consider the right-hand side of Figure 4. For the
treatment group, the same division of groups applied for CCP was used. The control group not
targeted by the campaign includes two groups: the fraction of would-be churners and the fraction
who will not churn. Although the addition of a control group generally adds an extra layer of
complexity, in terms of the profit formula, it also contributes more useful knowledge. The di↵erence
�C � �T is the value of the uplift or the reduction in the churn rate. Whereas � must be estimated
through CCP modeling, in CCU modeling, it is observed, rendering the formula an instrument that
is easier to use and generating more reliable estimates of profits. In CCU modeling, � represents the
uplift, i.e., the reduction or di↵erence in the churn rate between the two groups (i.e., � = �C ��T ).
Additionally, we can fine-tune the � parameter of CCP as �C and �T . In turn, Equation 3 can be
rewritten as follows:
⌃ = N↵[(�C � �T )(b� ccontact � cincentive) + �T (�ccontact)
+ (1� �C)(�ccontact � cincentive)]
�A
(7)
14
Reformulating the above formula to place more emphasis on costs and benefits leads to:
⌃ = N↵[(�C � �T ) ⇤ b� ccontact � (1� �T ) ⇤ cincentive]�A (8)
�C and �T are the churn rates of the control and treatment groups, respectively, and (1� �T )
is the non-churn rate. As in CCP modeling, the goal is to maximize what we denominate the
maximum profit uplift (MPU):
MPU = max↵
(⌃) (9)
The MPU measure expresses the performance of a CCU model in terms of the profits gener-
ated per customer of the customer base when targeting the optimal fraction of customers ranked
according to the estimated uplift of the CCU model.
4. Experiments
The objective of the experiments presented in this section is to compare and contrast customer
churn prediction modeling and customer churn uplift modeling outcomes. In the first part of
this section, information on the experimental setup is provided, i.e., the dataset and experimental
methodology. In Section 4.2, the results of the experiments are presented, and these results are
discussed and analyzed in Section 4.2.3.
4.1. Experimental Design
4.1.1. Dataset
The dataset used to conduct the experiments was obtained from a financial institution. It
consists of records containing customer information, including a variable on churning and a variable
determining whether a customer was targeted by a retention campaign. Table 2 provides detailed
information on the dataset. The retention campaign was targeted at a treatment group, for which,
in the subsequent period, a churn rate of 13.25 % was observed. For the control group not targeted
by the retention campaign, a significantly higher churn rate of 25.52 % was observed. The overall
uplift achieved is thus equal to 12.27 %, showing that the campaign had a significant impact on
customer behavior. The dataset includes 162 variables, including socio-demographic information
and usage and activity data. Both the treatment and control groups are randomly split into training
and test sets, respectively including 2/3 and 1/3 of the records.
15
The data
Type of organization Financial institution
Total observations 200 903
Total variables 162
Control group observations 118 809
Control group churn rate 25.52 %
Treatment group observations 82 094
Treatment group churn rate 13.25 %
Overall Uplift: 12.27 %
Table 2: Information on the dataset obtained from a private financial institution in Belgium.
4.1.2. Methodology
Unlike conventional predictive modeling, uplift modeling manages two groups, a treatment
group and a control group. In testing such techniques and measures, we consider two scenarios.
The first scenario tests the classic profit measure, MP (Equation 3), which considers the population
to be part of one group, preferably a group that has not had any prior contact with campaigns
before. Therefore, to test the MP, we only use the test set of the control group. The second scenario
assumes the existence of both a treatment group and a control group, and thus, the MPU metric is
applied to the results of test sets of both the treatment and control groups. This is also illustrated
in Figure 5.
Figure 5: Scenario 1 focuses solely on the control group, whereas Scenario 2 considers both the treatment and control
group.
Two modeling techniques are used to develop and compare CCP and CCU models, i.e., logistic
regression and random forests. Both techniques can be used in a straightforward manner to develop
predictive models and have been adapted for developing uplift models [21, 48]. The use of these two
16
techniques in our experiments is motivated by their popularity. Logistic regression is the standard
predictive modeling approach used in industrial settings across various applications and is a typical
benchmark approach used in experimental studies and scientific research. Additionally, logistic
regression facilitates the interpretation of the resulting model and typically performs well [62, 63].
Random forests are state of the art in the field of business analytics, have broad applications to
industry settings and to scientific research, and typically achieve strong outcomes [62, 63]. Note
that a full scale benchmarking study of a broad range of predictive and uplift modeling techniques
for various datasets falls beyond the scope of this study.
To execute our experiments, the open source R-package was used [64]. For CCP, the adopted
implementations stem from the R-package Caret1. For CCU, adapted implementations were applied
to take into account and contrast customer behaviors of the treatment and control groups, although
the underlying learning approach used is similar to the counterparts for predictive modeling. For
logistic regression, Lo’s approach was applied [48] to our experiments to draw comparisons with
standard logistic regression, whereas uplift random forests proposed in Guelman et al. [21] were
applied via the ’uplift’ R-package 2.
4.2. Results and discussion
4.2.1. Scenario 1 - Evaluation with Maximum Profit
In this section, we present the results of our experiments on the first scenario as detailed above,
in which the maximum profit measure (Equation 6) is used to evaluate the performance of logistic
regression and random forest CCP and CCU models. Figure 6 shows the profit curves generated
from the experiments on scenario 1. As no information was provided by financial institutions
regarding actual values of the cost and benefit parameters of the MP measure, three di↵erent sets
of parameters were used to calculate the MP. The three sets of values used are based on values
reported in the literature [22, 27, 60, 65] and represent situations presenting low, medium and high
profitability resulting from retaining a customer. A full sensitivity analysis on the impact of the
adopted cost and benefit parameters falls beyond the scope of this article and is recognized as
a topic for further research. However, the results of experiments conducted on the three sets of
parameters are fully consistent, and thus, conclusions drawn from the experiments appear to hold
irrespective of the assumed parameter values.
The profit curves presented in Figure 6 show the profits generated per customer of the customer
base for a fraction x of customers targeted by the retention campaign. These values are ranked
1http://caret.r-forge.r-project.org2https://cran.r-project.org/web/packages/uplift/index.html
17
per the estimated probability of churning for the CCP models (black profit curves) and ranked per
the estimated uplift of the CCU models (blue profit curves). Note that the profits generated per
customer of the customer base, rather than the total profit, are plotted because the profit generated
per customer is independent of the size of the customer base yet is still proportional to the total
profit. Therefore, the profit curves denote the optimal fraction of customers to be targeted by the
retention campaign, giving rise to the maximum profit.
4.2.2. Scenario 2 - Evaluation with Maximum Profit Uplift
Figure 7 shows the profit curves generated from the experiments following the second scenario
detailed in the previous section, with the results of the CCP and CCU models evaluated using
the novel maximum profit for uplift (MPU) modeling measure. The MPU measure includes both
treatment and control group observations in the test set of the evaluation.
4.2.3. Discussion
The evaluation based on the maximum profit measure for scenario 1 clearly shows that CCP
modeling yields higher profits than CCU modeling for logistic regression and random forests. The
profit curves shown in Figure 6 of the CCP models exceed the profit curves of the CCU models. We
conclude that CCP models are superior in predicting which customers will churn. This makes sense,
as CCP models are trained with the objective of predicting churn patterns, whereas uplift models
are designed to predict uplift events, i.e., the impact of a retention campaign on the propensity to
attrite. Many of the churners predicted by the CCP model may be customers who have made up
their minds and who have decided to churn, and they therefore cannot be retained when targeted by
a retention campaign. A successful uplift model will therefore rank these customers at the bottom
of the ranking, i.e., will estimate their uplift as close to zero as the impact of the retention campaign
will be nil. In other words, many churners identified by a CCP model can be expected to be reflected
as Sure Things as defined in Section 2.2.1. When using MP as the evaluation measure, it is natural
to see that the measure values CCP more than CCU because the MP assumes a constant retention
rate and as such does not acknowledge the true retention rate that can be observed when a control
group is present. The MP measure is additionally linearly related to the lift or to the number of
churners of the fraction of selected customers. As CCP models can be expected to detect more
churners than CCU models, this further contributes to the superiority of the MP of CCP models
over CCU models.
For the results of the experiments based on scenario 2, in using the MPU measure to evaluate
the performance of the CCP and CCU models, we find that the CCU models outperform the CCP
models. This can be attributed to the fact that the uplift models e↵ectively succeed in predicting
18
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
Profit Per Customer − Classic Profit Measure − Log. Regression
Percentage captured
Prof
it pe
r Cus
tom
er
Classic CPUplift CP
(a) Logistic Regression
b = 200, cincentive = 10, ccontact = 1
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
Profit Per Customer − Classic Profit Measure − Random Forest
Percentage captured
Prof
it pe
r Cus
tom
er
Classic CPUplift CP
(b) Random Forests
b = 200, cincentive = 10, ccontact = 1
0.0 0.2 0.4 0.6 0.8 1.0
01
23
4
Profit Per Customer − Classic Profit Measure − Log. Regression
Percentage captured
Prof
it pe
r Cus
tom
er
Classic CPUplift CP
(c) Logistic Regression
b = 100, cincentive = 10, ccontact = 1
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
6Profit Per Customer − Classic Profit Measure − Random Forest
Percentage captured
Prof
it pe
r Cus
tom
er
Classic CPUplift CP
(d) Random Forests
b = 100, cincentive = 10, ccontact = 1
0.0 0.2 0.4 0.6 0.8 1.0
−30
−20
−10
−50
Profit Per Customer − Classic Profit Measure − Log. Regression
Percentage captured
Prof
it pe
r Cus
tom
er
Classic CPUplift CP
(e) Logistic Regression
b = 100, cincentive = 50, ccontact = 1
0.0 0.2 0.4 0.6 0.8 1.0
−30
−20
−10
−50
Profit Per Customer − Classic Profit Measure − Random Forest
Percentage captured
Prof
it pe
r Cus
tom
er
Classic CPUplift CP
(f) Random Forests
b = 100, cincentive = 50, ccontact = 1
Figure 6: Profit curves for logistic regression (left) and random forest (right) CCP (black curves) and CCU (blue
curves) models based on the first scenario using the MP measure with three sets of cost and benefit parameters.
19
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
Profit Per Customer
Percentage captured
Prof
it pe
r Cus
tom
er
Classic CPUplift CP
(a) Logistic Regression
b = 200, cincentive = 10, ccontact = 1
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
2025
Profit Per Customer
Percentage captured
Prof
it pe
r Cus
tom
er
Classic CPUplift CP
(b) Random Forests
b = 200, cincentive = 10, ccontact = 1
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Profit Per Customer
Percentage captured
Prof
it pe
r Cus
tom
er
Classic CPUplift CP
(c) Logistic Regression
b = 100, cincentive = 10, ccontact = 1
0.0 0.2 0.4 0.6 0.8 1.0
02
46
810
Profit Per Customer
Percentage captured
Prof
it pe
r Cus
tom
er
Classic CPUplift CP
(d) Random Forests
b = 100, cincentive = 10, ccontact = 1
0.0 0.2 0.4 0.6 0.8 1.0
−30
−20
−10
−50
Profit Per Customer
Percentage captured
Prof
it pe
r Cus
tom
er
Classic CPUplift CP
(e) Logistic Regression
b = 100, cincentive = 50, ccontact = 1
0.0 0.2 0.4 0.6 0.8 1.0
−30
−20
−10
−50
Profit Per Customer
Percentage captured
Prof
it pe
r Cus
tom
er
Classic CPUplift CP
(f) Random Forests
b = 100, cincentive = 50, ccontact = 1
Figure 7: Profit curves for logistic regression (left) and random forest (right) CCP (black curves) and CCU (blue
curves) models for the second scenario using the MPU measure with three sets of cost and benefit parameters.
20
(a) Logistic Regression (b) Random Forests
Figure 8: Churn rate as a function of the selected fraction of customers for CCP and CCU logistic regression (a) and
random forest (b) models.
uplift, which is accounted for in the MPU measure as discussed in Section 3.2. When ranking both
the treatment and control groups of the test set following the predicted probabilities of churning in
evaluating the CCP models and the estimated uplift in evaluating the CCU models, the observed
reduction in churn rates for the selected fraction x of customers can be used to measure the profits
generated from a retention campaign when selecting customers based on the CCP and CCU models.
As CCP models rank customers who are likely to churn but who cannot necessarily be retained
high on the list (which is exactly what the CCU model predicts), CCP models appear to be less
profitable than CCU models. The objective of CCU models is to ascribe high scores to customers
who are likely to both churn and be retained, and as such, they achieve higher degrees of uplift
and profitability.
Note that it is only possible to calculate the MPU measure when both a control group and
a treatment group are present, which, in traditional customer churn prediction setups, is not the
case. The MP measure still has use in such settings, although uplift modeling is clearly a superior
paradigm with respect to developing a data-driven customer retention program.
In addition, although of less importance here, our profit curves show that random forests gen-
erally perform better than logistic regressions. Random forest models can generates higher profits
per customer and higher profits from a smaller fraction of customers targeted by a retention cam-
paign. This result is no surprise and is fully in line with the results of benchmarking experiments
conducted across various business domains as reported in the literature [2, 22].
To further analyze and gain insight into the results of the experiments, we plot the churn rate
as a function of the fraction of customers selected x for the CCP and CCU logistic regression and
21
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Cummulative Uplift
Percentage captured
Cum
mul
ative
Upl
ift
Classic CPUplift CP
(a) Logistic Regression
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Cummulative Uplift
Percentage captured
Cum
mul
ative
Upl
ift Classic CPUplift CP
(b) Random Forests
Figure 9: Cumulative uplift in the function of the fraction of customers selected for the CCP and CCU logistic
regression (a) and random forest (b) models.
random forest models in the left and right panels of Figure 8, respectively. These figures show that
the cumulative churn rate for the CCP models always exceeds the churn rate of the CCU model.
This indicates that the CCP model captures more churners than the CCU model for the same
fraction x of selected customers. We also plot the uplift in the function for customers selected x
for the CCP and CCU logistic regression and random forest models in the left and right panels
of Figure 9, respectively. Here, it can be seen that the CCU model achieves a stronger degree of
uplift, i.e., a stronger reduction of the churn rate for the treatment group than for the control group
relative to the CCP model.
Figures 8 and 9 confirm the above analysis and support the conclusion that CCP models tend
to detect numerous Sure Things, i.e., customers who decide to churn and who cannot be retained
by a campaign, whereas CCU models aim to and succeed at avoiding targeting Sure Things and
instead allow one to treat Persuadables to realize a stronger decrease in the churn rate and yield
an increased return. This conclusion holds for both the logistic regression and random forests
techniques. For uplift churn prediction modeling, this is also seems to be the case. Further research
may extend these experiments to the use of alternative predictive and uplift modeling techniques.
A next step in the analysis of our results involves an assessment of similarities in the rankings
of customers when scored using the various models developed. For this purpose, Spearman’s rank
order correlation and Kendall’s tau are calculated for the first and second scenarios and are reported
in Tables 3 and 4. We find that overall, the rankings resulting from the various models substantially
di↵er. For the first scenario, the strongest similarity is found between logistic regression models of
the CCP and CCU setups and between random forests of the CCP and CCU setups, both presenting
22
the maximum observed value of Spearman’s rank order correlation of 0.52. The weakest similarities
are found between the CCP logistic regression model and the CCU random forest model, with a
Spearman’s rank order correlation value of 0.31 found for the first scenario and a value of only
0.17 found for the second scenario. Between the CCP random forest and CCU logistic regression
models, we find a Spearman’s rank order correlation of 0.23 for the first scenario and of 0.24 for the
second scenario. These model setups are the most dissimilar, as both di↵er in terms of predictive
versus uplift and logistic regression versus random forests model. For the second scenario, which
considers the control set, we find that the rankings of the CCP and CCU logistic regression models
become more similar, whereas similarities in the Spearman’s rank order correlations for rankings of
the CCP and CCU random forest model decrease to 0.35, which is equal to the correlation between
CCU logistic regression and CCU random forest models. Overall, these results confirm that CCU
and CCP models identify di↵erent customers to target through campaigns.
Scenario 1 - Spearman Scenario 1 - Kendall’s tau
SC1.CCP.GLM SC1.CCP.RF SC1.CCU.DTA SC1.CCU.RF SC1.CCP.GLM SC1.CCP.RF SC1.CCU.DTA SC1.CCU.RF
SC1.CCP.GLM 1 0.46 0.52 0.31 1 0.32 0.39 0.21
SC1.CCP.RF 0.46 1 0.23 0.52 0.32 1 0.15 0.37
SC1.CCU.DTA 0.52 0.23 1 0.39 0.39 0.15 1 0.27
SC1.CCU.RF 0.31 0.52 0.39 1 0.21 0.37 0.27 1
Table 3: Spearman’s rank order correlation and Kendall’s tau, scenario 1.
Scenario 2 - Spearman Scenario 2 - Kendall’s tau
SC2.CCP.GLM SC2.CCP.RF SC2.CCU.DTA SC2.CCU.RF SC2.CCP.GLM SC2.CCP.RF SC2.CCU.DTA SC2.CCU.RF
SC2.CCP.GLM 1 0.47 0.59 0.17 1 0.32 0.44 0.12
SC2.CCP.RF 0.47 1 0.24 0.35 0.32 1 0.16 0.25
SC2.CCU.DTA 0.59 0.24 1 0.35 0.44 0.16 1 0.24
SC2.CCU.RF 0.17 0.35 0.35 1 0.12 0.25 0.24 1
Table 4: Spearman’s rank order correlation and Kendall’s tau, scenario 2.
The observations of the previous analysis on the similarities in the rankings of customers are
confirmed when plotting the overlap in selected customers. Figure 10 shows the percentage of
overlap in customers when comparing di↵erent cuto↵s of the ranking between di↵erent techniques
and methodologies. For the first scenario, both the logistic regression and random forest models
present an overlap of 0.55 and 0.46 at 5% for the CCP and CCU settings, respectively (Figure 10a).
In comparing logistic regression and random forest models of each setting, we find a lower overlap
of 0.30 and 0.38 at 5%, respectively (Figure 10b), revealing clear di↵erences between the targeted
customers. For the second scenario, the logistic regression models show a 0.52 overlap between
the CCP and CCU setups at a cuto↵ of 5%. The largest di↵erence is found between random
forest models of the CCP and CCU setups with an overlap of 0.21 at 5% (Figure 10c). This latter
observation combined with the MPU results (Figure 7b) again clearly shows that the CCU setup
ranks customers more profitably than the CCP setup. Finally, Figure 10d shows overlaps of 0.31
and 0.29 at 5% for techniques of the CCP and CCU setups, respectively. This only confirms the
presence of a significant di↵erence in rankings when comparing logistic regression and random forest
23
models.
0.00
0.25
0.50
0.75
1.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0Cutoff
Ove
rlap Overlap between
glm and dta
rf and urf
Scen 1 − Overlap − CCP vs CCU
(a) Overlap comparison across
methodologies of scenario 1.
0.00
0.25
0.50
0.75
1.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0Cutoff
Ove
rlap Overlap between
ccp: glm and rf
ccu: rf and urf
Scen 1 − Overlap − Technique comparison
(b) Overlap comparison across
techniques of scenario 1.
0.00
0.25
0.50
0.75
1.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0Cutoff
Ove
rlap Overlap between
glm and dta
rf and urf
Scen 2 − Overlap − CCP vs CCU
(c) Overlap comparison across
methodologies of scenario 2.
0.00
0.25
0.50
0.75
1.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0Cutoff
Ove
rlap Overlap between
ccp: glm and rf
ccu: rf and urf
Scen 2 − Overlap − Technique comparison
(d) Overlap comparison across
techniques of scenario 1.
Figure 10: The overlap in customers observed when comparing di↵erent cuto↵s of the ranking of setups (10a and
10c) and techniques (10b and 10d) for scenarios 1 and 2.
In previous studies on uplift modeling, the performance of uplift models has been reported to be
unstable, i.e., to heavily vary across test folds when adopting an n-fold cross validation setup [63].
Therefore, the experiments reported above were repeated five times to assess the impact of randomly
splitting the dataset into training and test sets. The results generated across the five repetitions
were found to be highly stable, supporting the validity of the presented findings.
24
5. Conclusions and future research
In this article, we introduce a novel, profit-driven evaluation measure for assessing the per-
formance of customer churn uplift models. The measure extends the maximum profit measure
for customer churn prediction models and allows one to compare customer churn prediction and
customer churn uplift models. The measure assesses the performance of a customer churn uplift
model in terms of the profits per customer of a customer base generated when targeting the optimal
fraction of customers with the highest uplift scores for a retention campaign. The optimal fraction
of customers to be targeted is determined by maximizing the profits generated from a retention
campaign and is indirectly determined based on the costs and benefits related to a retention cam-
paign and on retained customers who are about to churn. The results of a real-life case study of the
financial industry are presented. An experimental study was developed and conducted to assess
the added value of prescriptive over predictive analytics. The results indicate that customer churn
uplift models outperform customer churn prediction models. Uplift models appear to be able to
identify so-called persuadables and to therefore yield higher returns than customer churn prediction
models that top-rank and thus select lost causes, i.e., customers who are about to churn but who
will not be retained when targeted through a retention campaign. These results strongly imply
that uplift modeling serves as an improved tool for practical customer churn modeling applications.
Future studies will focus on generalizing the newly introduced MPU measure, as there is a need
for powerful and application-oriented evaluation measures for assessing the performance of uplift
models. This study also opens doors to the development of profit-driven uplift modeling approaches
that aim at maximizing profitability.
References
[1] W. Verbeke, B. Baesens, C. Bravo, Profit Driven Business Analytics: A Practitioner’s Guide to Transforming
Big Data into Added Value, John Wiley & Sons, 2017.
[2] S. Lessmann, B. Baesens, H.-V. Seow, L. C. Thomas, Benchmarking state-of-the-art classification algorithms
for credit scoring: An update of research, Eur. J. Oper. Res. 247 (2015) 124–136.
[3] S. Maldonado, J. Perez, C. Bravo, Cost-based feature selection for support vector machines: An application in
credit scoring, Eur. J. Oper. Res. 261 (2017) 656–665.
[4] B. Baesens, V. Van Vlasselaer, W. Verbeke, Fraud Analytics Using Descriptive, Predictive, and Social Network
Techniques: A Guide to Data Science for Fraud Detection, John Wiley & Sons, 2015.
[5] K. Coussement, K. W. De Bock, Customer churn prediction in the online gambling industry: The beneficial
e↵ect of ensemble learning, J Bus Res 66 (2013) 1629–1636.
[6] M. Oskarsdottir, C. Bravo, W. Verbeke, C. Sarraute, B. Baesens, J. Vanthienen, Social network analytics for
churn prediction in telco: Model building, evaluation and network architecture, Expert Syst. Appl. 85 (2017)
204–220.
25
[7] W. Verbeke, D. Martens, C. Mues, B. Baesens, Building comprehensible customer churn prediction models with
advanced rule induction techniques, Expert Syst. Appl. 38 (2011) 2354–2364.
[8] A. D. Athanassopoulos, Customer satisfaction cues to support market segmentation and explain switching
behavior, J Bus Res 47 (2000) 191 – 207.
[9] C. B. Bhattacharya, When customers are members: Customer retention in paid membership contexts, J Acad
Market Sci 26 (1998) 31.
[10] M. R. Colgate, P. J. Danaher, Implementing a customer relationship strategy: The asymmetric impact of poor
versus excellent execution, J Acad Market Sci 28 (2000) 375–387.
[11] E. Rasmusson, Complaints can build relationships., Sales & Marketing Management 151 (1999) 89–89.
[12] M. Colgate, K. Stewart, R. Kinsella, Customer defection: a study of the student market in ireland, International
Journal of Bank Marketing 14 (1996) 23–29.
[13] J. Ganesh, M. J. Arnold, K. E. Reynolds, Understanding the customer base of service providers: An examination
of the di↵erences between switchers and stayers, J Mark 64 (2000) 65–87.
[14] R. W. Mizerski, An attribution explanation of the disproportionate influence of unfavorable information, J
Consum Res 9 (1982) 301–310.
[15] F. F. Reichheld, Learning from customer defections (1996).
[16] Stum, D. L, A. Thiry, Building customer loyalty, Train Dev J 45 (1991) 34–36.
[17] V. A. Zeithaml, L. L. Berry, A. Parasuraman, The behavioral consequences of service quality, J Mark 60 (1996)
31–46.
[18] R. T. Rust, A. J. Zahorik, Customer satisfaction, customer retention, and market share, J Retailing 69 (1993)
193 – 215.
[19] D. V. den Poel, B. Lariviere, Customer attrition analysis for financial services using proportional hazard models,
Eur. J. Oper. Res. 157 (2004) 196 – 217. Smooth and Nonsmooth Optimization.
[20] N. J. Radcli↵e, R. Simpson, Identifying who can be saved and who will be driven away by retention activity.,
Journal of Telecommunications Management 1 (2008).
[21] L. Guelman, M. Guillen, A. M. Perez-Marin, Random forests for uplift modeling: An insurance customer
retention case, in: K. J. Engemann, A. M. Gil-Lafuente, J. Merigo (Eds.), Modeling and Simulation in
Engineering, Economics and Management, volume 115 of Lecture Notes in Business Information Processing,
Springer Berlin Heidelberg, 2012, pp. 123–133. URL: http://dx.doi.org/10.1007/978-3-642-30433-0_13.
doi:10.1007/978-3-642-30433-0_13.
[22] W. Verbeke, K. Dejaeger, D. Martens, J. Hur, B. Baesens, New insights into churn prediction in the telecom-
munication sector: A profit driven data mining approach, Eur. J. Oper. Res. 218 (2012) 211 – 229.
[23] T. Verbraken, C. Bravo, R. Weber, B. Baesens, Development and application of consumer credit scoring models
using profit-based classification measures, Eur. J. Oper. Res. 238 (2014) 505 – 513.
[24] F. Garrido, W. Verbeke, C. Bravo, A robust profit measure for binary classification model evaluation, Expert
Syst. Appl. 92 (2018) 154–160.
26
[25] B. Baesens, Analytics in a big data world: The essential guide to data science and its applications, John Wiley
& Sons, 2014.
[26] W. Verbeke, D. Martens, B. Baesens, Social network analysis for customer churn prediction, Appl. Soft Comput.
14 (2014) 431–446.
[27] S. A. Neslin, S. Gupta, W. Kamakura, J. Lu, C. H. Mason, Defection detection: Measuring and understanding
the predictive accuracy of customer churn models, Journal of Marketing Research 43 (2006) 204–211.
[28] J. Burez, D. V. den Poel, Handling class imbalance in customer churn prediction, Expert Syst. Appl. 36 (2009)
4626 – 4636.
[29] P. Datta, B. Masand, D. R. Mani, B. Li, Automated cellular modeling and prediction on a large scale, Artificial
Intelligence Review 14 (2000) 485–502.
[30] C.-P. Wei, I.-T. Chiu, Turning telecommunications call details to churn prediction: a data mining approach,
Expert Syst. Appl. 23 (2002) 103 – 112.
[31] E. Lima, C. Mues, B. Baesens, Domain knowledge integration in data mining using decision tables: case studies
in churn prediction, J Oper Res Soc 60 (2009) 1096–1106.
[32] A. Lemmens, C. Croux, Bagging and boosting classification trees to predict churn, Journal of Marketing
Research 43 (2006) 276–286.
[33] S. Lessmann, S. Voß, A reference model for customer-centric data mining with support vector machines, Eur.
J. Oper. Res. 199 (2009) 520–530.
[34] Z.-Y. Chen, Z.-P. Fan, M. Sun, A hierarchical multiple kernel support vector machine for customer churn
prediction using longitudinal behavioral data, Eur. J. Oper. Res. 223 (2012) 461–472.
[35] J. Moeyersoms, D. Martens, Including high-cardinality attributes in predictive models: A case study in churn
prediction in the energy sector, Decis Support Syst 72 (2015) 72–81.
[36] W.-H. Au, K. C. C. Chan, X. Yao, A novel evolutionary data mining algorithm with applications to churn
prediction, IEEE Trans. Evol. Comput. 7 (2003) 532–545.
[37] S.-Y. Hung, D. C. Yen, H.-Y. Wang, Applying data mining to telecom churn management, Expert Syst. Appl.
31 (2006) 515 – 524.
[38] K. Dasgupta, R. Singh, B. Viswanathan, D. Chakraborty, S. Mukherjea, A. Nanavati, A. Joshi, Social ties and
their relevance to churn in mobile telecom networks, in: Proceedings of the 11th international conference on
Extending Database Technology: Advances in database technology, EDBT ’08, 2008, pp. 697–711.
[39] B. Baesens, T. Van Gestel, M. Stepanova, D. Van den Poel, J. Vanthienen, Neural network survival analysis for
personal loan data, J Oper Res Soc 56 (2005) 1089–1098.
[40] A. Backiel, B. Baesens, G. Claeskens, Predicting time-to-churn of prepaid mobile telephone customers using
social network analysis, J Oper Res Soc 67 (2016) 0.
[41] A. Keramati, R. Jafari-Marandi, M. Aliannejadi, I. Ahmadian, M. Moza↵ari, U. Abbasi, Improved churn
prediction in telecommunication industry using data mining techniques, Appl. Soft Comput. 24 (2014) 994 –
1012.
27
[42] A. Amin, S. Anwar, A. Adnan, M. Nawaz, K. Alawfi, A. Hussain, K. Huang, Customer churn prediction in the
telecommunication sector using a rough set approach, Neurocomputing 237 (2017) 242 – 254.
[43] K. Coussement, S. Lessmann, G. Verstraeten, A comparative analysis of data preparation algorithms for customer
churn prediction: A case study in the telecommunication industry, Decis Support Syst 95 (2017) 27 – 36.
[44] B. Zhu, B. Baesens, A. Backiel, S. K. L. M. vanden Broucke, Benchmarking sampling techniques for imbalance
learning in churn prediction, J Oper Res Soc 69 (2018) 49–65.
[45] N. Radcli↵e, Generating incremental sales: Maximizing the incremental impact of cross-selling, up-selling and
deep-selling through uplift modelling, Stochastic Solutions Limited (2007).
[46] L. Lai, S. F. U. (Canada)., Influential Marketing: A New Direct Marketing Strategy Addressing the Existence
of Voluntary Buyers, Canadian theses on microfiche, Simon Fraser University (Canada), 2006. URL: https:
//books.google.be/books?id=5EvSuAAACAAJ.
[47] K. Kane, V. S. Y. Lo, J. Zheng, True-lift modeling: Comparison of methods, J Market Analytics 2 (2014)
218–238.
[48] V. S. Y. Lo, The true lift model: A novel data mining approach to response modeling in database marketing,
SIGKDD Explor. Newsl. 4 (2002) 78–86.
[49] K. Larsen, Generalized naive bayes classifiers, SIGKDD Explor. Newsl. 7 (2005) 76–81.
[50] D. M. Chickering, D. Heckerman, A decision theoretic approach to targeted advertising, in: Proceedings of the
Sixteenth Conference on Uncertainty in Artificial Intelligence, UAI’00, Morgan Kaufmann Publishers Inc., San
Francisco, CA, USA, 2000, pp. 82–88. URL: http://dl.acm.org/citation.cfm?id=2073946.2073957.
[51] B. Hansotia, B. Rukstales, Incremental value modeling, Journal of Interactive Marketing 16 (2001) 35–46.
[52] N. J. Radcli↵e, Using control groups to target on predicted lift: Building and assessing uplift models, Direct
Market J Direct Market Assoc Anal Council 1 (2007) 14–21.
[53] N. J. Radcli↵e, P. D. Surry, Real-world uplift modelling with significance-based uplift trees, White Paper
TR-2011-1, Stochastic Solutions (2011).
[54] P. Rzepakowski, S. Jaroszewicz, Decision trees for uplift modeling with single and multiple treatments, Knowl
Inf Syst 32 (2012) 303–327.
[55] A. Shaar, T. Abdessalem, O. Segard, Pessimistic uplift modeling, ACM SIGKDD (2016).
[56] L. Breiman, J. Friedman, C. J. Stone, R. A. Olshen, Classification and regression trees, CRC press, 1984.
[57] G. V. Kass, An exploratory technique for investigating large quantities of categorical data, Applied statistics
(1980) 119–127.
[58] L. Guelman, M. Guillen, A. M. Perez-Marın, Optimal personalized treatment rules for marketing interventions:
A review of methods, a new proposal, and an insurance case study, Working Papers 2014-06, Universitat de
Barcelona, UB Riskcenter, 2014. URL: http://ideas.repec.org/p/bak/wpaper/201406.html.
[59] B. Hansotia, B. Rukstales, Direct marketing for multichannel retailers: Issues, challenges and solutions, Journal
of Database Marketing 9 (2002) 259–266.
28
[60] T. Verbraken, W. Verbeke, B. Baesens, A novel profit maximizing metric for measuring classification performance
of customer churn prediction models, IEEE Trans Knowl Data Eng 25 (2013) 961–973.
[61] T. Verbraken, W. Verbeke, B. Baesens, Profit optimizing customer churn prediction with bayesian network
classifiers, Intell. Data Anal. 18 (2014) 3–24.
[62] K. Dejaeger, W. Verbeke, D. Martens, B. Baesens, Data mining techniques for software e↵ort estimation: A
comparative study, IEEE Trans. Softw. Eng. 38 (2012) 375–397.
[63] F. Devriendt, W. Verbeke, A literature survey and experimental evaluation of the state-of-the-art in uplift
modeling: a stepping stone towards the development of prescriptive analytics, Big Data (2018). Submitted in
December 2017.
[64] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Com-
puting, Vienna, Austria, 2013. URL: http://www.R-project.org/.
[65] J. Burez, D. Van den Poel, CRM at a pay-TV company: Using analytical models to reduce customer attrition
by targeted marketing for subscription services, Expert Syst. Appl. 32 (2007) 277–288.
29