Post on 18-Jul-2015
Beyond Classification and Ranking: Constrained
Optimization of the ROILian Yan & Patrick Baldasare
KDD’06
1
Return on Investment (ROI)
• ROI is the ratio of money gained or lost on an investment relative to the amount of money invested.
• $50/$1,000 = 5% ROI• $20/$100 = 20% ROI
4
Example
• Used a classifier to predict defection of mutual fund accounts for a major US mutual fund company.
• Positive samples are defined as those accounts with a net redemption amount of 35% or more of the account balance within a two-month window.– net redemption amount = redemption minus
purchase
5
Real-world evaluation results
• two levels of defection risk• three segments based on account values
6
Fixed budget
• ROI of the project is determined by the amount of redemptions prevented.
• Simply classifying does not enable the mutual fund company to reach out to those accounts with the highest redemption amount.
7
Example 2
• Predict collectability of delinquent accounts receivable for credit card issuers– credit, demographic, account data– binary class• payment be received within a certain period
8
Difference of Maximation
• True positive rate among accounts in the collection process→ classification accuracy
• Collectable amount for the collection process → ROI
9
Budget constraint
• Budget constraint determines – how many mutual fund accounts the customer
service team can reach out every month– how many accounts receivable can be placed into
a specific collection process
• pull rate r is the percentage of accounts to pull out for a specific intervention/ collection process.
10
Find Observe Function
• x as the target monetary measure– E.g. collection amount, which directly determines
the ROI.
• Find function y(e)– e is the independent variables
• accounts in the top r % by y correspond to those in the top r% by the target
11
Observe Function
• Maximizing the ROI can be formally defined as
• i = 0, 1, . . . , n − 1, and n is the total number of accounts
12
Cost-sensitive learning Cont.
• P = set of positive samples• N= set of negative samples• qi, qj are both posterior probabilities of
belonging to the positive class• C00 = 0, C11 = 0• C01 = x (the target monetary measure)• C10 is not a constant and unknown
14
Regression & Ranking Models
• Regression Modely(ei) = xi
i = 0, . . . , n−1,
• Ranking Modely(ei) > y(ej)
(i, j) {(i, j)|xi > xj, i, j = 0, . . . , n − 1}∈
15
Maximization Σ xi y(ei) Top r%∈
Constrained Optimization
• 0 ≤ y ≤ 1• decision threshold β (0 <β < 1)
• I(yi, β) is nondifferentiable16
Convert to Unconstrained Optimization
• minimizing the Lagrangian
• Improve results– Mapping Xi to value between -1 and 1
20
Algorithm
• Parametric model– differentiable objective function
• Multilayer perceptron (MLP) network with softmax outputs between 0 and 1– single hidden layer
• This paper found that fixing β at 0.5 achieves almost the same results
21
Classification
• Classification– Trained by mean squared error• 35% of the account balance• top r% of x
– Imbalanced data sets• class prior is typically low
23
Weighted Classification & Regression
• Weighted Classification– Weighted by x or a function of x• Use sigmoid function to avoid extreme value of x
• Regression– Map x to a value between 0 and 1 using the
sigmoid function
24
Ranking
• C. Burges, T. Shaked, et al. Learning to rank using gradient descent. In Proc. of the 22nd Intl. Conf. on Machine Learning, 2005.
• Minimize
• is probability of xi > xj• Cost function
25
Predicting Collectibility of Accounts Receivable
• Accounts receivable– unpaid customer invoices– money owed to a company by its customers
• Banks & Federal– extends credit, offers payment installment plans,
or makes assessments• The collection industry serves an important
role in the U.S. economy– saves American families on average $331 a year
26
Goal
• Goal is to develop a generic predictive model which can be used to guide the agents’ collection efforts
27
Problem
• Identify a high value segment which consists of 11% of the whole– The 11% is chosen since the payer rate
(percentage of paid accounts in the first six months) is 11%
• Data set = 684,600 accounts• Account history & general demographic info
28
Detail
• Randomly split into 1:1 training and test set• Missing values– continuous variables → mean + binary column– categorical variables → conditional mean +
conditional standard deviation
• r = 11% and fix β at 0.5• γ = 0.01 and p = 2• iterations of μ is updated by μt+1 = 0.75μt– t is the iteration index
29
Pull Rate
• This figure shows convergence of pull rates achieved by the threshold β during the optimization. Line 1 is for the training set, and Line 2 shows the pull rate change over the test set.
30
Avg. Collection Amount
• This figure shows the improving average collection amount among the top 11% accounts during the optimization. Line 1 is for the training set, and Line 2 is over the test set.
31
Result
• Classification model is an ensemble of 25 MLP networks with a modified class prior between 0.02 and 0.5
• Weighted classification are weighted by
• average collection amount over the whole portfolio is $36 only
32