Beyond Classiﬁcation and Ranking: Constrained Optimization of the ROI

Beyond Classification and Ranking: Constrained

Optimization of the ROILian Yan & Patrick Baldasare

KDD’06

Outline

• Introduction– Example

• Related Work• Algorithm• Experiment• Conclusion

Introduction

• Financial Service Industry• Data Mining• Classification• Prediction

Return on Investment (ROI)

• ROI is the ratio of money gained or lost on an investment relative to the amount of money invested.

• $50/$1,000 = 5% ROI• $20/$100 = 20% ROI

Example

• Used a classifier to predict defection of mutual fund accounts for a major US mutual fund company.

• Positive samples are defined as those accounts with a net redemption amount of 35% or more of the account balance within a two-month window.– net redemption amount = redemption minus

purchase

Real-world evaluation results

• two levels of defection risk• three segments based on account values

Fixed budget

• ROI of the project is determined by the amount of redemptions prevented.

• Simply classifying does not enable the mutual fund company to reach out to those accounts with the highest redemption amount.

Example 2

• Predict collectability of delinquent accounts receivable for credit card issuers– credit, demographic, account data– binary class• payment be received within a certain period

Difference of Maximation

• True positive rate among accounts in the collection process→ classification accuracy

• Collectable amount for the collection process → ROI

Budget constraint

• Budget constraint determines – how many mutual fund accounts the customer

service team can reach out every month– how many accounts receivable can be placed into

a specific collection process

• pull rate r is the percentage of accounts to pull out for a specific intervention/ collection process.

Find Observe Function

• x as the target monetary measure– E.g. collection amount, which directly determines

the ROI.

• Find function y(e)– e is the independent variables

• accounts in the top r % by y correspond to those in the top r% by the target

Observe Function

• Maximizing the ROI can be formally defined as

• i = 0, 1, . . . , n − 1, and n is the total number of accounts

Cost-sensitive learning

• Minimize the cost

• Cost Matrix

Cost-sensitive learning Cont.

• P = set of positive samples• N= set of negative samples• qi, qj are both posterior probabilities of

belonging to the positive class• C00 = 0, C11 = 0• C01 = x (the target monetary measure)• C10 is not a constant and unknown

Regression & Ranking Models

• Regression Modely(ei) = xi

i = 0, . . . , n−1,

• Ranking Modely(ei) > y(ej)

(i, j) {(i, j)|xi > xj, i, j = 0, . . . , n − 1}∈

Maximization Σ xi y(ei) Top r%∈

Constrained Optimization

• 0 ≤ y ≤ 1• decision threshold β (0 <β < 1)

• I(yi, β) is nondifferentiable16

Differentiable Approximation

• p > 1• 0 ≤ γ < 1

More Problem

• f(yi, β) is often not close to 1

≠ = r

Approximate to Related Ratio

• p > 1

Convert to Unconstrained Optimization

• minimizing the Lagrangian

• Improve results– Mapping Xi to value between -1 and 1

Algorithm

• Parametric model– differentiable objective function

• Multilayer perceptron (MLP) network with softmax outputs between 0 and 1– single hidden layer

• This paper found that fixing β at 0.5 achieves almost the same results

Comparing Methods

• Classification• Weighted classification • Ranking• Regression

Classification

• Classification– Trained by mean squared error• 35% of the account balance• top r% of x

– Imbalanced data sets• class prior is typically low

Weighted Classification & Regression

• Weighted Classification– Weighted by x or a function of x• Use sigmoid function to avoid extreme value of x

• Regression– Map x to a value between 0 and 1 using the

sigmoid function

Ranking

• C. Burges, T. Shaked, et al. Learning to rank using gradient descent. In Proc. of the 22nd Intl. Conf. on Machine Learning, 2005.

• Minimize

• is probability of xi > xj• Cost function

Predicting Collectibility of Accounts Receivable

• Accounts receivable– unpaid customer invoices– money owed to a company by its customers

• Banks & Federal– extends credit, offers payment installment plans,

or makes assessments• The collection industry serves an important

role in the U.S. economy– saves American families on average $331 a year

• Goal is to develop a generic predictive model which can be used to guide the agents’ collection efforts

Problem

• Identify a high value segment which consists of 11% of the whole– The 11% is chosen since the payer rate

(percentage of paid accounts in the first six months) is 11%

• Data set = 684,600 accounts• Account history & general demographic info

Detail

• Randomly split into 1:1 training and test set• Missing values– continuous variables → mean + binary column– categorical variables → conditional mean +

conditional standard deviation

• r = 11% and fix β at 0.5• γ = 0.01 and p = 2• iterations of μ is updated by μt+1 = 0.75μt– t is the iteration index

Pull Rate

• This figure shows convergence of pull rates achieved by the threshold β during the optimization. Line 1 is for the training set, and Line 2 shows the pull rate change over the test set.

Avg. Collection Amount

• This figure shows the improving average collection amount among the top 11% accounts during the optimization. Line 1 is for the training set, and Line 2 is over the test set.

Result

• Classification model is an ensemble of 25 MLP networks with a modified class prior between 0.02 and 0.5

• Weighted classification are weighted by

• average collection amount over the whole portfolio is $36 only

Conclusion

• This paper proposed a new learning algorithm which focuses on maximizing the monetary measure under a fixed budget constraint.

Beyond Classiﬁcation and Ranking: Constrained Optimization of the ROI

Economy & Finance

Transcript of Beyond Classiﬁcation and Ranking: Constrained Optimization of the ROI

Mathematics Subject Classification 2000 - cms.dm.uba.arcms.dm.uba.ar/hemeroteca/classification.pdfMathematics Subject Classification 2000 Mathematics Subject Classification 2000

Efficiently finding optimal winding- constrained loops in ...maxim/files/loopplan_rss12.pdf · Efficiently finding optimal winding-constrained ... be able to view each ROI from

Geologically constrained fi fl - oucreate.commatthewpranter.oucreate.com/article_pdf/Allen_Pranter_AAPG_2016.pdf · Geologically constrained electrofacies classification of fluvial

Protein Structure Databases and Classification · Protein Structure Databases and Classification •SCOP, CATH classification schemes, what they mean. •Motifs: classic turn types.

Constrained Optimization

Constrained Banks, Constrained Borrowers: Bank Liquidity ...

Roi by it-roi

Image Classiﬁcation using Random Forests and Fernsdaphna/course/CoursePapers/bosch07a.pdf · dom forest/fern classiﬁer trained on the ROI descriptors. It is not feasible to optimize

ROI from Nurturing (B2B Marketing ROI)

Image Classiﬁcation using Random Forests and …daphna/course/CoursePapers/...3. Learning the model This section describes the two components of the model: the selection of the ROI

Constrained Optimal

Mathematics Subject Classification 2000mbe/classification.pdfMathematics Subject Classification 2000 Mathematics Subject Classification 2000 (MSC2000) This is a completely revised

Classiﬁcation Clustering

Recursive Flow Classification: An Algorithm for Packet ...klamath.stanford.edu/~pankaj/thesis/chapter4.pdfRecursive Flow Classification: An Algorithm for Packet Classification on

Triangle Classiﬁcation

Classification of problem & problem solving strategies · Classification of problem & problem solving strategies classification of time complexities (linear, logarithmic etc) ...

Pattern search in the presence of degeneracyKeywords: Pattern search, linearly constrained optimization, derivative-free optimization, degeneracy, redundancy, constraint classiﬁcation

Mining Discriminative Triplets of ... - cv-foundation.org · tomatically mine discriminative geometrically-constrained triplets for classiﬁcation. The resulting approach only re-quires

Deadline Constrained Packet Scheduling in Wireless Networkingkeshi.ubiwna.org/2014IoTComm/readinglist/Deadline Constrained Pa… · Deadline Constrained Packet Scheduling in Wireless

Journal of Arid Environments - tucson.ars.ag.gov · Semi-arid land Stream classification Cluster analysis Classification trees ABSTRACT An ecohydrological stream type classification