Presentation Title
description
Transcript of Presentation Title
Adventures in SegmentationUsing Applied Data Mining to add Business Value
Drew Minkin
The Value Add of Data Mining Segmentation 101 Segmentation Tools in Analysis Services Methodology for Segmentation Analysis Building Confidence in your Model
2
Agenda
3
The Value Add of Data Mining
Statistics for the Computer Age◦Evolution, not revolution with traditional
statistics◦Statistics enriched with brute-force
capabilities of modern computing Associated with industrial-sized data sets
4
Value Add - What is Data Mining?
5
Value Add - Data Mining in the BI Spectrum
SQL-Server 2008
OLAP
Reports (Ad hoc)
Reports (Static)
Data Mining
Business Knowledge
Easy Difficult
Rel
ativ
e B
usin
ess
Val
ue
VoterVault◦ From Mid-1990s◦ Massive get-out-the-vote drive for those expected
to vote Republican Demzilla
◦ Names typically have 200 to 400 information items
6
Value Add – Data Mining and Democracy
“The quiet statisticians have changed our world; not by discovering new facts or technical developments, but by changing the ways that we reason, experiment and form our opinions.”
-- Ian Hacking
7
Value Add – The Promise of Data Mining
Strategic
Tactical
Operational
8
Value Add – Spheres of Influence
Improved efficiency◦Inventory management◦Risk management
Value Add – Operational Benefits
The Bottom Line Increased agility Brand building
◦Differentiate message◦“Relationship” building
Value Add – Strategic Benefits
Reduction of costs◦Transactional leakage◦Outlier analysis
Value Add – Tactical Benefits
Identify a group of customers who are expected to attrite
Conduct marketing campaigns to change the behavior in the desired direction ◦change their behavior, reduce the
attrition rate.
Value Add - Customer Attrition Analysis
Slow attriters: Customers who slowly pay down their outstanding balance until they become inactive.
Fast attriters: Customers who quickly pay down their balance and either lapse it or close it via phone call or write in.
Value Add - Target Result
Credit models Retention models Elasticity models Cross-sell models Lifetime Value models Agent/agency monitoring Target marketing Fraud detection
14
Value Add - Sample Applications
15
Segmentation 101
Unsupervised learning◦Associations and patterns many entities target information
Market basket analysis (“diapers and beer”) Supervised learning
◦Predict the value target variable well-defined predictive variables
Credit / non-credit scoring engines
16
Segmentation – Machine Learning
Data Warehouse: Credit Card Data Warehouse containing about 200 product specific fields
Third Party Data : A set of account related demographic and credit bureau information
Segmentation files :Set of account related segmentation values based on our client's segmentation scheme which combines Risk, Profitability and External potential
Payment Database :Database that stores all checks processed. The database can categorize source of checks
Segmentation –Sample Data Sources
18
Methodology for Segmentation Analysis
19
Methodology–Distribution of Effort
20
Methodology – Segmentation Lifecycle
Acquire
Cleanse
Analyze
Model
Test
Deploy
Monitor
Refine
Research/Evaluate possible data sources Availability Hit rate Implementability Cost-effectiveness
Extract/purchase data Check data for quality (QA) At this stage, data is still in a “raw” form
Often start with voluminous transactional data Much of the data mining process is “messy”
21
Methodology – Acquiring Raw Data
Reflects data changes over time. Recognizes and removes statistically
insignificant fields Defines and introduces the "target" field Allows for second stage preprocessing and
statistical analysis.
Methodology – Goals of Refinement
Scoring engine◦ Formula that classifies or separates policies (or
risks, accounts, agents…) into profitable vs. unprofitable Retaining vs. non-retaining…
(Non-)Linear equation f( ) of several predictive variables
Produces continuous range of scoresscore = f(X1, X2, …, XN)
23
Methodology - Scoring Engines
Mining Model
Methodology – Deployed Model
DMEngine
Data To Predict
DMEngine
Predicted Data
Training Data
Mining Model
Mining Model
DB dataClient dataApplication log
“Just one row”New EntryNew Txion
Randomly divide data into 3 pieces Training data Test data Validation data
Use Training data to fit models Score the Test data to create a lift curve
Perform the train/test steps iteratively until you have a model you’re happy with
During this iterative phase, validation data is set aside in a “lock box”
Score the Validation data and produce a lift curve
Unbiased estimate of future performance
25
Methodology - Testing
Examine correlations among the variables
Weed out redundant, weak, poorly distributed variables
Model design Build candidate models
Regression/GLM Decision Trees/MARS Neural Networks
Select final model
26
Methodology - Multivariate Analysis
27
Segmentation Tools in Analysis Services
Data Mining - Algorithm Matrix
Classifica
tion
Estimatio
n
Segmentation
Associa
tion
Forecasti
ng
Text Analysis
Advanced Data
Exploration
Time Series
Sequence Clustering
Neural Nets
Naïve Bayes
Logistic Regression
Linear Regression
Decision Trees
Clustering
Association Rules
29
Data Mining - SQL-Server Algorithms
Decision Trees Time Series Neural NetClustering
Sequence Clustering Association Naïve Bayes
Linear and Logistic Regression
Offline and online modes◦ Everything you do stays on the server◦ Offline requires server admin privileges to deploy
Define Data Sources and Data Source Views Define Mining Structure and Models Train (process) the Structures Verify accuracy Explore and visualise Perform predictions Deploy for other users Regularly update and re-validate the Model
Data Mining - Blueprint for Toolset
SQL Server 2008 ◦ X iterations of retraining
and retesting the model◦ Results from each test
statistically collated◦ Model deemed accurate
(and perhaps reliable) when variance is low and results meet expectations
Data Mining - Cross-Validation
Data Mining - Microsoft Decision Trees
Use for:◦ Classification: churn and risk
analysis◦ Regression: predict profit or income ◦ Association analysis based on
multiple predictable variable Builds one tree for each
predictable attribute Fast
Data Mining - Microsoft Naïve Bayes
Use for:◦ Classification◦ Association with multiple
predictable attributes Assumes all inputs are
independent Simple classification
technique based on conditional probability
Applied to ◦ Segmentation: Customer
grouping, Mailing campaign
◦ Also: classification and regression
◦ Anomaly detection Discrete and continuous Note:
◦ “Predict Only” attributes not used for clustering
Data Mining - Clustering
Applied to◦ Classification◦ Regression
Great for finding complicated relationship among attributes◦ Difficult to interpret
results Gradient Descent
method
Data Mining - Neural Network
Age Education Sex Income
Input Layer
Hidden Layers
Output Layer
Loyalty
Data Mining - Sequence Clustering
Analysis of:◦ Customer behaviour◦ Transaction patterns◦ Click stream◦ Customer segmentation◦ Sequence prediction
Mix of clustering and sequence technologies◦ Groups individuals based on their
profiles including sequence data
To discover the most likely beginning, paths, and ends of a customer’s journey through our domain consider using:
◦ Association Rules◦ Sequence Clustering
Data Mining - What is a Sequence?
Data Mining - Sequence Data
Cust ID Age
MaritalStatus
Car Purchases
Seq ID Brand
1 35 M 1 Porch-A2 Bamborgini3 Kexus
2 20 S 1 Wagen
2 Voovo
3 Voovo
3 57 M 1 Voovo
2 T-Yota
Your “if” statement will test the value returned from a prediction – typically, predicted probability or outcome
Steps:1. Build a case (set of attributes) representing the transaction
you are processing at the moment E.g. Shopping basket of a customer plus their shipping info
2. Execute a “SELECT ... PREDICTION JOIN” on the pre-loaded mining model
3. Read returned attributes, especially case probability for a some outcome
E.g. Probability > 50% that “TransactionOutcome=ShippingDeliveryFailure”
4. Your application has just made an intelligent decision!5. Remember to refresh and retest the model regularly – daily?
Data Mining – Minor Introduction to DMX
45
Data Mining – Detailed Workflow
46
Data Mining – Detailed Mining Model
47
Data Mining – Detailed Mining Model
48
Data Mining – Detailed Mining Model
49
Building Confidence in your Segmentation
Which target variable to use? Frequency & severity Loss Ratio, other profitability measures Binary targets: defection, cross-sell …etc
How to prepare the target variable? Period - 1-year or Multi-year? Losses evaluated @? Cap large losses? Cat losses? How / whether to re-rate, adjust premium? What counts as a “retaining” policy? …etc
50
Building Confidence - Model Design
Approaches◦ Change the algorithm◦ Change model parameters◦ Change inputs/outputs to avoid bad correlations◦ Clean the data set
Perhaps there are no good patterns in data◦ Verify statistics (Data Explorer)
Building Confidence - Improving Models
Capping ◦ Outliers reduced in influence and to produce better estimates.
Binning◦ Small and insignificant levels of character variables are
regrouped. Box-Cox Transformations
◦ These transformations are commonly included, specially, the square root and logarithm.
Johnson Transformations ◦ Performed on numeric variables to make them more ‘normal’.
Weight of Evidence
◦ Created for character variables and binned numeric variables.
52
Building Confidence – Alternate Methods
53
Building Confidence - Confusion Matrix
1241 correct predictions (516 + 725) .35 incorrect predictions (25 + 10).The model scored 1276 cases (1241+35).The error rate is 35/1276 = 0.0274.The accuracy rate is 1241/1276 = 0.9725.
“All models are wrong, but some are useful." ◦ George Box
54
Building Confidence – Warning Signs
Extrapolation◦ Applying models from unrelated disciplines
Equality◦ The real world contains a surprising amount of
uncertainty, fuzziness, and precariousness. Copula
◦ Binding probabilities can mask errors Distribution functions
◦ Small miscalculations can make coincidences look like certainties
Gamma◦ Human behavior difficult to quantify as a linear parameter
55
Building Confidence –Li’s Revenge
Building Confidence - Lift Curves
Sort data by score Break the dataset into
10 equal pieces Best “decile”: lowest
score lowest LR Worst “decile”:
highest score highest LR
Difference: “Lift” Lift = segmentation
power Lift translates into ROI
of the modeling project
56
~Top 5% of 750000 ◦ 2 groups with 10000 customers from random sampling ◦ 37500 top customers from the prediction list sorted by the
score Group 1
◦ Engaged or offered incentives by marketing department Group 2
◦ No action Results
◦ Group 1 has a attrition rate 0.8%, ◦ Group 2 has 10.6% ◦ Average attrition rate is 2.2% Lift is 4.8 (10.6% /2.2%)
Building Confidence – Vetted Results
58
Discussion
Xiaohua Hu, Drexel University Jerome Friedman, Trevor Hastie, Robert Tibshirani ,The Elements of
Statistical Learning James Guszcza,Deloitte Jeff Kaplan, Apollo Data Technologies Rafal Lukawiecki, Project Botticelli Ltd David L. Olson, University of Nebraska Lincoln Donald Farmer, ZhaoHui Tang and Jamie MacLennan, Microsoft ASA Corporation Richard Boire, Boire Filler Group, John Spooner, SAS Corporation Shin-Yuan Hung , Hsiu-Yu Wang , National Chung-Cheng University Felix Salmon and Chris Anderson, Wired Magazine
59
Credits