Presentation Title

Adventures in SegmentationUsing Applied Data Mining to add Business Value

Drew Minkin

The Value Add of Data Mining Segmentation 101 Segmentation Tools in Analysis Services Methodology for Segmentation Analysis Building Confidence in your Model

2

Agenda

3

The Value Add of Data Mining

Statistics for the Computer Age◦Evolution, not revolution with traditional

statistics◦Statistics enriched with brute-force

capabilities of modern computing Associated with industrial-sized data sets

4

Value Add - What is Data Mining?

5

Value Add - Data Mining in the BI Spectrum

SQL-Server 2008

OLAP

Reports (Ad hoc)

Reports (Static)

Data Mining

Business Knowledge

Easy Difficult

Rel

ativ

e B

usin

ess

Val

ue

VoterVault◦ From Mid-1990s◦ Massive get-out-the-vote drive for those expected

to vote Republican Demzilla

◦ Names typically have 200 to 400 information items

6

Value Add – Data Mining and Democracy

“The quiet statisticians have changed our world; not by discovering new facts or technical developments, but by changing the ways that we reason, experiment and form our opinions.”

-- Ian Hacking

7

Value Add – The Promise of Data Mining

Strategic

Tactical

Operational

8

Value Add – Spheres of Influence

Improved efficiency◦Inventory management◦Risk management

Value Add – Operational Benefits

The Bottom Line Increased agility Brand building

◦Differentiate message◦“Relationship” building

Value Add – Strategic Benefits

Reduction of costs◦Transactional leakage◦Outlier analysis

Value Add – Tactical Benefits

Identify a group of customers who are expected to attrite

Conduct marketing campaigns to change the behavior in the desired direction ◦change their behavior, reduce the

attrition rate.

Value Add - Customer Attrition Analysis

Slow attriters: Customers who slowly pay down their outstanding balance until they become inactive.

Fast attriters: Customers who quickly pay down their balance and either lapse it or close it via phone call or write in.

Value Add - Target Result

Credit models Retention models Elasticity models Cross-sell models Lifetime Value models Agent/agency monitoring Target marketing Fraud detection

14

Value Add - Sample Applications

15

Segmentation 101

Unsupervised learning◦Associations and patterns many entities target information

Market basket analysis (“diapers and beer”) Supervised learning

◦Predict the value target variable well-defined predictive variables

Credit / non-credit scoring engines

16

Segmentation – Machine Learning

Data Warehouse: Credit Card Data Warehouse containing about 200 product specific fields

Third Party Data : A set of account related demographic and credit bureau information

Segmentation files :Set of account related segmentation values based on our client's segmentation scheme which combines Risk, Profitability and External potential

Payment Database :Database that stores all checks processed. The database can categorize source of checks

Segmentation –Sample Data Sources

18

Methodology for Segmentation Analysis

19

Methodology–Distribution of Effort

20

Methodology – Segmentation Lifecycle

Acquire

Cleanse

Analyze

Model

Test

Deploy

Monitor

Refine

Research/Evaluate possible data sources Availability Hit rate Implementability Cost-effectiveness

Extract/purchase data Check data for quality (QA) At this stage, data is still in a “raw” form

Often start with voluminous transactional data Much of the data mining process is “messy”

21

Methodology – Acquiring Raw Data

Reflects data changes over time. Recognizes and removes statistically

insignificant fields Defines and introduces the "target" field Allows for second stage preprocessing and

statistical analysis.

Methodology – Goals of Refinement

Scoring engine◦ Formula that classifies or separates policies (or

risks, accounts, agents…) into profitable vs. unprofitable Retaining vs. non-retaining…

(Non-)Linear equation f( ) of several predictive variables

Produces continuous range of scoresscore = f(X1, X2, …, XN)

23

Methodology - Scoring Engines

Mining Model

Methodology – Deployed Model

DMEngine

Data To Predict

DMEngine

Predicted Data

Training Data

Mining Model

Mining Model

DB dataClient dataApplication log

“Just one row”New EntryNew Txion

Randomly divide data into 3 pieces Training data Test data Validation data

Use Training data to fit models Score the Test data to create a lift curve

Perform the train/test steps iteratively until you have a model you’re happy with

During this iterative phase, validation data is set aside in a “lock box”

Score the Validation data and produce a lift curve

Unbiased estimate of future performance

25

Methodology - Testing

Examine correlations among the variables

Weed out redundant, weak, poorly distributed variables

Model design Build candidate models

Regression/GLM Decision Trees/MARS Neural Networks

Select final model

26

Methodology - Multivariate Analysis

27

Segmentation Tools in Analysis Services

Data Mining - Algorithm Matrix

Classifica

tion

Estimatio

n

Segmentation

Associa

tion

Forecasti

ng

Text Analysis

Advanced Data

Exploration

Time Series

Sequence Clustering

Neural Nets

Naïve Bayes

Logistic Regression

Linear Regression

Decision Trees

Clustering

Association Rules

29

Data Mining - SQL-Server Algorithms

Decision Trees Time Series Neural NetClustering

Sequence Clustering Association Naïve Bayes

Linear and Logistic Regression

Offline and online modes◦ Everything you do stays on the server◦ Offline requires server admin privileges to deploy

Define Data Sources and Data Source Views Define Mining Structure and Models Train (process) the Structures Verify accuracy Explore and visualise Perform predictions Deploy for other users Regularly update and re-validate the Model

Data Mining - Blueprint for Toolset

SQL Server 2008 ◦ X iterations of retraining

and retesting the model◦ Results from each test

statistically collated◦ Model deemed accurate

(and perhaps reliable) when variance is low and results meet expectations

Data Mining - Cross-Validation

Data Mining - Microsoft Decision Trees

Use for:◦ Classification: churn and risk

analysis◦ Regression: predict profit or income ◦ Association analysis based on

multiple predictable variable Builds one tree for each

predictable attribute Fast

Data Mining - Microsoft Naïve Bayes

Use for:◦ Classification◦ Association with multiple

predictable attributes Assumes all inputs are

independent Simple classification

technique based on conditional probability

Applied to ◦ Segmentation: Customer

grouping, Mailing campaign

◦ Also: classification and regression

◦ Anomaly detection Discrete and continuous Note:

◦ “Predict Only” attributes not used for clustering

Data Mining - Clustering

Applied to◦ Classification◦ Regression

Great for finding complicated relationship among attributes◦ Difficult to interpret

results Gradient Descent

method

Data Mining - Neural Network

Age Education Sex Income

Input Layer

Hidden Layers

Output Layer

Loyalty

Data Mining - Sequence Clustering

Analysis of:◦ Customer behaviour◦ Transaction patterns◦ Click stream◦ Customer segmentation◦ Sequence prediction

Mix of clustering and sequence technologies◦ Groups individuals based on their

profiles including sequence data

To discover the most likely beginning, paths, and ends of a customer’s journey through our domain consider using:

◦ Association Rules◦ Sequence Clustering

Data Mining - What is a Sequence?

Data Mining - Sequence Data

Cust ID Age

MaritalStatus

Car Purchases

Seq ID Brand

1 35 M 1 Porch-A2 Bamborgini3 Kexus

2 20 S 1 Wagen

2 Voovo

3 Voovo

3 57 M 1 Voovo

2 T-Yota

Your “if” statement will test the value returned from a prediction – typically, predicted probability or outcome

Steps:1. Build a case (set of attributes) representing the transaction

you are processing at the moment E.g. Shopping basket of a customer plus their shipping info

2. Execute a “SELECT ... PREDICTION JOIN” on the pre-loaded mining model

3. Read returned attributes, especially case probability for a some outcome

E.g. Probability > 50% that “TransactionOutcome=ShippingDeliveryFailure”

4. Your application has just made an intelligent decision!5. Remember to refresh and retest the model regularly – daily?

Data Mining – Minor Introduction to DMX

45

Data Mining – Detailed Workflow

46

Data Mining – Detailed Mining Model

47


48


49

Building Confidence in your Segmentation

Which target variable to use? Frequency & severity Loss Ratio, other profitability measures Binary targets: defection, cross-sell …etc

How to prepare the target variable? Period - 1-year or Multi-year? Losses evaluated @? Cap large losses? Cat losses? How / whether to re-rate, adjust premium? What counts as a “retaining” policy? …etc

50

Building Confidence - Model Design

Approaches◦ Change the algorithm◦ Change model parameters◦ Change inputs/outputs to avoid bad correlations◦ Clean the data set

Perhaps there are no good patterns in data◦ Verify statistics (Data Explorer)

Building Confidence - Improving Models

Capping ◦ Outliers reduced in influence and to produce better estimates.

Binning◦ Small and insignificant levels of character variables are

regrouped. Box-Cox Transformations

◦ These transformations are commonly included, specially, the square root and logarithm.

Johnson Transformations ◦ Performed on numeric variables to make them more ‘normal’.

Weight of Evidence

◦ Created for character variables and binned numeric variables.

52

Building Confidence – Alternate Methods

53

Building Confidence - Confusion Matrix

1241 correct predictions (516 + 725) .35 incorrect predictions (25 + 10).The model scored 1276 cases (1241+35).The error rate is 35/1276 = 0.0274.The accuracy rate is 1241/1276 = 0.9725.

“All models are wrong, but some are useful." ◦ George Box

54

Building Confidence – Warning Signs

Extrapolation◦ Applying models from unrelated disciplines

Equality◦ The real world contains a surprising amount of

uncertainty, fuzziness, and precariousness. Copula

◦ Binding probabilities can mask errors Distribution functions

◦ Small miscalculations can make coincidences look like certainties

Gamma◦ Human behavior difficult to quantify as a linear parameter

55

Building Confidence –Li’s Revenge

Building Confidence - Lift Curves

Sort data by score Break the dataset into

10 equal pieces Best “decile”: lowest

score lowest LR Worst “decile”:

highest score highest LR

Difference: “Lift” Lift = segmentation

power Lift translates into ROI

of the modeling project

56

~Top 5% of 750000 ◦ 2 groups with 10000 customers from random sampling ◦ 37500 top customers from the prediction list sorted by the

score Group 1

◦ Engaged or offered incentives by marketing department Group 2

◦ No action Results

◦ Group 1 has a attrition rate 0.8%, ◦ Group 2 has 10.6% ◦ Average attrition rate is 2.2% Lift is 4.8 (10.6% /2.2%)

Building Confidence – Vetted Results

58

Discussion

Xiaohua Hu, Drexel University Jerome Friedman, Trevor Hastie, Robert Tibshirani ,The Elements of

Statistical Learning James Guszcza,Deloitte Jeff Kaplan, Apollo Data Technologies Rafal Lukawiecki, Project Botticelli Ltd David L. Olson, University of Nebraska Lincoln Donald Farmer, ZhaoHui Tang and Jamie MacLennan, Microsoft ASA Corporation Richard Boire, Boire Filler Group, John Spooner, SAS Corporation Shin-Yuan Hung , Hsiu-Yu Wang , National Chung-Cheng University Felix Salmon and Chris Anderson, Wired Magazine

59

Credits

Presentation Title

Documents

Transcript of Presentation Title