Capabilities Apollo and SQL Server Data Mining Presented by Jeff Kaplan, Principal Client Services...
-
Upload
silvia-marsh -
Category
Documents
-
view
215 -
download
0
Transcript of Capabilities Apollo and SQL Server Data Mining Presented by Jeff Kaplan, Principal Client Services...
CapabilitiesApollo and SQL Server Data Mining
Presented by
Jeff Kaplan, Principal Client Services
Paul Bradley, Ph.D., Principal Data Mining Technology
312.787.7376
2
Agenda
Apollo Overview
Data Mining 101
Project REAL Case Study
SQL Server 2005 Data Mining Demo
Real-life Examples
3
Apollo Overview
PART ONE
4
Company Background
First company delivering true predictive analytic solutions
10 plus years in data mining and data warehousing
Premier Partner for SQL Server 2005 Data Mining
Cater to a wide range of business including Microsoft, Sprint, Wal-Mart, Barnes & Noble, Seattle Times, Knight Ridder
Variety of Industries• Retail and Consumer Goods• Media• Financial Services• Manufacturing• Public Services
overview
5
Industry Recognition
overview
6
Testimonials
overview
7
Testimonials
overview
8
Testimonials
overview
9
overview
Analytic Landscape
10
Capabilities
overview
• Customer Acquisition
• Campaign Targeting
• Cross-sell/Up-sell
• Customer Segmentation
• Retention Modeling
• Behavioral Targeting
• Personalization
• Claim Analysis
• Call Center Analytics
• Data Warehousing
• Dashboard Reporting
Marketing Sales & Distribution
• Correlation Analysis
• Key Driver Analysis
• Verbatim Summarization
Market Research Operations
• Inventory Forecasting
• Sales Forecasting
• Pricing Optimization
• Next Best Offer
• Market Basket Analysis
• Recency & Frequency
Modeling
11
RedCard
Booking
CallCenter
SQL-Server2005
Stores
Predictive Models
Dashboard & Ad-hoc Reporting
Customer Clustering Models
Measure Promotion Success
Web
Direct Mail
Phone
Automate Predictions for Targeting, Forecasting,
Detection, etc.
• Join Customer Data Sources
Customer Targeting Models
• Score Model Results • Deliver Targeted Predictions• Run Predictive Algorithms
overview
12
MS Data Mining
PART TWO
13
Fastest Growing BI Segment (IDC)• Data Mining Tools: $1.85B in 2006• Predictive Analytic projects yield a high median ROI of 145%
Uses• Marketing: Customer Acquisition and Targeting, Cross-Sell/Up-Sell• Retail: Inventory Forecasting, Price Optimization• Market Research: Driver Analysis, Verbatim Summarization• Operations: Call Center Analytics• Finance: Fraud Detection, Risk Models
Mainstream Emergence• E-commerce (e.g Amazon.com)• Search (e.g. Vivisimo.com)• Behavioral Advertising
SQL-Server is in a Unique Position to Service Market Needs
ms data mining
Background
14
Evolution of SQL Server Data Mining
SQL 2000
SQL 2000
SQL 2005
SQL 2005
Enter the GameEnter the Game Create industry standardCreate industry standard Target developer audienceTarget developer audience V1.0 product with 2 V1.0 product with 2
algorithmsalgorithms
Win LeadershipWin Leadership Continue standards and Continue standards and
developer effortdeveloper effort Comprehensive feature setComprehensive feature set Penetrate the EnterprisePenetrate the Enterprise Thought leadershipThought leadership
ms data mining
15
SQL-Server 2005
OLAP
Reports (Adhoc)
Reports (Static)
Data Mining
Business Knowledge
Easy Difficult
Rel
ativ
e B
usin
ess
Val
uems data mining
Value of Data Mining
16
SQL-Server 2005 BI Platform
Analysis ServicesAnalysis ServicesOLAP & Data MiningOLAP & Data Mining
Integration ServicesIntegration ServicesETLETL
SQL ServerSQL ServerRelational EngineRelational Engine
Reporting ServicesReporting Services Man
agem
ent T
oo
lsM
anag
emen
t To
ols
Dev
elo
pm
ent
To
ols
Dev
elo
pm
ent
To
ols
ms data mining
17
SQL Server 2005 BI Platform
Embed Data Mining: Development Tool Integration• Make Decisions Without Coding
• Customized Logic Based on Client Data
• Logic Updated by Model Reprocessing – Applications Do Not Need to be Re-Written, Re-Compiled, and Re-Deployed
Data Mining Key Points• Price Point to Achieve Market Penetration
• Database Metaphors for Building, Managing, Utilizing Extracted Patterns and Trends
• APIs for Embedding Data Mining Functionality into Applications
ms data mining
18
SQL-Server 2005 Algorithms
Decision TreesDecision Trees Time SeriesTime Series Neural NetNeural NetClusteringClustering
Sequence ClusteringSequence Clustering AssociationAssociation Naïve BayesNaïve Bayes
ms data mining
Linear and Logistic RegressionLinear and Logistic Regression
19
Project REAL
PART THREE
20
Client Profile – Inventory Forecasting
• Create a Reference Implementation of a BI System Using Real Retail Data.• Partners - Barnes & Noble, Microsoft, Scalability Experts, EMC, Unisys,
Panorama, Apollo
• Forecast Out-of-Stock for 5 Book Titles Across Entire Chain (800 Stores)• Predictive Models to Flag Items That Are Going to be Out-of-Stock• Model on 48 Weeks of Data, Predictions for Month of December
• Models Predicted Out-of-Stock Occurrences > 90% Accuracy• Conservative Sales Opportunity for just 5 Titles: $6,800 per year• Extrapolate Across Millions of Titles - Million Dollar Sales Opportunity
project real
21
Predictive Modeling Process
+
Each item belongs to a category
For the category, create a set of store clusters predictive of sales in the category
Category
STEP 1
STEP 2
Identify the cluster which the store belongs to, for the category of that item.
STEP 3
Utilize sales data predict item sales 2 weeks out.
ITEMSTORE
CATEGORY
project real
22
Store Clustering Demo
project real
24
Out-of-Stock Data Preparation Summary
Apollo Explored 3 Data Preparation Strategies1. Use Sales, On-Hand, On-Order History Data for All Stores in the Same Cluster
Build One Mining Structure per Cluster, For All Stores in that Cluster for Each Title
Build One Mining Model per Store, per Cluster for Each Title
Negative: Few OOS Examples per Store, Computation to Deploy One Mining Model per Store/Title Combination
2. Use Sales, On-Hand, On-Order History for All Stores, Across All ClustersBuild One Mining Structure per Book, Use Cluster Membership of Store as
Input Attribute
Positive: Optimizes OOS Examples per Title by Considering All Stores
Negative: Does Not Capture Derivative Sales Information
3. Removed Negative of Strategy 2Included Historical Week-on-Week Sales Derivative Information for Each Title
Increase the Information Content of the Source Data for Modeling
project real
25
Creating Variables for Success
Using:• Sales and Inventory History from January 2004 to end of November 2004• Recommend two (2) years of Historical Data to Increase accuracy for training model
Key: • Store + Fiscal Year + WeekID
Predicted Variables• 1 Week Ahead OOS Boolean• 1 Week Ahead Sales Bin (None, 1 to 2, 3 to 4, 4+)• 2 Week Ahead OOS Boolean• 2 Week Ahead Sales Bin (None, 1 to 2, 3 to 4, 4+)
Input Attributes• Store Cluster Membership (Derived from Store Cluster Model)• Current Week Sales, On-Hand, On-Order• Preceding 1-5 Week Sales, On-Hand, On-Order• Sales Derivative Atttributes
project real
26
Model Training and Testing Scenarios
Purpose: Intelligence on Model Training Frequency• Scenario 1: Train Models Every 2 Weeks
Training Dataset: All Data Prior to Last 2 Fiscal Weeks in December 2004
Test Dataset: Last 2 Fiscal Week in December 2004
• Scenario 2: Train Models Monthly
Training Dataset: All Data Prior to End of Fiscal November 2004
Test Dataset: Fiscal Month of December 2004
project real
27
Balancing Training Data
When Considering All Stores, Still Have Un-Balanced Datasets• [# Store/Week Combinations Where OOS is False] >> [# Store/Week
Combinations Where OOS is True]
• Common in Many Data Mining Applications
Training Datasets were Balanced • Sample Store/Week Combinations Where OOS is False to Obtain Equal
Proportion of True/False Values
“Cost” of Predictive Errors are Equal• Requested by Client
project real
28
Prediction Methods
Algorithm SelectionMicrosoft Decision Trees for Predicting OOS Boolean flags
Consistently High Overall Accuracy
Straightforward Interpretation
Data Preparation• Scenario 2• Rebuild models monthly
Predictive Models are Contextual and Optimized for Behavior in the Coming Month
project real
29
Prediction Methods
Modeling Methodology Benefits• Scalability (Titles and Stores)
• Saves 4x to 5x on Computational Cost when Rebuilding Models (versus Neural Networks)
5 Minutes for All 5 Titles => 1 Minute per Title for All Stores
project real
30
Out-of-Stock Prediction Demo
project real
32
Inventory Prediction Results
1 week and 2 week prediction accuracies
TITLE Week 1 Week 2 Week 1 Week 2JUNIE B JONES IS A GRADUATION 97.53% 92.87% 98.46% 99.98%CAPTAIN UNDERPANTS & THE INVA 99.06% 87.67% 99.06% 99.96%MTH RESEARCH GDE #01 DINO 100.00% 83.82% 100.00% 100.00%MTH RESEARCH GDE #08 TWISTERS 98.29% 83.60% 99.48% 100.00%SECRETS OF DROON #04 CITY IN 97.71% 84.31% 99.13% 100.00%
AVERAGE ACCURACY 98.52% 86.45% 99.23% 99.99%
OOS SALES BINS
project real
33
Sales Opportunity
Data Mining created revenue generating opportunityBased on 55 titles for Jan 2004 - Dec 2004
• (# of weeks OOS across all stores)(Apollo Boolean Predicted Accuracy)• X (actual % of actual sales across all stores) x (retail price)• = Yearly Increase in Sales Opportunity using Apollo OOS Predictions
TITLE # of OOS 1-2 Sales Price 2 Wk PredJUNIE B JONES IS A GRADUATION 1,165 1.16% 14.95 92.87%CAPTAIN UNDERPANTS & THE INVA 10,040 1.01% 17.95 87.67%MTH RESEARCH GDE #01 DINO 15,227 0.16% 14.95 83.82%MTH RESEARCH GDE #08 TWISTERS 4,444 0.44% 27.95 83.60%SECRETS OF DROON #04 CITY IN 7,115 0.65% 21.95 84.31%
1 Copy Sales 2 Copy Sales187.44$ 374.89$
1,590.96$ 3,181.93$ 305.13$ 610.26$ 460.57$ 921.14$ 861.37$ 1,722.74$
3,405.48$ 6,810.95$
Sales bins produced $3.4K, $6.8K potential lift in sales
project real
34
Client Profiles
PART FOUR
35
Client Profile – Customer Acquisition
• Decrease Subscriber Churn
• Increase New Subscriptions
• Segment Geo-Demographic and Attitudinal Behaviors for Subscribers and Non-Subscribers
• Build Predictive Models to Identify Likely New Subscribers
• Using Analysis to Deliver Targeted Marketing Campaigns for Acquisition
• Increased Stop Saves by 2%
client profiles
36
Client Profile – Cross sell / Up sell (Global Catalog Retailer)
• Increase Average Purchase Size• Deploy Product Recommendations on their Website
• Modeling Historical Sales to Determine Product Affinities• Incorporate Business Logic into Modeling Process (e.g. Same
category recommendation)
• Increase Average Shopping Cart Size• Increase Sales Lift• Data Mining Driven Product Recommendation Performed Better
than Manual Recommendations
client profiles
37
Client Profile – Customer Support Automation
• Increase Visibility into Customer Service Center
• Increase Speed of Customer Support
• Utilizing Text Mining Engines to Automate Processing of Customer Support (Email, Web Inquiries, etc.)
• Automating the Process of Rolling up Keywords into Concepts
• Customer Support Center has the Ability to View Trends in Minutes versus Weeks
• Improved Accuracy - Text Mining Engines Removed the Bias and Inaccuracies Often Occurring in Call Center Representative Notes and Tagging.
client profiles
38
Client Profile – Key Driver Analysis
• Evaluate Customer Satisfaction Metrics• Increase Customer Satisfaction
• Partnered with Apollo to Develop Market Research Database and Reporting
• Developed Models to Identify “Key” Satisfaction Drivers
• Successfully Identified Drivers to Increase Customer Satisfaction• Delivered Driver Recommendations to Field Operations - Insight into
Action• Company Wide (sales, marketing, executive level) Visibility into Customer
Satisfaction Metrics
client profiles