Embedded Automatic Model Training And Forc In An Enterprise Sw Applic
-
Upload
greg-makowski -
Category
Technology
-
view
1.396 -
download
2
description
Transcript of Embedded Automatic Model Training And Forc In An Enterprise Sw Applic
Embedded Automatic Model Training and Forecasting
in an Enterprise Software Application
(… or how to embed a data mining consultant in a box)
Presented to the SF Bay ACM Data Mining SIGMarch 11, 2009 by Greg Makowski
Principal Consultant, Golden Data Miningp , g
OutlineOutline
Challenge: How to automate not only forecasting, but model training?
Solution: Focus on a vertical market applicationDeeply investigate the business & technical issues
Result:Result:An enterprise applicationUp to a 30% reduction in $ lost to over and under stock
1
Challenge: Business Pain PointChallenge: Business Pain Point
JDA Software (who owns the IP) has dozens of ( )enterprise retail supply chain applications
Th R l i h t ft d g d The Replenishment software does a very good job keeping store shelves stocked at the right level when sales are steadyy
Moves product from warehouse to DC to storeSales are NOT STEADY during sales events!
PAIN POINT: The event planner has to estimate the lift in sales for every store-item combination, the lift in sales for every store item combination,
(6k stores) * (1k to 4k item’s) 24 mm store-item lift estmts.
2
Challenge: 16 Page Newspaper Insert
Retail(context)
Challenge: 16 Page Newspaper Insert
Can vary by region or ZIP
Event Lift Forecasting (ELF)Event Lift Forecasting (ELF)
Lift is a multiplier for the increase in sales over Lift is a multiplier for the increase in sales over normal
“Prod X in Store Y will sell 6.8 times more than normal”
Normal sales are around the event, for the same: time period (i.e. Thr – Sun), a week before and after (non-overlapping)Store – product (SKU is a key for product)
LiftE t LiftEvent
4
Challenge: Appropriate for Business User
Retail
Challenge: Appropriate for Business User
A retail event planner Has revenue goals and a “budget” of discount $Has to get through a lot of detail quicklyDoes not typically create mathematical forecasts
Uses an enterprise application to layout the Uses an enterprise application to layout the event flyer about 3 weeks in advance
Decides for the event: Decides for the event: departments / items / pricing / photos / language
Uses the software to specify SKU’s, images and l t th fllayout the flyer
5
Challenge: How to Productize (Agile)?
Product MgmtSoftware Arch
Challenge: How to Productize (Agile)?
This is not a one-off consulting project, but SWSoftware engineering needs (get in the ballpark)
right starting position, metrics, use cases, data flowg g p , , ,Support good Agile development process
GoalsGoalsAt least 90% software and 10% configuration, not repeated consulting projects not repeated consulting projects,
Control the Total Cost of Ownership for the product
RELIABLE when used by the business user RELIABLE when used by the business user, working at the level of detail that the user cares about
6
Challenge: Details we Have vs. Need to Start
Product Mgmt
Challenge: Details we Have vs. Need to Start
OutlineOutline
Challenge: How to automate not only g yforecasting, but model training?
Solution: Focus on a vertical market applicationD l i i h b i & h i l iDeeply investigate the business & technical issues
Result:Result:An enterprise applicationUp to a 30% reduction in $ lost to over and under Up to a 30 educt o $ ost to o e a d u destock
8
Path to Solution
Product MgmtData Mining
Path to Solution
Customer lead, product driven – design general
Can’t data mine – without dataStart data request process with several clientsJumpstart efforts with Monte Carlo
Combine Census fields with noise to create a targetCombine Census fields with noise to create a targetThe models and forecast matter less – the process MORE
Ask for business interviewsAsk for business interviewsUnderstand users, metrics, past challenges
What is the BATNA?What is the BATNA?Best Alternative, To A New Alternative (system)?
9
Data Sources
Data Mining
Data Sources
Event Attributes (for planned in 3 weeks & past)
Pricing, placement (page #, on a page)Products, departments, layoutS f d hi f l i i Store features, demographics of population in area,
Past eventsPast eventsFlyers may have 1, 8, 12, 16, 20, 64 pagesSame week last year may have a different prod mixSame week last year may have a different prod mixCalculate Lift for all store-items for all past events
Normal sales (not during an event) near in timeNormal sales (not during an event) near in timeEvent sales; Lift = (event sales) / (non-event sales)
10
Iterative KDD Process
Data Mining
Iterative KDD Process
Knowledge Discovery in Databases (KDD)
1. Select Data for Analysis (from prior event app)
2. Exploratory Data Analysis (EDA)3. Preprocessing (manipulating fields)p g ( p g )
4. Model Building (Training DM algorithms)
5 Model Evaluation (appl to hold o t data)5. Model Evaluation (apply to hold out data)
6. Post-process score to business value7. Feed the next application (Lift / store-item)
11
Easiest to Automate From the Core
Data MiningProduct Mgmt
Easiest to Automate From the Core
Go through full process, automatingmodel building / evaluationEDA & PreprocessingSelect past marketing campaigns
12
Hypothesis to Select Past Campaigns: 1) Most Similar Past Events
Data Mining
1) Most Similar Past Events
Hypothesis: a close fit to the new event is betterAttention: your expertise will be quizzed!
Hypothesis: a close fit to the new event is better
Compare high level event attributesCompare high level event attributesNumber of pages of the flyerDiscount (average, max)“Primary” departments, sub-dep, catg, sub-category… and so on
Use “fuzzy” Euclidian distance to match past events to the planned event in 3 weeks
Select the 1-10 most similar events in the last year
13
Hypothesis to Select Past Campaigns: 2) Select Broadly
Data Mining
2) Select Broadly
Hypothesis: more training records provides a yp g pwide variety of behavior, and better generalization
Exclude past marketing events that are quite different (but be broadly inclusive)
If the planned event is 10-18 pages, exclude 1-2 and 64 page events
Audience Quiz: VOTE for what you expect1) Close fit 2) Broad fit ?1) Close fit, 2) Broad fit ?
14
Select Past Campaigns: Results & Why
Data Mining
Select Past Campaigns: Results & Why
Answer from testing: gBROADLY selecting past marketing events to train for the planned event works much better
GWhy: Breadth Robust GeneralizationSame sale last year was different in many waysBroad variety of price points / item or departmentVariety of items on coverV i ti hVariation over geography
15
Exploratory Data Analysis (EDA)
Data Mining
Exploratory Data Analysis (EDA)
Front cover items had a lift 5.1 times higher than the average elsewhere!
Lift as high as 130 – after Halloween candy lsale
The top 5% of the records had 90% of the lift (over all store-item combinations)
16
Exploratory Data Analysis (EDA)
Data MiningRetail
Exploratory Data Analysis (EDA)
The Cash Flow is Very Concentrated
Range of Lift Values(The Top 5% Provides 88% of the Lift)
140
Range of Lift Values (Omitting the Largest)
7
100
120
5
6
t)40
60
80
Lift
(Tar
get)
2
3
4
Lift
(Tar
get
0
20
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0
1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
?0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Bins of an Equal number of Records
Lift Baseline
Bins of an Equal number of Records
Lift Baseline
17Test weight and target variations, lift and lift_log
Preprocessing - Categorical
Data Mining++
Preprocessing Categorical
Average past Lift per category Percent off bin (i.e. 0%, 5%, 10%, 15% … 80%)Price Savings Bin (i.e. $2, $4, $6 …)S hi hStore hierarchyProduct hierarchy (50k to 100k SKUs, 4-6 levels)
Department Sub-department Category Sub-CatgegoryDepartment, Sub department, Category, Sub Catgegory
Seasonality, time, month, weekReason codes (the event is a circular, clearance)Location on the page in the flyer (top right, top left..)
Multivariate combinations – powerful & scalable(price bin) + (page loc bin) + (sub-cat)
18
Preprocessing – Interactions
Data Mining++
Preprocessing Interactions
19
Design Of Experiments (DOE)
Data Mining
Design Of Experiments (DOE)
Model Notebook (pictured in next slide)One row per model trainedinput columns: data version, model parametersoutput columns: training time, results in-sample, out of sample, gap (bigger is worse), and gap penalized resultspenalized results
Sections per data mining algorithm, i.e.Stepwise Regression Naïve BayesStepwise Regression, Naïve BayesCubist (tree w/ regression in leaves)Neural NetNeural NetTreeNet (from Salford Systms)
20
Model Notebook Tracks DOE
Data Mining++Instead of Occam’s Razor
Model Notebook Tracks DOEGeneralization Error = abs( in sample res – out of sample res )Conservative Result = worst( in, out samp ) + Generalization Err
N in In Out of Gen: Out +
MODEL RESULTSMean Abs Err (-good)ANALYSIS ENGINE SETTINGS
( , p )
1 2 3 4 N in ser Eng parameter 1 parameter 2 parameter 3 comment In
SampOut of Samp
Gen: In-Out
Out + Gen
1 regr Try target: LIFT LOG 58 vars selected 1 184 1 264 0 08 1 341 regr Try target: LIFT_LOG 58 vars selected 1.184 1.264 0.08 1.34
2 regr Try target: LIFT_LOG limit to 15 vars limit to 15 1.21 1.289 0.08 1.37
3 regr Try target: LIFT 65 vars selected 1.732 2.654 0.92 3.58
4 regr Try target: LIFT limit to 15 limit to 15 1 714 1 837 0 12 1 964 regr Try target: LIFT vars limit to 15 1.714 1.837 0.12 1.96
5 regr 60 vars selected 1.20 1.42 0.22 1.63Start with unv4_trn, and set larger wgt's for larger lift values wgt_2=1; IF(2<lift) wgt_2 = 2; IF(5<lift) wgt_2 = 3; 21
Data Mining Algorithm Improvements
Data Mining++
Data Mining Algorithm Improvements
Cubist http://www.rulequest.com/cubist-info.html
Ross Quinlan uses a “greedy algorithm” to select regression fields for each leafTested and changed to “stepwise regression” for Tested and changed to stepwise regression for each leaf
Split 1
Split 2 Split 3
Leaf 1
p p
Leaf 2 Leaf 3 Leaf 4
22
Training Priority – a Complex Surface
Data MiningRetail
$180,000
Training Priority a Complex Surface
$120,000
$140,000
$160,000
w
$60 000
$80,000
$100,000
e-Ite
ms
* Li
ft *
Cas
h Fl
ow
lift to 4.1 $0
$20,000
$40,000
$60,000
Num
Sto
reEv
ent
on-E
vent
C
lift to 55
lift to 1.0
lift to 1.4
lift to 2.1$0
1 2.54
$17.
08
o $2
2.89
h to
$32
.36
ash
to $
48.0
3
cash
to $
79.3
8
cash
to $
182
cash
to $
7,64
7
Cash Flow =
N
No
Liftlift to .55
cash
to $
6
cash
to $
8.81
cash
to $
12
cash
to $
cash
to
cash ca Cash Flow =
Non-Event Units/day *
Price 23
Model Notebook: Example of Describing Models
Data MiningRetail
Model Notebook: Example of Describing Models|||||||||||||||||||||||||||||||||| Top 1/6 of most expensive items, $5.30+
||||||||||| Past lift by store sub-dept dept front page||||||||||| Past lift by store, sub dept, dept, front page |||||||||| Average daily sales per item over prior events
||| Average price | Item is located on the front page of the flyer
Number of Saturday & Sundays in the event Item comes from the Health and Beauty deptItem comes from the Health and Beauty deptItem in the Stationary departmentAvg # items sold / day
24
Calculate $ of “Business Pain”
Data MiningRetail
Calculate $ of Business Pain
zeroerror
OverS k
UnderStockStock
25
Calculate $ of “Business Pain”
Data MiningRetail
Calculate $ of Business Pain
15% business
zeroerror
?1% bus pain $
pain $ ?OverS k
Underpain $
Equal mistakes
StockStock
qUnequal PAIN in $
26
Calculate $ of “Business Pain”
Data Mining++Retail
Calculate $ of Business Pain
No way – that could get you fired!No way that could get you fired!New progress in getting feedback
30% bus pain $15% business
zeroerror
OverS k
Under1% bus pain $
pain $
Stock
4 week supply of SKU
Stockpain $
Equal mistakes of SKU 30% off sale
qUnequal PAIN in $
27
Best Models by Lift Correlation <> Best by $
Data Mining
Best Models by Lift Correlation <> Best by $
The order of “best” models ranked by The order of best models ranked by technical metrics (correlation, MAD) vs.b i i t i did ’t t hbusiness pain metric didn’t matchA HUGE mismatch!
Change error function of data mining algs“$ over stock and under stock”
28
Change Data Mining Algorithm Error Func
Data Mining++
Change Data Mining Algorithm Error Func
Error function depends on Error function depends on knowing the threshold per SKU
“4 weeks of normal sales volume for the SKU”4 weeks of normal sales volume for the SKU
Neural Net (proprietary, from missile targeting)
After epoch, i.e. forward pass of 1000 records, calculate this error to minimize
Stepwise Regression & Cubist Leaf Regr.Change optimization problem from an RMSE of the target to RMSE of this error function & target
29
Worry About Response Time
Product MgmtRetail
Worry About Response Time
30
User Interface: 5 Levels of Complexity
Product MgmtData Mining
User Interface: 5 Levels of Complexity
Needs to make reliable for simplest stepSource data fields: use what is available & populatedInsure the minimum data enables a reliable systemUse metadata to select fields (i e exclude low corr empty)Use metadata to select fields (i.e. exclude low corr, empty)
Level 1: Train 6 models each for 3 fast engines, or with fast settings g g(i.e. more shallow trees)(~30 seconds)
Later Levels: Later Levels: Add more extensive search per engine of model parametersmore models in DOE, use slower engines, stay time sensitive(~30 minutes to 2 hours)
31
How is ELF Software and Not Consulting?
Product MgmtData Mining
How is ELF Software and Not Consulting?
Software install and configuration processConnect to Event Planning, Connect to ReplenishmentUse metadata tags on custom fields
Not dependent on field namesNot dependent on field namesSemantic (i.e. spending) and analytic tags (categorical, source)
Preprocessing executes if supporting data is availableInstaller validates by using ELF to create test models
End users create production models
LiftE t
32
LiftEvent
4
OutlineOutline
Challenge: How to automate not only g yforecasting, but model training?
Solution: Focus on a vertical market applicationD l i i h b i & h i l iDeeply investigate the business & technical issues
Result:Result:An enterprise applicationUp to a 30% reduction in $ lost to over and under Up to a 30 educt o $ ost to o e a d u destock
33
Result: Reduction in Business Pain
RetailData Mining
Result: Reduction in Business Pain
8 to 30% Reduction in Business Pain $
ELF ELF over $ over ELF HIGH $ High Over ELF under $ under
ELF, Model 117
stocking stock stock Over Stock Stock under stock stock
181 87 87$ 190 31 31$ 183 46 46$183 46 46$ 115 77 233$ 179 105 105$ 191 109 109$ 252 101 101$252 101 101$ 176 40 40$ 122 37 111$ 169 6 6$ 183 122 122$ 119 37 112$ 287 130 477$ 412 141 281$ 34
Result: Start Agile Process After
Product MgmtSoftware Dev
Result: Start Agile Process After…
Product Requirements Document (PRD)
Technical Specifications: data flow diagrams, use cases, business metric
Working Prototype, support for testing
Go through Agile & Scrum efforts w/ the soft are engineering gro psoftware engineering group
Review, revise, evaluate vs. business metrics
35
Result: Patent Application Process
Product MgmtData Mining
Result: Patent Application Process
Provisional Patent http://www.uspto.gov/
Re-write with help of patent attorney, very formalApplication will not be published for 18 months
Ordinary Skill in the Art Written by…
Jeffrey D Ullman, Stanford Computer Scienceh //i f l b f d d / ll / b/f 00 h lhttp://infolab.stanford.edu/~ullman/pub/focs00.html
The idea must be “novel,” “non obvious” & usefulNovel does not appear in previous literatureNovel – does not appear in previous literatureNon obvious – would not be discovered by one of “ordinary skill in the art” when the idea is neededordinary skill in the art when the idea is needed
How obvious is “obvious?” To how many of 100?36
To What other Verticals Could This Apply?
Data Mining
To What other Verticals Could This Apply?
It can apply where past examples in volume, pp y p p ,relate to future examplesMarketing / Advertising: (media independent)g / g ( p )
Finding new customers, clickers, buyers, spendingCross sell, up sellpCustomer Attrition (most likely to cancel)
Mortgage Bond pricing (help US out of this mess)g g p g ( p )
rating mortgages inside, forecasting prepayment & default ratesg p p y
Many other verticals37
SummarySummaryHow to automate? From the center out (i.e. onion)
Narrow vertical application known data source & feedsNarrow vertical application, known data source & feeds
How to select training data? Broadly
B t i t? Best improvement? Optimize by what gets people promoted or firedChange DM alg to opt bus metricChange DM alg. to opt. bus metric
How to make robust? Support, but not require, fieldsHeavy Research and Prototyping (R&P) before starting Agile Heavy Research and Prototyping (R&P) before starting Agile
How to succeed in business software?Support end users at the level of complexity they wantpp p y yHelp them succeed consistently and reliably
38
Questions & Answers?Questions & Answers?
[email protected](408)781-6808 cell
This PPT will be posted on SF Bay ACM and LinkedIn, belowThis PPT will be posted on SF Bay ACM and LinkedIn, belowhttp://sfbayacm.org/events/2009-03-11.phphttp://www.LinkedIn.com/in/GregMakowskihttp://fora.tv/ (Video company)http://fora.tv/ (Video company)
Future talks for ACM and ACM DM SIGhttp://www sfbayacm org/dmsig phphttp://www.sfbayacm.org/dmsig.php
Other talkshttp://www meetup com/Bay-Area-Collective-Intelligence/http://www.meetup.com/Bay-Area-Collective-Intelligence/http://www.sdforum.org (business intelligence & other sigs)
39