Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

40
Embedded Automatic Model Training and Forecasting in an Enterprise Software Application (… or how to embed a data mining consultant in a box) Presented to the SF Bay ACM Data Mining SIG March 11, 2009 by Greg Makowski Principal Consultant, Golden Data Mining

description

http://www.sfbayacm.org/events/2009-03-11.php Topic How can the process of Knowledge Discovery in Databases be automated, competitive and reliable? One approach is to focus on a narrow vertical market application, with known data sources and data feeds. Then you can automate the Exploratory Data Analysis (EDA) and Preprocessing phases. But how do you automate the selection of training data? Can the enterprise application be installed and configured at a variety of clients without a Senior Knowledge Discovery Engineer? How can you minimize "worst case" results of such a system when used by a business user going through their normal business role? How can you deeply investigate and model "business values" (i.e. things that can get an end user promoted or fired) into the core of the data mining algorithms? This talk will answer these questions and more. The patent-pending application, ELF, is an enterprise application in the retail supply chain vertical market. Before the development of this system, one enterprise application was used to lay out a weekly newspaper flier three weeks before the sales event, which in turn fed data into a replenishment application. The replenishment application kept products on the store shelves, with a minimal amount of over stock and under stock. The pain point was that the retail buyer would have to manually estimate the the sales lift, or the multiplier increase in sales, for every item for every store. While human expertise can be great, it isn\'t as scalable when applied to a sales event with 1,000 - 4,000 items on sale in 6,000 stores. ELF (Event Lift Forecasting) would import data from a planned event and automatically analyze and forecast the lift for each store-item combination. Data elements used included pricing, placement in the flier, store geography and demographics, seasonality, and product hierarchy. The resulting ELF system produced a 8-30% reduction in over and under stock costs, which is very significant in terms of the low profit margins in the supply chain industry. About the Speaker Greg Makowski is a Principal Consultant of Golden Data Mining, in Los Altos, California. Since 1992, he has deployed over 70 data mining models for clients i n targeted marketing, financial services, supply chain, e-commerce, and Internet advertising in North America, South America and Europe. He has applied a variety of data mining algorithms during these engagements and has experience using SQL, SAS, Java, and areas of Cloud Computing. Greg has eight years of experience in Product Management and over six years of experience working with start ups. See also www.LinkedIn.com/in/GregMakowski

Transcript of Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Page 1: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Embedded Automatic Model Training and Forecasting

in an Enterprise Software Application

(… or how to embed a data mining consultant in a box)

Presented to the SF Bay ACM Data Mining SIGMarch 11, 2009 by Greg Makowski

Principal Consultant, Golden Data Miningp , g

Page 2: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

OutlineOutline

Challenge: How to automate not only forecasting, but model training?

Solution: Focus on a vertical market applicationDeeply investigate the business & technical issues

Result:Result:An enterprise applicationUp to a 30% reduction in $ lost to over and under stock

1

Page 3: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Challenge: Business Pain PointChallenge: Business Pain Point

JDA Software (who owns the IP) has dozens of ( )enterprise retail supply chain applications

Th R l i h t ft d g d The Replenishment software does a very good job keeping store shelves stocked at the right level when sales are steadyy

Moves product from warehouse to DC to storeSales are NOT STEADY during sales events!

PAIN POINT: The event planner has to estimate the lift in sales for every store-item combination, the lift in sales for every store item combination,

(6k stores) * (1k to 4k item’s) 24 mm store-item lift estmts.

2

Page 4: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Challenge: 16 Page Newspaper Insert

Retail(context)

Challenge: 16 Page Newspaper Insert

Can vary by region or ZIP

Page 5: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Event Lift Forecasting (ELF)Event Lift Forecasting (ELF)

Lift is a multiplier for the increase in sales over Lift is a multiplier for the increase in sales over normal

“Prod X in Store Y will sell 6.8 times more than normal”

Normal sales are around the event, for the same: time period (i.e. Thr – Sun), a week before and after (non-overlapping)Store – product (SKU is a key for product)

LiftE t LiftEvent

4

Page 6: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Challenge: Appropriate for Business User

Retail

Challenge: Appropriate for Business User

A retail event planner Has revenue goals and a “budget” of discount $Has to get through a lot of detail quicklyDoes not typically create mathematical forecasts

Uses an enterprise application to layout the Uses an enterprise application to layout the event flyer about 3 weeks in advance

Decides for the event: Decides for the event: departments / items / pricing / photos / language

Uses the software to specify SKU’s, images and l t th fllayout the flyer

5

Page 7: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Challenge: How to Productize (Agile)?

Product MgmtSoftware Arch

Challenge: How to Productize (Agile)?

This is not a one-off consulting project, but SWSoftware engineering needs (get in the ballpark)

right starting position, metrics, use cases, data flowg g p , , ,Support good Agile development process

GoalsGoalsAt least 90% software and 10% configuration, not repeated consulting projects not repeated consulting projects,

Control the Total Cost of Ownership for the product

RELIABLE when used by the business user RELIABLE when used by the business user, working at the level of detail that the user cares about

6

Page 8: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Challenge: Details we Have vs. Need to Start

Product Mgmt

Challenge: Details we Have vs. Need to Start

Page 9: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

OutlineOutline

Challenge: How to automate not only g yforecasting, but model training?

Solution: Focus on a vertical market applicationD l i i h b i & h i l iDeeply investigate the business & technical issues

Result:Result:An enterprise applicationUp to a 30% reduction in $ lost to over and under Up to a 30 educt o $ ost to o e a d u destock

8

Page 10: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Path to Solution

Product MgmtData Mining

Path to Solution

Customer lead, product driven – design general

Can’t data mine – without dataStart data request process with several clientsJumpstart efforts with Monte Carlo

Combine Census fields with noise to create a targetCombine Census fields with noise to create a targetThe models and forecast matter less – the process MORE

Ask for business interviewsAsk for business interviewsUnderstand users, metrics, past challenges

What is the BATNA?What is the BATNA?Best Alternative, To A New Alternative (system)?

9

Page 11: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Data Sources

Data Mining

Data Sources

Event Attributes (for planned in 3 weeks & past)

Pricing, placement (page #, on a page)Products, departments, layoutS f d hi f l i i Store features, demographics of population in area,

Past eventsPast eventsFlyers may have 1, 8, 12, 16, 20, 64 pagesSame week last year may have a different prod mixSame week last year may have a different prod mixCalculate Lift for all store-items for all past events

Normal sales (not during an event) near in timeNormal sales (not during an event) near in timeEvent sales; Lift = (event sales) / (non-event sales)

10

Page 12: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Iterative KDD Process

Data Mining

Iterative KDD Process

Knowledge Discovery in Databases (KDD)

1. Select Data for Analysis (from prior event app)

2. Exploratory Data Analysis (EDA)3. Preprocessing (manipulating fields)p g ( p g )

4. Model Building (Training DM algorithms)

5 Model Evaluation (appl to hold o t data)5. Model Evaluation (apply to hold out data)

6. Post-process score to business value7. Feed the next application (Lift / store-item)

11

Page 13: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Easiest to Automate From the Core

Data MiningProduct Mgmt

Easiest to Automate From the Core

Go through full process, automatingmodel building / evaluationEDA & PreprocessingSelect past marketing campaigns

12

Page 14: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Hypothesis to Select Past Campaigns: 1) Most Similar Past Events

Data Mining

1) Most Similar Past Events

Hypothesis: a close fit to the new event is betterAttention: your expertise will be quizzed!

Hypothesis: a close fit to the new event is better

Compare high level event attributesCompare high level event attributesNumber of pages of the flyerDiscount (average, max)“Primary” departments, sub-dep, catg, sub-category… and so on

Use “fuzzy” Euclidian distance to match past events to the planned event in 3 weeks

Select the 1-10 most similar events in the last year

13

Page 15: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Hypothesis to Select Past Campaigns: 2) Select Broadly

Data Mining

2) Select Broadly

Hypothesis: more training records provides a yp g pwide variety of behavior, and better generalization

Exclude past marketing events that are quite different (but be broadly inclusive)

If the planned event is 10-18 pages, exclude 1-2 and 64 page events

Audience Quiz: VOTE for what you expect1) Close fit 2) Broad fit ?1) Close fit, 2) Broad fit ?

14

Page 16: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Select Past Campaigns: Results & Why

Data Mining

Select Past Campaigns: Results & Why

Answer from testing: gBROADLY selecting past marketing events to train for the planned event works much better

GWhy: Breadth Robust GeneralizationSame sale last year was different in many waysBroad variety of price points / item or departmentVariety of items on coverV i ti hVariation over geography

15

Page 17: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Exploratory Data Analysis (EDA)

Data Mining

Exploratory Data Analysis (EDA)

Front cover items had a lift 5.1 times higher than the average elsewhere!

Lift as high as 130 – after Halloween candy lsale

The top 5% of the records had 90% of the lift (over all store-item combinations)

16

Page 18: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Exploratory Data Analysis (EDA)

Data MiningRetail

Exploratory Data Analysis (EDA)

The Cash Flow is Very Concentrated

Range of Lift Values(The Top 5% Provides 88% of the Lift)

140

Range of Lift Values (Omitting the Largest)

7

100

120

5

6

t)40

60

80

Lift

(Tar

get)

2

3

4

Lift

(Tar

get

0

20

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0

1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

?0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Bins of an Equal number of Records

Lift Baseline

Bins of an Equal number of Records

Lift Baseline

17Test weight and target variations, lift and lift_log

Page 19: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Preprocessing - Categorical

Data Mining++

Preprocessing Categorical

Average past Lift per category Percent off bin (i.e. 0%, 5%, 10%, 15% … 80%)Price Savings Bin (i.e. $2, $4, $6 …)S hi hStore hierarchyProduct hierarchy (50k to 100k SKUs, 4-6 levels)

Department Sub-department Category Sub-CatgegoryDepartment, Sub department, Category, Sub Catgegory

Seasonality, time, month, weekReason codes (the event is a circular, clearance)Location on the page in the flyer (top right, top left..)

Multivariate combinations – powerful & scalable(price bin) + (page loc bin) + (sub-cat)

18

Page 20: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Preprocessing – Interactions

Data Mining++

Preprocessing Interactions

19

Page 21: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Design Of Experiments (DOE)

Data Mining

Design Of Experiments (DOE)

Model Notebook (pictured in next slide)One row per model trainedinput columns: data version, model parametersoutput columns: training time, results in-sample, out of sample, gap (bigger is worse), and gap penalized resultspenalized results

Sections per data mining algorithm, i.e.Stepwise Regression Naïve BayesStepwise Regression, Naïve BayesCubist (tree w/ regression in leaves)Neural NetNeural NetTreeNet (from Salford Systms)

20

Page 22: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Model Notebook Tracks DOE

Data Mining++Instead of Occam’s Razor

Model Notebook Tracks DOEGeneralization Error = abs( in sample res – out of sample res )Conservative Result = worst( in, out samp ) + Generalization Err

N in In Out of Gen: Out +

MODEL RESULTSMean Abs Err (-good)ANALYSIS ENGINE SETTINGS

( , p )

1 2 3 4 N in ser Eng parameter 1 parameter 2 parameter 3 comment In

SampOut of Samp

Gen: In-Out

Out + Gen

1 regr Try target: LIFT LOG 58 vars selected 1 184 1 264 0 08 1 341 regr Try target: LIFT_LOG 58 vars selected 1.184 1.264 0.08 1.34

2 regr Try target: LIFT_LOG limit to 15 vars limit to 15 1.21 1.289 0.08 1.37

3 regr Try target: LIFT 65 vars selected 1.732 2.654 0.92 3.58

4 regr Try target: LIFT limit to 15 limit to 15 1 714 1 837 0 12 1 964 regr Try target: LIFT vars limit to 15 1.714 1.837 0.12 1.96

5 regr 60 vars selected 1.20 1.42 0.22 1.63Start with unv4_trn, and set larger wgt's for larger lift values wgt_2=1; IF(2<lift) wgt_2 = 2; IF(5<lift) wgt_2 = 3; 21

Page 23: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Data Mining Algorithm Improvements

Data Mining++

Data Mining Algorithm Improvements

Cubist http://www.rulequest.com/cubist-info.html

Ross Quinlan uses a “greedy algorithm” to select regression fields for each leafTested and changed to “stepwise regression” for Tested and changed to stepwise regression for each leaf

Split 1

Split 2 Split 3

Leaf 1

p p

Leaf 2 Leaf 3 Leaf 4

22

Page 24: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Training Priority – a Complex Surface

Data MiningRetail

$180,000

Training Priority a Complex Surface

$120,000

$140,000

$160,000

w

$60 000

$80,000

$100,000

e-Ite

ms

* Li

ft *

Cas

h Fl

ow

lift to 4.1 $0

$20,000

$40,000

$60,000

Num

Sto

reEv

ent

on-E

vent

C

lift to 55

lift to 1.0

lift to 1.4

lift to 2.1$0

1 2.54

$17.

08

o $2

2.89

h to

$32

.36

ash

to $

48.0

3

cash

to $

79.3

8

cash

to $

182

cash

to $

7,64

7

Cash Flow =

N

No

Liftlift to .55

cash

to $

6

cash

to $

8.81

cash

to $

12

cash

to $

cash

to

cash ca Cash Flow =

Non-Event Units/day *

Price 23

Page 25: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Model Notebook: Example of Describing Models

Data MiningRetail

Model Notebook: Example of Describing Models|||||||||||||||||||||||||||||||||| Top 1/6 of most expensive items, $5.30+

||||||||||| Past lift by store sub-dept dept front page||||||||||| Past lift by store, sub dept, dept, front page |||||||||| Average daily sales per item over prior events

||| Average price | Item is located on the front page of the flyer

Number of Saturday & Sundays in the event Item comes from the Health and Beauty deptItem comes from the Health and Beauty deptItem in the Stationary departmentAvg # items sold / day

24

Page 26: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Calculate $ of “Business Pain”

Data MiningRetail

Calculate $ of Business Pain

zeroerror

OverS k

UnderStockStock

25

Page 27: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Calculate $ of “Business Pain”

Data MiningRetail

Calculate $ of Business Pain

15% business

zeroerror

?1% bus pain $

pain $ ?OverS k

Underpain $

Equal mistakes

StockStock

qUnequal PAIN in $

26

Page 28: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Calculate $ of “Business Pain”

Data Mining++Retail

Calculate $ of Business Pain

No way – that could get you fired!No way that could get you fired!New progress in getting feedback

30% bus pain $15% business

zeroerror

OverS k

Under1% bus pain $

pain $

Stock

4 week supply of SKU

Stockpain $

Equal mistakes of SKU 30% off sale

qUnequal PAIN in $

27

Page 29: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Best Models by Lift Correlation <> Best by $

Data Mining

Best Models by Lift Correlation <> Best by $

The order of “best” models ranked by The order of best models ranked by technical metrics (correlation, MAD) vs.b i i t i did ’t t hbusiness pain metric didn’t matchA HUGE mismatch!

Change error function of data mining algs“$ over stock and under stock”

28

Page 30: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Change Data Mining Algorithm Error Func

Data Mining++

Change Data Mining Algorithm Error Func

Error function depends on Error function depends on knowing the threshold per SKU

“4 weeks of normal sales volume for the SKU”4 weeks of normal sales volume for the SKU

Neural Net (proprietary, from missile targeting)

After epoch, i.e. forward pass of 1000 records, calculate this error to minimize

Stepwise Regression & Cubist Leaf Regr.Change optimization problem from an RMSE of the target to RMSE of this error function & target

29

Page 31: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Worry About Response Time

Product MgmtRetail

Worry About Response Time

30

Page 32: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

User Interface: 5 Levels of Complexity

Product MgmtData Mining

User Interface: 5 Levels of Complexity

Needs to make reliable for simplest stepSource data fields: use what is available & populatedInsure the minimum data enables a reliable systemUse metadata to select fields (i e exclude low corr empty)Use metadata to select fields (i.e. exclude low corr, empty)

Level 1: Train 6 models each for 3 fast engines, or with fast settings g g(i.e. more shallow trees)(~30 seconds)

Later Levels: Later Levels: Add more extensive search per engine of model parametersmore models in DOE, use slower engines, stay time sensitive(~30 minutes to 2 hours)

31

Page 33: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

How is ELF Software and Not Consulting?

Product MgmtData Mining

How is ELF Software and Not Consulting?

Software install and configuration processConnect to Event Planning, Connect to ReplenishmentUse metadata tags on custom fields

Not dependent on field namesNot dependent on field namesSemantic (i.e. spending) and analytic tags (categorical, source)

Preprocessing executes if supporting data is availableInstaller validates by using ELF to create test models

End users create production models

LiftE t

32

LiftEvent

4

Page 34: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

OutlineOutline

Challenge: How to automate not only g yforecasting, but model training?

Solution: Focus on a vertical market applicationD l i i h b i & h i l iDeeply investigate the business & technical issues

Result:Result:An enterprise applicationUp to a 30% reduction in $ lost to over and under Up to a 30 educt o $ ost to o e a d u destock

33

Page 35: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Result: Reduction in Business Pain

RetailData Mining

Result: Reduction in Business Pain

8 to 30% Reduction in Business Pain $

ELF ELF over $ over ELF HIGH $ High Over ELF under $ under

ELF, Model 117

stocking stock stock Over Stock Stock under stock stock

181 87 87$ 190 31 31$ 183 46 46$183 46 46$ 115 77 233$ 179 105 105$ 191 109 109$ 252 101 101$252 101 101$ 176 40 40$ 122 37 111$ 169 6 6$ 183 122 122$ 119 37 112$ 287 130 477$ 412 141 281$ 34

Page 36: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Result: Start Agile Process After

Product MgmtSoftware Dev

Result: Start Agile Process After…

Product Requirements Document (PRD)

Technical Specifications: data flow diagrams, use cases, business metric

Working Prototype, support for testing

Go through Agile & Scrum efforts w/ the soft are engineering gro psoftware engineering group

Review, revise, evaluate vs. business metrics

35

Page 37: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Result: Patent Application Process

Product MgmtData Mining

Result: Patent Application Process

Provisional Patent http://www.uspto.gov/

Re-write with help of patent attorney, very formalApplication will not be published for 18 months

Ordinary Skill in the Art Written by…

Jeffrey D Ullman, Stanford Computer Scienceh //i f l b f d d / ll / b/f 00 h lhttp://infolab.stanford.edu/~ullman/pub/focs00.html

The idea must be “novel,” “non obvious” & usefulNovel does not appear in previous literatureNovel – does not appear in previous literatureNon obvious – would not be discovered by one of “ordinary skill in the art” when the idea is neededordinary skill in the art when the idea is needed

How obvious is “obvious?” To how many of 100?36

Page 38: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

To What other Verticals Could This Apply?

Data Mining

To What other Verticals Could This Apply?

It can apply where past examples in volume, pp y p p ,relate to future examplesMarketing / Advertising: (media independent)g / g ( p )

Finding new customers, clickers, buyers, spendingCross sell, up sellpCustomer Attrition (most likely to cancel)

Mortgage Bond pricing (help US out of this mess)g g p g ( p )

rating mortgages inside, forecasting prepayment & default ratesg p p y

Many other verticals37

Page 39: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

SummarySummaryHow to automate? From the center out (i.e. onion)

Narrow vertical application known data source & feedsNarrow vertical application, known data source & feeds

How to select training data? Broadly

B t i t? Best improvement? Optimize by what gets people promoted or firedChange DM alg to opt bus metricChange DM alg. to opt. bus metric

How to make robust? Support, but not require, fieldsHeavy Research and Prototyping (R&P) before starting Agile Heavy Research and Prototyping (R&P) before starting Agile

How to succeed in business software?Support end users at the level of complexity they wantpp p y yHelp them succeed consistently and reliably

38

Page 40: Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Questions & Answers?Questions & Answers?

[email protected](408)781-6808 cell

This PPT will be posted on SF Bay ACM and LinkedIn, belowThis PPT will be posted on SF Bay ACM and LinkedIn, belowhttp://sfbayacm.org/events/2009-03-11.phphttp://www.LinkedIn.com/in/GregMakowskihttp://fora.tv/ (Video company)http://fora.tv/ (Video company)

Future talks for ACM and ACM DM SIGhttp://www sfbayacm org/dmsig phphttp://www.sfbayacm.org/dmsig.php

Other talkshttp://www meetup com/Bay-Area-Collective-Intelligence/http://www.meetup.com/Bay-Area-Collective-Intelligence/http://www.sdforum.org (business intelligence & other sigs)

39