Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Embedded Automatic Model Training and Forecasting

in an Enterprise Software Application

(… or how to embed a data mining consultant in a box)

Presented to the SF Bay ACM Data Mining SIGMarch 11, 2009 by Greg Makowski

Principal Consultant, Golden Data Miningp , g

OutlineOutline

Challenge: How to automate not only forecasting, but model training?

Solution: Focus on a vertical market applicationDeeply investigate the business & technical issues

Result:Result:An enterprise applicationUp to a 30% reduction in $ lost to over and under stock

1

Challenge: Business Pain PointChallenge: Business Pain Point

JDA Software (who owns the IP) has dozens of ( )enterprise retail supply chain applications

Th R l i h t ft d g d The Replenishment software does a very good job keeping store shelves stocked at the right level when sales are steadyy

Moves product from warehouse to DC to storeSales are NOT STEADY during sales events!

PAIN POINT: The event planner has to estimate the lift in sales for every store-item combination, the lift in sales for every store item combination,

(6k stores) * (1k to 4k item’s) 24 mm store-item lift estmts.

2

Challenge: 16 Page Newspaper Insert

Retail(context)

Challenge: 16 Page Newspaper Insert

Can vary by region or ZIP

Event Lift Forecasting (ELF)Event Lift Forecasting (ELF)

Lift is a multiplier for the increase in sales over Lift is a multiplier for the increase in sales over normal

“Prod X in Store Y will sell 6.8 times more than normal”

Normal sales are around the event, for the same: time period (i.e. Thr – Sun), a week before and after (non-overlapping)Store – product (SKU is a key for product)

LiftE t LiftEvent

4

Challenge: Appropriate for Business User

Retail

Challenge: Appropriate for Business User

A retail event planner Has revenue goals and a “budget” of discount $Has to get through a lot of detail quicklyDoes not typically create mathematical forecasts

Uses an enterprise application to layout the Uses an enterprise application to layout the event flyer about 3 weeks in advance

Decides for the event: Decides for the event: departments / items / pricing / photos / language

Uses the software to specify SKU’s, images and l t th fllayout the flyer

5

Challenge: How to Productize (Agile)?

Product MgmtSoftware Arch

Challenge: How to Productize (Agile)?

This is not a one-off consulting project, but SWSoftware engineering needs (get in the ballpark)

right starting position, metrics, use cases, data flowg g p , , ,Support good Agile development process

GoalsGoalsAt least 90% software and 10% configuration, not repeated consulting projects not repeated consulting projects,

Control the Total Cost of Ownership for the product

RELIABLE when used by the business user RELIABLE when used by the business user, working at the level of detail that the user cares about

6

Challenge: Details we Have vs. Need to Start

Product Mgmt

Challenge: Details we Have vs. Need to Start

OutlineOutline

Challenge: How to automate not only g yforecasting, but model training?

Solution: Focus on a vertical market applicationD l i i h b i & h i l iDeeply investigate the business & technical issues

Result:Result:An enterprise applicationUp to a 30% reduction in $ lost to over and under Up to a 30 educt o $ ost to o e a d u destock

8

Path to Solution

Product MgmtData Mining

Path to Solution

Customer lead, product driven – design general

Can’t data mine – without dataStart data request process with several clientsJumpstart efforts with Monte Carlo

Combine Census fields with noise to create a targetCombine Census fields with noise to create a targetThe models and forecast matter less – the process MORE

Ask for business interviewsAsk for business interviewsUnderstand users, metrics, past challenges

What is the BATNA?What is the BATNA?Best Alternative, To A New Alternative (system)?

9

Data Sources

Data Mining

Data Sources

Event Attributes (for planned in 3 weeks & past)

Pricing, placement (page #, on a page)Products, departments, layoutS f d hi f l i i Store features, demographics of population in area,

Past eventsPast eventsFlyers may have 1, 8, 12, 16, 20, 64 pagesSame week last year may have a different prod mixSame week last year may have a different prod mixCalculate Lift for all store-items for all past events

Normal sales (not during an event) near in timeNormal sales (not during an event) near in timeEvent sales; Lift = (event sales) / (non-event sales)

10

Iterative KDD Process

Data Mining

Iterative KDD Process

Knowledge Discovery in Databases (KDD)

1. Select Data for Analysis (from prior event app)

2. Exploratory Data Analysis (EDA)3. Preprocessing (manipulating fields)p g ( p g )

4. Model Building (Training DM algorithms)

5 Model Evaluation (appl to hold o t data)5. Model Evaluation (apply to hold out data)

6. Post-process score to business value7. Feed the next application (Lift / store-item)

11

Easiest to Automate From the Core

Data MiningProduct Mgmt

Easiest to Automate From the Core

Go through full process, automatingmodel building / evaluationEDA & PreprocessingSelect past marketing campaigns

12

Hypothesis to Select Past Campaigns: 1) Most Similar Past Events

Data Mining

1) Most Similar Past Events

Hypothesis: a close fit to the new event is betterAttention: your expertise will be quizzed!

Hypothesis: a close fit to the new event is better

Compare high level event attributesCompare high level event attributesNumber of pages of the flyerDiscount (average, max)“Primary” departments, sub-dep, catg, sub-category… and so on

Use “fuzzy” Euclidian distance to match past events to the planned event in 3 weeks

Select the 1-10 most similar events in the last year

13

Hypothesis to Select Past Campaigns: 2) Select Broadly

Data Mining

2) Select Broadly

Hypothesis: more training records provides a yp g pwide variety of behavior, and better generalization

Exclude past marketing events that are quite different (but be broadly inclusive)

If the planned event is 10-18 pages, exclude 1-2 and 64 page events

Audience Quiz: VOTE for what you expect1) Close fit 2) Broad fit ?1) Close fit, 2) Broad fit ?

14

Select Past Campaigns: Results & Why

Data Mining

Select Past Campaigns: Results & Why

Answer from testing: gBROADLY selecting past marketing events to train for the planned event works much better

GWhy: Breadth Robust GeneralizationSame sale last year was different in many waysBroad variety of price points / item or departmentVariety of items on coverV i ti hVariation over geography

15

Exploratory Data Analysis (EDA)

Data Mining


Front cover items had a lift 5.1 times higher than the average elsewhere!

Lift as high as 130 – after Halloween candy lsale

The top 5% of the records had 90% of the lift (over all store-item combinations)

16


Data MiningRetail


The Cash Flow is Very Concentrated

Range of Lift Values(The Top 5% Provides 88% of the Lift)

140

Range of Lift Values (Omitting the Largest)

7

100

120

5

6

t)40

60

80

Lift

(Tar

get)

2

3

4

Lift

(Tar

get

0

20

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0

1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

?0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Bins of an Equal number of Records

Lift Baseline

Bins of an Equal number of Records

Lift Baseline

17Test weight and target variations, lift and lift_log

Preprocessing - Categorical

Data Mining++

Preprocessing Categorical

Average past Lift per category Percent off bin (i.e. 0%, 5%, 10%, 15% … 80%)Price Savings Bin (i.e. $2, $4, $6 …)S hi hStore hierarchyProduct hierarchy (50k to 100k SKUs, 4-6 levels)

Department Sub-department Category Sub-CatgegoryDepartment, Sub department, Category, Sub Catgegory

Seasonality, time, month, weekReason codes (the event is a circular, clearance)Location on the page in the flyer (top right, top left..)

Multivariate combinations – powerful & scalable(price bin) + (page loc bin) + (sub-cat)

18

Preprocessing – Interactions

Data Mining++

Preprocessing Interactions

19

Design Of Experiments (DOE)

Data Mining

Design Of Experiments (DOE)

Model Notebook (pictured in next slide)One row per model trainedinput columns: data version, model parametersoutput columns: training time, results in-sample, out of sample, gap (bigger is worse), and gap penalized resultspenalized results

Sections per data mining algorithm, i.e.Stepwise Regression Naïve BayesStepwise Regression, Naïve BayesCubist (tree w/ regression in leaves)Neural NetNeural NetTreeNet (from Salford Systms)

20

Model Notebook Tracks DOE

Data Mining++Instead of Occam’s Razor

Model Notebook Tracks DOEGeneralization Error = abs( in sample res – out of sample res )Conservative Result = worst( in, out samp ) + Generalization Err

N in In Out of Gen: Out +

MODEL RESULTSMean Abs Err (-good)ANALYSIS ENGINE SETTINGS

( , p )

1 2 3 4 N in ser Eng parameter 1 parameter 2 parameter 3 comment In

SampOut of Samp

Gen: In-Out

Out + Gen

1 regr Try target: LIFT LOG 58 vars selected 1 184 1 264 0 08 1 341 regr Try target: LIFT_LOG 58 vars selected 1.184 1.264 0.08 1.34

2 regr Try target: LIFT_LOG limit to 15 vars limit to 15 1.21 1.289 0.08 1.37

3 regr Try target: LIFT 65 vars selected 1.732 2.654 0.92 3.58

4 regr Try target: LIFT limit to 15 limit to 15 1 714 1 837 0 12 1 964 regr Try target: LIFT vars limit to 15 1.714 1.837 0.12 1.96

5 regr 60 vars selected 1.20 1.42 0.22 1.63Start with unv4_trn, and set larger wgt's for larger lift values wgt_2=1; IF(2<lift) wgt_2 = 2; IF(5<lift) wgt_2 = 3; 21

Data Mining Algorithm Improvements

Data Mining++

Data Mining Algorithm Improvements

Cubist http://www.rulequest.com/cubist-info.html

Ross Quinlan uses a “greedy algorithm” to select regression fields for each leafTested and changed to “stepwise regression” for Tested and changed to stepwise regression for each leaf

Split 1

Split 2 Split 3

Leaf 1

p p

Leaf 2 Leaf 3 Leaf 4

22

Training Priority – a Complex Surface

Data MiningRetail

$180,000

Training Priority a Complex Surface

$120,000

$140,000

$160,000

w

$60 000

$80,000

$100,000

e-Ite

ms

* Li

ft *

Cas

h Fl

ow

lift to 4.1 $0

$20,000

$40,000

$60,000

Num

Sto

reEv

ent

on-E

vent

C

lift to 55

lift to 1.0

lift to 1.4

lift to 2.1$0

1 2.54

$17.

08

o $2

2.89

h to

$32

.36

ash

to $

48.0

3

cash

to $

79.3

8

cash

to $

182

cash

to $

7,64

7

Cash Flow =

N

No

Liftlift to .55

cash

to $

6

cash

to $

8.81

cash

to $

12

cash

to $

cash

to

cash ca Cash Flow =

Non-Event Units/day *

Price 23

Model Notebook: Example of Describing Models

Data MiningRetail

Model Notebook: Example of Describing Models|||||||||||||||||||||||||||||||||| Top 1/6 of most expensive items, $5.30+

||||||||||| Past lift by store sub-dept dept front page||||||||||| Past lift by store, sub dept, dept, front page |||||||||| Average daily sales per item over prior events

||| Average price | Item is located on the front page of the flyer

Number of Saturday & Sundays in the event Item comes from the Health and Beauty deptItem comes from the Health and Beauty deptItem in the Stationary departmentAvg # items sold / day

24

Calculate $ of “Business Pain”

Data MiningRetail

Calculate $ of Business Pain

zeroerror

OverS k

UnderStockStock

25


Data MiningRetail


15% business

zeroerror

?1% bus pain $

pain $ ?OverS k

Underpain $

Equal mistakes

StockStock

qUnequal PAIN in $

26


Data Mining++Retail


No way – that could get you fired!No way that could get you fired!New progress in getting feedback

30% bus pain $15% business

zeroerror

OverS k

Under1% bus pain $

pain $

Stock

4 week supply of SKU

Stockpain $

Equal mistakes of SKU 30% off sale

qUnequal PAIN in $

27

Best Models by Lift Correlation <> Best by $

Data Mining

Best Models by Lift Correlation <> Best by $

The order of “best” models ranked by The order of best models ranked by technical metrics (correlation, MAD) vs.b i i t i did ’t t hbusiness pain metric didn’t matchA HUGE mismatch!

Change error function of data mining algs“$ over stock and under stock”

28

Change Data Mining Algorithm Error Func

Data Mining++

Change Data Mining Algorithm Error Func

Error function depends on Error function depends on knowing the threshold per SKU

“4 weeks of normal sales volume for the SKU”4 weeks of normal sales volume for the SKU

Neural Net (proprietary, from missile targeting)

After epoch, i.e. forward pass of 1000 records, calculate this error to minimize

Stepwise Regression & Cubist Leaf Regr.Change optimization problem from an RMSE of the target to RMSE of this error function & target

29

Worry About Response Time

Product MgmtRetail

Worry About Response Time

30

User Interface: 5 Levels of Complexity


User Interface: 5 Levels of Complexity

Needs to make reliable for simplest stepSource data fields: use what is available & populatedInsure the minimum data enables a reliable systemUse metadata to select fields (i e exclude low corr empty)Use metadata to select fields (i.e. exclude low corr, empty)

Level 1: Train 6 models each for 3 fast engines, or with fast settings g g(i.e. more shallow trees)(~30 seconds)

Later Levels: Later Levels: Add more extensive search per engine of model parametersmore models in DOE, use slower engines, stay time sensitive(~30 minutes to 2 hours)

31

How is ELF Software and Not Consulting?


How is ELF Software and Not Consulting?

Software install and configuration processConnect to Event Planning, Connect to ReplenishmentUse metadata tags on custom fields

Not dependent on field namesNot dependent on field namesSemantic (i.e. spending) and analytic tags (categorical, source)

Preprocessing executes if supporting data is availableInstaller validates by using ELF to create test models

End users create production models

LiftE t

32

LiftEvent

4

OutlineOutline

Challenge: How to automate not only g yforecasting, but model training?

Solution: Focus on a vertical market applicationD l i i h b i & h i l iDeeply investigate the business & technical issues

Result:Result:An enterprise applicationUp to a 30% reduction in $ lost to over and under Up to a 30 educt o $ ost to o e a d u destock

33

Result: Reduction in Business Pain

RetailData Mining

Result: Reduction in Business Pain

8 to 30% Reduction in Business Pain $

ELF ELF over $ over ELF HIGH $ High Over ELF under $ under

ELF, Model 117

stocking stock stock Over Stock Stock under stock stock

181 87 87$ 190 31 31$ 183 46 46$183 46 46$ 115 77 233$ 179 105 105$ 191 109 109$ 252 101 101$252 101 101$ 176 40 40$ 122 37 111$ 169 6 6$ 183 122 122$ 119 37 112$ 287 130 477$ 412 141 281$ 34

Result: Start Agile Process After

Product MgmtSoftware Dev

Result: Start Agile Process After…

Product Requirements Document (PRD)

Technical Specifications: data flow diagrams, use cases, business metric

Working Prototype, support for testing

Go through Agile & Scrum efforts w/ the soft are engineering gro psoftware engineering group

Review, revise, evaluate vs. business metrics

35

Result: Patent Application Process


Result: Patent Application Process

Provisional Patent http://www.uspto.gov/

Re-write with help of patent attorney, very formalApplication will not be published for 18 months

Ordinary Skill in the Art Written by…

Jeffrey D Ullman, Stanford Computer Scienceh //i f l b f d d / ll / b/f 00 h lhttp://infolab.stanford.edu/~ullman/pub/focs00.html

The idea must be “novel,” “non obvious” & usefulNovel does not appear in previous literatureNovel – does not appear in previous literatureNon obvious – would not be discovered by one of “ordinary skill in the art” when the idea is neededordinary skill in the art when the idea is needed

How obvious is “obvious?” To how many of 100?36

To What other Verticals Could This Apply?

Data Mining

To What other Verticals Could This Apply?

It can apply where past examples in volume, pp y p p ,relate to future examplesMarketing / Advertising: (media independent)g / g ( p )

Finding new customers, clickers, buyers, spendingCross sell, up sellpCustomer Attrition (most likely to cancel)

Mortgage Bond pricing (help US out of this mess)g g p g ( p )

rating mortgages inside, forecasting prepayment & default ratesg p p y

Many other verticals37

SummarySummaryHow to automate? From the center out (i.e. onion)

Narrow vertical application known data source & feedsNarrow vertical application, known data source & feeds

How to select training data? Broadly

B t i t? Best improvement? Optimize by what gets people promoted or firedChange DM alg to opt bus metricChange DM alg. to opt. bus metric

How to make robust? Support, but not require, fieldsHeavy Research and Prototyping (R&P) before starting Agile Heavy Research and Prototyping (R&P) before starting Agile

How to succeed in business software?Support end users at the level of complexity they wantpp p y yHelp them succeed consistently and reliably

38

Questions & Answers?Questions & Answers?

[email protected](408)781-6808 cell

This PPT will be posted on SF Bay ACM and LinkedIn, belowThis PPT will be posted on SF Bay ACM and LinkedIn, belowhttp://sfbayacm.org/events/2009-03-11.phphttp://www.LinkedIn.com/in/GregMakowskihttp://fora.tv/ (Video company)http://fora.tv/ (Video company)

Future talks for ACM and ACM DM SIGhttp://www sfbayacm org/dmsig phphttp://www.sfbayacm.org/dmsig.php

Other talkshttp://www meetup com/Bay-Area-Collective-Intelligence/http://www.meetup.com/Bay-Area-Collective-Intelligence/http://www.sdforum.org (business intelligence & other sigs)

39

Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Technology

Transcript of Embedded Automatic Model Training And Forc In An Enterprise Sw Applic