Exploiting full potential of predictive analytics on small ...Exploiting full potential of...

Exploiting full potential of predictive analytics on small data to drive business

outcomes

Adrian Foltyn, External Data Science Expert

TM Forum, Nice

15 May 2018

Perfectly Portioned Ingredients For 3-5 Meals

Per Week

Personalised Fresh Food, Locally

Sourced

Easily Managed Via Subscription

Platform

1 Box Delivered Weekly To The

Door

NoPlanning

NoShopping

NoWaste

HelloFresh breaks the dinner routine by continuously innovating both service and product

Disrupting the supply chain by cutting middlemen, ensuring higher margins and fresher products

5

HelloFresh global footprint

How we use data science / machine learning

DATA SCIENCE @ HF

Fraud detection

Marketing attribution

Lifetime / churn prediction

Recommendation engines

Demand forecasting

minimize cost maximize revenue

Generalized Additive Models

Support Vector Regressions

Random Forests

Extreme Gradient Boosting

Bayesian networks

Collaborative filtering

Deep learning CNNs

ARIMA & other time series

models

Hidden Markov Models

Graph databases

Myself: a drift between consulting and data science

▪ Quant methods and computational psychoacoustics

▪ Demand forecasting

▪ Market research & business intelligence

▪ Data Science in strategic consulting

▪ Data Science in-house

Why bother about small data? Isn’t it dead by now?

▪ Not all areas of business produce big data

▪ Several business functions use / produce both small and big data, e.g. :

‒ Demand planning / forecasting

‒ Attribution & ROI of marketing activities

▪ As a result, top-down and bottom-up models appear

‒ Top-down focuses on trends of phenomena, predictors, aggregated values

‒ Bottom-up drills down to individual customer / transaction etc.

▪ Both model types require validation and could provide it for each other….

Demand forecasting challenges in a subscription meal kit business

▪ Huge cost impact

▪ Short time series

▪ High variance of pausing, strong seasonal (holiday) effects

▪ New conversions highly dependent on marketing plans

▪ Certain predictors difficult to track (e.g. errors in delivery)

▪ Large set of numbers to be delivered

‒ Splits into box types, recipes, delivery day, delivery time, …

▪ Tech legacy

What can we cook from these ingredients ? ☺

First decision: forecast top-down or bottom-up ?

CustomerID

Weeks from activation

Weeks from last pause

Weeksfrom last meal swap

No. of meal swaps total

No. ofboxes in total

Box type

……. Probabilityof getting a box

….. ….. ….. ….. ….. ….. ….. ….. 0.4

….. ….. ….. ….. ….. ….. ….. ….. 0.7

….. ….. ….. ….. ….. ….. ….. ….. 0.5

….. ….. ….. ….. ….. ….. ….. ….. 0.6

Total 0.55

Sales (boxes)**

Outlook of actives

Outlook of pauses

………..

`

** Dummy data in all charts

~

+

+

Why do we need a top-down forecasting model?

Should I stay (or should I

go)?

Shall I take a break?

Do I care to see my options

?

Do I swap my

meals?

CANCEL? PAUSE?TRUST DEFAULT MEAL CHOICE ?

SWAP MEALS?

Y

N

Y

Y

Y

N

N

N

• Each decision increases variance of final output

• In a bottom-up model those variances could mitigate each other or could explode…

• Top-down model (aggregate number of boxes) is much more stable

Methodological challenges of short time series forecasting

▪ Possible lack of time series effects ‒ no significant autocorrelation of sales

‒ still, burning need to control for trend in data / ‘baseline’

▪ Dynamic business growth introduces sales disruptions

▪ Lots of predictors are inter-correlated‒ particularly true in the case when some

outlooks (early trends) of activeness, pausing and cancelling are available

Forecasting – like cooking – is a mixture of art and science ☺

To our surprise, often times we found lack of time-related effects in forecast target


Standard time series models cannot be the only tool we use….

Boxes shipped in one of countries **

Autocorrelation of differenced time series Partial autocorrelation of differenced ts

`

` `

We needed to introduce dummies for disruptions in sales time series


` `

Histogram of weekly sales in country 1 ** Time series of weekly sales in country 2 **

Mixture of 2 distributions due to major change in marketing spend

Price cut effect = shift of entire sales

Feature selection is limited by size of input data and requires regularization

▪ Lots of predictors are highly correlated, even after controlling for trend / baseline

▪ This calls for regularization in both feature selection process and then in the model

▪ We use Lasso in feature selection and cubic shrinkage in Generalized Additive Models

``

Addressing non-linearity: Generalized Additive Models

• Introduced by Hastie and Tibsharani in 1990

• ’a step taken from GLM towards non-parametric models’

• Instead of estimating parameters, for each variable GAM estimates a function composed of smoothing elements

• The functions f1, . . . , fp can be natural splines, smoothing splines, local regressions, technically even polynomials (not used)

• Parametric terms and 2D-smoothers are also allowed

Future sales ie. no. of boxes to be delivered X weeks from now=

b0 + f1(marketing spend) + f2(planned pauses to date) + f3(planned cancellations to date) + f4(holiday effect) + weekday effect + …

Ensemble approach and root cause analysisBacktest in country 1 Backtest in country 2

• Neither standard time series nor average ensemble forecast work• Best forecast method selected by progressive cross validation is better (final.forecast)• Frequent review based on backtesting and root-cause analysis is even better

` `

Mo

del

err

or

bas

ed o

n 1

6-w

eek

pro

gres

sive

cro

ss-v

alid

atio

n

Mo

del

err

or

bas

ed o

n 1

6-w

eek

pro

gres

sive

cro

ss-v

alid

atio

n

Next step: predicting user-level demand with deep learning

CNNs

Factorization / Word2Vec

Marketing attribution problem

• Marketing spend for a company within multiple years, typically measured daily or weekly

• Question: how many conversions / how much revenue / how much CLV can be attributed to activity in each channel?

Marketing attribution challenges in most e-commerce businesses

▪ Huge cost impact

▪ Short time series

▪ Varying granularity of input data

▪ Very often marketing data stored by multiple people in obscure ways (sheets, docs, no standardization)

▪ Lack of full attribution models to cross check results

▪ Dealing with counter intuitive results…

Is there a silver / golden bullet? ☺

Here we go again: top-down or bottom-up approach?

CustomerID

Touchpoint Paid-Social

Touchpt.Affiliates

Touchpt. Bloggers

Touchpt….

Likelihoodof outdoor exposure

Likelihoodof TVexposure

…….

Number of boxes overfirst year(CLV)

….. ….. ….. ….. ….. ….. ….. ….. 10.5

….. ….. ….. ….. ….. ….. ….. ….. 2.5

….. ….. ….. ….. ….. ….. ….. ….. 5.3

….. ….. ….. ….. ….. ….. ….. ….. 7.6

Total 9.2

Number of boxes from

newly acquired

customers

Activity in TV**

Activity in PaidSocial

**

………..

`


~

+

+

Balancing business insight and simulation / prediction power

• Typically, statistics used doesn’t align exactly to desired business outcomes

• There is usually an inverse relationship between how well the model predicts and how interpretable are its components

• In marketing attribution, forcing intuitive constraints (non-negative contribution of channels, convex shape of response = saturation etc.) often affect fit and predictive strength

• Hitting sweet spot requires an iterative process of refining the model against business assumptions and usability / actionability

Simulator for marketing attribution & ROI purposes based on a PCA + Bayesian network + GAM model

Conclusions: Do’s for small data

▪ Combine bottom-up and top-down

▪ Experiment if no history available

▪ Ensemble your models wisely

▪ Back-test and root-cause-analyze

▪ Factor in iterations with business and make it part of model building

▪ Keep calm and explain discrepancies….

Forecasting / simulating is the art of saying what will happen and then

explaining why it didn’t…

We’re hiring at HelloFresh!

▪ Data Scientists

‒ Python, R, Spark, Scala, ML + computer vision / NLP / other deep learning experience

▪ Machine Learning Engineers

‒ Python, Hadoop, Spark, Kafka, ML productionizing expertise

▪ Data Engineers

‒ Python, Hadoop, Spark, Kafka, Airflow, ETL experience

https://www.hellofresh.com/careers/

Thanks!Any Questions?

Exploiting full potential of predictive analytics on small ...Exploiting full potential of...

Documents

Transcript of Exploiting full potential of predictive analytics on small ...Exploiting full potential of...