Exploiting full potential of predictive analytics on small ...Exploiting full potential of...
Transcript of Exploiting full potential of predictive analytics on small ...Exploiting full potential of...
Exploiting full potential of predictive analytics on small data to drive business
outcomes
Adrian Foltyn, External Data Science Expert
TM Forum, Nice
15 May 2018
Perfectly Portioned Ingredients For 3-5 Meals
Per Week
Personalised Fresh Food, Locally
Sourced
Easily Managed Via Subscription
Platform
1 Box Delivered Weekly To The
Door
NoPlanning
NoShopping
NoWaste
HelloFresh breaks the dinner routine by continuously innovating both service and product
Disrupting the supply chain by cutting middlemen, ensuring higher margins and fresher products
5
HelloFresh global footprint
How we use data science / machine learning
DATA SCIENCE @ HF
Fraud detection
Marketing attribution
Lifetime / churn prediction
Recommendation engines
Demand forecasting
minimize cost maximize revenue
Generalized Additive Models
Support Vector Regressions
Random Forests
Extreme Gradient Boosting
Bayesian networks
Collaborative filtering
Deep learning CNNs
ARIMA & other time series
models
Hidden Markov Models
Graph databases
Myself: a drift between consulting and data science
▪ Quant methods and computational psychoacoustics
▪ Demand forecasting
▪ Market research & business intelligence
▪ Data Science in strategic consulting
▪ Data Science in-house
Why bother about small data? Isn’t it dead by now?
▪ Not all areas of business produce big data
▪ Several business functions use / produce both small and big data, e.g. :
‒ Demand planning / forecasting
‒ Attribution & ROI of marketing activities
▪ As a result, top-down and bottom-up models appear
‒ Top-down focuses on trends of phenomena, predictors, aggregated values
‒ Bottom-up drills down to individual customer / transaction etc.
▪ Both model types require validation and could provide it for each other….
Demand forecasting challenges in a subscription meal kit business
▪ Huge cost impact
▪ Short time series
▪ High variance of pausing, strong seasonal (holiday) effects
▪ New conversions highly dependent on marketing plans
▪ Certain predictors difficult to track (e.g. errors in delivery)
▪ Large set of numbers to be delivered
‒ Splits into box types, recipes, delivery day, delivery time, …
▪ Tech legacy
What can we cook from these ingredients ? ☺
First decision: forecast top-down or bottom-up ?
CustomerID
Weeks from activation
Weeks from last pause
Weeksfrom last meal swap
No. of meal swaps total
No. ofboxes in total
Box type
……. Probabilityof getting a box
….. ….. ….. ….. ….. ….. ….. ….. 0.4
….. ….. ….. ….. ….. ….. ….. ….. 0.7
….. ….. ….. ….. ….. ….. ….. ….. 0.5
….. ….. ….. ….. ….. ….. ….. ….. 0.6
Total 0.55
Sales (boxes)**
Outlook of actives
Outlook of pauses
………..
`
** Dummy data in all charts
~
+
+
Why do we need a top-down forecasting model?
Should I stay (or should I
go)?
Shall I take a break?
Do I care to see my options
?
Do I swap my
meals?
CANCEL? PAUSE?TRUST DEFAULT MEAL CHOICE ?
SWAP MEALS?
Y
N
Y
Y
Y
N
N
N
• Each decision increases variance of final output
• In a bottom-up model those variances could mitigate each other or could explode…
• Top-down model (aggregate number of boxes) is much more stable
Methodological challenges of short time series forecasting
▪ Possible lack of time series effects ‒ no significant autocorrelation of sales
‒ still, burning need to control for trend in data / ‘baseline’
▪ Dynamic business growth introduces sales disruptions
▪ Lots of predictors are inter-correlated‒ particularly true in the case when some
outlooks (early trends) of activeness, pausing and cancelling are available
Forecasting – like cooking – is a mixture of art and science ☺
To our surprise, often times we found lack of time-related effects in forecast target
** Dummy data in all charts
Standard time series models cannot be the only tool we use….
Boxes shipped in one of countries **
Autocorrelation of differenced time series Partial autocorrelation of differenced ts
`
` `
We needed to introduce dummies for disruptions in sales time series
** Dummy data in all charts
` `
Histogram of weekly sales in country 1 ** Time series of weekly sales in country 2 **
Mixture of 2 distributions due to major change in marketing spend
Price cut effect = shift of entire sales
Feature selection is limited by size of input data and requires regularization
▪ Lots of predictors are highly correlated, even after controlling for trend / baseline
▪ This calls for regularization in both feature selection process and then in the model
▪ We use Lasso in feature selection and cubic shrinkage in Generalized Additive Models
``
Addressing non-linearity: Generalized Additive Models
• Introduced by Hastie and Tibsharani in 1990
• ’a step taken from GLM towards non-parametric models’
• Instead of estimating parameters, for each variable GAM estimates a function composed of smoothing elements
• The functions f1, . . . , fp can be natural splines, smoothing splines, local regressions, technically even polynomials (not used)
• Parametric terms and 2D-smoothers are also allowed
Future sales ie. no. of boxes to be delivered X weeks from now=
b0 + f1(marketing spend) + f2(planned pauses to date) + f3(planned cancellations to date) + f4(holiday effect) + weekday effect + …
Ensemble approach and root cause analysisBacktest in country 1 Backtest in country 2
• Neither standard time series nor average ensemble forecast work• Best forecast method selected by progressive cross validation is better (final.forecast)• Frequent review based on backtesting and root-cause analysis is even better
` `
Mo
del
err
or
bas
ed o
n 1
6-w
eek
pro
gres
sive
cro
ss-v
alid
atio
n
Mo
del
err
or
bas
ed o
n 1
6-w
eek
pro
gres
sive
cro
ss-v
alid
atio
n
Next step: predicting user-level demand with deep learning
CNNs
Factorization / Word2Vec
Marketing attribution problem
• Marketing spend for a company within multiple years, typically measured daily or weekly
• Question: how many conversions / how much revenue / how much CLV can be attributed to activity in each channel?
Marketing attribution challenges in most e-commerce businesses
▪ Huge cost impact
▪ Short time series
▪ Varying granularity of input data
▪ Very often marketing data stored by multiple people in obscure ways (sheets, docs, no standardization)
▪ Lack of full attribution models to cross check results
▪ Dealing with counter intuitive results…
Is there a silver / golden bullet? ☺
Here we go again: top-down or bottom-up approach?
CustomerID
Touchpoint Paid-Social
Touchpt.Affiliates
Touchpt. Bloggers
Touchpt….
Likelihoodof outdoor exposure
Likelihoodof TVexposure
…….
Number of boxes overfirst year(CLV)
….. ….. ….. ….. ….. ….. ….. ….. 10.5
….. ….. ….. ….. ….. ….. ….. ….. 2.5
….. ….. ….. ….. ….. ….. ….. ….. 5.3
….. ….. ….. ….. ….. ….. ….. ….. 7.6
Total 9.2
Number of boxes from
newly acquired
customers
Activity in TV**
Activity in PaidSocial
**
………..
`
** Dummy data in all charts
~
+
+
Balancing business insight and simulation / prediction power
• Typically, statistics used doesn’t align exactly to desired business outcomes
• There is usually an inverse relationship between how well the model predicts and how interpretable are its components
• In marketing attribution, forcing intuitive constraints (non-negative contribution of channels, convex shape of response = saturation etc.) often affect fit and predictive strength
• Hitting sweet spot requires an iterative process of refining the model against business assumptions and usability / actionability
Simulator for marketing attribution & ROI purposes based on a PCA + Bayesian network + GAM model
Conclusions: Do’s for small data
▪ Combine bottom-up and top-down
▪ Experiment if no history available
▪ Ensemble your models wisely
▪ Back-test and root-cause-analyze
▪ Factor in iterations with business and make it part of model building
▪ Keep calm and explain discrepancies….
Forecasting / simulating is the art of saying what will happen and then
explaining why it didn’t…
We’re hiring at HelloFresh!
▪ Data Scientists
‒ Python, R, Spark, Scala, ML + computer vision / NLP / other deep learning experience
▪ Machine Learning Engineers
‒ Python, Hadoop, Spark, Kafka, ML productionizing expertise
▪ Data Engineers
‒ Python, Hadoop, Spark, Kafka, Airflow, ETL experience
https://www.hellofresh.com/careers/
Thanks!Any Questions?