On Leveraging Social Media Pranam Kolari Tim Finin & eBiquity folks!
Data Exploration at Ebiquity - The UK's Premier R User Group · Data exploration at Ebiquity (and...
Transcript of Data Exploration at Ebiquity - The UK's Premier R User Group · Data exploration at Ebiquity (and...
Data Exploration at EbiquityWojtek Kostelecki
2
Kaggle Dojo, R Dojo, Python Dojo
Kaggle Hackathon – Saturday 27 May
meetup.com/London-Kaggle-Meetup
3
Data exploration at Ebiquity (and in a hackathon)
Our hotel
in Porto
I-COM GLOBAL SUMMIT (MARKETING DATA AND MEASUREMENT)
4
Intel Challenge Overview
Business Challenge:
What is the impact of discussions in social media and brand health
indicators on advertising effectiveness for high consideration purchases
such as consumer PC sales in the US?
Prediction Challenge:
Predict the sales revenue by CPU brand/device brand combination by
month for Jan and Feb 2017
Time Challenge:
24 Hours
EBVERSE
5
Hadleyverse/Tidyverse
RAW DATA
6
~300 MB of data over ~400 files
directory of text files
ebdb::dir_stack(path, func)
multi-sheet excel file
ebdb::read_yougov(x)
GITLAB
7
8
data script sourced data
EBPLOT
9
EBPLOT – PC/PROCESSOR OFFLINE ADVERTISING IN UK (AS MONITORED BY EBIQUITY)
10
area_plot(df, "Week", "Spend")
EBPLOT – PC/PROCESSOR OFFLINE ADVERTISING IN UK (AS MONITORED BY EBIQUITY)
11
area_plot(df, "Week", "Spend", "Company")
EBPLOT – PC/PROCESSOR OFFLINE ADVERTISING IN UK (AS MONITORED BY EBIQUITY)
12
area_plot(df, "Week", "Spend", "Year", "Company", labels = TRUE)
EBPLOT – PC/PROCESSOR OFFLINE ADVERTISING IN UK (AS MONITORED BY EBIQUITY)
13
share_plot(df, "Year", "Spend", "Company")
EBPLOT – PC/PROCESSOR OFFLINE ADVERTISING IN UK (AS MONITORED BY EBIQUITY)
14
share_plot(df, "Year", "Spend", "Company", "Device")
EBPLOT – PC/PROCESSOR OFFLINE ADVERTISING IN UK (AS MONITORED BY EBIQUITY)
15
share_plot(df, "Week", "Spend", "Company")
EBPLOT – PC/PROCESSOR OFFLINE ADVERTISING IN UK (AS MONITORED BY EBIQUITY)
16
waterfall(df, "Device", "Spend", "Year")
MODELLING
17
PREDICTION CHALLENGE – FORECAST TWO MONTHS OF SALES
18
Time Proc. Manuf. Dev. Rev. … Microsoft Ad Intel Ad AMD Ad Dell Ad
1 Intel Microsoft Laptop ## … £0 £10k £0 £0
2 Intel Microsoft Laptop ## … £100k 0k £0 £0
1 Intel Asus Laptop ## … £0 £10k £0 £0
2 Intel Asus Laptop ## … £0 0k £0 £0
1 Intel Dell Desktop ## … £0 £10k £0 £50k
2 Intel Dell Desktop ## … £0 0k £0 £50k
1 AMD Dell Laptop ## … £0 £0 £10k £50k
2 AMD Dell Laptop ## … £0 £0 £10k £50k
1 AMD Lenovo Laptop ## … £0 £0 £10k £0
2 AMD Lenovo Laptop ## … £0 £0 £10k £0
… … … … … … … … … …
target
19
In House Weighted OLS with box-
constrained optimization• Estimate OLS
• Enforce coefficient
boundary constraints
• Re-estimate OLS with
boundary-touching
coefficients fixed
• Release fixed coefficients if
necessary
• Repeat
Variable w/transformationMod.
Link
Var.
Link
Coef.
Min
Coef.
Max
INTERCEPT Model
… … … … …
amd_tv * (proc == "AMD") Device proc_tv 0
intel_tv * (proc == "Intel") Device proc_tv 0
… … … … …
Model Specification
Model Estimation
BENCHMARKING
20
Using a 6000 x 1230 model matrixImplementation with sparse
matrices and repeated
calculations moved up one level
21