Post on 23-Feb-2016
description
1
Real-Time Big Data AnalyticsFrom Deployment to
Production
David SmithRevolution Analytics
@revodavid
2
WHAT’S UP
WITH THAT?
3
REAL TIME
BIG DATA
PREDICTIVE ANALYTICS
Buzzword Bingo!
4Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0
5
Predictive Analytics Model
Factors
Scores
”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0
Decision TreeLogistic RegressionNeural NetworkK-means clusteringEnsemble Model
Predictive Model
User IDBrowserTime/Date / LocationPrevious purchasesFriend data
Any known information
Product of most interestOffer of most likely saleMost relevant linkForecast sale valueOptimal Bid
Prediction or Selection
Scoring Rules
"CLOCK" by Heiko Klingele flickr.com/photos/divdax/3458668053/ CC-BY 2.0
6
Real-time Deployment1. Data distillation2. Model development and
validation3. Model deployment4. Real-time model scoring5. Model refresh
7
1. Data Distillation in Hadoop
Unstructured
Data
Analytics Data Mart
Structured Data
Log Files
Sensor Streams
Language Text
HDFS Load Map-Reducermr
8
2. The Model Development CycleFeature
SelectionSamplingAggregat
ionVariable Trans-
formation
Model Estimatio
n
Model Refinem
ent
Model Compari
son / Bench-
markingStructured Data Predictive Model
R White Paperbit.ly/r-is-hot
9
3: Deployment OptionsUnknown factors
SQL / Rules EngineCode (C++, Java, R, Hadoop)PMML Engine
Factors known in advanceBatch Lookup Tables
Factors
Scores
10
Why did I buy that blender?Just browsing in the mallTV ad / magazine adCoupon in the mail“Just moved” promo emailWebstore recommendationBrowsing catalog
11
UpStream: Attribution Modeling
• ETL• Marketing channel data• Behavioral variables• Promotional data• Overlay data
• Exploratory data analysis• Time-to-event models• GAM survival models
• Scoring for inference• Scoring for prediction
• 5 billion scores per day per retailer
UPSTREAM DATA FORMAT
CUSTOM VARIABLES (PMML)
4. Model Scorin
g
13
5. Model refresh Factors
ScoresActual
Outcomes
14
Big Data
Real TimeKilobytes/
SecMegabyte
s/Sec
Gigabytes Terabytes
Petabytes Exabytes
Seconds
Milliseconds
Minutes
Minutes Hours
15
PREDICTIVE ANALYTICSBIG DATA
REAL TIMEWHAT’S UP
WITH THAT?
16
www.revolutionanalytics.com +1 650 646 9545 Twitter: @RevolutionR
The leading enterprise provider of software and services for Open Source R
Real-Time Big Data Predictive Analytics: From Deployment to Production
Booth 618 / Office Hours Weds 1:30PM
David Smith@revodavid