Big data real time R - useR! 2013 - David Smith
-
Upload
revolution-analytics -
Category
Technology
-
view
103 -
download
0
description
Transcript of Big data real time R - useR! 2013 - David Smith
![Page 1: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/1.jpg)
1
Big-data, real-time R?
Yes, you can!
David SmithRevolution Analytics
@revodavid
![Page 2: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/2.jpg)
2
REAL TIME
BIG DATA
PREDICTIVE ANALYTICS
Buzzword Bingo!
With R?
![Page 3: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/3.jpg)
3
Real-time Deployment
1. Data distillation2. Model development
and validation3. Model deployment4. Real-time model
scoring5. Model refresh
![Page 4: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/4.jpg)
4Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0
“Big Data”
![Page 5: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/5.jpg)
5
1. Data Distillation in Hadoop
Unstructured
Data
Analytics Data Mart
Structured Data
Log Files
Sensor Streams
Language Text
HDFS LoadMap-
ReduceRHadooprmr
![Page 6: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/6.jpg)
6
2. The Model Development Cycle
Feature SelectionSamplingAggregat
ion
Variable Trans-
formation
Model Estimatio
n
Model Refinem
ent
Model Compari
son / Bench-
marking
Predictive Model
R White Paperbit.ly/r-is-hot
Structured Data
![Page 7: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/7.jpg)
7
Big-Data Predictive Models with ScaleR
![Page 8: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/8.jpg)
8
3: Deployment Options
Unknown factorsSQL / Rules EngineCode (C++, Java, R, Hadoop)PMML Engine
Factors known in advanceBatch Lookup Tables
Factors
Scores
![Page 9: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/9.jpg)
9
4. Real-Time Scoring Factors
Scores
”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0
Decision TreeLogistic RegressionNeural NetworkK-means clusteringEnsemble Model
Predictive Model
User IDBrowserTime/Date / LocationPrevious purchasesFriend data
Any known information
Product of most interestOffer of most likely saleMost relevant linkForecast sale valueOptimal Bid
Prediction or Selection
Scoring Rules
![Page 10: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/10.jpg)
10
5. Model refresh Factors
Scores
Actual Outcomes
![Page 11: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/11.jpg)
11
Big Data
Real Time
Kilobytes/Sec
Megabytes/Sec
Gigabytes Terabytes
Petabytes Exabytes
Seconds
Milliseconds
Minutes
Minutes Hours
![Page 12: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/12.jpg)
12
Real-World ExamplesRevolution Analytics Case Studies
![Page 13: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/13.jpg)
13
Why did I buy that blender?
Just browsing in the mallTV ad / magazine adCoupon in the mail“Just moved” promo emailWebstore recommendationBrowsing catalog
![Page 14: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/14.jpg)
14
UpStream: Attribution Modeling
![Page 15: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/15.jpg)
• ETL
• Marketing channel data
• Behavioral variables
• Promotional data
• Overlay data
• Exploratory data analysis• Time-to-event models• GAM survival models
• Scoring for inference
• Scoring for prediction
• 5 billion scores per day per retailer
UPSTREAM DATA FORMAT
CUSTOM VARIABLES (PMML)
![Page 16: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/16.jpg)
16
ACI
Top-20 mutual fund company$125B assetsResearch and data-drivenInnovative
![Page 17: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/17.jpg)
17
• Collaboration
• Speed• Deployment
Process• Adoption• Results
Analytics Function Library
rACI Package (w/ RevoR)
Model Building Function Library
Data Acquisition Function Library
Portfolio Optimization and Simulation API
Market Data from Thomson Reuters (QA-Direct)
American Century Quant Proprietary Data
Additional 3rd Party Data Vendors
Live Analytics
PRODUCTION MODEL GENERATION AND TRADING PROCESSES
Data Feeds
![Page 18: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/18.jpg)
18
PREDICTIVE ANALYTICSBIG DATA
REAL TIMEYes You Can!
![Page 19: Big data real time R - useR! 2013 - David Smith](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c62c134a7959b5078b457b/html5/thumbnails/19.jpg)
19
www.revolutionanalytics.com +1 650 646 9545 Twitter: @RevolutionR
The leading enterprise provider of software and services for Open Source R
Big-Data, Real-Time R?Yes, you can!
David Smith@revodavid