Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
-
Upload
revolution-analytics -
Category
Technology
-
view
114 -
download
3
description
Transcript of Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
RRE: Faster than SAS Results from Benchmarking
Thomas W. Dinsmore, Revolution Analytics
John Wallace, DataSong
Polling Question Do you currently use:
– A) R or Revolution R Enterprise (RRE)
– B) SAS
– C) Both
– D) Neither
Benchmarking RRE vs. SAS Background
Approach
Results
Discussion
4
Revolution R Enterprise Open source R
Commercially support distribution
Enhanced for enterprise use:
– Scalable analytics
– Developer tools
– Integration tools
– Deployment tools
5
2012: Allstate Benchmark
0 50 100 150 200 250 300
6
300
Runtime, Minutes
SAS PROC GENMOD RRE
Poisson Regression, 150MM rows
Criticism: “Apples to Oranges”
6
20 Cores 16 Cores
7
Most SAS/STAT PROCs (including PROC
GENMOD) run single-threaded.
SAS/STAT: 91 PROCs • 69 single threaded
• 13 multi-threaded
• 9 distributed (if you license SAS HP Statistics)
8
9
2013: SAS Benchmark PROC HPGENSELECT
– SAS/STAT
– SAS High Performance Statistics
Massive grid (140/144 nodes)
– 16 cores per node
– 2,240/2,304 cores
Conclusion: SAS on 2,304 cores is competitive
with RRE on 20 cores.
Honest Benchmarking Compare RRE and SAS/STAT performance
– Same data
– Same environment
– Same tasks
Test under real-world conditions
Make the test fair and transparent
Data
11
Manufactured data
Reproducible in any environment
Designed to emulate “typical” working data
“Entity” tables: 1MM, 5MM rows
“Predict” tables: 10MM, 50MM rows
Fact Pre-
dict
Entity 1
Entity 2
Entity key
571 Columns
21 Columns
Benchmarking Environment
12
SAS 9.4:
• Base
• STAT
• Grid Manager
Commodity servers: • 4 cores
• 16GB Memory
Gbit network
CentOS
RRE 7.0
Platform LSF 9
Analytic Tasks
13
Task SAS Capability RRE Capability
Descriptive Statistics PROC SURVEYMEANS rxSummary
Median and Deciles PROC SURVEYMEANS rxQuantile
Frequency Distribution PROC FREQ rxCube
Linear Regression (Numeric predictors) PROC REG, HPREG rxLinMod
Linear Regression (Mixed predictors) PROC GENMOD rxLinMod
Stepwise Linear (100 predictors) PROC REG rxLinMod/rxStepControl
Logistic Regression PROC LOGISTIC rxLogit
Generalized Linear PROC GENMOD rxGLM
K-Means Clustering PROC FASTCLUS rxKMeans
Score PROC SCORE rxPredict
14
Preparation Generated data with randomized procedure
Loaded data into native formats:
– RRE: XDF file
– SAS: SAS DATA set
Generation and load times not included
No meaningful differences
15
RRE: 42 Times Faster Than SAS 9.4
0 1,000 2,000 3,000 4,000 5,000 6,000
124
5,192
Runtime, Seconds
N=5,000,000
SAS 9.4 RRE RRE ~2 minutes
SAS ~1 hour, 26 minutes
Complete script: ten analytic tasks.
16
RRE: Linear Scalability
68 124
623
5,192
0
1,000
2,000
3,000
4,000
5,000
6,000
0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000
Runtim
e, S
econds
# Rows in Entity Table
RRE 7
SAS 9.4
RRE: consistent
performance with
increased data volume.
17
RRE: Up to 350X Faster Than SAS
0
50
100
150
200
250
300
350
400
RRE Speed Multiple
213 185
351
39 37 19
58
18
101
32 Runtim
e, S
econds
N=5MM Stats
Quintiles
Freq
Lin Reg 1
Lin Reg 2
Step Lin
Logistic
GLM
Kmeans 1
Kmeans 2
18
Why is RRE faster than SAS? RRE supports scalable computing out of the
box
– Multi-threaded processing
– Distributed processing
Legacy SAS is mostly single-threaded
– DATA Step processing
– Most SAS/STAT PROCs
19
SAS HP PROCs
9 new SAS PROCs
Bundled into SAS 9.4
Designed for scalability
Multiple operating modes:
– Single machine
– Distributed (must license SAS HP
Statistics)
20
HP PROCs: Minimal Improvement
0 50 100 150 200 250 300
6.8
267.17
253.82
Runtime, Seconds
N=5,000,000
SAS: PROC HPREG SAS: PROC REG RRE: rxLinMod
Linear regression, 20 predictors
HPREG running in single machine mode.
21
Summary RRE is faster than Legacy SAS:
– Same tasks
– Same hardware
RRE speed:
– Efficient engineering
– Multi-threaded and distributed processing
SAS performance claims:
– Massive hardware requirements
– Force you to license more software from SAS
– Don’t apply to Legacy SAS
22
Polling Question Which of the following analytic software
benefits is most important to you:
– A) Completing projects faster
– B) Building better predictive models
– C) High performance with low infrastructure costs
23
John Wallace, Founder & CEO
Background
Approaching $1 trillion in revenue analyzed. $3 billion in marketing spend under our lens.
Experienced 60+ person team based in San Francisco with offices in Seattle, Los Angeles,
Singapore, and India.
Founded in 2003 with a proven history of solving difficult analytics problems. Evolved from
consulting through close partnerships with our clients.
Our Offerings
Customer interaction insight that powers applications for customer-level revenue attribution,
targeting, media optimization.
Descriptive and predictive modeling of hidden trends and relationships in big data.
Custom development including applications, process automation, and decision support solutions.
DataSong at a Glance
DataSong Offerings Hosted Applications
● Revenue Attribution
● Customer Targeting
● Marketing Planning
We know Big Data. We analyze and provide the “so what”.
DataSong Architecture
• ETL
• N marketing channels
• Behavioral variables
• Promotional data
• Overlay data
• Functions to read Hadoop output; xdf creation
• Exploratory data analysis
• GAM survival models
• Scoring for inference
• Scoring for prediction
• 5 billion scores per day
per customer
DATASONG DATA
FORMAT (DDF)
CUSTOM VARIABLES
(PMML)
Where Speed Matters 3 key dimensions
● how many rows
● how many variables
● how many iterations of a model
Trade offs for speed
● Sampling variance
● Test fewers features
● Have less understanding of the signal
This 3rd dimension means we must multiply any benchmark by N
28
29
30
Thank You