Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

30
RRE: Faster than SAS Results from Benchmarking Thomas W. Dinsmore, Revolution Analytics John Wallace, DataSong

description

 

Transcript of Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

Page 1: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

RRE: Faster than SAS Results from Benchmarking

Thomas W. Dinsmore, Revolution Analytics

John Wallace, DataSong

Page 2: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

Polling Question Do you currently use:

– A) R or Revolution R Enterprise (RRE)

– B) SAS

– C) Both

– D) Neither

Page 3: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

Benchmarking RRE vs. SAS Background

Approach

Results

Discussion

Page 4: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

4

Revolution R Enterprise Open source R

Commercially support distribution

Enhanced for enterprise use:

– Scalable analytics

– Developer tools

– Integration tools

– Deployment tools

Page 5: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

5

2012: Allstate Benchmark

0 50 100 150 200 250 300

6

300

Runtime, Minutes

SAS PROC GENMOD RRE

Poisson Regression, 150MM rows

Page 6: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

Criticism: “Apples to Oranges”

6

20 Cores 16 Cores

Page 7: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

7

Most SAS/STAT PROCs (including PROC

GENMOD) run single-threaded.

SAS/STAT: 91 PROCs • 69 single threaded

• 13 multi-threaded

• 9 distributed (if you license SAS HP Statistics)

Page 8: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

8

Page 9: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

9

2013: SAS Benchmark PROC HPGENSELECT

– SAS/STAT

– SAS High Performance Statistics

Massive grid (140/144 nodes)

– 16 cores per node

– 2,240/2,304 cores

Conclusion: SAS on 2,304 cores is competitive

with RRE on 20 cores.

Page 10: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

Honest Benchmarking Compare RRE and SAS/STAT performance

– Same data

– Same environment

– Same tasks

Test under real-world conditions

Make the test fair and transparent

Page 11: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

Data

11

Manufactured data

Reproducible in any environment

Designed to emulate “typical” working data

“Entity” tables: 1MM, 5MM rows

“Predict” tables: 10MM, 50MM rows

Fact Pre-

dict

Entity 1

Entity 2

Entity key

571 Columns

21 Columns

Page 12: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

Benchmarking Environment

12

SAS 9.4:

• Base

• STAT

• Grid Manager

Commodity servers: • 4 cores

• 16GB Memory

Gbit network

CentOS

RRE 7.0

Platform LSF 9

Page 13: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

Analytic Tasks

13

Task SAS Capability RRE Capability

Descriptive Statistics PROC SURVEYMEANS rxSummary

Median and Deciles PROC SURVEYMEANS rxQuantile

Frequency Distribution PROC FREQ rxCube

Linear Regression (Numeric predictors) PROC REG, HPREG rxLinMod

Linear Regression (Mixed predictors) PROC GENMOD rxLinMod

Stepwise Linear (100 predictors) PROC REG rxLinMod/rxStepControl

Logistic Regression PROC LOGISTIC rxLogit

Generalized Linear PROC GENMOD rxGLM

K-Means Clustering PROC FASTCLUS rxKMeans

Score PROC SCORE rxPredict

Page 14: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

14

Preparation Generated data with randomized procedure

Loaded data into native formats:

– RRE: XDF file

– SAS: SAS DATA set

Generation and load times not included

No meaningful differences

Page 15: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

15

RRE: 42 Times Faster Than SAS 9.4

0 1,000 2,000 3,000 4,000 5,000 6,000

124

5,192

Runtime, Seconds

N=5,000,000

SAS 9.4 RRE RRE ~2 minutes

SAS ~1 hour, 26 minutes

Complete script: ten analytic tasks.

Page 16: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

16

RRE: Linear Scalability

68 124

623

5,192

0

1,000

2,000

3,000

4,000

5,000

6,000

0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000

Runtim

e, S

econds

# Rows in Entity Table

RRE 7

SAS 9.4

RRE: consistent

performance with

increased data volume.

Page 17: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

17

RRE: Up to 350X Faster Than SAS

0

50

100

150

200

250

300

350

400

RRE Speed Multiple

213 185

351

39 37 19

58

18

101

32 Runtim

e, S

econds

N=5MM Stats

Quintiles

Freq

Lin Reg 1

Lin Reg 2

Step Lin

Logistic

GLM

Kmeans 1

Kmeans 2

Page 18: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

18

Why is RRE faster than SAS? RRE supports scalable computing out of the

box

– Multi-threaded processing

– Distributed processing

Legacy SAS is mostly single-threaded

– DATA Step processing

– Most SAS/STAT PROCs

Page 19: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

19

SAS HP PROCs

9 new SAS PROCs

Bundled into SAS 9.4

Designed for scalability

Multiple operating modes:

– Single machine

– Distributed (must license SAS HP

Statistics)

Page 20: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

20

HP PROCs: Minimal Improvement

0 50 100 150 200 250 300

6.8

267.17

253.82

Runtime, Seconds

N=5,000,000

SAS: PROC HPREG SAS: PROC REG RRE: rxLinMod

Linear regression, 20 predictors

HPREG running in single machine mode.

Page 21: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

21

Summary RRE is faster than Legacy SAS:

– Same tasks

– Same hardware

RRE speed:

– Efficient engineering

– Multi-threaded and distributed processing

SAS performance claims:

– Massive hardware requirements

– Force you to license more software from SAS

– Don’t apply to Legacy SAS

Page 22: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

22

Polling Question Which of the following analytic software

benefits is most important to you:

– A) Completing projects faster

– B) Building better predictive models

– C) High performance with low infrastructure costs

Page 23: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

23

John Wallace, Founder & CEO

Page 24: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

Background

Approaching $1 trillion in revenue analyzed. $3 billion in marketing spend under our lens.

Experienced 60+ person team based in San Francisco with offices in Seattle, Los Angeles,

Singapore, and India.

Founded in 2003 with a proven history of solving difficult analytics problems. Evolved from

consulting through close partnerships with our clients.

Our Offerings

Customer interaction insight that powers applications for customer-level revenue attribution,

targeting, media optimization.

Descriptive and predictive modeling of hidden trends and relationships in big data.

Custom development including applications, process automation, and decision support solutions.

DataSong at a Glance

Page 25: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

DataSong Offerings Hosted Applications

● Revenue Attribution

● Customer Targeting

● Marketing Planning

We know Big Data. We analyze and provide the “so what”.

Page 26: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

DataSong Architecture

• ETL

• N marketing channels

• Behavioral variables

• Promotional data

• Overlay data

• Functions to read Hadoop output; xdf creation

• Exploratory data analysis

• GAM survival models

• Scoring for inference

• Scoring for prediction

• 5 billion scores per day

per customer

DATASONG DATA

FORMAT (DDF)

CUSTOM VARIABLES

(PMML)

Page 27: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

Where Speed Matters 3 key dimensions

● how many rows

● how many variables

● how many iterations of a model

Trade offs for speed

● Sampling variance

● Test fewers features

● Have less understanding of the signal

This 3rd dimension means we must multiply any benchmark by N

Page 28: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

28

Page 29: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

29

Page 30: Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

30

Thank You