Using R for Big Data Analytics

6
Using R for Big Data Analytics Christopher Nguyen, PhD Co-Founder & CEO Presented on December 6, 2013

Transcript of Using R for Big Data Analytics

Page 1: Using R for Big Data Analytics

Using R for Big Data Analytics !

Christopher Nguyen, PhD Co-Founder & CEO

Presented on December 6, 2013

Page 2: Using R for Big Data Analytics

R on Big Data

is single threaded

R is single core single threaded

scales poorly

collapses when meets

standard

Page 3: Using R for Big Data Analytics

Big Data R Options Available

RDBMS with

statistical support

Multi-threaded Multi-core

Out of memory

R

R on MapReduce

In-Memory Parallel R on Big Data

bigmemory biglm parallel …

HP Vertica SAP HANA Oracle R ...

Revolution R RHIPE RHive ...

BigR

Page 4: Using R for Big Data Analytics

Technology Big Data Storage

Speed Scalability In-memory Computing

Standard R User

Experience

bigmemory biglm

parallelNo Very slow Low No Yes

RDBMS with stat support

Proprietary High, if native Medium Variable No if UDF

R on MapReduce

HDFS Slow High No No

Adatao BigR HDFS, RDBMS Very high High Yes Yes

Comparison Matrix

Page 5: Using R for Big Data Analytics

Concrete Examples

Page 6: Using R for Big Data Analytics

Adatao Live Demo at

The First Apache Spark Summit San Francisco, CA December 2, 2013