Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL...
Transcript of Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL...
![Page 1: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/1.jpg)
1 © Copyright 2015 Pivotal. All rights reserved. 1 © Copyright 2015 Pivotal. All rights reserved.
Enabling R for Big Data with PL/R and PivotalR Real World Examples on Hadoop & MPP Databases
Woo J. Jung Principal Data Scientist Pivotal Labs
![Page 2: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/2.jpg)
Still can’t believe we did this. Truly exciting. All In On Open Source
![Page 3: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/3.jpg)
3 © Copyright 2015 Pivotal. All rights reserved.
Pivotal Big Data Suite
P L A T F O R M
Data Science Toolkit KEY TOOLS KEY LANGUAGES
SQL
![Page 4: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/4.jpg)
4 © Copyright 2015 Pivotal. All rights reserved.
How Pivotal Data Scientists Select Which Tool to Use
Prototype in R or directly in
MADlib/PivotalR
Is the algorithm of choice available in MADlib/PivotalR?
Yes Build final set of
models in MADlib/PivotalR
No Do opportunities for
explicit parallelization exist?
Yes Build final set of models in PL/R
No Connect to Pivotal
via ODBC Optimized for both algorithm efficiency and code overhead
![Page 5: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/5.jpg)
5 © Copyright 2015 Pivotal. All rights reserved.
How Pivotal Data Scientists Select Which Tool to Use
Prototype in R or directly in
MADlib/PivotalR
Is the algorithm of choice available in MADlib/PivotalR?
Yes Build final set of
models in MADlib/PivotalR
No Do opportunities for
explicit parallelization exist?
Yes Build final set of models in PL/R
No Connect to Pivotal
via ODBC Optimized for both algorithm efficiency and code overhead
![Page 6: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/6.jpg)
6 © Copyright 2015 Pivotal. All rights reserved.
MADlib: Toolkit for Advanced Big Data Analytics
• Better Parallelism – Algorithms designed to leverage MPP or Hadoop
architecture • Better Scalability
– Algorithms scale as your data set scales – No data movement
• Better Predictive Accuracy – Using all data, not a sample, may improve accuracy
• Open Source – Available for customization and optimization by user
madlib.net
![Page 7: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/7.jpg)
7 © Copyright 2015 Pivotal. All rights reserved.
http://doc.madlib.net/latest/
![Page 8: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/8.jpg)
8 © Copyright 2015 Pivotal. All rights reserved.
PivotalR: Bringing MADlib and HAWQ to a Familiar R Interface
� Challenge Want to harness the familiarity of R’s interface and the performance & scalability benefits of in-DB/in-Hadoop analytics
� Simple solution: Translate R code into SQL
d <- db.data.frame(”houses")!houses_linregr <- madlib.lm(price ~ tax!
! ! !+ bath!! ! !+ size!! ! !, data=d)!
PivotalR SELECT madlib.linregr_train( 'houses’,!
'houses_linregr’,!'price’,!
'ARRAY[1, tax, bath, size]’);!
SQL Code
http://cran.r-project.org/web/packages/PivotalR/index.html https://pivotalsoftware.github.io/gp-r/ https://github.com/pivotalsoftware/PivotalR
![Page 9: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/9.jpg)
9 © Copyright 2015 Pivotal. All rights reserved.
PivotalR Design Overview
2. SQL to execute
3. Computation results 1. R à SQL
RPostgreSQL
PivotalR
Data lives here No data here
Database/Hadoop w/ MADlib
• Call MADlib’s in-DB machine learning functions directly from R
• Syntax is analogous to native R function
• Data doesn’t need to leave the database • All heavy lifting, including model estimation
& computation, are done in the database
![Page 10: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/10.jpg)
10 © Copyright 2015 Pivotal. All rights reserved.
More Piggybacking
![Page 11: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/11.jpg)
11 © Copyright 2015 Pivotal. All rights reserved.
How Pivotal Data Scientists Select Which Tool to Use
Prototype in R or directly in
MADlib/PivotalR
Is the algorithm of choice available in MADlib/PivotalR?
Yes Build final set of
models in MADlib/PivotalR
No Do opportunities for
explicit parallelization exist?
Yes Build final set of models in PL/R
No Connect to Pivotal
via ODBC Optimized for both algorithm efficiency and code overhead
![Page 12: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/12.jpg)
12 © Copyright 2015 Pivotal. All rights reserved.
What is Data Parallelism? � Little or no effort is required to break up the problem into a number of parallel tasks,
and there exists no dependency (or communication) between those parallel tasks � Also known as ‘explicit parallelism’ � Examples:
– Have each person in this room weigh themselves: Measure each person’s weight in parallel – Count a deck of cards by dividing it up between people in this room: Count in parallel – MapReduce – apply() family of functions in R
![Page 13: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/13.jpg)
13 © Copyright 2015 Pivotal. All rights reserved.
Procedural Language R (PL/R)
SQL & R • Parallelized model building in
the R language • Originally developed by Joe
Conway for PostgreSQL • Parallelized by virtue of
piggybacking on distributed architectures
http://pivotalsoftware.github.io/gp-r/
![Page 14: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/14.jpg)
14 © Copyright 2015 Pivotal. All rights reserved.
Parallelized Analytics in Pivotal via PL/R: An Example
SQL & R
� Parsimonious – R piggy-backs on Pivotal’s parallel architecture � Minimize data movement � Build predictive model for each state in parallel
TN Data
CA Data
NY Data
PA Data
TX Data
CT Data
NJ Data
IL Data
MA Data
WA Data
TN Model
CA Model
NY Model
PA Model
TX Model
CT Model
NJ Model
IL Model
MA Model
WA Model
![Page 15: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/15.jpg)
Parallelized R via PL/R: One Example of Its Use
� With placeholders in SQL, write functions in the native R language
� Accessible, powerful modeling framework
![Page 16: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/16.jpg)
16 © Copyright 2015 Pivotal. All rights reserved.
Parallelized R via PL/R: One Example of Its Use � Execute PL/R function
� Plain and simple table is returned
![Page 17: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/17.jpg)
17 © Copyright 2015 Pivotal. All rights reserved.
Examples of Usage
![Page 18: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/18.jpg)
18 © Copyright 2015 Pivotal. All rights reserved.
Pivotal Data Science: Areas of Expertise
![Page 19: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/19.jpg)
19 © Copyright 2015 Pivotal. All rights reserved.
Pivotal Data Science: Packaged Services
• Analytics Roadmap
• Prioritized Opportunities
• Architectural Recommendations
• Hands-on training
• Hosted data on Pivotal Data stack
• Results review & assessment
• On-site MPP analytics training
• Analytics tool-kit
• Quick insight (2 weeks)
• Prof. services
• Data science model building
• Ready-to-deploy model(s)
• Prof. services
• Data science model building
• Ready-to-deploy model(s)
LAB PRIMER (2-Week Roadmapping)
LAB 600 (6-Week Lab)
LAB 1200 (12-Week Lab)
LAB 100 (Analytics Bundle)
DATA JAM (Internal DS Contest)
![Page 20: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/20.jpg)
The Internet of Things: Smart Meter Analytics
![Page 21: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/21.jpg)
21 © Copyright 2015 Pivotal. All rights reserved.
Engagement Summary � Objective
– Build key foundations of a data-driven framework for anomaly detection to leverage in revenue protection initiatives
� Results – With limited access to limited data, our models (FFT and Time
Series Analysis) identified 191K potentially anomalous meters (7% of all meters).
� High Performance – Pivotal Big Data Suite including MADlib and PL/R – 90 seconds to compute FFT for over 3.1 million meters (~3.5 billion
readings) à 0.0288 ms/meter – ~36 minutes to compute time series models for over 3.1 million
meters (~3.5 billion readings) à 0.697 ms/meter
![Page 22: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/22.jpg)
22 © Copyright 2015 Pivotal. All rights reserved.
Step 1: Select Data for Advanced Modeling (3.1 million meters / ~3.5 billion meter readings)
Step 4: Detect Anomalies
From Combined Analysis (191K)
Step 2: Detect Anomalies
From Frequency
Domain Analysis (547K)
Step 3: Detect Anomalies
From Time Domain
Analysis (485K)
All Data (4.5 million meters / ~20 billion meter readings)
Anomaly Detection Methodology & Results
![Page 23: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/23.jpg)
-- create type to store frequency, spec, and max freq !
create type fourier_type AS (!
freq text, spec text, freq_with_maxspec float8);!
!
-- create plr function to compute periodogram and return frequency with maximum spectral density !
create or replace function pgram_concise(tsval float8[]) !
RETURNS float8 AS!
$$!
rpgram <- spec.pgram(tsval,fast=FALSE,plot=FALSE,detrend=TRUE)!
freq_with_maxspec <- rpgram$freq[which(rpgram$spec==max(rpgram$spec))]!
return(freq_with_maxspec)!
$$!
LANGUAGE 'plr’;!
!
-- execute function!
create table pg_gram_results!
as select geo_id, meter_id, pgram_concise(load_ts) FROM!
meter_data distributed by (geo_id,meter_id);!
![Page 24: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/24.jpg)
24 © Copyright 2015 Pivotal. All rights reserved.
0 200 400 600 800 1000
0500000
1000000
1500000
2000000
Estimated Periodicity of Meters
Periodicity in Hours
Num
ber
of M
eter
s
12
24
546 1092
Most Households Use Energy in Daily or Half-Daily Cycles
� Dominant periodicity (i.e. maximum frequency) of each meter is computed
� ~80% of all households show daily or half-daily patterns of energy usage
� ~20% of all households show anomalous patterns of energy usage
� Flag meters falling into the 20% as potentially anomalous meters
� Follow-up Items: Event type of the anomalies w.r.t. Revenue Protection to be determined with additional data & models
1092 546
Meters with Anomalous Usage Patterns (~20% of all meters)
![Page 25: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/25.jpg)
25 © Copyright 2015 Pivotal. All rights reserved.
Irregular Patterns of Energy Consumption Displayed by Detected Anomalous Meters
0 200 400 600 800 1000
01
23
4
FFT Analysis: Time Series of an Ordinary Meter
Elaspsed Hours
Load
0 200 400 600 800 1000
01
23
45
6
FFT Analysis: Time Series of an Anomalous Meter
Elaspsed Hours
Load
![Page 26: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/26.jpg)
Parallelize the Generation of Visualizations
![Page 27: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/27.jpg)
Parallelize Visualization Generation
![Page 28: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/28.jpg)
28 © Copyright 2015 Pivotal. All rights reserved.
Parallelize Visualization Generation
![Page 29: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/29.jpg)
Demand Modeling & What-If Scenario Analysis Scalable Algorithm Development Using R Prototyping Dashboards on RShiny
![Page 30: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/30.jpg)
30 © Copyright 2015 Pivotal. All rights reserved.
Engagement Overview
� Compose rich set of reusable data assets from disparate LOBs and make available for ongoing analysis & reporting
� Build parallelized demand models for 100+ products & locations
� Develop scalable Hierarchical/Multilevel Bayesian Modeling algorithm (Gibbs Sampling)
� Construct framework & prototype app for what-if scenario analysis in RShiny
Customer’s Business Goal Make data-driven decisions about how to allocate resources for planning & inventory management
![Page 31: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/31.jpg)
31 © Copyright 2015 Pivotal. All rights reserved.
Overview of Hierarchical Linear Model Likelihood
Priors for parameters
Priors for hyperparameters
Posterior Likelihood x Priors for parameters x Priors for hyperparameters
This joint posterior distribution does not take the form of a known probability density, thus it is a challenge to draw samples from it directly à However, the full conditional posterior distributions follow known probability densities
(Gibbs Sampling)
![Page 32: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/32.jpg)
32 © Copyright 2015 Pivotal. All rights reserved.
Game Plan 1. Figure out which components of the Gibbs Sampler can be
“embarrassingly” parallelized, i.e. the key building blocks – Mostly matrix algebra calculations & draws from full conditional
distributions, parallelized by Product-Location
2. Build functions (i.e. in PL/R) for each of the building blocks
3. Build a “meta-function” that ties together each of the functions in (2) to run a Gibbs Sampler
4. Run functions for K iterations, monitor convergence, summarize results
![Page 33: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/33.jpg)
Examples of Building Block Functions
![Page 34: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/34.jpg)
34 © Copyright 2015 Pivotal. All rights reserved.
Meta-Function & Execution
![Page 35: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/35.jpg)
35 © Copyright 2015 Pivotal. All rights reserved.
PivotalR & RShiny
SQL to execute
Computation results R à SQL
RPostgreSQL
PivotalR
Data lives here No data here
Database/Hadoop w/ MADlib
RShiny Server
• Data doesn’t need to leave the database • All heavy lifting, including model estimation
& computation, are done in the database
RShiny
![Page 36: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/36.jpg)
36 © Copyright 2015 Pivotal. All rights reserved.
![Page 37: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/37.jpg)
37 © Copyright 2015 Pivotal. All rights reserved.
![Page 38: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/38.jpg)
38 © Copyright 2015 Pivotal. All rights reserved.
Next Steps
� Continue to build even more PivotalR wrapper functions
� Identify more areas where core R functions can be re-leveraged and made scalable via PivotalR
� Explore, learn, and share notes with other packages like PivotalR
� Explore closer integration with Spark, MLlib, H20
� PL/R wrappers directly from R
![Page 39: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/39.jpg)
39 © Copyright 2015 Pivotal. All rights reserved.
Thank You Have Any Questions?
![Page 40: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/40.jpg)
40 © Copyright 2015 Pivotal. All rights reserved.
http://blog.pivotal.io/data-science-pivotal
Check out the Pivotal Data Science Blog!
![Page 41: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/41.jpg)
41 © Copyright 2015 Pivotal. All rights reserved.
Additional References � PivotalR
– http://cran.r-project.org/web/packages/PivotalR/PivotalR.pdf – https://github.com/pivotalsoftware/PivotalR – Video Demo
� PL/R & General Pivotal+R Interoperability – http://pivotalsoftware.github.io/gp-r/
� MADlib – http://madlib.net/ – http://doc.madlib.net/latest/
![Page 42: Enabling R for Big Data with PL/R and PivotalR · 2015-07-15 · 3. Computation results 1. R " SQL RPostgreSQL PivotalR No data here Data lives here Database/Hadoop w/ MADlib •](https://reader034.fdocuments.in/reader034/viewer/2022042302/5ecd309ce4765603a703f7c4/html5/thumbnails/42.jpg)
BUILT FOR THE SPEED OF BUSINESS