Applications in R - Success and Lessons Learned from the Marketplace
-
Upload
revolution-analytics -
Category
Data & Analytics
-
view
3.185 -
download
3
Transcript of Applications in R - Success and Lessons Learned from the Marketplace
Applications in R Success and Lessons Learned from the Marketplace
David Smith Chief Community Officer
Revolution Analytics
July 29, 2014
Neera Talbert VP Professional Services
Revolution Analytics
Agenda
Introduction to R
Growth of R
Applications of R
Q&A
David Smith
Chief Community Officer
@revodavid
Editor, blog.revolutionanalytics.com
Co-author, “Introduction to R”
3
OUR COMPANY
The leading provider
of advanced analytics
software and services
based on open source R,
since 2007
OUR SOFTWARE
The only Big Data, Big
Analytics software platform
based on the data science
language R
SOME KUDOS
Visionary
Gartner Magic Quadrant
for Advanced Analytics
Platforms, 2014
What is R?
Most widely used data analysis software
• Used by 2M+ data scientists, statisticians and analysts
Most powerful statistical programming language
• Flexible, extensible and comprehensive for productivity
Create beautiful and unique data visualizations
• As seen in New York Times, Twitter and Flowing Data
Thriving open-source community
• Leading edge of analytics research
Fills the talent gap
• New graduates prefer R
www.revolutionanalytics.com/what-r
Poll #1
What data analysis software is used where you work?
5
6
R’s popularity is growing rapidly
R Usage Growth Rexer Data Miner Survey, 2007-2013
• Rexer Data Miner Survey • IEEE Spectrum, July 2014
#9: R
Language Popularity IEEE Spectrum Top Programming Languages
7
R is among the highest-paid IT skills in the US
Dice Tech Salary Survey, January 2014 O’Reilly Strata 2013 Data Science Salary Survey
8
Technical Support for Open Source R AdviseR™ from Revolution Analytics
Technical support for open source R, from the R experts.
Email and phone support 8AM-6PM, Mon-Fri
Support for R, validated packages, and third-party software
connections
On-line case management and knowledgebase
Access to technical resources, documentation and user forums
Exclusive on-line webinars from community experts
Guaranteed response times
Also available: expert hands-on and on-line training for R, from
Revolution Analytics AcademyR.
www.revolutionanalytics.com/AdviseR
R SUPPORT 12 MONTHS
$795 PER USER
Applications of R
9
• Exploratory Data
Analysis
• Experimental Analysis
“Generally, we use R to move
fast when we get a new data
set. With R, we don’t need to
develop custom tools or write
a bunch of code. Instead, we
can just go about cleaning
and exploring the data.” —
Solomon Messing, data
scientist at Facebook
• Big-Data Visualization
“It resonated with
many people. It's not
just a pretty picture,
it's a reaffirmation of
the impact we have
in connecting
people, even across
oceans and
borders.” — Paul
Butler, data
scientist, Facebook
12
“The great beauty of R
is that you can modify
it to do all sorts of
things.”
— Hal Varian
Chief Economist,
• Advertising
Effectiveness
“R is really
important to the
point that it's hard
to overvalue it.” —
Daryl Pregibon
Head of
Statistics,
Google • Economic forecasting
13
• Data Visualization • Semantic clustering
“A common pattern for me is that I'll code a MapReduce
job in Scala, do some simple command-line munging on
the results, pass the data into Python or R for further
analysis, pull from a database to grab some extra fields,
and so on, often integrating what I find into some
machine learning models in the end” — Ed Chen, Data
Scientist, Twitter
City of Chicago
14
Pu
blic
He
alth
• Food poisoning monitor
15
The New York Times
Interactive Features
• Election Forecast
• Dialect Quiz
Data Journalism
• NFL Draft Picks
• Wealth distribution in USA
16
The New York Times
Data Visualization
• Facebook IPO
• Baseball legends
17
Video Gaming
• Multiplayer Matchmaking
• Player Churn
• Game design
• Difficulty curve
• Level trouble-spots
• In-game purchase optimization
• Fraud detection
• Player communities
• Game Analysis
Vid
eo
Ga
me
s
18
Housing
• Crime mapping
• choroplethr package
“The core innovation that Zillow
offers are its advanced
statistical predictive products,
including the Zestimate®, the
Rent Zestimate and the ZHVI®
family of real estate indexes. By
using R in production as well as
research, Zillow maximizes
flexibility and minimizes the
latency in rolling out updates
and new products.”
• Statistical forecasting
Re
al E
sta
te
19
Fin
an
ce
an
d B
an
kin
g
• Credit Risk Analysis • Financial Networks
20
Ph
arm
ace
utica
ls
“R use at the FDA is completely
acceptable and has not caused
any problems.” — Dr Jae
Brodsky, Office of
Biostatistics, Food and Drug
Administration
Regulatory Drug Approvals
• Reproducible research
• Accurate, reliable and consistent statistical analysis
• Internal reporting (Section 508 compliance)
Power
“We’ve combined Revolution R
Enterprise and Hadoop to build and
deploy customized exploratory data
analysis and GAM survival models for
our marketing performance
management and attribution platform.
Given that our data sets are already in
the terabytes and are growing rapidly,
we depend on Revolution R Enterprise’s
scalability and power – we saw about
a 4x performance improvement on 50
million records. It works brilliantly.”
- CEO, John Wallace, DataSong
4X performance 50M records scored daily
Scalability
“We’ve been able to scale our solution to a
problem that’s so big that most companies could
not address it. If we had to go with a different
solution we wouldn’t be as efficient as we are
now.”
- SVP Analytics, Kevin Lyons, eXelate
TB’s data from 200+ data sources
10’s thousands attributes
100’s millions of scores daily
2X data 2X attributes no impact on performance
Performance
“We need a high-performance
analytics infrastructure because
marketing optimization is a lot like a
financial trading. By watching the
market constantly for data or market
condition updates, we can now
identify opportunities for our clients
that would otherwise be lost.”
- Chief Analytics Officer, Leon Zemel,
[x+1]
Ma
rke
tin
g A
na
lytics
All of Open Source R plus:
Big Data scalability
High-performance analytics
Development and deployment
tools
Data source connectivity
Application integration framework
Multi-platform architecture
Technical Support
Available training and services
22
is the Big Data Big Analytics Platform
Poll #2
What kinds of R projects are underway where you work?
23
24
Neera Talbert, VP Big Data & Advanced Analytic Services
Leads Services at Revolution Analytics
Fifteen years of experience the business analytics software industry
Works with Fortune 500 companies to define analytics strategy, implement analytic based
decision making, reduce decision latency, and increase speed of decision making
– Analytics, business intelligence, big data analytics, risk
– Customer intelligence, supply chain, manufacturing, retail, oil & gas, public sector.
Organizational Readiness
“There will be almost half a million jobs in five years, and a
shortage of up to 190,000 qualified data scientists, plus a
need for 1.5 million executives and support staff who have
an understanding of data”
McKinsey Global Institute
April 2013
Opportunity to develop talent
Data Science “the sexiest job in the 21st
century,” - Harvard Business Review
A cross between computer engineers,
statisticians and business analyst – people who
ask good questions and open to working with
unstructured information
Universities can’t produce them fast enough –
need 60% more resources – McKinsey Global
Institute
Our Philosophy
“The Hands-on exercises were the best part of Revolution Analytics
training”
- A participant from a global telecom company
Course Catalog
www.revolutionanalytics.com/AcademyR
RRE Certification Testing
Demonstrate your R and RRE programming knowledge
– Fundamentals in R Language
– Data Management in Revolution R Enterprise
– Modeling in Revolution R Enterprise
Independently proctored exam – online and onsite
Training Data Science team for Big Data Analytics
30
“Given that our data sets are already in the
terabytes and are growing rapidly, we depend
on Revolution R Enterprise’s scalability
and power. We saw about a 4x
performance improvement on 50 million
records. It works brilliantly.”
CEO, John Wallace
(DataSong formerly named UpStream)
4X performance 50M+ records scored daily
Key Technology: Revolution R Enterprise and Hadoop, replacing SAS and Open Source R
Outcomes: Massively scalable infrastructure to support attribution and optimization at an individual customer level (segments of one) for clients such as Williams-Sonoma. Client saved $250K in one campaign.
Rapid development and deployment of customer-specific models, using innovative analytic techniques such as big data GAM Survival models
Bottom Line: Driving revenue lift and cost savings through marketing optimization
Profile: Multi-channel marketing attribution
and analytics software developer and service
provider. Growing, innovative, cost-conscious.
Model Development for Supply Chain Analytics with Hadoop
31
Profile: The Application Development team worked with
Revolution Analytics Consultants to build cloud-based supply
chain analytics platform
Key Technology and Services: R for Big Data Analytics, Consulting, Training
Analytic Approach: Aggregate data from 15 data
sources including ERP data, store sales data and
sales forecast data to 25,000 store locations, 50 SKUs
nightly across 6 forecast models, order planning
models, running back tests and validation. Worked
with client to establish big data environment and
models that will generate 6.5 billion computations
daily by end of the year (in a 4-hour window for
processing). Scale and performance will allow new
capabilities such as seasonality, promotions and
incentives.
>Sales and Demand Data Analysis
>R/RRE Model Development
Bottom line: Work with client to develop predictive models, starting with rigorous forecasts across various models, generating forecast statistics and scoring each model against historical data to come up with the best fit. The forecast is input into an order-planning model that generates recommendations to optimize product distribution and ensure in-stock rate targets are achieved so that the right amount of product is in the right location at the right time. .
“The amount of analytic horsepower required for this application cannot be supported in traditional means; it would require millions of dollars of hardware. R + Hadoop is allowing us to have the compute capacity to run 6.5 billion computations on nightly basis to generate order plans for our clients.” VP Application Development
Confidential – Do Not Distribute
Model Development for Vehicle Data Analysis
32
Profile: The Analytics R&D team of the multinational
automobile manufacturer worked with Revolution Analytics
Consultants to perform Survival Analysis, and to build and
deploy Decision Trees and Time Series models
Key Technology and Services: Revolution R Enterprise for Big Data Analytics, Consulting, Training
Analytic Approach – Warranty Data Analysis: Estimating the life of an automobile component using Survival Analysis with Cox proportional hazards. Models are trained using historical data, consisting of warranty claims, and region and weather related variables such snow, rain, temperature etc.
Outcome: New analytics paradigm for existing processes introduced, with potential for millions of dollars in cost savings through improved warranty contracts, and re-designed automobile components.
>Warranty & Sensor Data Analysis
>R/Revolution R Enterprise Training
Analytic Approach – Sensor Data Analysis: Use sensor data from vehicle components to build Decision Trees for classification, and to establish range of predicted values for sensor readings so that actual readings can be analyzed for outliers.
Bottom line: New analytics initiative for building an intelligent automobile system that’s capable of guiding the driver upon detection of anomalies in driving patterns.
“The consultants and training instructors from Revolution Analytics were very knowledgeable and supported me very well. I am looking forward to taking my learnings to the larger analytics team at my company.” Senior Researcher, Analytics R&D
Confidential – Do Not Distribute
R Package Validation
33
Profile: The Clinical Trials Analytics team at the
multinational biopharmaceutical company moved from
SAS to R to develop big data analytics for Clinical Trials
Key Technology and Services: RUnit testing framework, Revolution R Enterprise (RRE) and open source R
Approach: Validate third party (user-contributed) R packages from CRAN by executing unit and regression tests for functions both in the stated base package and its dependent packages.
Outcome: Client moving from SAS to RRE for new analytics initiatives for improved performance and cost savings, and requires validation for user contributed packages for reliability and compliance.
Challenge: The Clinical Trials Analytics team had “big data”
and “big computation” challenges, and needed a
centralized, scalable, and high-performance platform to
concurrently run the analytic models for faster analysis.
Bottom Line: Revolution R Enterprise acts as their
statistical analytics platform providing a centralized and
scalable platform for 10’s of data scientists and analysts.
Confidential – Do Not Distribute
User-contributed, Open Source R package
validation for Clinical Trial compliance to
support move from SAS to R & RRE
Model Optimization for Customer Analytics
34
Profile: The advanced analytics & IT Infrastructure teams at the
Las Vegas-based gaming corporation build and deploy
analytical models for internal customers such as Marketing &
Sales.
Key Technology and Services: Hadoop, Open Source R, Consulting and Training
Analytic Approach: Assess the end-to-end flow of the current Guest scoring model, and re-write the existing rmr/ R code using optimization techniques.
Outcome: 84% reduction in run time of the Guest Scoring model, which helps the gaming company target their customers with a customized marketing campaign within minutes of performing a new activity such as checking into the hotel, and buying tickets to a show.
Challenge: The IT Infrastructure team at the company was
challenged to support innovative, R-powered big data analytics
initiatives and needed to optimize their Analytics and Visualization
architecture.
Bottom line: Revolution Analytics consultants helped re-write R
analytics running inside Hadoop to achieve superior performance
and as a second project, designed a big data architecture
incorporating Cloudera, Teradata, Alteryx and Tableau
“Excellent work, Revolution!! We’re very glad that you came
on board to help us. Revolution Consultants get an A+.”
Technical Program Manager, Big Data Initiatives
Confidential – Do Not Distribute
> 84% improvement in performance &
reliability of Guest Scoring model
> Multi-layer big data infrastructure
architecture design
Revolution Analytics Services Overview
35
Training
• On-Site or Remote Classes
• Classroom or Self Paced
• Standard or Tailored
Project Services
• Analytics Strategy
• Analytics Architecture
• Full Life Cycle Projects
• Application Migration
• Proof of concept
• Staff Augmentation
• Package Certification
Quick Start Services
• Pre-production
• Jumpstart value
• Combines software, training, and services
Post Go-Live Support
• Technical Account Management
• On-going Training
Poll #3
What's the biggest R need at your company?
36
37
Why are so many companies using R?
Big Data
Data Science
Competition and Innovation
Open Source
Ecosystem
38
Q&A / Resources
What is R? revolutionanalytics.com/what-is-r
Companies using R revolutionanalytics.com/companies-using-r
AcademyR training revolutionanalytics.com/AcademyR
AcademyR Certification revolutionanalytics.com/AcademyR-certification
Contact Revolution Analytics revolutionanalytics.com/contact-us
Thank you Join us August 7th at 10:00 AM, Pacific, for our
Moving from SAS to R webinar. Please visit
our website to register.
www.revolutionanalytics.com, 1.855.GET.REVO, Twitter: @RevolutionR
39