Be Heroic - Xomnia · Challenge: Quickly prototype analytics processes for Under Armour wearable...
Transcript of Be Heroic - Xomnia · Challenge: Quickly prototype analytics processes for Under Armour wearable...
Be Heroic
Analytics. For Anyone.
Turn Data into Action
2
Analytics on big data is no longer just a
competitive advantage.
It’s a Business Requirement.
Progressive businesses must accelerate time-to-value not only to thrive, but survive.
3
Unlike traditional analytics providers, RapidMiner enables anyone to make the
most of all data in all environments,creating a powerful advantage from the
wisdom of over 250,000 users.
RapidMiner is the industry's easiest-to-use
Modern Analytics Platform that
significantly accelerates productivity – from data prep to predictive action.
Built by data scientists for data scientists, businesses analysts, and developers.
4
The Analytics Spectrum
Count
Mean
SQLAnalytics
DescriptiveStatistics
DataMining
Predictive Analytics
Simulation Optimization
Univariate distribution
Central tendency
Dispersion
Association rules
Clustering
Feature extraction
Classification
Regression
Time series
Text
Spatial
Machine learning
Monte Carlo
Agent based modeling
Discrete event modeling
Linear optimization
Non-linear optimization
Business Intelligence Advanced Analytics
5
MModern
Key Trends & Drivers in Modern Analytics Market
Forces Internet of Things Consumerization Mass Personalization
Business Accelerate time-to-value Maximize business value Simplify getting to value
Technology Big Data New compute engines Cloud
6
TTraditional
Evolving Advanced Analytics Market
MModern
Limitations• Limited handling of variety of data source• Legacy compute engines• On-premises, if not offline
Limitless• Big Data• New compute engines• Cloud
7
TTraditionalMModern
Advanced Analytics Market Maturity
Lagging innovation
High-velocity innovation
8
Traditional vs. Modern Analytics Market
Magic Quadrant forAdvanced Analytics PlatformsFebruary 2014
Challengers Leaders
Niche players Visionaries
Completeness of vision
Ab
ility
to
exe
cute
Revolution Analytics
Alteryx
RapidMiner
IBMSAS
StatSoft
SAPAngoss Knime
Actuate
InfoCentricity
Microsoft Alpine
Data
Labs
Megaputer
Oracle FICO
9Data Science Skills Gap in 2018 (McKinsey 2012)
Skills Gap in the Modern Analytics Market
MathDomain
Expertise
ComputerScience
Data ScientistStatistician
Actuarial
Quant
+Searching for UnicornsMcKinsey projects by 2018 there will be a
shortage of 1.7M professionals with
analytics expertise in the U.S. alone.
++
+
10
Unlocking Value with Modern Analytics
MathDomain
Expertise
ComputerScience
Business Analysts
Next Generation Data Scientist
(aka: the hero you are looking for)Data Scientist
Statistician
Actuarial
Quant
++
+
+
11
Enter RapidMiner. Analytics. For Anyone.
Accelerate
Pre-Built Models
One-Click Deployments
Connect
All Data
All Environments
Simplify
Code Free
Wisdom of Crowds
12
Wisdom of CrowdsHow do we create data science heroes?
Store them in aknowledge baseof analytic best practices
Anonymously collect analytic modelsfrom analysts across theenterprise
Use machinelearning algorithmsto recommend and
empower any user at any skill level to become a
data science hero
1
2
3
13
RapidMiner Modern Analytics Platform
RapidMiner StudioCode free design your analytics using 1500+ operators
RapidMiner RadoopPush down computations to where your data lives
RapidMiner StreamsAnalyze streaming data while in motion
RapidMiner CloudElastic compute environment for high performance analytics
RapidMiner ServerEnterprise analytics environment for integration with business processes
Orchestrate
Design
Compute
Business Analysts Data Scientists
Consume
Machine
Business Users
Web App
Custom App
BizApp
VizBI
Studio
Code Free GUI Engine
Engine
Server
Engine Engine Web Services API
In-Memory In-DatabaseIn-Hadoop In-Stream
Streams
Engine
Studio
Engine
Radoop
Engine
Cloud
Engine Engine
14
RapidMiner Radoop Architecture
Hadoop environment
Impala(In-memory SQL)
Mahout(Machine Learning)
Pig(Scripting)
HDFS
YARNMapReduce
Hive(SQL)
PR
OG
RA
MM
ING
CO
DE
VIS
UA
LD
EVEL
OP
MEN
T
Data Integration
Data PrepModel
BuildingModel
ValidationModel Scoring
Radoop
Studio Server
Spark(MLib)
Data Discovery
Code free designin RapidMiner
with 70+ Operators
Optimized distributed execution in Hadoop
environment
One-click push down to Hadoop environment
15
VIS
UA
LD
EVEL
OP
MEN
T
Data Integration
Data PrepModel Scoring
Streams
Studio Server
Apache Storm cluster
Node
Engine
Message broker
Apache Kafka
Amazon SQS
or
Application
Cassandra MongoDB SOLRApplication
push pull
pull
storeNode
deploy processas topology
monitor andmanage
Spout Bolt
BoltBolt
Redis
Storm Topology
Node
Node
Node
Engine
Streams
Storm Topology
Bolt
Bolt
RapidMiner Streams Architecture Code free designin RapidMiner leveraging
1500+ Operators
Distributed execution in Storm environment
One-click push down to Storm environment
16
RapidMiner Server ArchitectureWeb Services APIIntegrate analytic models into any type of applicationRESTful API
App DesignerBuild web based predictive reportsCreate ad hoc reportsBuild predictive apps without codingEmbed app into business process
Shared RepositoryCollaborate on analytic processesShare data, processes and models
User ManagementCreate and manage users, roles and access rights
SchedulerExecute analytics at certain times or automate repeating execution patterns
EngineHigh-performance compute engine for distributed and remote work
Server
Viz
Web Services API
App Designer
Shared Repository
User Management
Engine
BI
Custom AppBiz App
MachineWeb App
Scheduler
17
RapidMiner
RapidMiner Modern Analytics Flow
Data Sources
• Work with any data, from any source
Compute Engines
• Work in any environment, at any time
Model Building &
ScoringModel
Deployment
• Deploy models any way you want
Model Consumption
• Embed your insights and take action
• Data Integration, Discovery & Preparation
• Model Building, Validation & Scoring
18
One Platform To Rule Them AllModel Building and Scoring (1500+ operators, 200+ community contributed operators)
Data Integration Data Discovery Data Preparation Model Building Model Validation Model Scoring
• 50+ data connectors withaccess to 100+ sources including 40+ file types
• Any data type• Structured • Semi-structured • Unstructured • Binary• 700+ data parsing,
data blending, data cleansing, transformations, aggregations, set operations, rotations ,filtering, outlier detection, value type transformations, feature creation, window functions, feature extraction
• 20+ process control structures
• 25+ interactive data visualizations including:• Data tables• Scatter matrices• Bubble charts• Parallel
coordinates• Deviation plots• 3-D scatter plots• Density plots• Histograms• Survey plots• Andrews curve• Quartile• Pareto charts• Network & tree
visualizations
• 30+ image format exports
• 45+ featureselection –automatic & manual
• 20+ missing value replacement & Imputation
• 96+ feature creation –automatic & manual
• 20+ anomaly & outlier detection
• 60+ dimension reduction / feature selection
• 20+ segmentation & clustering
• 80+ processing & feature extraction from unstructured data
• 25+ statistical• 250+ machine
learning• 200+ association
mining, frequent item set, similarity computation, feature weighting
• 10+ ensemble and hierarchical models
• 10+ model and parameter optimization
• Automatic model fitting
• Integration of 3rd
party analytics, optimization solvers or simulations tools
• 10+ crossvalidation
• 20+ visualevaluation
• 30+ numerical / nominal / categorical model performance criteria
• 10+ significancetests
• 5+ optimal threshold cutofffor binomialclasses
• 5+ clusterperformancemeasures
• Model scoring for all applicablemodel building
Add’l Analytics
• 50+ text analytics• 15+ web mining• 30+ image / audio
/ video mining• 85+ time series• 30+ financial &
economics
19
Data Sources (access to 450+ data sources)
Flat Files
Text Files
Databases via JDBC or ODBC
Database & Cube Queries
Hadoop Sources
NoSQL database
Cloud Data Sources
Web Services
Mail Services
Work with Any Data, from Any Source
MDX
Web pages Web services
POP3 IMAP
Logos & icons represent a partial list. Full list available upon request.
20
Compute Engines (in-memory, in-SQL, in-database, in-cluster, in-Hadoop, in-Cloud, in-stream)
In-memory
In-SQL & In-database
In-cluster
In-Hadoop
In-Cloud
In-stream
Work in Any Environment, at Any Time
Logos & icons represent a partial list. Full list available upon request.
21
Deploy Models Any Way You Want
Model Deployment
Model Scheduling • Scheduled model execution
Model Publishing
Publishing model results via web services API into:• Web services• Web application• Business application (ERP, CRM, Marketing Automation, etc.) • Machines• Streaming application• Rule engines• Complex event processing• Business intelligence • Data visualization• Cloud application• Custom application
Model Embedding
Embedding of model via Java API into:• Any application • Callable from any application
Model Export • PMML export
22
Model Consumption
Business Intelligence
CRM
Marketing Automation
Cloud Applications (connecting to 300+ cloud services)
ERP
Custom Custom web applications, web portals
Embed Your Insights and Take Immediate Action
Logos & icons represent a partial list. Full list available upon request.
23
Get to Meaningful Business Value in a Snap
AccelerateDrop development time from
days to minutes
ConnectAutomate data
integration
SimplifyMake data science
accessible to all
24
Design Your Analytics. Coding Not Required.
Liberate your business analysts with a code free
environment
Leverage the wisdom of over 250,000 users
worldwide
Supercharge your results with +1500 analytic operators
Boost your data science knowledge
with interactive help
25
Machine Learning on Hadoop
? !Pushing data prep and machine learning into Hadoop clusters is complex and requires coding.Not an viable option!
Push computations intoHadoop clusters from acode free environment.Heroes use RapidMiner!
How do we become big data heroes?
!?
26
Use Case Example: Churn Prevention with HadoopTask: Separate loyal customers from customers who are likely to churn.
Solution with Hadoop + Mahout + (a lot of) custom coding
Disconnected individuals get bogged down in endless process, coding and queries.
In the meantime, your competition beats you to the punch.
1. Define a schema and create tables for customer data, past transactions, service usage log
files, and so on. Manually list columns, types, defining separator characters, etc.
2. Write HiveQL queries (or Pig scripts or other code) to aggregate transactions and service
logs for each customer and calculate attributes describing them
3. Implement and execute a custom MapReduce job to convert data to Mahout’s input format
4. Run the Mahout Naïve Bayes algorithm with proper parameters from the command line
5. Repeat each step for the customers you want to apply the model on
6. Implement and execute a custom MapReduce job to convert predictions back into a
delimited format
7. Export the result from HDFS
8. Import the result into an RDBMS
DAY 3
DAY 12
DAY 18 TIME: 3 WEEKS
DAY 1
27
Task: Separate loyal customers from customers who are likely to churn.
Solution with RapidMiner
Use Case Example: Churn Prevention with Hadoop
Your team designs the process in collaboration with each other just like they would on a white board. And then you press “play”. That’s it.
Combine data from Hadoop and any
traditional source
Apply modelin RapidMinerand integrate
seamlessly
Train model in distributed
Hadoop cluster
12 3
TIME: 10 MINUTES
28
performance increases of up to
RapidMiner Radoop consistently delivers
4,000%compared to pure scripting approaches*
*RapidMiner results compared against traditional Hadoop approaches including data integration, data prep, modeling, deployment and maintenance.
29
RapidMiner Fills In The Skills Gap
MathDomain
Expertise
ComputerScience
Business AnalystsNext Generation Data Scientists
(this is the realm of heroes)
Data ScientistStatistician
Actuarial
Quant
++
+
+
30
Companies Around The World Use RapidMiner
Oil & Gas, Chemicals
Retail
Software & Analytics
Consumer Products
Business Services
Pharma & Healthcare
Manufacturing
Aerospace
Technology
Entertainment
Consulting
Government & Defense
AcademiaFinancial Services
31
Signature Customers
32
Data Science Hero Spotlight “Business executives, who hold
the power to allocate text analytics resources, are
beginning to see and realize the benefits to help better focus
and solve business problems.”
-- Han-Sheong LaiDirector of Operational Excellence & Customer
Advocacy
Process Customer Feedback In Multiple Languages To Increase Retention Rates
AccelerateProcess massive amounts of
text at high speed
ConnectAnalyze multiple silos
of global customer data
SimplifyAutomatically determine
intent-to-churn
Challenge: Applying basic voice-of-the-customer-concepts and text analytics to customer feedback in over 60 countries worldwide.
Solution: Use RapidMiner’sPlatform to detect churn and identify customer service issues regardless of time, location or language.
150,000customer comments and tweets in almost every language processed on
RapidMiner
33
Data Science Hero Spotlight“RapidMiner is extremely
powerful, has the best operators, and can handle Big
Data from wearables. It also allows us to rapidly prototype
sophisticated analytics, machine learning and
classification applications, saving time and money.“
-- Kevin LoganCEO
Quickly Prototype Analytics Models for Under Armour Challenge Wearables Data
AcceleratePrototype multiple analytics
processes quickly and easily
ConnectAnalyze Big Data from
wearables devices
SimplifyUse code free, drag and
drop GUI for analytics
Challenge: Quickly prototype analytics processes for Under Armour wearable data, for the Under Armour39 Challenge.
Solution: Use RapidMiner’s code free, drag and drop GUI to quickly design 11 analytics processes, iterate them for optimization, and win the challenge.
1.8Mdata points analyzed,
per hour, by the Under Armour39 wearable
34
,
Data Science Hero Spotlight“We benefit from the public
availability of extensions and the RapidMiner Marketplace. We can easily search for what
others have designed in RapidMiner, and use the
extensions that are a fit for us.”
-- Tom GattenCEO
Track Data from Millions of Companies to Identify Critical Economic Drivers
AcceleratePrototype analytics and
visualizations quickly
ConnectAnalyze data from the digital
footprint of UK businesses
SimplifyRapidMiner Marketplace
public extensions
Challenge: Monitor corporate performance data in real time, and identify correlations, outliers, and economic drivers.
Solution: Use RapidMiner’s algorithms for rapid prototyping and visualizations for correlations, and to identify outlying, unusual, data.
4.5 Msubject matter experts’
content analyzed in the United Kingdom
35
,
Data Science Hero Spotlight“Some years ago (the patent team) had tried a dedicated
patent classification tool that didn’t work - RapidMiner
does. It provides a framework for substantially reducing the
time it takes us to find interesting patents.”
-- Thomas HartmannBusiness Engineer
Search Millions of Patents Online and Automatically Mine Image Data
Challenge: Search millions of patents online and automatically mine image data for applicable information.
Solution: Use RapidMiner text and image mining to quickly and easily identify several thousand images of interest.
1M+detailed patent records
mined online, including images
AccelerateAutomatically mine millions of
online patent images
ConnectSearch through a wide
variety online data sources
SimplifyNo programming required to
connect insights to action
36
Data Science Hero Spotlight:“RapidMiner allows us to
leverage Big Data, in real-time, for the TV industry.”
-- Avi BernsteinProfessor at the University of
Zurich, Department of Informatics
Drive Broadcast Revenues and Customer Retention with Streaming, Real-Time Analytics
AcceleratePersonalized recommendations
in less than five seconds
ConnectStream and analyze from set-top
boxes, mobile devices and PCs
SimplifyCode free design of
streaming analytics
Challenge: Better understand TV viewing habits to prevent churn and optimize advertising.
Solution: Process streaming Big Data from three million TV viewers, in real-time, to make program content recommendations and target advertising.
<5stime to generate high
value activities based on predictive analytics
TelevisionBroadcasters
Project
37
“RapidMiner was most frequently selected based on ease of use, license cost, and speed of model development/ability to build large numbers of models. A number of templates guide users on the most common set of predictive use cases. Customer references cite high levels of satisfaction with the data access, data filtering and manipulation, predictive analytics and further advanced analytics components of the product.”
Gareth HerschelResearch Director
Don’t Take It From Us
"Radoop also makes an eponymous product, focused on Hadoop analytics functionality, that is also visually-oriented and is 'powered by' RapidMiner itself, making the union quite logical.”
Andrew Brust Research Director
“RapidMiner is an excellent data mining and statistics platform with a large following. With version 6 the product and company became much more commercial, and the recent acquisition of Radoop puts it in the big data league.”
Martin ButlerResearch Director
38
"Customer references cite high levels of satisfaction with the data access, data filtering and manipulation, predictive analytics and further advanced analytics components of the product.”
Recognized Leader in Advanced Analytics
As of February 2014. Gartner Magic Quadrant for Advanced Analytics Platforms (Feb. ’14). www.rapidminer.com/gartner2014
Challengers Leaders
Niche players Visionaries
Completeness of vision
Ab
ility
to
exe
cute
Revolution Analytics
Alteryx
RapidMiner
IBMSAS
StatSoft
SAPAngoss Knime
Actuate
InfoCentricity
Microsoft Alpine
Data
Labs
Megaputer
Oracle FICO
39
Industries
Manufacturing Government
Retail/CPG Automotive
Financial Life Science
Utilities/Energy Telecom
Our HistoryRapidMiner was born from a data science project at the University of Dortmund, Germany, by Ingo Mierswa, Ralf Klinkenberg and Simon Fischer. Initially known as YALE in 2001, the product led to Rapid-I, a company founded by Ingo and Ralf in 2007. Later, the company was renamed to RapidMiner and in 2012, global HQ were established in Cambridge, Massachusetts, USA.
Customers600+ worldwide
Corporate Locations North America | EMEA
Our Milestones
Investors
Earlybird Venture Capital
Open Ocean Capital
2007 – Open Source
2010 – Open Core
2013 – Business Source
2014 – Big Data & Cloud
5,000 30,000
150,000
250,000
Global Users2007 2010 2013 2014
40
www.rapidminer.com
Activating the data science hero in every business analyst!