Cloudera single-click-hadoop-deployment-cloudsoft-case-study-v1.0
Agile deployment predictive analytics on hadoop
-
Upload
hadoop-summit -
Category
Technology
-
view
3.680 -
download
6
Transcript of Agile deployment predictive analytics on hadoop
© 2012 Datameer, Inc. All rights reserved. Page 1
© 2012 Datameer, Inc. All rights reserved.
Agile Deployment of Predictive Analytics on
Hadoop
Faster Insights through Open Standards
Hadoop Summit 2012
© 2012 Datameer, Inc. All rights reserved. Page 2
Today’s Session
Ulrich Rueckert Michael Zeller Data Scientist CEO Datameer Zementis
After this session, you will be able to…
1. Effectively deliver predictive solutions combining: a. R, KNIME & Others [Model Development] b. Zementis Universal PMML Plug-in [Model Deployment & Execution] c. Datameer [Scalable Hadoop Infrastructure]
2. Identify PMML as a vendor-neutral & open standard to: a. Incorporate predictive models from virtually any commercial vendor or open source tool b. Apply such models on Big Data
3. Leverage a lightweight, agile deployment process for predictive analytics to: a. Accelerate time-to-market b. Lower cost and complexity c. Reuse existing predictive assets
© 2012 Datameer, Inc. All rights reserved. Page 3
§ “Business Intelligence on top of Hadoop” § Established 2009 by Hadoop and enterprise software veterans § Offices in Silicon Valley, New York and Germany
Who is Datameer?
§ Some customers:
© 2012 Datameer, Inc. All rights reserved. Page 4
§ Focus on “Operational Predictive Analytics” § Offices in San Diego and Hong Kong § Predictive Analytics Software Technology:
• ADAPA® Decision Engine (Predictive Models and Rules) • ADAPA Add-in for Excel • PMML Converter • Universal PMML Plug-in (UPPI)
§ Global Partner Network
Who is Zementis?
© 2012 Datameer, Inc. All rights reserved. Page 5
Big Data and Analytics
§ People and Sensor Data • Transaction records • Social media • Climate information • Mobile GPS signals • Healthcare • Smart Grid
§ Benefits from Analytics • Descriptive Analytics answers “What happened?” • Predictive Analytics answers “What will happen next?”
90% of the data today created in the last 2 years
© 2012 Datameer, Inc. All rights reserved. Page 6
Operational Predictive Analytics
Score Distribution1st Lien Stand-Alone Loans
0%
2%
4%
6%
8%
10%
12%
14%
50 100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
Score
% W
ithin
Cla
ss
GoodsBadsPoly. (Goods)Poly. (Bads)
% of Delinquent Loans per Month
0
10
20
30
40
50
60
70
80
90
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
Months
% o
f Del
inqu
ent L
oans
700750800850900950
© 2012 Datameer, Inc. All rights reserved. Page 7
From Model Building to Deployment
Model Deployment Integration / Execution
Model Building
Datameer Server
UPPI
PMML (models) PMML (models) PMML (models)
Simple Deployment & Execution 1. Upload PMML file(s) in DAS 2. PMML turns into custom function 3. Seamlessly score data in Datameer
PMML
© 2012 Datameer, Inc. All rights reserved. Page 8
PMML Predictive Model Markup Language
Transformations
• PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications.
• Mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models.
• Supported by all leading data mining tools, commercial and open-source. • Allows for the clear separation of tasks: Model development vs. model deployment.
• Eliminates the need for custom code and proprietary model deployment solutions.
• Uniform deployment platform ensures scalability and reliability of model execution.
PMML book available on Amazon.com
© 2012 Datameer, Inc. All rights reserved. Page 9
PMML: Predictive Model Management Integrating across all systems and processes
Applications CRM, ERP, EXCEL, etc.
Business Process
PMML
IBM SmartCloud Amazon EC2
© 2012 Datameer, Inc. All rights reserved. Page 10
Service Providers
Divisions
External Vendors
Applications
PMML
PMML: One Standard, One Process
© 2012 Datameer, Inc. All rights reserved. Page 11
Demo Setup
§ End-to-end “Model Development Lifecycle” § PMML Standard as the “Glue”
Universal PMML Plug-‐In
Model Design Development and Test
Model Deployment Data Analysis
Demonstrate Model Performance
Real-time Process Improvement and ROI
Understand Client’s Data
Build Model(s) to Unlock Hidden Value
© 2012 Datameer, Inc. All rights reserved. Page 12
Demo: Annual Marketing Campaign
§ Which customers should we target?
§ Split 2011 results in training and test set
§ Learn model on training set § Apply model on test set § Fine-tune model until
evaluation shows success § Apply final model on 2012
customer list
2011 Campaign
Results
Subset for Training
Subset for Testing
Prediction Model
2012Customer
List
Model Evaluation
CampaignCandidates
Fine-TunedPrediction
Model
© 2012 Datameer, Inc. All rights reserved. Page 13
Summary
Ease of Use Fast ROI
Avoid Vendor Lock-in Hadoop-based
Scoring Paradigm
• Open Standards vs. Proprietary Code
• Best-of-Breed Tool Set
• Minimize Data Movement • Massively Parallel Execution • Scale with Business Demand
• Leverage Datameer UI • Deploy in Minutes vs. Months • No Coding Skills Required
© 2012 Datameer, Inc. All rights reserved. Page 14
Online Resources
§ Learn More About PMML § Data Mining Group website http://www.dmg.org
§ Join LinkedIn PMML Discussion Group http://www.linkedin.com/groupRegistration?gid=2328634 § Articles, on-line videos, blogs http://www.zementis.com/community.htm
§ Product Info § On Demand Webinar http://data.datameer.com/power-of-big-data-insights-of-predictive-analytics/ § UPPI for Datameer http://www.zementis.com/DAS-plugin.htm