Predictive Analytics in a Heterogeneous World of Tools and
Transcript of Predictive Analytics in a Heterogeneous World of Tools and
Zementis © - Confidential
Dr. Michael ZellerZementis, Inc.
Big Data Science MeetupAugust 25, 2012
www.zementis.com@Zementis
Predictive Analytics in a Heterogeneous World of Tools and Big Data Platforms
Zementis © - Confidential 2
Software Technology‐ ADAPA® Decision Engine‐ UPPI Universal PMML Plug‐in‐ ADAPA Add‐in for Excel‐ ADAPA Control Center‐ PMML Converter‐ Transformations Generator
Consulting Services‐ Data Mining‐ Predictive Analytics‐ Statistical Data Analysis‐ Business Rules Development‐ Predictive Solutions, e.g.:
Credit Risk AssessmentCustomer PreferencesCredit Card FraudPredictive MaintenanceQuality ControlHealthcare Fraud/Abuse
San Diego and Hong Kong
Operational Predictive Analytics
Zementis © - Confidential 3
About ZementisHighlights, Use Cases and Examples
Peer-reviewed Articles and Publications Available at http://www.zementis.com/
R-Journal & ACM SIGKDD Explorations Journal KDD Conference Panel & Report / PMML Workshop LinkedIn PMML Discussion Group ~3000 members PMML Book
“PMML in Action: Unleashing the Power of Open Standards for Data Mining and Predictive Analytics”
Global Partner Network
Zementis © - Confidential
Today’s Focus: Business Value of Predictive Analytics
Score Distribution1st Lien Stand-Alone Loans
0%
2%
4%
6%
8%
10%
12%
14%
50 100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
Score
% W
ithin
Cla
ss
GoodsBadsPoly. (Goods)Poly. (Bads)
Score Distribution1st Lien Stand-Alone Loans
0%
2%
4%
6%
8%
10%
12%
14%
50 100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
Score
% W
ithin
Cla
ss
GoodsBadsPoly. (Goods)Poly. (Bads)
% of Delinquent Loans per Month
0
10
20
30
40
50
60
70
80
90
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
Months
% o
f Del
inqu
ent L
oans
700750800850900950
4
Operational Predictive Analytics
Zementis © - Confidential 6
PMMLPredictive Model Markup Language
Transformations
PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications.
Mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models.
Supported by all leading data mining tools, commercial and open-source.
Allows for the clear separation of tasks: Model development vs. model deployment.
Eliminates the need for custom code and proprietary model deployment solutions.
Uniform deployment platform ensures scalability and reliability of model execution.
Models
PMML defines a standard not only to represent models, but also data handling and data transformations (pre- and post-processing)
Transformations
Zementis © - Confidential
Individual Model
Input Data
PMML File
Input Validation
Pre‐Processing
Core Model
Post‐Processing
Missing Values, Invalid Values, Outliers
Normalize, Discretize, Bin, Map, etc.
Neural Nets, Trees, Regression, SVM, Clustering, etc.
Scored Data
Scaling, Thresholds, etc.
8
Zementis © - Confidential
Model Composition
Input Data
PMML File
Input Validation
Pre‐Processing Voting
Scored Data
Model 1
Model 2
Model 3
Majority Voting, Weighted Voting, Weighted Average, etc.
Scores from all models are computed
9
Zementis © - Confidential
Data Driven Model Segmentation
PMML File
Input Validation
Pre‐Processing
Scored Data
Model 1
Model 2
Model 3
?
Predicate‐based Model Selection
10
Input Data
Zementis © - Confidential
One Standard, One Process
Applications External Vendors
Service ProvidersDivisions
PMML
Zementis © - Confidential
PMML: Predictive Model Management Integrating across all systems and processes
ApplicationsCRM, ERP, EXCEL, etc.
Business Process
PMMLPMML
Cloud ComputingVirtual Server
…
In-databaseHadoop
…
Zementis © - Confidential 13
BusinessRules
BusinessRules
Auditing &ReportingAuditing &Reporting
PredictiveAnalyticsPredictiveAnalytics
Web Services &Java API
Web Services &Java API
ADAPA Predictive Analytics Decision Engine Overview
ADAPA Scoring Engine
Predictive ModelsBusiness Rules
DataEnhancedDecisioning
Zementis © - Confidential 14
Scalable Execution Platform
Environment to Manage Predictive Models
Framework for SOA‐based IT Integration
ADAPA is not ...
Execute your models in real‐time and on demand.Score in single decision or batch mode.
Deploy one or many models in the same engine.Manage and maintain models through web console.
Completely standards‐basedmodels and API.Easily integrated into your existing infrastructure.
ADAPA is not a model development environment.Use best‐of‐breed commercial or open source tools.
What isADAPA?What isADAPA?
Adaptive Decision And Predictive Analytics
Zementis © - Confidential
From Model Building to Model Deployment
15
Model DeploymentModel Building
ADAPA Deployment Options Amazon EC2 IBM SmartCloud Private Cloud In‐house As Embedded Java Library OEM / White Label
Zementis © - Confidential 16
Model DeploymentIntegration / Execution
Model Building
Universal PMML Plug-in for “Big Data” Scoring
In‐database & Hadoop Turn PMML into UDFs Deploy PMML files & UDF stubs Write SQL against UDFs
Zementis © - Confidential
ADAPA & Universal Plug-In OverviewFeatures and Model Types
17
The Plug-in delivers a wide range of predictive analytics for high performance scoring, including:
• Decision Trees for classification and regression• Neural Network Models: Back-Propagation, Radial-Basis Function, and Neural-Gas• Support Vector Machines for regression, binary and multi-class classification• Linear and Logistic Regression (binary and multinomial)• Naïve Bayes Classifiers• General and Generalized Linear Models• Cox Regression Models• Rule Set Models (flat decision trees)• Clustering Models: Distribution-Based, Center-Based, and 2-Step Clustering• Scorecards (including reason codes)• Association Rules• Multiple Models: Model ensemble, segmentation, chaining and composition
It also implements the a data dictionary, missing / invalid values handling and data pre-processing.
Zementis © - Confidential
ADAPA & Universal Plug-In OverviewBroad Compatibility across PMML Versions & Vendors
18
Universal PMML Plug-in Includes PMML Conversion, Validation & Correction
• Consumes PMML Versions 2.0 … 4.0
• Validates and Corrects Known Issues
• Ensures Compatibility with Vendors
• Invisible & Seamless to User
Zementis © - Confidential 19
Case Study – ADAPA in the Financial IndustryFraud & Risk Scoring
Scoring Bureau
IT Service Provider
Financial Institution
ADAPA Scoring Engine
Online Transactions Decision Management
Zementis © - Confidential 20
ADAPA Real-time Decision Management Sensor & device data processing
Energy
Biometrics
IP Network Security
Rotating Equipment
ADAPA Scoring Engine
Zementis © - Confidential
ADAPA Case Study – iPhone Mobile Scoring“On-the-Cloud” with Zementis ADAPA
21
ERP SCM CRM Legacy Others
Batch / Real-time Business Intelligence Hub
PMML Model Upload
Scores &Recommendation
Inquiry
DynaMine Data Mining Automation
Inquiry
Customer Information
Zementis © - Confidential 23
PMML – Why Should You Care? Best Practices for Predictive Analytics and Data Mining
Platform IndependentDeployment
Platform IndependentDeployment
Vendor NeutralStandard
Vendor NeutralStandard Time‐to‐Market
AgilityTime‐to‐Market
Agility
• Open Standards vs. Proprietary Code
• Select Best‐of‐Breed Tool Set• Aviod Vendor Lock‐in
• Deploy in Minutes vs. Months• Facilitate Clear Requirements &
Communication• Scale with Business Demand
• Big Data & Real‐Time• In‐Database & Hadoop• Server, Cloud & SaaS
Zementis © - Confidential 24
Thank You!
U.S.A Headquarters Asia Office
E-mail: [email protected]
19/F., Unit AHo Lee Commercial Building38-44 D’Aguilar StreetCentral, Hong Kong (S.A.R.)
Tel: +852 2868-0878Fax: +852 2845-6027
6125 Cornerstone Court EastSuite 250San Diego, CA, 92121
Tel: +1 619 330-0780Fax: +1 858 535-0227