1. 2015 ligaDATA, Inc. All Rights Reserved. Driving Business
Value Through Real-Time Decisioning Solutions July 2015 Download,
Forums, Docs, Events http://Kamanja.orgligaDATA
2. 2 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Outline
Motivation Case Study Modeling Department Review Mining and Big
Data Tools Solution: Predictive Markup Modeling Language (PMML)
Reviewing Big Data Space and Real Time Kamanja Integration (Open
Source PMML) Use Cases, Demo, Architecture
3. 3 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Audience
Survey (show of hands) Data Mining Experience __% Read or heard
about __% Class or competition __% Put a model into production __%
Have put 10+ models in production __% Have put 75+ models in
production Big Data Experience __% Read or heard about __% Class or
exploration project __% Put a system into production __% System
with 3+ OSS in prod __% System with 6+ OSS or PB+ in production
Extensive Data Mining AND Big Data Experience __% with 10+ models
AND 3+ OSS __% with 75+ models AND 6+ OSS / PB+ Overlap on
extensive experience is rare This is what Kamanja helps with
4. 4 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Case
Study of a Modeling Department Financial Fraud Detection CONTEXT 3
modelers, 2 data infrastructure people in department Over 3 dozen
predictive models in production, high $$$$ and visibility Separate
Operations group deploying models PROBLEM Models were getting stale
Spinning Plates between short term solutions 2 months for a full
model training investigation 2 months to put a model into
production (OUCH) Had to completely re-code the preprocessing and
model scoring Operations had One process to deploy a regression
Operations had a different process to deploy a decision tree
5. 5 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA
http://www.kdnuggets.com/polls/2015/analytics-data-mining-data-science-software-used.html
Challenges of Managing a Department of Models: Integration &
Deployment
6. 6 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA
http://www.kdnuggets.com/2015/06/data-mining-data-science-tools-associations.html
Challenges of Managing a Department of Models: Integration &
Deployment
7. 7 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA
http://www.kdnuggets.com/2015/06/data-mining-data-science-tools-associations.html
Challenges of Managing a Department of Models: Integration &
Deployment
8. 8 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA
Predictive Modeling Markup Language PMML is an Integration &
Deployment Solution PMML Producers (18 companies) R (Rattle, PMML)*
RapidMiner KNIME* PMML Consumers (12 co) Zemintis IBM SPSS KNIME
Microstrategy SAS Kamanja* (Open Source) Spark (MLib)* * = Open
Source Weka* SAS Enterprise Miner PREDICTIVE Nave Bayes Neural Net
Regression Rules Scorecard Sequence SVM Time Series Trees
DESCRIPTIVE / OTH Association Rules Cluster, K-Nearest Nb Text
Models model ensembles & composition (i.e. Gradient
Boosting)
9. 9 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Case
Study of a Modeling Department Financial Fraud Detection SOLUTION
OBJECTIVES Decrease time on putting models into production (incr
analysis time) Support a wider variety of algorithms and software
(increase accuracy)
10. 10 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Case
Study of a Modeling Department Financial Fraud Detection SOLUTION
OBJECTIVES Decrease time on putting models into production (incr
analysis time) Support a wider variety of algorithms and software
(increase accuracy) SOLUTION Train models in SAS Enterprise Miner,
R (PMML Producers) Score models with a RESTful call to a PMML
Consumer (Zementis) Predictive Modeling Markup Language (PMML) is a
type of XML RESULT PUT MODELS INTO PRODUCTION 4 TO 20 TIMES FASTER!
By supporting more software & algorithms MORE ACCURATE! Greatly
increased throughput of training new models!
11. 11 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA
Outline Motivation Case Study Modeling Department Review Mining and
Big Data Tools Solution: Predictive Markup Modeling Language (PMML)
Other Uses of PMML Reviewing Big Data Space and Real Time Kamanja
Integration (Open Source PMML) Use Cases, Demo, Architecture
12. 12 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Other
Uses of PMML or Real-Time Decisioning Complex Event Processing
(CEP) Possibly 100s of concurrent data streams Apply rule logic,
select, aggregate Select action on elements in stream Enterprise
Applications, During customer call or chat: recommendations to
improve service card transaction: offer credit increase web
application: pre-approval web transaction: recommend other
product(s) MOOC: customize training speed for the student
13. 2015 ligaDATA, Inc. All Rights Reserved.13 ligaDATA Real
Time Open Source Systems (OSS)Kamanja and Spark are good
Compliments
14. OSS Technology StackKamanja Integrates With. Hbase,
Cassandra, InfluxDB HDFS Kamanja Cloud, EC2 Private Clusters High
level languages / abstractions Search Real Time Streaming Compute
Fabric Kerberos Security Data Store Real Time Computing Zookeeper
Resource Management Batch Computing Kafka, MQ ligaDATA
15. Create adaptor to add others Yarn OSS Technology Stack (can
be confusing) Upcoming Integrations Hbase, Cassandra, InfluxDB HDFS
Kamanja Cloud, EC2 Private Clusters MLlib High level languages /
abstractions Search Real Time Streaming Compute Fabric Kerberos
Security Data Store Spark Real Time Computing Zookeeper Resource
Management Batch Computing Kafka, MQ Spark ligaDATA
16. Create adaptor to add others Mesos Yarn OSS Technology
Stack (can be confusing) There are currently 300+ Apache projects!
Hbase, Cassandra, InfluxDB HDFS Kamanja Cloud, EC2 Private Clusters
MLlib Mllib, Mahout, Oozie, Pig, Impala, Hive High level languages
/ abstractions Search Solr, Elastic SearchReal Time Streaming
Compute Fabric Kerberos Security Data Store Spark Real Time
Computing Zookeeper Resource Management Batch Computing Hadoop, Tez
Kafka, MQ Storm Spark ligaDATA
17. 17 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Higher
Requirements in Fin Services or Health Care Compared to social
media or web apps 2015 ligaDATA, Inc. All Rights Reserved.
CONFIDENTIAL17 Legal Compliance to meet exacting technical
standards Losing (or duplicating) a bank transaction Losing a
medical record Executives or employees can GO TO JAIL What is
different about these industries? Regulatory requirements requires
100% data protection Security Auditability Lineage ZERO data loss
ligaDATA
18. 18 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA
Migrated from mass generalized communications to real time
personalized alerts Increased messaging eectiveness of 400% lift in
conversion to digital $8.6M savings in the rst year Full
integration with Mainframe Leverage streaming transaction and
customer account data Business Objectives Reduce operating cost of
Calling Centers Increase customer adoption of digital channels
Interact with Customer at point of transaction IT Objectives
Implement cost eective and scalable platform Satisfy nancial
services security and compliance req.s Integrate with existing core
systems Driving Digital Adoption: Bank Call Center 2015 ligaDATA,
Inc. All Rights Reserved. CONFIDENTIAL18 Results ligaDATA
19. 2015 ligaDATA, Inc. All Rights Reserved.19 ligaDATA Medical
Company use of Kamanja Lines of Business Run Medco models supplying
client's intelligence based upon model ndings (using multi tenant
deployment when appropriate) Run Customer models on Medco hardware
(on Medco owned customer private net) Consult/partner with Medco
customers providing software solutions to be run on Customer
net
20. 2015 ligaDATA, Inc. All Rights Reserved.20 ligaDATA
Clinicians (knowledge experts) develop heuristic based rule set
models The initial model was COPD (Chronic Obstructive Pulmonary
Disease) risk assessment Models are expressed with a Domain
Specific Language (DSL) they developed DSL models are transformed
to PMML for Kamanja Models consume current + prior related messages
over look back period Save the assertions of a patient in the
database (beyond standard PMML) Medco plans to integrate the DSL
with their ontology data modeling effort Goal is to generate new
models as their medical world ontology evolves Medical Company use
of Kamanja
21. 2015 ligaDATA, Inc. All Rights Reserved.21 ligaDATA
DECISION GATEWAYS Representative Kamanja Solution Architecture
MAINFRAME DB2 SERVER INTEGRATION CDC Kafka Inbound Queue Kamanja
Security Management Error Management Metadata Service/Cache Storage
Service/Cache Message Construction Decision Engine Output Handler
Transform Compute MakeObj Parallel DAG Executor DAG Optimizer DAG
Generator Change Listner Output Generator Output Disctributor DBs
Apps Notication Engine DATA SOURCES DW Customer Preferences Kafka
Outbound Queue HBase - HistoryHDFS Long term storage Zookeeper
Resource Management ligaDATA
22. 2015 ligaDATA, Inc. All Rights Reserved.22 ligaDATA
Performance Characteristics 2015 ligaDATA, Inc. All Rights
Reserved. CONFIDENTIAL22 Performance Throughput of million
messages/second Uses commodity hardware Scalability Linear
scalability vertically and horizontally Data partitioning support
Runtime multi-model optimizations to supports thousands of models
Consistent performance on hundreds of models and thousands of rules
Built for IoT data volumes ligaDATA
23. Data Transformer Data History (Cassandra, HBase) Metadata
(expanded in next slides) Model Executor Output Dispatcher Kafka
Queue Input Adapter Output Adapter Next Process Kamanja Engine
Kamanja Execution Flow on a Node Storage Adapter ligaDATA
24. Metadata Functions (in PMML, Scala as User Defined Func
(UDF)) Models (PMML Rule Set, i.e. fraud, attrition) Messages (from
input queue, real time records) Containers (i.e. a record or lookup
table to provide context, priors) Types (i.e. array of patients,
Drs, types of containers) Concepts (PMML created fields,
preprocessing, scores) Metadata API Elements ligaDATA
25. Metadata Metadata API Scoring Engine Manager (within a
model) Model Manager (activate, control a DAG of many models) PMML
Producer, or application Admin App, used by DevOps Activate PMML
Model or DAG Rest API Metadata API Subsystems Configuration
(Cluster, Engine, Model Compilation) Kamanja Engine ligaDATA
26. Model Runtime Kamanja Runtime Model Execution Transformer
Data History, (HBase, Cassandra, ..) Msg Storage Adaptor(s)
Metadata Instance Model Object Model Factory 1) Message rec by
runtime engine 2) Metadata is checked To see what model is
Interested in the message 3) Model object Is instantiated 4) Msg
saved in history 5) Model is executed on the Message obj 6) Output
of the model is returned to the engine ligaDATA
27. If the node that crashes is a Kamanja Slave node The
Kamanja Leader Node rebalances over all Kamanja nodes Each message
is processed EXACTLY ONCE A Bank needs to process a transaction
ONCE AND ONLY ONCE Look at the state of every message through each
step If the Kamanja Leader node goes down, The next node on the
list becomes the Leader, then rebalance COMPARE TO: Spark and Storm
would execute each message AT LEAST ONCE (but may process a message
2, 3 or 4 times). The expectation is for the application to handle
possible dup. What happens when a node goes down? ligaDATA
28. 2015 ligaDATA, Inc. All Rights Reserved.28 ligaDATA Kamanja
Integration Points We provide with an enterprise friendly license
(No GPL License virus to infect the entire system) Adaptors: for
any data flow Kafka, IBMs MQ, Hbase, Cassandra, InfluxDB,
Zookeeper, Spark User Defined Functions: Provide a JAR file or
Scala function Custom Java Model Can skip PMML, leverage Adaptors
and UDFs Import generated Java code
29. 29 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Deploy
Predictive Models and Rules in 1/100th the time it takes today
Kamanja is an open source, real time decisioning engine Hardened to
meet strictest requirements of Financial Services, Healthcare and
scalable to handle IoT Kamanja Enables Developers and Data
Scientists to reduce time to deploy Rules and Predictive Models
Kamanja integrates with your Big Data ecosystem
30. 2015 ligaDATA, Inc. All Rights Reserved.30 ligaDATA Planned
Kamanja Dierentiation Model management, enable DevOps for
modelsDevOps: automated testing, validation, deployment and
rollbackA/B testing to competitively roll out model update,
scheduling Enterprise Level Security and
Multiple-TenancyIntegration using KerberosRole based security for
model managementSecurity at eld level for models, need to
know/access Multi tenancypartition internal groups in dierent
tenanciesData isolation, resource management, SLA support Data
IntegrationBuilt-in integrations for social data and third party
dataCan consume 100s of dierent event and document types
31. 2015 ligaDATA, Inc. All Rights Reserved.31 ligaDATA Planned
Kamanja Dierentiation Performance and ScaleDynamic scaling enlarge
and shrink as needed, based on loadLeap in performance by
generating native code (vs. Java)Cost aware execution in cloud
environment Extensive integrations with enterprise queue, storage
and indexingMQ, HBase, Cassandra, RDBMS, Elastic Search, Zookeeper
Domain specic libraries and model templates to speed up
preprocessing, business logic and algorithms
32. 2015 ligaDATA, Inc. All Rights Reserved. Try out Kamanja
2015 ligaDATA, Inc. All Rights Reserved. CONFIDENTIALDownload,
Forums, Docs, Events http://Kamanja.orgligaDATA