Kamanja: Driving Business Value through Real-Time Decisioning Solutions

download Kamanja: Driving Business Value through Real-Time Decisioning Solutions

If you can't read please download the document

Transcript of Kamanja: Driving Business Value through Real-Time Decisioning Solutions

  1. 1. 2015 ligaDATA, Inc. All Rights Reserved. Driving Business Value Through Real-Time Decisioning Solutions July 2015 Download, Forums, Docs, Events http://Kamanja.orgligaDATA
  2. 2. 2 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Outline Motivation Case Study Modeling Department Review Mining and Big Data Tools Solution: Predictive Markup Modeling Language (PMML) Reviewing Big Data Space and Real Time Kamanja Integration (Open Source PMML) Use Cases, Demo, Architecture
  3. 3. 3 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Audience Survey (show of hands) Data Mining Experience __% Read or heard about __% Class or competition __% Put a model into production __% Have put 10+ models in production __% Have put 75+ models in production Big Data Experience __% Read or heard about __% Class or exploration project __% Put a system into production __% System with 3+ OSS in prod __% System with 6+ OSS or PB+ in production Extensive Data Mining AND Big Data Experience __% with 10+ models AND 3+ OSS __% with 75+ models AND 6+ OSS / PB+ Overlap on extensive experience is rare This is what Kamanja helps with
  4. 4. 4 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Case Study of a Modeling Department Financial Fraud Detection CONTEXT 3 modelers, 2 data infrastructure people in department Over 3 dozen predictive models in production, high $$$$ and visibility Separate Operations group deploying models PROBLEM Models were getting stale Spinning Plates between short term solutions 2 months for a full model training investigation 2 months to put a model into production (OUCH) Had to completely re-code the preprocessing and model scoring Operations had One process to deploy a regression Operations had a different process to deploy a decision tree
  5. 5. 5 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA http://www.kdnuggets.com/polls/2015/analytics-data-mining-data-science-software-used.html Challenges of Managing a Department of Models: Integration & Deployment
  6. 6. 6 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA http://www.kdnuggets.com/2015/06/data-mining-data-science-tools-associations.html Challenges of Managing a Department of Models: Integration & Deployment
  7. 7. 7 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA http://www.kdnuggets.com/2015/06/data-mining-data-science-tools-associations.html Challenges of Managing a Department of Models: Integration & Deployment
  8. 8. 8 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Predictive Modeling Markup Language PMML is an Integration & Deployment Solution PMML Producers (18 companies) R (Rattle, PMML)* RapidMiner KNIME* PMML Consumers (12 co) Zemintis IBM SPSS KNIME Microstrategy SAS Kamanja* (Open Source) Spark (MLib)* * = Open Source Weka* SAS Enterprise Miner PREDICTIVE Nave Bayes Neural Net Regression Rules Scorecard Sequence SVM Time Series Trees DESCRIPTIVE / OTH Association Rules Cluster, K-Nearest Nb Text Models model ensembles & composition (i.e. Gradient Boosting)
  9. 9. 9 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Case Study of a Modeling Department Financial Fraud Detection SOLUTION OBJECTIVES Decrease time on putting models into production (incr analysis time) Support a wider variety of algorithms and software (increase accuracy)
  10. 10. 10 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Case Study of a Modeling Department Financial Fraud Detection SOLUTION OBJECTIVES Decrease time on putting models into production (incr analysis time) Support a wider variety of algorithms and software (increase accuracy) SOLUTION Train models in SAS Enterprise Miner, R (PMML Producers) Score models with a RESTful call to a PMML Consumer (Zementis) Predictive Modeling Markup Language (PMML) is a type of XML RESULT PUT MODELS INTO PRODUCTION 4 TO 20 TIMES FASTER! By supporting more software & algorithms MORE ACCURATE! Greatly increased throughput of training new models!
  11. 11. 11 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Outline Motivation Case Study Modeling Department Review Mining and Big Data Tools Solution: Predictive Markup Modeling Language (PMML) Other Uses of PMML Reviewing Big Data Space and Real Time Kamanja Integration (Open Source PMML) Use Cases, Demo, Architecture
  12. 12. 12 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Other Uses of PMML or Real-Time Decisioning Complex Event Processing (CEP) Possibly 100s of concurrent data streams Apply rule logic, select, aggregate Select action on elements in stream Enterprise Applications, During customer call or chat: recommendations to improve service card transaction: offer credit increase web application: pre-approval web transaction: recommend other product(s) MOOC: customize training speed for the student
  13. 13. 2015 ligaDATA, Inc. All Rights Reserved.13 ligaDATA Real Time Open Source Systems (OSS)Kamanja and Spark are good Compliments
  14. 14. OSS Technology StackKamanja Integrates With. Hbase, Cassandra, InfluxDB HDFS Kamanja Cloud, EC2 Private Clusters High level languages / abstractions Search Real Time Streaming Compute Fabric Kerberos Security Data Store Real Time Computing Zookeeper Resource Management Batch Computing Kafka, MQ ligaDATA
  15. 15. Create adaptor to add others Yarn OSS Technology Stack (can be confusing) Upcoming Integrations Hbase, Cassandra, InfluxDB HDFS Kamanja Cloud, EC2 Private Clusters MLlib High level languages / abstractions Search Real Time Streaming Compute Fabric Kerberos Security Data Store Spark Real Time Computing Zookeeper Resource Management Batch Computing Kafka, MQ Spark ligaDATA
  16. 16. Create adaptor to add others Mesos Yarn OSS Technology Stack (can be confusing) There are currently 300+ Apache projects! Hbase, Cassandra, InfluxDB HDFS Kamanja Cloud, EC2 Private Clusters MLlib Mllib, Mahout, Oozie, Pig, Impala, Hive High level languages / abstractions Search Solr, Elastic SearchReal Time Streaming Compute Fabric Kerberos Security Data Store Spark Real Time Computing Zookeeper Resource Management Batch Computing Hadoop, Tez Kafka, MQ Storm Spark ligaDATA
  17. 17. 17 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Higher Requirements in Fin Services or Health Care Compared to social media or web apps 2015 ligaDATA, Inc. All Rights Reserved. CONFIDENTIAL17 Legal Compliance to meet exacting technical standards Losing (or duplicating) a bank transaction Losing a medical record Executives or employees can GO TO JAIL What is different about these industries? Regulatory requirements requires 100% data protection Security Auditability Lineage ZERO data loss ligaDATA
  18. 18. 18 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Migrated from mass generalized communications to real time personalized alerts Increased messaging eectiveness of 400% lift in conversion to digital $8.6M savings in the rst year Full integration with Mainframe Leverage streaming transaction and customer account data Business Objectives Reduce operating cost of Calling Centers Increase customer adoption of digital channels Interact with Customer at point of transaction IT Objectives Implement cost eective and scalable platform Satisfy nancial services security and compliance req.s Integrate with existing core systems Driving Digital Adoption: Bank Call Center 2015 ligaDATA, Inc. All Rights Reserved. CONFIDENTIAL18 Results ligaDATA
  19. 19. 2015 ligaDATA, Inc. All Rights Reserved.19 ligaDATA Medical Company use of Kamanja Lines of Business Run Medco models supplying client's intelligence based upon model ndings (using multi tenant deployment when appropriate) Run Customer models on Medco hardware (on Medco owned customer private net) Consult/partner with Medco customers providing software solutions to be run on Customer net
  20. 20. 2015 ligaDATA, Inc. All Rights Reserved.20 ligaDATA Clinicians (knowledge experts) develop heuristic based rule set models The initial model was COPD (Chronic Obstructive Pulmonary Disease) risk assessment Models are expressed with a Domain Specific Language (DSL) they developed DSL models are transformed to PMML for Kamanja Models consume current + prior related messages over look back period Save the assertions of a patient in the database (beyond standard PMML) Medco plans to integrate the DSL with their ontology data modeling effort Goal is to generate new models as their medical world ontology evolves Medical Company use of Kamanja
  21. 21. 2015 ligaDATA, Inc. All Rights Reserved.21 ligaDATA DECISION GATEWAYS Representative Kamanja Solution Architecture MAINFRAME DB2 SERVER INTEGRATION CDC Kafka Inbound Queue Kamanja Security Management Error Management Metadata Service/Cache Storage Service/Cache Message Construction Decision Engine Output Handler Transform Compute MakeObj Parallel DAG Executor DAG Optimizer DAG Generator Change Listner Output Generator Output Disctributor DBs Apps Notication Engine DATA SOURCES DW Customer Preferences Kafka Outbound Queue HBase - HistoryHDFS Long term storage Zookeeper Resource Management ligaDATA
  22. 22. 2015 ligaDATA, Inc. All Rights Reserved.22 ligaDATA Performance Characteristics 2015 ligaDATA, Inc. All Rights Reserved. CONFIDENTIAL22 Performance Throughput of million messages/second Uses commodity hardware Scalability Linear scalability vertically and horizontally Data partitioning support Runtime multi-model optimizations to supports thousands of models Consistent performance on hundreds of models and thousands of rules Built for IoT data volumes ligaDATA
  23. 23. Data Transformer Data History (Cassandra, HBase) Metadata (expanded in next slides) Model Executor Output Dispatcher Kafka Queue Input Adapter Output Adapter Next Process Kamanja Engine Kamanja Execution Flow on a Node Storage Adapter ligaDATA
  24. 24. Metadata Functions (in PMML, Scala as User Defined Func (UDF)) Models (PMML Rule Set, i.e. fraud, attrition) Messages (from input queue, real time records) Containers (i.e. a record or lookup table to provide context, priors) Types (i.e. array of patients, Drs, types of containers) Concepts (PMML created fields, preprocessing, scores) Metadata API Elements ligaDATA
  25. 25. Metadata Metadata API Scoring Engine Manager (within a model) Model Manager (activate, control a DAG of many models) PMML Producer, or application Admin App, used by DevOps Activate PMML Model or DAG Rest API Metadata API Subsystems Configuration (Cluster, Engine, Model Compilation) Kamanja Engine ligaDATA
  26. 26. Model Runtime Kamanja Runtime Model Execution Transformer Data History, (HBase, Cassandra, ..) Msg Storage Adaptor(s) Metadata Instance Model Object Model Factory 1) Message rec by runtime engine 2) Metadata is checked To see what model is Interested in the message 3) Model object Is instantiated 4) Msg saved in history 5) Model is executed on the Message obj 6) Output of the model is returned to the engine ligaDATA
  27. 27. If the node that crashes is a Kamanja Slave node The Kamanja Leader Node rebalances over all Kamanja nodes Each message is processed EXACTLY ONCE A Bank needs to process a transaction ONCE AND ONLY ONCE Look at the state of every message through each step If the Kamanja Leader node goes down, The next node on the list becomes the Leader, then rebalance COMPARE TO: Spark and Storm would execute each message AT LEAST ONCE (but may process a message 2, 3 or 4 times). The expectation is for the application to handle possible dup. What happens when a node goes down? ligaDATA
  28. 28. 2015 ligaDATA, Inc. All Rights Reserved.28 ligaDATA Kamanja Integration Points We provide with an enterprise friendly license (No GPL License virus to infect the entire system) Adaptors: for any data flow Kafka, IBMs MQ, Hbase, Cassandra, InfluxDB, Zookeeper, Spark User Defined Functions: Provide a JAR file or Scala function Custom Java Model Can skip PMML, leverage Adaptors and UDFs Import generated Java code
  29. 29. 29 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Deploy Predictive Models and Rules in 1/100th the time it takes today Kamanja is an open source, real time decisioning engine Hardened to meet strictest requirements of Financial Services, Healthcare and scalable to handle IoT Kamanja Enables Developers and Data Scientists to reduce time to deploy Rules and Predictive Models Kamanja integrates with your Big Data ecosystem
  30. 30. 2015 ligaDATA, Inc. All Rights Reserved.30 ligaDATA Planned Kamanja Dierentiation Model management, enable DevOps for modelsDevOps: automated testing, validation, deployment and rollbackA/B testing to competitively roll out model update, scheduling Enterprise Level Security and Multiple-TenancyIntegration using KerberosRole based security for model managementSecurity at eld level for models, need to know/access Multi tenancypartition internal groups in dierent tenanciesData isolation, resource management, SLA support Data IntegrationBuilt-in integrations for social data and third party dataCan consume 100s of dierent event and document types
  31. 31. 2015 ligaDATA, Inc. All Rights Reserved.31 ligaDATA Planned Kamanja Dierentiation Performance and ScaleDynamic scaling enlarge and shrink as needed, based on loadLeap in performance by generating native code (vs. Java)Cost aware execution in cloud environment Extensive integrations with enterprise queue, storage and indexingMQ, HBase, Cassandra, RDBMS, Elastic Search, Zookeeper Domain specic libraries and model templates to speed up preprocessing, business logic and algorithms
  32. 32. 2015 ligaDATA, Inc. All Rights Reserved. Try out Kamanja 2015 ligaDATA, Inc. All Rights Reserved. CONFIDENTIALDownload, Forums, Docs, Events http://Kamanja.orgligaDATA