Low-Latency Machine Learning Applications · Advanced Machine Learning & AI Time-Series Analysis....
Transcript of Low-Latency Machine Learning Applications · Advanced Machine Learning & AI Time-Series Analysis....
© 2018 HAZELCAST | 1
THE LEADING IN-MEMORY COMPUTING PLATFORM
Low-Latency Machine Learning Applications
John DesJardins – CTO N. America
© 2018 HAZELCAST | 2
AI Techniques Continue to Expand & Evolve
AI
Machine Learning
SupervisedLearning
Classification
Regression
UnsupervisedLearning
Dimensionality Reduction
Clustering
ReinforcementLearning
Deep Learning
Simulation
Each Innovation Introduces New Challenges in Scalability of Compute & Storage
Image/Video ProcessingUnstructured DataFraud & Anomaly Detection
Predicting TrendsStructured Numeric Data
Image ProcessingUnstructured Data Feature Extraction
Data ExplorationFeature Extraction
Images, Video, AudioAdvanced Machine Learning & AITime-Series Analysis
Online Machine Learning within an In-Memory PlatformMove from Reactive to Pro-Active
Take Action before negative impact or ahead of opportunity
Ingest Classify Predict Pro-ActEnrich
Context Meaning
Low-Latency Data Grid Data at Rest
Low-Latency Stream Processing - Data in Motion
Models
No SQLData Lake
Batch MLModel Training
Hazelcast In-Memory Platform – Fast Data
Offline – Slow DataData at Rest
Hazelcast IMDG
Hazelcast Jet
Online Inference & Nearline Machine Learning
Ingest Classify Predict Pro-ActEnrich
Context Meaning
Low-Latency Data Grid Data at Rest
Low-Latency Stream Processing - Data in Motion
Models
No SQLData Lake
Batch MLModel Training
Offline – Slow DataData at Rest
Hazelcast IMDG
Hazelcast Jet
Nearline Re-training
© 2019 HAZELCAST Confidential & Proprietary | 5
Card-ProcessingInfrastructure
Time-Based
SLA
Swipe
Response
The Hazelcast DifferenceExample: Credit Card Processing
Majority of Time Consumed in Network Transit
MillisecondsInitial Processing:Microseconds
FraudDetectionAlgorithm
Tiny Window of TimeFor Accurate Processing
Time
# ofCard
Terminals
Traditional
eCommerce
iPhones
Square
Payment Evolution
Time
# ofTransactions � Performance at massive scale
� Increase in fraud attempts
Business Challenge
Performance At Scalegives time for
Multiple Algorithms
Business Opportunity: TIME
© 2018 Hazelcast Inc. Confidential & Proprietary
Hazelcast - High Performance Platform
Secure | Manage | OperateEmbeddable | Scalable | Low-Latency
Secure | Resilient | Distributed
Ingest & TransformEvents, Connectors, Filtering
CombineJoin, Enrich, Group, Aggregate
StreamWindowing, Event-Time Processing
Compute & ActDistributed & Parallel Computations
Live StreamsKafka, JMS,
Sensors, Feeds
DatabasesJDBC,
Relational, NoSQL,Change Events
FilesHDFS, Flat Files,
Logs, File watcher
ApplicationsSockets
Mobile
Apps
Commerce
Communities
Social
AnalyticsVisualization
Data Lake
IntegrateAPIs, Microservices, Notifications
CommunicateSerialization, Protocols
Store/UpdateCaching, CRUD Persistence
ComputeQuery, Process, Execute
IMDG In-Memory Data Grid
Jet In-Memory Streams
ScaleClustering & Cloud, High Density
ReplicateWAN Replication, Partitioning
Management Center
SecurePrivacy, Authentication,
Authorization
AvailableRolling Upgrades, Hot Restart
SecurePrivacy, Authentication,
Authorization
AvailableJob Elasticity, Graceful shutdown
© HAZELCAST Confidential & Proprietary | 7
Messaging(Kafka, JMS)
loT, Social
CustomConnector
Enterprise Applications
Hazelcast IMDG
Filesystem(files, watcher)
Object Store(HDFS, S3, ..)
Database (JDBC, Redis,
Mongo, )
Messaging
Alerts
Enterprise Applications
Object Store
Filesystem
Database
Hazelcast
Interactive Analytics
Enrichment
RemoteIMDG Database
TransformComputeStreamCombineIngest
Source Output
Inference
In-memOperational
Storage
Join, Enrich,Group,
Aggregate
Windowing, Event-TimeProcessing
Distributed and Parallel
Computations
Filter, Clean, Convert
Predict, Score Apply ML
ML or business rules
Database Events
Publish
In-memory, Subscriber
Notifications
Model Service
InferenceCo-located Model
Local IMDG
© 2018 Hazelcast Inc. Confidential & Proprietary
Evolution of Stream Processing
1st Gen (2000s)Hadoop(batch) or Apama(CEP)hard choices
Distributed Batch Compute – MapReduce – scaled, parallelized, distributed, resilient, - not real-timeorSiloed, Real-time – Complex Event Processing – specialized languages, not resilient, not distributed(single instance), hard to scale, fast, but brittle, proprietary
2nd Gen (2014)Sparkhard to manage
Micro-batch distributed – heavy weight, complex to manage, not elastic, require large dedicated environments with many moving parts, not Cloud-friendly, not low-latency
3rd Gen (2017 Jet & Flink)flexible & scalableTrue “Fast Data”
Distributed, real-time streaming – highly parallel, true streams, advanced techniques (Directed Acyclic Graph) enabling reliable distributed job executionFlexible deployment - Cloud-native, elastic, embeddable, light-weight, supports serverless, fog & edge.Low-latency Streaming, ETL, and fast-batch processing, built on proven data grid
© 2018 Hazelcast Inc. Confidential & Proprietary
Streaming Performance
*
* Spark had all performance options, including Tungsten, turned on
© 2018 Hazelcast Inc. Confidential & Proprietary
Give Hazelcast a try
Hazelcast.org – Data Grid
https://github.com/hazelcast
Jet.hazelcast.org – Streaming Analytics
Demos: https://jet.hazelcast.org/demos/
My info: •John DesJardins
• [email protected] - email
•@johnmdesjardins - Twitter
© 2018 HAZELCAST | 11
Thank You