Open Data Hub - A Deep Dive - assets.openshift.com · Large-scale data access Multi-user Jupyter...
Transcript of Open Data Hub - A Deep Dive - assets.openshift.com · Large-scale data access Multi-user Jupyter...
![Page 1: Open Data Hub - A Deep Dive - assets.openshift.com · Large-scale data access Multi-user Jupyter Used for data science and research Monitoring and alerting toolkit Used to diagnose](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed188300ebe3b04cd2fd94a/html5/thumbnails/1.jpg)
Open Data Hub - A Deep DiveOpenShift Commons Gathering San Francisco
Sherard GriffinSenior ManagerAI CoE
Václav PavlinPrincipal Software EngineerAI CoE
Peter MacKinnonPrincipal Software EngineerAI CoE
1
Sherard GriffinSenior Manager, Red Hat AI Center of Excellence
![Page 2: Open Data Hub - A Deep Dive - assets.openshift.com · Large-scale data access Multi-user Jupyter Used for data science and research Monitoring and alerting toolkit Used to diagnose](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed188300ebe3b04cd2fd94a/html5/thumbnails/2.jpg)
What Does a Data Scientist Want?
As a Data Scientist, I want a
“self-service cloud like” experience
for my Machine Learning projects,
where I can access a rich set of
modelling frameworks, data, and
computational resources, share
and collaborate with colleagues,
and deliver my work into
production with speed, agility and
repeatability to drive business
value!
Self service portal to select ML frameworks, data access
Perform ML Modelling
Inferencing w/ hardware acceleration
ML model deployment in app dev
2
![Page 3: Open Data Hub - A Deep Dive - assets.openshift.com · Large-scale data access Multi-user Jupyter Used for data science and research Monitoring and alerting toolkit Used to diagnose](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed188300ebe3b04cd2fd94a/html5/thumbnails/3.jpg)
Current Situation and Challenges
DATA SCIENTIST DATA SCIENTIST DATA SCIENTIST
➔ Team(s) of Data Scientists and Developers
➔ Sharing and collaboration, if any, is difficult,
manual, error prone and take time
➔ Access to limited non-shared resources
means modeling takes long times or can’t
achieve desired accuracy
➔ Delivering into models into production is a
challenge
3
![Page 4: Open Data Hub - A Deep Dive - assets.openshift.com · Large-scale data access Multi-user Jupyter Used for data science and research Monitoring and alerting toolkit Used to diagnose](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed188300ebe3b04cd2fd94a/html5/thumbnails/4.jpg)
Why OpenShift And Cloud Platforms for ML Workloads?
EXISTING AUTOMATION
TOOLSETS
SCM(GIT)
CI/CD
SERVICE LAYER
PERSISTENTSTORAGE
REGISTRY
RHEL
NODE
c
RHEL
NODE
RHEL
NODE
RHEL
NODE
RHEL
NODE
RHEL
NODE
C
C
C C
C
C
C CC C
RED HATENTERPRISE LINUX
MASTER
API/AUTHENTICATION
DATA STORE
SCHEDULER
HEALTH/SCALING
PHYSICAL VIRTUAL PRIVATE PUBLIC HYBRID
DATA SCIENTIST
ML deployed across clouds, data center,
and edge
ML services, load balanced
and scaled
ML microservices scheduled and
orchestrated on shared resources
ML in Production
4
![Page 5: Open Data Hub - A Deep Dive - assets.openshift.com · Large-scale data access Multi-user Jupyter Used for data science and research Monitoring and alerting toolkit Used to diagnose](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed188300ebe3b04cd2fd94a/html5/thumbnails/5.jpg)
Connected Drive & Autonomous Driving
Autonomous Driving
Data driven diagnosis of “Sepsis”
Digital banking with personalized services
Oil & Gas Exploration
5
AI/ML on OpenShift Momentum is Strong
![Page 6: Open Data Hub - A Deep Dive - assets.openshift.com · Large-scale data access Multi-user Jupyter Used for data science and research Monitoring and alerting toolkit Used to diagnose](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed188300ebe3b04cd2fd94a/html5/thumbnails/6.jpg)
Data Acquisition & Preparation
ML Modelling (Selection, Training,
Testing)
ML Model Deployment in
App. Dev. Process
Data Engineer
Data Scientists
App Developer
IT Operations
BusinessObjectives
Data
Business Leadership
Business Leadership
Intelligent applicationsto achieve
business outcomes
6
Machine Learning Pipeline
![Page 7: Open Data Hub - A Deep Dive - assets.openshift.com · Large-scale data access Multi-user Jupyter Used for data science and research Monitoring and alerting toolkit Used to diagnose](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed188300ebe3b04cd2fd94a/html5/thumbnails/7.jpg)
Open Data Hub Project - OpenDataHub.io● Community meta-operator that integrates best open source AI/ML/Data projects● Blueprint architecture for end to end AI/ML on OpenShift ● Used as Red Hat’s internal data science and AI platform● Open Data Hub Architecture: https://opendatahub.io/docs/architecture.html
Data Acquisition & Preparation
ML Model Selection, Training, Testing
ML Model Deployment in App. Dev. Process
![Page 8: Open Data Hub - A Deep Dive - assets.openshift.com · Large-scale data access Multi-user Jupyter Used for data science and research Monitoring and alerting toolkit Used to diagnose](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed188300ebe3b04cd2fd94a/html5/thumbnails/8.jpg)
Upstream/Community Projects in the AI/ML Space
ML toolkit and lifecycle Kube
ML-as-a-service platform based on OpenShift, Ceph, Kafka, JupyterHub,
Spark,
Home for k8s community to share operators for various apps/tools
NVIDIA NGC GPU optimized
and curated
Tensorflow
Others
8
![Page 9: Open Data Hub - A Deep Dive - assets.openshift.com · Large-scale data access Multi-user Jupyter Used for data science and research Monitoring and alerting toolkit Used to diagnose](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed188300ebe3b04cd2fd94a/html5/thumbnails/9.jpg)
Open Data Hub v0.4 OperatorAvailable Now at OpenDataHub.io
● Unified analytics engine
● Large-scale data access
● Multi-user Jupyter● Used for data
science and research
● Monitoring and alerting toolkit
● Used to diagnose problems
● Analytics platform for all metrics
● Query, visualize and alert on metrics
● Deploying machine learning models as micro-services
● Full model lifecycle management
● Distributed Object Store● S3 Interface
● Distributed event streaming
● Pub/Sub Messaging
● Container-native workflow engine
● Declaratively deploy ML pipelines and models
9
![Page 10: Open Data Hub - A Deep Dive - assets.openshift.com · Large-scale data access Multi-user Jupyter Used for data science and research Monitoring and alerting toolkit Used to diagnose](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed188300ebe3b04cd2fd94a/html5/thumbnails/10.jpg)
Open Data Hub Blueprint
10
![Page 11: Open Data Hub - A Deep Dive - assets.openshift.com · Large-scale data access Multi-user Jupyter Used for data science and research Monitoring and alerting toolkit Used to diagnose](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed188300ebe3b04cd2fd94a/html5/thumbnails/11.jpg)
Pooling and Sharing Resources with JupyterHub on OpenShift
Jupyter Hub DATA SCIENTIST
DATA SCIENTIST
DATA SCIENTIST
DATA SCIENTIST DATA SCIENTIST DATA SCIENTIST
11
![Page 12: Open Data Hub - A Deep Dive - assets.openshift.com · Large-scale data access Multi-user Jupyter Used for data science and research Monitoring and alerting toolkit Used to diagnose](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed188300ebe3b04cd2fd94a/html5/thumbnails/12.jpg)
Demo
https://console-openshift-console.apps.cluster-raleigh-8a2a.raleigh-8a2a.open.redhat.com/operatorhub/ns/opendatahub-user1
12
![Page 13: Open Data Hub - A Deep Dive - assets.openshift.com · Large-scale data access Multi-user Jupyter Used for data science and research Monitoring and alerting toolkit Used to diagnose](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed188300ebe3b04cd2fd94a/html5/thumbnails/13.jpg)
OpenDataHub.io
Thank you
13