Infrastructure Agnostic Machine Learning Workload Deployment
Transcript of Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Abi Akogun Data Science Consultant (MavenCode)
Charles Adetiloye ML Platforms Engineer (MavenCode)
About MavenCodeMavenCode is an Artificial Intelligence Solutions company located in Dallas, Texas - We do training, product development, and consulting services in the following areas:
● Provisioning Scalable Data Processing Pipelines on Cloud Infrastructure
● Development & Deployment of Machine Learning and Artificial Intelligence Platforms
● Streaming and Big Data Analytics Edge-IoT and Sensors
About The Presenters
Charles Adetiloye is an ML Platforms Engineer
at MavenCode. He has well over 15 years of
experience building large-scale, distributed
applications. He has extensive experience
working and consulting with several companies
implementing production grade ML and AI
platforms
twitter.com/cadetiloye
Abiodun Akogun is a Machine Learning and Data
Science Consultant at Mavencode. He has extensive
experience building and deploying large-scale Machine
Learning Applications in different industries that
include Healthcare, Finance, Telecommunications, and
Insurance. He has experience solving several business
problems using Data Analytics, Sentiment Analysis,
Topic Modelling, Named Entity Recognition(N.E.R),
Opinion Mining, Data Mining, Time Series, Spatial
Statistics and Marketing Analytics
twitter.com/akogz
Agenda
▪ Overview of Machine Learning Model Deployment Workflow
▪ Various Approaches to model training, management, and serving in the Cloud
▪ Deploying Machine Learning Workloads in the Cloud
▪ Implementing Feature Storage backend for ML model training
▪ Running Spark Workloads for ML training on Kubernetes with Kubeflow
Overview of Machine Learning Deployment Workflow
Data Sourcing
Pre Processing
Feature Engineering
Model Training /
Evaluation
Model Scoring /Management
Model Inferencing
Machine Learning Workload Deployment
Data Sourcing
Pre Processing
Feature Engineering
Model Training /
Evaluation
Model Scoring /Management
Model Inferencing
Google Cloud AWS Azure On Prem
Machine Learning Deployment Effort
Data Verification
Configuration
FeatureExtraction
Data ValidationMachine Resource
Management
Serving Infrastructure Monitoring
Analysis Tool
Machine Learning Code
Data Preparation +Storage
Efficient Compute Resource Management
Overview of Machine Learning Deployment Workflow
Data Sourcing
Pre Processing
Feature Engineering
Model Training /
Evaluation
Model Scoring /Management
Model Inferencing
32%
10%
36%
2% 4%
16%
A Typical Machine Learning Developer Workflow
Data Sourcing
Pre Processing
Feature Engineering
Model Training /
Evaluation
Model Scoring
/Management
Model Inferencing
Azure Storage
Google Storage
AWS S3 Storage
Raw Data Transformation Processed Data
Storage Compute1 2
Google Cloud AI AWS Sage Maker Azure ML
Data Scientist / ML Engineers works on pulling or processing data first before starting ML training on a Managed Cloud Service
Raw Data Processing and Transformation Pipeline
Cloud Training Platforms
What Enterprise Machine Learning Workflow In the Cloud Looks Like!
Data Sourcing
Pre Processing
Feature Engineering
Azure Storage
Google Storage
AWS S3 Storage
Raw Data Transformation Processed Data
Storage Compute1 2
Team A
Team B
Team C
Team D
Google Cloud AI
AWS SageMaker
AWS SageMaker
Azure ML
Running ML workflow across the enterprise with multiple teams using different Cloud Provider technology stacks
Implementing Machine Learning solutions in the cloud comes at a cost, with cost of Compute and Storage on top of the list.
If we plan to be Cloud Neutral, can we abstract our ● Machine Learning Compute Workload→Kubernetes?● Machine Storage → Feature Store?
Google Cloud AI AWS Sage Maker Azure ML
A Typical Machine Learning Developer Workflow
Data Sourcing
Pre Processing
Feature Engineering
Model Training /
Evaluation
Model Scoring /Management
Model Inferencing
Azure Storage
Google Storage
AWS S3 Storage
Data Source Transformation Processed Data
Storage Compute1 2
Towards A Cloud Neutral ML Deployment Environment
Data Sourcing Pre ProcessingFeature Engineering
Model Training / Evaluation
Model Scoring /Management
Model Inferencing
Storage Compute1 2
Feature Store
Kubernetes
Why the need for Cloud Agnostic Deployment Infrastructure?
● Makes it easier to migrate workloads in a Hybrid Cloud Environment
● We are not tied to particular Cloud Infrastructure technology stack
● It’s easier to Implement best practice patterns and solutions
● Your team will have a common base denominator for all Enterprise ML workload
● Easy to control cost, manage utilization and forecast demand
Cloud Agnostic Machine Learning Development
Data Sourcing Pre ProcessingFeature Engineering
Model Training / Evaluation
Model Scoring /Management
Model Inferencing
Storage Compute1 2
Feature Store
Kubernetes
Azure StorageGoogle StorageAWS S3 Storage
What’s Feature Store All about?A Feature is a measurable observable attribute that is part of the input to a
Machine Learning Model.
Model Training
X1
X2
X3
Xn
[Feature Vector]
Model
What’s Feature Store All about?
Model Training
X1
X2
X3
Xn
[Feature Vector]
Model
Model 1
Features are derived from
● Raw Datastore
● Streaming Datasource
● Aggregates of Raw Inputs
● Windows (mins, hourly, daily, weekly)
Features Change Over time!
Model Training
X1
X2
X3
Xn
X1
X2
X3
Xn
X1
X2
X3
Xn
Time
Machine Learning Feature Store● Makes it easy to operationalize our ML workload, most importantly Data
Management and Storage for Model training
● Features can be shared easily amon teams running different Model
training pipelines
● We can get to version of datasets and track changes easily
● Consistency in Feature input attributes between Model Training and
Serving
● Offline Feature Store → Batching Training
● Online Feature Store → Inferencing / Serving
Types Of Feature Store
Implementing Offline Feature Storage with Apache Hudi
Azure Storage
Google StorageAWS S3 Storage
Streaming Source
Batch Job Operations
Datasource with Streaming sources like MQTT, Kafka, Pubsub etc
Batch Operations on Databases, FileStorage, Distributed Storage etc
Feature Store
Workflow Scheduling Orchestration with Kubeflow Pipelines or Airflow Dags on Kubernetes
Feature Store Implementation on any of the Major Cloud Storage
● A need for a Unified Platform where new data can be made available in addition to historical data within minutes.
● The need for a quick computation (or derivation ) of Feature vectors in other to make them available for our model input.
● Incremental Versioning of our Feature collections so that we can time-travel and use a particular set of features for Model training.
● Our Hudi dataset can be stored in Azure, Google Cloud, AWS cloud storage layer.
● Easy to implement all our code and everything we need to do with Spark and PySpark
Why did we use Apache Hudi?
Getting Data into Hudi Feature Store with Kubeflow Pipelineimport kfpfrom kfp import components
KafkaDatastreamer_op = kfp.components.create_component_from_func(KafkaDatastreamer,base_image="python:3.7.1”)
ValidatorOnSchema_op = kfp.components.create_component_from_func(ValidatorOnSchema,base_image="python:3.7.1")
PreProcessor_op = kfp.components.create_component_from_func(PreProcessor,base_image="python:3.7.1")
HudiTableWriter_op= kfp.components.create_component_from_func(HudiTableWriter, base_image="mavencode.io/spark:v3.1.1")
The Hudi Data Store writer
Configure the Spark Session with the packages needed to run hudi and avro
Hudi configuration Options
Writing the data into our Hudi data store in the right format
Cloud Agnostic Machine Learning Development
Data Sourcing Pre ProcessingFeature Engineering
Model Training / Evaluation
Model Scoring /Management
Model Inferencing
Storage Compute1 2
Feature Store
Kubernetes
Cloud Native ML Workload Deployment with Operators on Kubeflow
Cloud Native ML Training Deployment
● Containerized Workload
● Scalable + Can Run in Distributed Mode
● Efficient Compute Utilization
● Language Agnostic!
Machine Learning Operators with Kubeflow onKubernetes
● An Machine Learning Operator helps the deployment monitoring and management a model training life-cycle
● Some ML Operators found in Kubeflow are:○ TF-operator → Tensorflow Job○ Pytorch-operator → Pytorch Job○ Xgboost-operator → Xgboost Job○ Spark-operator → Spark and Spark ML Jobs
Cloud Agnostic Machine Learning Development
MLOps Model Training and Deployment Platform
Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook
Namespace Namespace Namespace Namespace
Auto-Scalable CPU Node Pool Auto-Scalable GPU Node Pool
Spark Operator Spark Operator TensorFlow Operator Tensorflow Operator
Cloud Infrastructure Layer Running
Auto Scaling Node Pools Running Kubernetes
Machine Learning Operators running with Kubeflow
Feature Store
Using Spark Operator for Training ML Steps
PySpark ML Code
Containerizethe Python
Code
Create SparkApplication Kubernetes YAML
Deployment
Apply Deployment to
Kubernetes
Spark Operator on Kubernetes
API
Scheduler
OR OR OR
Spark Driver
Executors
Elastic Compute Resource ML Jobs
API
Scheduler
OR OR OR
kubectl apply -f ...
Deployment Configuration YAML
Spark Application Config that describes the job and the namespace where the job will run
Container that will run our Spark ML Code
Spark Drive and Executor Configuration
Connecting to Feature Store with Kubeflow Pipeline
Cost comparison with Managed Cloud service on AWS
30%
100%
15s
66s
Compute Utilization Cost Compute Startup Uptime Team Agility & Productivity
6x Productivity
Managed Services Running on AWS
Kubeflow + S3 Feast Storage ML workload
Summary● Implementing a Cloud neutral ML deployment approach
simplifies most of the complexities in a Multi-Cloud
environment
● After the initial hump, learning curve and the overall
team efficiency improves significantly
● Teams is not locked in to a particular Cloud
Infrastructure stack
● Easy to control cost and forecast future capacity
demands
THANK YOU!
Thank You!
If you are interested in learning more about how to run your Machine Learning Workloads on any Cloud Infrastructure or Onprem reach out to us
Drop us a mail [email protected]
Visit Us Onlinehttps://www.mavencode.com
Follow Ushttps://www.twitter.com/mavencode