Presentación de PowerPoint - AI & Big Data Congress
Transcript of Presentación de PowerPoint - AI & Big Data Congress
![Page 1: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/1.jpg)
portada
![Page 2: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/2.jpg)
WORKSHOP 2MODELOS DE MACHINE LEARNING EN PRODUCCIÓN:
DIFERENTES ENFOQUES Y EL ROL DE LAS HERRAMIENTAS DE GESTIÓN DEL CICLO DE VIDA ML
![Page 3: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/3.jpg)
Rohit KumarResearcher, Data Science and Big Data Analytics UnitEurecateurecat.org | @Eurecat_News | @Eurecat_Events
![Page 4: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/4.jpg)
BIG DATA & AI Congress 2019
MACHINE LEARNING MODELS IN PRODUCTION
Dr. Rohit Kumar
Researcher, Eurecat
![Page 5: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/5.jpg)
5BigData & AI Congress 2019
Who is this guy!!
Physicist Java Developer/System Architect (6+ years) Big Data Researcher
(5+ years)
I am not a ML expert or a Data Scientist!!I am just an engineer who can follow complex maths…
![Page 6: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/6.jpg)
6BigData & AI Congress 2019
Quick Poll?
How many Data scientist?
How many Data engineer?
How many DevOps?
How many managers/architects?
Rockstars who does everything?
![Page 7: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/7.jpg)
7BigData & AI Congress 2019
Quick Feedback: How many have this issue?
![Page 8: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/8.jpg)
8BigData & AI Congress 2019
What I am going to talk about!!
Data wrangling Training Prediction
ML Life Cycle!! Not Data Science Life Cycle..
Acknowledgment: https://www.kdnuggets.com/2019/06/approaches-deploying-machine-learning-production.html by Julien Kervizic
![Page 9: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/9.jpg)
9BigData & AI Congress 2019
Training
1. One Off training
Where do you save your notebook/code?Where and how do you save your models?
![Page 10: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/10.jpg)
10BigData & AI Congress 2019
Training
2. Batch training (continuous learning system)
Not simply retraining the model weights !!
Train Predict
But also feature processing, feature selection, model selections and parameter optimization.
![Page 11: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/11.jpg)
11BigData & AI Congress 2019
AutoML
AutoWEKA is an approach for the simultaneous selection of a machine learning algorithm and its hyperparameters; combined withthe WEKA package it automatically yields good models for a wide variety of data sets.
TPOT is a data-science assistant which optimizes machine learningpipelines using genetic programming. (Good for Python users)
H2O AutoML provides automated model selection and ensembling forthe H2O machine learning and data analytics platform. (Nice for Java users)
TransmogrifAI is an AutoML library running on top of Spark.
![Page 12: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/12.jpg)
12BigData & AI Congress 2019
To manage different data workflows
To automate Machine Learning
![Page 13: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/13.jpg)
13BigData & AI Congress 2019
![Page 14: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/14.jpg)
14BigData & AI Congress 2019
Azure ML
Offcourse there are easy to use fancy platforms !! (if you have money and do not want to get your hands too dirty !!)
![Page 15: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/15.jpg)
15BigData & AI Congress 2019
Training
3. Real time trainingTrain
..
Incremental Training using “Online Machine Learning Algorithms”
![Page 16: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/16.jpg)
16BigData & AI Congress 2019
Quick Poll: What kind of ML training approach you have used?
A. One Off training.
B. Batch training adhoc
C. Batch training using some platform.
D. Real time training
![Page 17: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/17.jpg)
17BigData & AI Congress 2019
Saving your Models
In Python world:1) Pickle: 2) Joblib: significantly faster on large numpy arrays
In R world:1) saveRDS/readRDS – can be assigned to a new model variable2) save/load – returns the same model variable
In Java (H20 world):1) Pojo : Plan old java object2) Mojo: Model ObJect, Optimized
![Page 18: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/18.jpg)
18BigData & AI Congress 2019
Saving your Models
“Change, which is the only constant in life, can be secured and facilitated bystandardization.”
![Page 19: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/19.jpg)
19BigData & AI Congress 2019
Saving your Models
ONNX (Open Neural Network Exchange) format: developed byFacebook and Microsoft in collaboration it is an open-source standard for representing deep learning models that will let you transfer them between CNTK, Caffe2 and PyTorch.
Link: https://github.com/onnx/tutorials
TensorFlow PyTorch
![Page 20: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/20.jpg)
20BigData & AI Congress 2019
Saving your Models
![Page 21: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/21.jpg)
21BigData & AI Congress 2019
![Page 22: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/22.jpg)
22BigData & AI Congress 2019
Saving your Models
PMML (Predictive model markup language): Developed by Data Mining Group (DMG) a consortium of commercial and open-source data mining companies and supported by IBM, AWS and Google.
PFA (Portable Format for Analytics): it is more flexible (using JSON instead of XML) and also handling data preparation along with the modelsthemselves.
![Page 23: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/23.jpg)
23BigData & AI Congress 2019
Saving your Models
MLWriter/MLReader: Inherente to Spark
Supports not only saving a Model but a ML Pipeline.
The result of save for pipeline model is a JSON file for metadata whileParquet for model data, e.g. coefficients.
![Page 24: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/24.jpg)
24BigData & AI Congress 2019
Is a model or Pipeline saved using Apache Spark ML persistence in Sparkversion X loadable by Spark version Y?• Major versions: No guarantees, but best-effort.
• Minor and patch versions: Yes; (backwards compatible)
• Note about the format: There are no guarantees for a stable persistence format, but model loading itself is designed to be backwards compatible.
Does a model or Pipeline in Spark version X behave identically in Spark versionY?
• Major versions: No guarantees, but best-effort.
• Minor and patch versions: Identical behavior, except for bug fixes.
![Page 25: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/25.jpg)
25BigData & AI Congress 2019
Quick Poll: Machine Learning Deployment
A. One time PPT/PDF/Report for Managers.
B. Deployed in Server for Batch Prediction.
C. Deployed in a streaming platform for real time prediction.
![Page 26: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/26.jpg)
26BigData & AI Congress 2019
Batch vs. Real-time Prediction
Load implications Infrastructure Implications
Cost Implications Evaluation Implication
![Page 27: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/27.jpg)
27BigData & AI Congress 2019
Batch Prediction
Data
Load Model
Featureextracted
SaveApply to the modelon data
Used by other systems
Use workflow managers like Airflow, or as simple as crontab etc.
![Page 28: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/28.jpg)
28BigData & AI Congress 2019
Real Time Prediction
1. Database Trigger based prediction
PostGres supprts native Python integration and can run Python script on trig
Ms SQL Server has Machine Learning Services (In-Database)
Teradata can run R/Python script through an external script command.
Oracle supports PMML model through its data mining extension.
![Page 29: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/29.jpg)
29BigData & AI Congress 2019
Real Time Prediction
2. Message queue based prediction service
A combination of Kafka for message queue and Spark Streaming forprocessing.
Azure eventhub with Azure functions
Google Pub/Sub with Apache Beam.
![Page 30: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/30.jpg)
30BigData & AI Congress 2019
Real Time Prediction
3. Web services based deployment
Get unique ids and let web service pull related data from backend system and make prediction.
Get the complete payload with data and make the prediction.
A combination of both.
![Page 31: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/31.jpg)
31BigData & AI Congress 2019
Real Time Prediction
3.1 Using Functions!!
+ ML Net
![Page 32: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/32.jpg)
32BigData & AI Congress 2019
![Page 33: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/33.jpg)
33BigData & AI Congress 2019
Real Time Prediction
3.2 Using Container
Ununiform environments across models. Ununiform library requirements across models. Ununiform resource requirements across models. Scaling at model level
![Page 34: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/34.jpg)
34BigData & AI Congress 2019
Real Time Prediction
3.3 Directly In-App
Google ML Kit or the likes of Caffe2or Apple’s core ML allows to leveragemodels within native applications
Tensorflow.js or ONNXJs allow running modelsdirectly on browser
![Page 35: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/35.jpg)
35BigData & AI Congress 2019
Real Time Prediction
3.4 Using Notebooks!!!!
![Page 36: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/36.jpg)
36BigData & AI Congress 2019
So Far So Good!!
![Page 37: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/37.jpg)
37BigData & AI Congress 2019
Why ML Life Cycle is more complicated than SDLC!
Variety of tools and packages at disposal.
Hard to track useful experiments.
Reproducibility is hard.
Deploying is hard.
![Page 38: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/38.jpg)
38BigData & AI Congress 2019
ML Life Cycle Management Platforms
Azure ML services (launched in 2015)
![Page 39: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/39.jpg)
39BigData & AI Congress 2019
ML Life Cycle Management Platforms
Amazon SageMaker (launched in 2017)
![Page 40: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/40.jpg)
40BigData & AI Congress 2019
ML Life Cycle Management Platforms
Google AI (launched in 2018)
![Page 41: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/41.jpg)
41BigData & AI Congress 2019
ML Life Cycle Management Platforms
Cloudera DS Workbench (launched in 2018)
![Page 42: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/42.jpg)
42BigData & AI Congress 2019
ML Life Cycle Management Platforms
Databricks Managed ML Flow (launched in 2019) : ML Flow in itself is a library and not a hosted platform.
![Page 43: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/43.jpg)
43BigData & AI Congress 2019
ML Life Cycle Management Platforms
![Page 44: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/44.jpg)
44BigData & AI Congress 2019
![Page 45: Presentación de PowerPoint - AI & Big Data Congress](https://reader031.fdocuments.in/reader031/viewer/2022012020/616893c8d394e9041f70c61f/html5/thumbnails/45.jpg)
portada