Machine Learning Lifecycle - boss-workshop.github.io

58
Simplifying the Machine Learning Lifecycle

Transcript of Machine Learning Lifecycle - boss-workshop.github.io

Page 1: Machine Learning Lifecycle - boss-workshop.github.io

Simplifying the Machine Learning Lifecycle

Page 2: Machine Learning Lifecycle - boss-workshop.github.io

Agenda/ Broad Adoption of ML … and its issues

/ The need for standardization

/ ML development challenges

/ How MLflow tackles these

Page 3: Machine Learning Lifecycle - boss-workshop.github.io

Login to Databricks Community Edition

• Sign up for Databricks Community Edition for free • We will use this for the tutorial• Once you sign up, you can continue to use it to learn and

experiment on a dedicated data sciences engineering environment

https://databricks.com/try

Page 4: Machine Learning Lifecycle - boss-workshop.github.io

Go to databricks.com/try

Page 5: Machine Learning Lifecycle - boss-workshop.github.io

Sign up for Community Edition

Page 6: Machine Learning Lifecycle - boss-workshop.github.io

Sign up for Community Edition

Page 7: Machine Learning Lifecycle - boss-workshop.github.io

Log into DBCE

Page 8: Machine Learning Lifecycle - boss-workshop.github.io

Create a Cluster on DBCE

Page 9: Machine Learning Lifecycle - boss-workshop.github.io

Create a Cluster on DBCE

Page 10: Machine Learning Lifecycle - boss-workshop.github.io

Create a Cluster on DBCE

Page 11: Machine Learning Lifecycle - boss-workshop.github.io

Create a Cluster on DBCE

Page 12: Machine Learning Lifecycle - boss-workshop.github.io

Attach a Notebook to your Cluster

Page 13: Machine Learning Lifecycle - boss-workshop.github.io

Attach a Notebook to your Cluster

Page 14: Machine Learning Lifecycle - boss-workshop.github.io

Attach a Notebook to your Cluster

Page 15: Machine Learning Lifecycle - boss-workshop.github.io

Attach a Notebook to your Cluster

Page 16: Machine Learning Lifecycle - boss-workshop.github.io

Broad Adoption of ML

and many many more customers in different industries and segments

Internet of ThingsDigital PersonalizationHealthcare and Genomics Fraud Prevention

Huge disruptive innovations are affecting most enterprises on the planet

Page 17: Machine Learning Lifecycle - boss-workshop.github.io

MLCode

ConfigurationData Collection

Data Verification

Feature Extraction

Machine Resource

Management

Analysis Tools

ProcessManagement Tools

ServingInfrastructure

Monitoring

“Hidden Technical Debt in Machine Learning Systems,” Google NIPS 2015

Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small green box in the middle. The required surrounding infrastructure is vast and complex.

Hardest Part of ML isn’t ML, it’s Data

Page 18: Machine Learning Lifecycle - boss-workshop.github.io

Data & ML Tech and People are in Silos

DATA ENGINEERS

xDATA SCIENTISTS

Page 19: Machine Learning Lifecycle - boss-workshop.github.io

ML Lifecycle is Manual, Inconsistent and Disconnected

● Ad hoc approach to track experiments

● Very hard to reproduce experiments

Prep Data● Multiple tightly coupled

deployment options ● Different monitoring approach

for each framework

Build Model Deploy Model● Low level integrations for

Data and ML● Difficult to track data used

for a model

Page 20: Machine Learning Lifecycle - boss-workshop.github.io

The need for standardization

Page 21: Machine Learning Lifecycle - boss-workshop.github.io

Day in the life of a data scientist (tracking edition)

Elasticnet model (alpha=0.01, l1_ratio=1.0): RMSE: ?? MAE: 51.051828604086325 R2: 0.3951809598912357

Elasticnet model (alpha=?, l1_ratio=0.75): RMSE: 65.28994906390733 MAE: 53.759148284349266 R2: ??

Elasticnet model (alpha=0.01, l1_ratio=?): RMSE: 71.40362571026475 MAE: ?? R2: 0.2291130640003659

Page 22: Machine Learning Lifecycle - boss-workshop.github.io

Day in the life of a data scientist (tracking edition)

Elasticnet model (alpha=0.01, l1_ratio=1.0): RMSE: ?? MAE: 51.051828604086325 R2: 0.3951809598912357

Elasticnet model (alpha=?, l1_ratio=0.75): RMSE: 65.28994906390733 MAE: 53.759148284349266 R2: ??

Elasticnet model (alpha=0.01, l1_ratio=?): RMSE: 71.40362571026475 MAE: ?? R2: 0.2291130640003659

Did anything change in the feature engineering?

Page 23: Machine Learning Lifecycle - boss-workshop.github.io

Day in the life of a data scientist (tracking edition)

Elasticnet model (alpha=0.01, l1_ratio=1.0): RMSE: ?? MAE: 51.051828604086325 R2: 0.3951809598912357

Elasticnet model (alpha=?, l1_ratio=0.75): RMSE: 65.28994906390733 MAE: 53.759148284349266 R2: ??

Elasticnet model (alpha=0.01, l1_ratio=?): RMSE: 71.40362571026475 MAE: ?? R2: 0.2291130640003659

How did the hyperparameters change?

Page 24: Machine Learning Lifecycle - boss-workshop.github.io

Day in the life of a data scientist (tracking edition)

Elasticnet model (alpha=0.01, l1_ratio=1.0): RMSE: ?? MAE: 51.051828604086325 R2: 0.3951809598912357

Elasticnet model (alpha=?, l1_ratio=0.75): RMSE: 65.28994906390733 MAE: 53.759148284349266 R2: ??

Elasticnet model (alpha=0.01, l1_ratio=?): RMSE: 71.40362571026475 MAE: ?? R2: 0.2291130640003659

What data was this model trained on?

Page 25: Machine Learning Lifecycle - boss-workshop.github.io

Day in the life of a data scientist (tracking edition)

Elasticnet model (alpha=0.01, l1_ratio=1.0): RMSE: ?? MAE: 51.051828604086325 R2: 0.3951809598912357

Elasticnet model (alpha=?, l1_ratio=0.75): RMSE: 65.28994906390733 MAE: 53.759148284349266 R2: ??

Elasticnet model (alpha=0.01, l1_ratio=?): RMSE: 71.40362571026475 MAE: ?? R2: 0.2291130640003659

How did the offline metrics change?

Page 26: Machine Learning Lifecycle - boss-workshop.github.io

Day in the life of a data scientist (tracking edition)

Elasticnet model (alpha=0.01, l1_ratio=1.0): RMSE: ?? MAE: 51.051828604086325 R2: 0.3951809598912357

Elasticnet model (alpha=?, l1_ratio=0.75): RMSE: 65.28994906390733 MAE: 53.759148284349266 R2: ??

Elasticnet model (alpha=0.01, l1_ratio=?): RMSE: 71.40362571026475 MAE: ?? R2: 0.2291130640003659

What else am I missing?

Page 27: Machine Learning Lifecycle - boss-workshop.github.io

The difference between releasing Software and deploying ML Models

Page 28: Machine Learning Lifecycle - boss-workshop.github.io

Write code

Software

Write unit tests

Send for review

Get approvals

Commit

Release testing

Release

Page 29: Machine Learning Lifecycle - boss-workshop.github.io

Write code

Software ML Models

Write unit tests

Send for review

Get approvals

Commit

Release testing

Release

Analyze data

Put data into the right format

Write model code

Train and evaluate model

Experiment with params, model structure

Deploy … by email?

Monitor performance and trigger retraining

Page 30: Machine Learning Lifecycle - boss-workshop.github.io

Write code

Software ML Models

Write unit tests

Send for review

Get approvals

Commit

Release testing

Release

Analyze data

Put data into the right format

Write model code

Train and evaluate model

Experiment with params, model structure

Deploy … by email?

Monitor performance and trigger retraining

Meet a functional specification

Optimize a metric, e.g. CTR

Goal

Page 31: Machine Learning Lifecycle - boss-workshop.github.io

Write code

Software ML Models

Write unit tests

Send for review

Get approvals

Commit

Release testing

Release

Analyze data

Put data into the right format

Write model code

Train and evaluate model

Experiment with params, model structure

Deploy … by email?

Monitor performance and trigger retraining

Meet a functional specification

Optimize a metric, e.g. CTR

Goal

Depends on code Depends on data, code, model, params,

...

Quality

Page 32: Machine Learning Lifecycle - boss-workshop.github.io

Write code

Software ML Models

Write unit tests

Send for review

Get approvals

Commit

Release testing

Release

Analyze data

Put data into the right format

Write model code

Train and evaluate model

Experiment with params, model structure

Deploy … by email?

Monitor performance and trigger retraining

Meet a functional specification

Optimize a metric, e.g. CTR

Goal

Depends on code Depends on data, code, model, params,

...

Quality

Typically one software stack

Combination of many libraries, tools,

...

Tools

Page 33: Machine Learning Lifecycle - boss-workshop.github.io

Write code

Software ML Models

Write unit tests

Send for review

Get approvals

Commit

Release testing

Release

Analyze data

Put data into the right format

Write model code

Train and evaluate model

Experiment with params, model structure

Deploy … by email?

Monitor performance and trigger retraining

Meet a functional specification

Optimize a metric, e.g. CTR

Goal

Depends on code Depends on data, code, model, params,

...

Quality

Typically one software stack

Combination of many libraries, tools,

...

Tools

Works deterministically

Keeps changing with data, etc.

Outcome

Page 34: Machine Learning Lifecycle - boss-workshop.github.io

In summary, deploying ML Models is hard!

Page 35: Machine Learning Lifecycle - boss-workshop.github.io

ML Lifecycle and Challenges

Delta

Tuning Model Mgmt

Raw Data ETL TrainFeaturize Score/ServeBatch + Realtime

Monitor Alert, Debug

Deploy

AutoML, Hyper-p. search

Experiment Tracking

Remote Cloud Execution

Project Mgmt(scale teams)

Model Exchange

DataDrift

ModelDrift

Orchestration (Airflow, Jobs)

A/BTesting

CI/CD/Jenkins push to prod

Feature Repository

Lifecycle mgmt.

RetrainUpdate FeaturesProduction Logs

Zoo of Ecosystem Frameworks

Collaboration Scale Governance

An open source platform for the machine learning lifecycle

Page 36: Machine Learning Lifecycle - boss-workshop.github.io

Introducing MLflowUnveiled in June 2018, MLflow is the only open source framework designed to manage the complete Machine Learning Lifecycle.

ModelRegistry

Page 37: Machine Learning Lifecycle - boss-workshop.github.io

Introducing MLflowUnveiled in June 2018, MLflow is the only open source framework designed to manage the complete Machine Learning Lifecycle.

Page 38: Machine Learning Lifecycle - boss-workshop.github.io

140120100806040200

0 5 10 15 20 25 30 35 40 45

Months since Project Launch

# of

Con

trib

utor

s

Page 39: Machine Learning Lifecycle - boss-workshop.github.io

Components

Tracking

Record and queryexperiments: code,

data, config, results

Projects

Packaging formatfor reproducible

runs on any platform

Models

General format that standardizes

deployment paths

Model Registry

Centralized and collaborative

model lifecycle management

mlflow.org github.com/mlflow twitter.com/MLflow

Page 40: Machine Learning Lifecycle - boss-workshop.github.io

Components

Tracking

Record and queryexperiments: code,

data, config, results

Projects

Packaging formatfor reproducible

runs on any platform

Models

General format that standardizes

deployment paths

Model Registry

Centralized and collaborative

model lifecycle management

new

mlflow.org github.com/mlflow twitter.com/MLflow

Page 41: Machine Learning Lifecycle - boss-workshop.github.io

Tracking

Notebooks

Local Apps

Cloud Jobs

UI

API

Tracking Server

Parameters Metrics Artifacts

ModelsMetadata Spark Data Source

Page 42: Machine Learning Lifecycle - boss-workshop.github.io

Key Concepts in Tracking

Parameters: key-value inputs to your codeMetrics: numeric values (can update over time)Artifacts: arbitrary files, including modelsSource: what code ran?

Page 43: Machine Learning Lifecycle - boss-workshop.github.io

# Scikit Learn Linear Regression via ElasticNet lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42) lr.fit(train_x, train_y)

# Predict predicted_qualities = lr.predict(test_x)

# Evaluate Metrics (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

Page 44: Machine Learning Lifecycle - boss-workshop.github.io

with mlflow.start_run() as run:

# Scikit Learn Linear Regression via ElasticNet lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42) lr.fit(train_x, train_y)

# Predict predicted_qualities = lr.predict(test_x)

# Evaluate Metrics (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

# Log mlflow.log_param("alpha", alpha) ...

Page 45: Machine Learning Lifecycle - boss-workshop.github.io

GitHub Demohttps://github.com/dennyglee/mlflow-diabetes-example

Page 46: Machine Learning Lifecycle - boss-workshop.github.io

Comparing Runs Contour Plot

Page 47: Machine Learning Lifecycle - boss-workshop.github.io

Projects

Project Spec

Code MetadataConfig

Local Execution

Remote Execution

Page 48: Machine Learning Lifecycle - boss-workshop.github.io

Example MLflow Projectmy_project/├── MLproject│ │ │ │ │├── conda.yaml├── main.py└── model.py ...

conda_env: conda.yaml

entry_points: main: parameters: training_data: path lambda: {type: float, default: 0.1} command: python main.py {training_data} {lambda}

$ mlflow run git://<my_project>

mlflow.run(“git://<my_project>”, ...)

Page 49: Machine Learning Lifecycle - boss-workshop.github.io

Model Format

Flavor 2Flavor 1

Simple model flavors usable by many tools

Containers

Batch & Stream Scoring

Cloud Inference Services

In-Line Code

Models

Page 50: Machine Learning Lifecycle - boss-workshop.github.io

Example MLflow Modelmy_model/├── MLmodel│ │ │ │ │└── estimator/ ├── saved_model.pb └── variables/ ...

Usable by tools that understandTensorFlow model format

Usable by any tool that can runPython (Docker, Spark, etc!)

run_id: 769915006efd4c4bbd662461time_created: 2018-06-28T12:34flavors: tensorflow: saved_model_dir: estimator signature_def_key: predict python_function: loader_module: mlflow.tensorflow

Page 51: Machine Learning Lifecycle - boss-workshop.github.io

Automated Jobs

REST Serving

Downstream Users

Reviewers + CI/CD Tools

Model Registry

Experimental Staging A/B Tests Production

Model RegistryData Scientists Deployment Engineers

Page 52: Machine Learning Lifecycle - boss-workshop.github.io

Model Registry: Benefits

One Collaborative Hub

● Central Model Repository

● Overview of versions in Staging/Production/etc.

● Search/filter/pagination1

2

3

3

1

2

3

Page 53: Machine Learning Lifecycle - boss-workshop.github.io

Model Registry: Benefits

2

1

Management of the entire ML Lifecycle (MLOps)

● Overview of active model versions and their deployment stage

● Request/Approval workflow for transitioning deployment stages

1

2

Page 54: Machine Learning Lifecycle - boss-workshop.github.io

1

Visibility

● Full activity log of stage transition requests, approvals, etc.

1

Model Registry: Benefits

Page 55: Machine Learning Lifecycle - boss-workshop.github.io

1.a

1.b

1.c

Governance and Auditability

● Full provenance from Model marked production in the Registry to …

●○ Run that produced the model○ Notebook that produced the

run○ Exact revision history of the

notebook that produced the run

1.a

1.b

1.c

Model Registry: Benefits

Page 56: Machine Learning Lifecycle - boss-workshop.github.io

Notebook Demohttps://github.com/dennyglee/tech-talks/blob/master/sa

mples/MLflow%20Diabetes%20Example%20(with%20MLflow%20Registry).ipynb

Page 57: Machine Learning Lifecycle - boss-workshop.github.io

Towards more principled Data Science and ML

: An Open Source ML Platform

mlflow.org github.com/mlflow twitter.com/MLflow

Page 58: Machine Learning Lifecycle - boss-workshop.github.io

Hands-on Workshop

bit.ly/mlflow-boss-2020