Productionizing your Machine Learning Models

Jon Peck

Making state-of-the-art algorithms

discoverable and accessible to everyone

Fullstack Developer & Advocate

jpeck@algorithmia.com

@peckjon

bit.ly/AI-DW-19

The Problem: ML is in a huge growth phase, difficult/expensive for DevOps to keep up

Initially:

● A few models, a couple frameworks, 1-2 languages

● Dedicated hardware or VM Hosting

● IT Team or self-managed DevOps

● High time-to-deploy, manual discoverability

● Few end-users, heterogenous APIs (if any)

Pretty soon... ● > 9,500 algorithms (95k versions) on many runtimes / frameworks

● > 100k algorithm developers: heterogenous, largely unpredictable

● Each algorithm: 1 to 1,000 calls/second, a lot of variance

● Need auto-deploy, discoverability, low (10-15ms) latency

● Common API, composability, fine-grained security

Challenges of deploying models in the enterprise

Machine learning

● CPU / GPU / Specialized hardware

● Multiple frameworks, languages,

dependencies

● Called from different devices &

architectures

“Snowflake” environments

● Unique cloud hardware and services

● DevOps teams not used to the specific

considerations of ML hosting

Security and Audit

● Stringent security and access controls

● “Who called what when” for audit & compliance

Uncharted territory

● Deployment is a new problem for datascience

teams; not a lot of literature / examples

● Redundant work across teams, lack of re-use

● New experience buying & managing

infrastructure or working w/ DevOps team

● How to handle chargebacks and billing

"Expecting your engineering and DevOps teams to deploy ML models well is like showing up to Seaworld with a giraffe since they are already handling large mammals.”

MACHINE LEARNING !=

PRODUCTION MACHINE LEARNING

Training vs Production

Data Scientists build and iterate over a model until it is ready to move to production

DevOps manages servers, task scheduling, etc to support execution of concurrent models

INFERENCE

Short compute bursts

TRAINING

Long compute cycle

Fixed load (Inelastic)

Stateful

Many users Single user

Stateless

Elastic

Users and Services run models ad-hoc (need: elasticity), and rarely from the same language they’re developed in (need: APIs)

Training vs Production

Deploying Models: raw server or cloud VM

1. Set up server ○ Select proper balance of CPU, GPU, memory, cost

○ Laborious to configure first time, but fairly easy to replicate

○ Expensive for higher-powered machines (especially GPUs)

2. Create microservice ○ Write API wrapper (e.g., Flask)

○ Will be usable from any language, environment

○ How to secure, meter, disseminate?

3. Add scaling ○ Cloud VMs can scale by adding more copies

(usu billed per machine-hour)

○ Write/config automation to predict load & create VMs

4. Repeat for each unique environment ○ Separate server for each model?

○ Or deal with dependency & resource conflicts?

Flask source: Jeff Klukas 7

Deploying Models: serverless functions

● Initially, this looks great ○ Simple setup: just fill out a function body

○ Automatic API wrappers or configurable API gateway

○ No DevOps: maintenance handled by provider

○ Instant, elastic scaling (big cost savings)

○ Cheap: usu billed per-second, and free when not in use

● But there are some significant limitations ○ Not optimized for ML

○ Languages: Node & some Python, Java, C#

○ Limited dependency support

○ No GPUs!

○ Max execution time: 5-15 minute

○ Little/no consumer-facing UI

What should a mature solution have?

● Broad lang & lib support: any language & dependencies

● GPU support: fast exec & memory for GPU models

● Elasticity & concurrency: instantly scale up/down with demand; many copies of different models

● Automatic API: datascientists not responsible for serializing JSON or managing server frameworks

● Pipelining: common API across models, data passing

● Built-in security: auth, process isolation, user data

● Long timeouts: predictions may take ms or an hour

● Versioning and Grouping: public / private / group visibility of models, all old versions executable

(no broken services)

● Portability: run in-house or on any cloud(s)

● Discoverability / model-management UI: find & share well-described models, “run an example”,

cut-and-paste API code in every language

Building it: start with containers, add scaling / replication

Web Load Balancer

API Load Balancer

Web Servers

API Servers

Cloud Region #1

Worker xN

Docker(algorithm#1)

Docker(algorithm#n)

Cloud Region #2

Worker xN

Docker(algorithm#1)

Docker(algorithm#n)

● ML models as serverless microservices: allows isolation, promotes model re-use and modularity ● Ability to replicate containers and move between regions allows for scaling, portability, low-latency

Design containers to support all languages, flexible enough to add any library

FoodClassifier

FruitClassifier VeggieClassifier

...don’t forget to make GPU versions, too 11

Make it easy for datascientists to add new models

● Continuous Deployment speeds production: GIT code management, develop locally or Web IDE ● User and group namespaces, private / public / group visibility, pricing & dept chargebacks

Add pipelining and intelligent orchestration

Known:

1. Typical execution path

2. Compute & memory per algo

Optimize for:

1. Minimum network latency

2. Maximum throughput

3. Minimum resource use

‣ CPU ‣ Memory

‣ GPU ‣ I/O

‣ CPU ‣ Memory

‣ GPU ‣ I/O

‣ CPU ‣ Memory

‣ GPU ‣ I/O

cat foo.txt | keyword.sh | ranker.sh

● Semantic versioning for models, just like with any other software (1.2.x)

● All versions of model are runnable at any time

● Compare versions of the model, to verify and see changes in performance (speed,

accuracy), and manage model drift

● App Devs can stay a version behind, or use different versions for different contexts

● Rolling, non-interruptive deployments: model improvements that don’t break existing code

Support standardized versioning

Key production metrics:

● Latency

● Resources used (CPU/GPU, I/O)

● System Capacity

● Scale up and Scale down

● Authentication

● API timing metrics and calls

● Errors rates

But also:

● What teams are using the models

● What applications are using them

● Billing & chargebacks

● Understand if AI investments are paying off

● See business impact across organization

Provide logging and analytics

Compute EC2 CE VM ESX

Autoscaling Autoscaling Group Autoscaler Scale Set Orchestrator /

Load Balancing Elastic Load

Balancer Load Balancer Load Balancer NSX / BYO

DataBase RDS Cloud SQL Azure SQL DB BYO

Object Storage S3 Cloud Storage Azure Blobs BYO

Block Storage Elastic Block Store Persistent Disk Azure Disks VMFS

Build abstraction layers for all infrastructure providers

Expose user-friendly storage abstraction

# No storage abstraction

s3 = boto3.client("s3")

obj = s3.get_object(Bucket="bucket-name", Key="records.csv")

data = obj["Body"].read()

# With storage abstraction

data = client.file("blob://records.csv").get()

s3://foo/bar

blob://foo/bar

hdfs://foo/bar

dropbox://foo/bar

Build a model portfolio UI for easy discovery & testing

● Models are only as useful as their docs: creators write descriptions which live with the model ● Categories / tags / search for users to find the models they need (and see only the ones allowed) ● Test models right inside the catalog, before integrating into app code ● Encourage model re-use and improve efficiency across teams, while respecting access rights

Design a consistent API with clients in every language

● Models are often written in one lang but consumed in another (or many) ● Provide cut-and-paste code for any model / language combination ● ZERO time from model deployment to usability: drastically reduce the length of total dev pipeline

Make the public platform available to anyone, anywhere

ALGORITHMIA ENTERPRISE - your company’s private ML inventory & model-as-a-service platform

Deploy

Develop models

in any language,

framework, or

infrastructure

Expose models as

highly-reliable

versioned APIs that

autoscale to 100s

of reqs/second

Discover

Describe your

model in a central

catalog where

peers can easily

discover & use it

Monitor

House thousands of

models under one

roof with a uniform

REST interface and a

central dashboard

Make your platform deployable on any org private cloud

Try it yourself: deploy a model on Algorithmia

http://bit.ly/algodev -> digit_recognition

Looking for more?

Jon Peck Developer Advocate

FREE STUFF

$50 free at Algorithmia.com signup code: AI-DW-19

WE ARE HIRING

algorithmia.com/jobs ● Seattle or Remote ● Bright, collaborative env ● Unlimited PTO ● Dog-friendly

@peckjon

bit.ly/AI-DW-19

THANK YOU!

Tell the world that the future is here

Appendix

Jon Peck Developer Advocate

FREE STUFF

$50 free at Algorithmia.com signup code: AI-DW-19

WE ARE HIRING

algorithmia.com/jobs ● Seattle or Remote ● Bright, collaborative env ● Unlimited PTO ● Dog-friendly

@peckjon

bit.ly/AI-DW-19

THANK YOU!

Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise...

Transcript of Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise...

Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise...

Documents

Transcript of Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise...

Synchronous Machine Simulation Models

Computational Lithography Using Machine Learning Models

COMPARISON OF MACHINE LEARNING MODELS

Machine Learning - Combining models

Hadoop World 2010: Productionizing Hadoop: Lessons Learned

Productionizing dl from the ground up

Machine Learning Models on Random Graphs

Whitepaper - Productionizing Machine Learning Models with ...justinsgage.com/assets/data/portfolio/wp2.pdf · serverless AI Layer to power our Hosting AI/ML Models and Enterprise

Machine Learning: Generative and Discriminative Models

Basic Machine Learning: Linear Models

5 Keys to "Productionizing" Big Data

MALIGN MACHINE LEARNING MODELS - 2019.zeronights.ru

Dual-Event Machine Learning Models to Accelerate Drug ... · PDF fileDual-Event Machine Learning Models ... “Dual-event machine learning models to accelerate drug discovery” April

Productionizing Deep Learning From the Ground Up

Productionizing your Streaming Jobs

Operationalizing Machine Learning: Serving ML Models

DAWN: Infrastructure for Usable Machine LearningThe DAWN Stack Data Acquisition Feature Engineering Model Training Productionizing aces ms ms e … Snorkel DeepDive MacroBase(Streaming

Best practices for productionizing Apache Spark MLlib models

Keeping Spark on Track: Productionizing Spark for ETL

Traffic models for machine-to-machine (M2M) communications: types and applications · 2015. 12. 2. · Traffic models for machine-to-machine (M2M) communications: types and applications