Post on 20-May-2020
Productionizing your Machine Learning Models
Jon Peck
Making state-of-the-art algorithms
discoverable and accessible to everyone
Fullstack Developer & Advocate
jpeck@algorithmia.com
@peckjon
bit.ly/AI-DW-19
2
The Problem: ML is in a huge growth phase, difficult/expensive for DevOps to keep up
Initially:
● A few models, a couple frameworks, 1-2 languages
● Dedicated hardware or VM Hosting
● IT Team or self-managed DevOps
● High time-to-deploy, manual discoverability
● Few end-users, heterogenous APIs (if any)
Pretty soon... ● > 9,500 algorithms (95k versions) on many runtimes / frameworks
● > 100k algorithm developers: heterogenous, largely unpredictable
● Each algorithm: 1 to 1,000 calls/second, a lot of variance
● Need auto-deploy, discoverability, low (10-15ms) latency
● Common API, composability, fine-grained security
3
Challenges of deploying models in the enterprise
Machine learning
● CPU / GPU / Specialized hardware
● Multiple frameworks, languages,
dependencies
● Called from different devices &
architectures
“Snowflake” environments
● Unique cloud hardware and services
● DevOps teams not used to the specific
considerations of ML hosting
Security and Audit
● Stringent security and access controls
● “Who called what when” for audit & compliance
Uncharted territory
● Deployment is a new problem for datascience
teams; not a lot of literature / examples
● Redundant work across teams, lack of re-use
● New experience buying & managing
infrastructure or working w/ DevOps team
● How to handle chargebacks and billing
"Expecting your engineering and DevOps teams to deploy ML models well is like showing up to Seaworld with a giraffe since they are already handling large mammals.”
MACHINE LEARNING !=
PRODUCTION MACHINE LEARNING
Training vs Production
5
Data Scientists build and iterate over a model until it is ready to move to production
DevOps manages servers, task scheduling, etc to support execution of concurrent models
INFERENCE
Short compute bursts
TRAINING
Long compute cycle
Fixed load (Inelastic)
Stateful
Many users Single user
Stateless
Elastic
Users and Services run models ad-hoc (need: elasticity), and rarely from the same language they’re developed in (need: APIs)
Training vs Production
6
Deploying Models: raw server or cloud VM
1. Set up server ○ Select proper balance of CPU, GPU, memory, cost
○ Laborious to configure first time, but fairly easy to replicate
○ Expensive for higher-powered machines (especially GPUs)
2. Create microservice ○ Write API wrapper (e.g., Flask)
○ Will be usable from any language, environment
○ How to secure, meter, disseminate?
3. Add scaling ○ Cloud VMs can scale by adding more copies
(usu billed per machine-hour)
○ Write/config automation to predict load & create VMs
4. Repeat for each unique environment ○ Separate server for each model?
○ Or deal with dependency & resource conflicts?
Flask source: Jeff Klukas 7
Deploying Models: serverless functions
● Initially, this looks great ○ Simple setup: just fill out a function body
○ Automatic API wrappers or configurable API gateway
○ No DevOps: maintenance handled by provider
○ Instant, elastic scaling (big cost savings)
○ Cheap: usu billed per-second, and free when not in use
● But there are some significant limitations ○ Not optimized for ML
○ Languages: Node & some Python, Java, C#
○ Limited dependency support
○ No GPUs!
○ Max execution time: 5-15 minute
○ Little/no consumer-facing UI
8
What should a mature solution have?
● Broad lang & lib support: any language & dependencies
● GPU support: fast exec & memory for GPU models
● Elasticity & concurrency: instantly scale up/down with demand; many copies of different models
● Automatic API: datascientists not responsible for serializing JSON or managing server frameworks
● Pipelining: common API across models, data passing
● Built-in security: auth, process isolation, user data
● Long timeouts: predictions may take ms or an hour
● Versioning and Grouping: public / private / group visibility of models, all old versions executable
(no broken services)
● Portability: run in-house or on any cloud(s)
● Discoverability / model-management UI: find & share well-described models, “run an example”,
cut-and-paste API code in every language
9
Building it: start with containers, add scaling / replication
User
Web Load Balancer
API Load Balancer
Web Servers
API Servers
Cloud Region #1
Worker xN
Docker(algorithm#1)
..
Docker(algorithm#n)
Cloud Region #2
Worker xN
Docker(algorithm#1)
..
Docker(algorithm#n)
+
+
10
● ML models as serverless microservices: allows isolation, promotes model re-use and modularity ● Ability to replicate containers and move between regions allows for scaling, portability, low-latency
Design containers to support all languages, flexible enough to add any library
FoodClassifier
FruitClassifier VeggieClassifier
...don’t forget to make GPU versions, too 11
Make it easy for datascientists to add new models
12
● Continuous Deployment speeds production: GIT code management, develop locally or Web IDE ● User and group namespaces, private / public / group visibility, pricing & dept chargebacks
13
Add pipelining and intelligent orchestration
Known:
1. Typical execution path
2. Compute & memory per algo
Optimize for:
1. Minimum network latency
2. Maximum throughput
3. Minimum resource use
‣ CPU ‣ Memory
‣ GPU ‣ I/O
A
‣ CPU ‣ Memory
‣ GPU ‣ I/O
B
‣ CPU ‣ Memory
‣ GPU ‣ I/O
C
cat foo.txt | keyword.sh | ranker.sh
14
● Semantic versioning for models, just like with any other software (1.2.x)
● All versions of model are runnable at any time
● Compare versions of the model, to verify and see changes in performance (speed,
accuracy), and manage model drift
● App Devs can stay a version behind, or use different versions for different contexts
● Rolling, non-interruptive deployments: model improvements that don’t break existing code
Support standardized versioning
15
Key production metrics:
● Latency
● Resources used (CPU/GPU, I/O)
● System Capacity
● Scale up and Scale down
● Authentication
● API timing metrics and calls
● Errors rates
But also:
● What teams are using the models
● What applications are using them
● Billing & chargebacks
● Understand if AI investments are paying off
● See business impact across organization
Provide logging and analytics
16
Compute EC2 CE VM ESX
Autoscaling Autoscaling Group Autoscaler Scale Set Orchestrator /
BYO
Load Balancing Elastic Load
Balancer Load Balancer Load Balancer NSX / BYO
DataBase RDS Cloud SQL Azure SQL DB BYO
Object Storage S3 Cloud Storage Azure Blobs BYO
Block Storage Elastic Block Store Persistent Disk Azure Disks VMFS
Build abstraction layers for all infrastructure providers
17
Expose user-friendly storage abstraction
# No storage abstraction
s3 = boto3.client("s3")
obj = s3.get_object(Bucket="bucket-name", Key="records.csv")
data = obj["Body"].read()
# With storage abstraction
data = client.file("blob://records.csv").get()
s3://foo/bar
blob://foo/bar
hdfs://foo/bar
dropbox://foo/bar
etc.
Build a model portfolio UI for easy discovery & testing
18
● Models are only as useful as their docs: creators write descriptions which live with the model ● Categories / tags / search for users to find the models they need (and see only the ones allowed) ● Test models right inside the catalog, before integrating into app code ● Encourage model re-use and improve efficiency across teams, while respecting access rights
Design a consistent API with clients in every language
19
● Models are often written in one lang but consumed in another (or many) ● Provide cut-and-paste code for any model / language combination ● ZERO time from model deployment to usability: drastically reduce the length of total dev pipeline
20
Make the public platform available to anyone, anywhere
21
ALGORITHMIA ENTERPRISE - your company’s private ML inventory & model-as-a-service platform
Deploy
Develop models
in any language,
framework, or
infrastructure
Scale
Expose models as
highly-reliable
versioned APIs that
autoscale to 100s
of reqs/second
Discover
Describe your
model in a central
catalog where
peers can easily
discover & use it
Monitor
House thousands of
models under one
roof with a uniform
REST interface and a
central dashboard
Make your platform deployable on any org private cloud
Try it yourself: deploy a model on Algorithmia
http://bit.ly/algodev -> digit_recognition
Looking for more?
Jon Peck Developer Advocate
FREE STUFF
$50 free at Algorithmia.com signup code: AI-DW-19
WE ARE HIRING
algorithmia.com/jobs ● Seattle or Remote ● Bright, collaborative env ● Unlimited PTO ● Dog-friendly
jpeck@algorithmia.com
@peckjon
bit.ly/AI-DW-19
THANK YOU!
Tell the world that the future is here
Appendix
24
Jon Peck Developer Advocate
FREE STUFF
$50 free at Algorithmia.com signup code: AI-DW-19
WE ARE HIRING
algorithmia.com/jobs ● Seattle or Remote ● Bright, collaborative env ● Unlimited PTO ● Dog-friendly
jpeck@algorithmia.com
@peckjon
bit.ly/AI-DW-19
THANK YOU!