Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like...

30
1 Instrumentation, Observability, and Monitoring of Machine Learning Models

Transcript of Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like...

Page 1: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

1

Instrumentation, Observability, and Monitoring of Machine Learning Models

Page 2: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

About Me

● Google Engineer (2007-11)

● Cloudera’s Director of Data Science (2011-15)

● Slack’s Director of Data Engineering (2015-2017)

● Slack Engineer (now)

Page 3: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

“”

Page 4: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

“”

Page 5: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

The Genesis of This Talk

Page 6: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Machine Learning In the Wild

Page 7: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Data Science Meets DevOps

Page 8: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Some History

Page 9: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Logs via the ELK Stack

Page 10: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Metrics with Prometheus

Page 11: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Prometheus Architecture

Page 12: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Traces

Page 13: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

A Word About Cardinality

Page 14: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Make Good Decisions By Avoiding Bad Decisions

Page 15: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

The ML Test Score

Page 16: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

The Map Is Not The Territory

Page 17: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Monitor Model Decay

Page 18: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Build Lots of Models

Page 19: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Deploy Your Models Like They Are Code*

Page 20: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Stand On The Shoulders of Giants

● Ensembles

● Experiments

● Dark Tests

● Canary

● Sanity Checks

Page 21: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Tag All The Things

Page 22: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Circle of Competence

Page 23: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Garbage In...

Page 24: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Linking Online and Offline Metrics

Page 25: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Handling Cross-Language Feature Engineering

Page 26: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Know Your Dependencies

Page 27: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Monitoring For Critical Slices

Page 28: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

Second-Order Thinking

Page 29: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

On Razors

Page 30: Monitoring of Machine Learning Models Instrumentation, … · 2020-03-22 · Sounds a lot more like a DevOps problem then a Machine Learning problem to me-e. But really, in general,

http://slack.com/careers

30