Machine Learning at Netflix Scale
-
Upload
aish-fenton -
Category
Engineering
-
view
323 -
download
0
description
Transcript of Machine Learning at Netflix Scale
Machine Learning At Netflix Scale
Aish Fenton Manager - Research Engineering @aishfenton
Everything is a recommendation
4
Top Picks for Aish
Movies based on books
Because you watched Bob’s Burgers
Rank based on your taste
Ran
k ba
sed
on y
our
tast
e
75% of plays come from homepage
Back Story…
Proxy question: ▪ Accuracy in predicted rating ▪ Improve by 10% = $1million!
What we were interested in: ▪ High quality recommendations
predicted
actual
SVD RBMs
Top two results still used in production!
>
2006 2013
• > 44M members
• > 40 countries
• > 5B hours in Q3 2013
• Log 100B events/day
• 31.62% of peak US downstream traffic
Data and Models
▪ > 40M subscribers ▪ Ratings: ~5M/day ▪ Searches: >3M/day ▪ Plays: > 50M/day ▪ Streamed hours: o 5B hours in Q3 2013
Geo Info
Time
Impressions
Device Info
Metadata
Social
Ratings
Demographics
Member Behavior
Plays
Aish House of Cards
Latent User Vector
Latent Item Vector
3.53
RU
M
u1 u2 u3
m1 !m2!m3
House of Cards
Aish Aish
House of Cards
Mean Rating My Bias
Movie Bias
Interaction
Mean Rating My Bias
Movie Bias
Interaction
3.55 = 2.50 + -1.5 + 1.2 + pq
My rating for House of Cards
R3.53
U
M
u1 u2 u3
m1 !m2!m3
House of Cards
Aish
2.35
1.34
Time
T
t1 t2 t3 Time
▪ Matrix/Tensor Factorization ▪ Regression models (Logistic, Linear, Elastic nets) ▪ Factorization Machines ▪ Restricted Boltzmann Machines ▪ Markov Chains & other graph models ▪ Clustering / Topic Models ▪ Neural Networks ▪ Association Rules ▪ GBDT/RF ▪ …
Popularity
+ Ratings
+ More Features & Optimized Models
0% 50%
100%
150%
200%
250%
300%
Improvement Over Baseline
Anatomy of a Machine Learning
Platform
Problem
Data
Experiment Offline
Produce Model
Test / Metrics
Near-line
Online
UI Clients
Event Distribution
Online Algs
Model Trainer
Pre-compute
AB Test Metrics
API Layer
Monitoring
Offline
Hadoop / Data Warehouse
Experimentation Platform
S3 / HDFS
Offline MetricsQuery Tools
Models
Models
Near-line
Online
UI Clients
Event Distribution
Online Algs
Model Trainer
Pre-compute
AB Test Metrics
API Layer
Monitoring
Offline
Hadoop / Data Warehouse
Experimentation Platform
S3 / HDFS
Offline MetricsQuery Tools
Models
Models
▪ App Logs ▪ User Actions
▪ Ratings ▪ Plays ▪ Queue Adds
▪ Algo Actions ▪ Impressions (Presentation Bias)
▪ Context ▪ Device Info ▪ User Demographics ▪ Social ▪ Time
▪ …
Many different types of data…
Near-line
Online
UI Clients
Event Distribution
Online Algs
Model Trainer
Pre-compute
AB Test Metrics
API Layer
Monitoring
Offline
Hadoop / Data Warehouse
Experimentation Platform
S3 / HDFS
Offline MetricsQuery Tools
Models
Models
Embedded
Embedded
Weights
Real-time popularity of movie
Example: Neural Network Training
θ
Input OutputHidden Layer
Input OutputHidden Layers
Neural Network Training
1,536 cores
G2 Instances $0.60 p/h
But… things can go astray
Near-line
Online
UI Clients
Event Distribution
Online Algs
Model Trainer
Pre-compute
AB Test Metrics
API Layer
Monitoring
Offline
Hadoop / Data Warehouse
Experimentation Platform
S3 / HDFS
Offline MetricsQuery Tools
Models
Models
RU
MPre-compute
u1 u2 u3Online
Near-line
Online
UI Clients
Event Distribution
Online Algs
Model Trainer
Pre-compute
AB Test Metrics
API Layer
Monitoring
Offline
Hadoop / Data Warehouse
Experimentation Platform
S3 / HDFS
Offline MetricsQuery Tools
Models
Models
Aish played HoC
Publish new model for Aish
Aish Fenton @aishfenton https://www.linkedin.com/profile/view?id=47917219