Real time analytics @ netflix

Post on 16-Apr-2017

503 views 1 download

Transcript of Real time analytics @ netflix

Real-Time Analytics @ Netflix

Cody Rioux - @codyriouxReal-Time Analytics - Insight Engineering

Overview.● Real-Time Analytics

○ Anomaly / Outlier Detection○ Canary Analysis

● Architecture● Challenges

○ Cold Start○ Concept Drift○ Configuration○ Change Deployment○ User Acceptance

We are drowning in information but starved for knowledge.- John Naisbitt

Real-Time Analytics

Real-Time Analytics● Part of Insight Engineering.● Build systems that make intelligent decisions about our operational environment.

○ Make decisions in near real-time.○ Automate actions in the production environment.

● Support operational availability and reliability.

One of these things is not like the others.

Anomaly and Outlier Detection

Unexpected value for a given generating mechanism.

Terminology

Outlier Anomaly

Good builds gone bad!

Automated Canary Analysis

Old Version (v1.0)

New Version(Canary - v1.1)

Load Balancer

Customers

88 Servers

6 Servers

Metrics

Netflix Canary Release Process.

Old Version(Control - v1.0)

6 Servers

Analysis

A Data Scientist’s capability to extract value from data is largely coupled with the maturity of the data platform of its company. - Robert Chang

Analytic Architecture

CSI (REST)Customers

First Generation Architecture

OpenPy

Load BalancerCustomers

Models

Second Generation Architecture

OpenPy

...

RTA Data PollerTelemetry Data

Heating things up beginning at absolute zero

Challenge Zero: Cold Start

OpenPy

Load BalancerCustomers

Models

New Architecture

OpenPy

...

RTA Data PollerTelemetry Data

Data TaggerData Store

Change in data stream over time.

Challenge One: Concept Drift

Solutions: Concept Drift

Monitoring the behavior of

analytics and soliciting user

feedback.

OpenPy

Load Balancer

CustomersModels

New Architecture

OpenPy

...

RTA Data PollerTelemetry Data

Data TaggerData Store

Feedback

Removing the burden of configuration complexity for the user.

Challenge Two: Configuration

Configurations are complex.

Assumptions and meta-analytics eliminate user burden.

General is inherently more

complex than specific.

OpenPy

Load Balancer

CustomersModels

New Architecture

OpenPy

...

RTA Data PollerTelemetry Data

Data TaggerData Store

Feedback

Progress is impossible without change, and those who cannot change their minds cannot change anything - George Bernard Shaw

Challenge Three: Change Deployment

OpenPy

Load Balancer

CustomersModels

New Architecture

OpenPy

...

RTA Data PollerTelemetry Data

Data TaggerData Store

Feedback

Rest API

/REST/v1/anomaly

/REST/v2/anomaly

/REST/v3/anomaly

...

Mantis (Stream Processor)

CustomersModels

New Architecture

SparkTelemetry

Data

Data Tagger Data Store

Feedback

Fact Table

Versioned JAR Files

“Machine learning is really good at partially solving just about any problem.” - cdixon

Challenge Four: User Acceptance

User Acceptance● Understandable analytics.● Favor probabilities for inputs and outputs.● Conceptual documentation.

User Acceptance

?

Our platform is less bad than it used to be. :)

Recap

Mantis (Stream Processor)

CustomersModels

New Architecture

SparkTelemetry

Data

Data Tagger Data Store

Feedback

Fact Table

Versioned JAR Files

Recap● Cold Start: Data Tagging● Concept Drift: Feedback Loop● Configuration: Assumptions and Meta Analytics● Change Deployment: Versioned Analytics● User Acceptance: Docs, probabilities, understandable analytics.

Literature

Machine Learning: The High

Interest Credit Card of Technical

Debt (Sculley et al., 2014)

Literature● Practical Machine Learning: A New Look at Anomaly Detection (Dunning, 2014)● Distinguishing cause from effect using observational data: methods and benchmarks

(Mooij et al., 2014)● Enhancing Performance Prediction Robustness by Combining Analytical Modeling

and Machine Learning (Didona et al., 2015)

Questions?crioux@netflix.com@codyriouxlinkedin.com/in/codyrioux