Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

196
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Roy Rapoport November 14, 2013 Deft Data at Netflix: Using Amazon S3 and Amazon Elastic Friday, November 15, 13
  • date post

    16-Sep-2014
  • Category

    Technology

  • view

    736
  • download

    1

description

How does Netflix stay on top of the operations of its Internet service with millions of users and billions of metrics? With Atlas, its own massively distributed, large-scale monitoring system. Come learn how Netflix built Atlas with multiple processing pipelines using Amazon S3 and Amazon EMR to provide low-latency access to billions of metrics while supporting query-time aggregation along multiple dimensions.

Transcript of Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Page 1: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Roy Rapoport

November 14, 2013

Deft Data at Netflix:Using Amazon S3 and Amazon Elastic

Friday, November 15, 13

Page 2: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Me …

Friday, November 15, 13

Page 3: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

• About 20 years in technology

A Word About Me …

Friday, November 15, 13

Page 4: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

• About 20 years in technology• Systems engineering, networking,

software development, QA, release management

A Word About Me …

Friday, November 15, 13

Page 5: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

• About 20 years in technology• Systems engineering, networking,

software development, QA, release management

• Time at Netflix: 1599 days

A Word About Me …

Friday, November 15, 13

Page 6: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

• About 20 years in technology• Systems engineering, networking,

software development, QA, release management

• Time at Netflix: 1599 days

A Word About Me …

(4y:4m:15d)

Friday, November 15, 13

Page 7: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

• About 20 years in technology• Systems engineering, networking,

software development, QA, release management

• Time at Netflix: 1599 days • Before at Netflix: Service Delivery in

the IT/Ops, troubleshooter, Builder of Python Things[tm]

A Word About Me …

(4y:4m:15d)

Friday, November 15, 13

Page 8: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

• About 20 years in technology• Systems engineering, networking,

software development, QA, release management

• Time at Netflix: 1599 days • Before at Netflix: Service Delivery in

the IT/Ops, troubleshooter, Builder of Python Things[tm]

• Current role: Cloud Monitoring

A Word About Me …

(4y:4m:15d)

Friday, November 15, 13

Page 9: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

• About 20 years in technology• Systems engineering, networking,

software development, QA, release management

• Time at Netflix: 1599 days • Before at Netflix: Service Delivery in

the IT/Ops, troubleshooter, Builder of Python Things[tm]

• Current role: Cloud Monitoring•We build platforms

A Word About Me …

(4y:4m:15d)

Friday, November 15, 13

Page 10: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

• About 20 years in technology• Systems engineering, networking,

software development, QA, release management

• Time at Netflix: 1599 days • Before at Netflix: Service Delivery in

the IT/Ops, troubleshooter, Builder of Python Things[tm]

• Current role: Cloud Monitoring•We build platforms•Sometimes we make them easy to use

A Word About Me …

(4y:4m:15d)

Friday, November 15, 13

Page 11: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …

Friday, November 15, 13

Page 12: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …Just the Stats

Friday, November 15, 13

Page 13: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

• 16 years

A Word About Netflix …Just the Stats

Friday, November 15, 13

Page 14: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

• 16 years• 2000+ employees

A Word About Netflix …Just the Stats

Friday, November 15, 13

Page 15: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

• 16 years• 2000+ employees• 40 million users

A Word About Netflix …Just the Stats

Friday, November 15, 13

Page 16: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

• 16 years• 2000+ employees• 40 million users• 5x10^9 hours/quarter

A Word About Netflix …Just the Stats

Friday, November 15, 13

Page 17: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …

Friday, November 15, 13

Page 18: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …Freedom and Responsibility Culture

Friday, November 15, 13

Page 19: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

• Optimize speed of innovationConstrain availabilityCost will be what cost will be

A Word About Netflix …Freedom and Responsibility Culture

Friday, November 15, 13

Page 20: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

• Optimize speed of innovationConstrain availabilityCost will be what cost will be

• Hire smart (experienced) peopleGet out of their way

A Word About Netflix …Freedom and Responsibility Culture

Friday, November 15, 13

Page 21: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

• Optimize speed of innovationConstrain availabilityCost will be what cost will be

• Hire smart (experienced) peopleGet out of their way

• Anti-process bias

A Word About Netflix …Freedom and Responsibility Culture

Friday, November 15, 13

Page 22: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …

Friday, November 15, 13

Page 23: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …Technology and Operations

Friday, November 15, 13

Page 24: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …Technology and Operations

•Service Oriented Architecture

Friday, November 15, 13

Page 25: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …Technology and Operations

•Service Oriented Architecture•Decentralized Operations. You

Friday, November 15, 13

Page 26: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …Technology and Operations

•Service Oriented Architecture•Decentralized Operations. You

•Build

Friday, November 15, 13

Page 27: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …Technology and Operations

•Service Oriented Architecture•Decentralized Operations. You

•Build•Test

Friday, November 15, 13

Page 28: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …Technology and Operations

•Service Oriented Architecture•Decentralized Operations. You

•Build•Test•Deploy

Friday, November 15, 13

Page 29: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …Technology and Operations

•Service Oriented Architecture•Decentralized Operations. You

•Build•Test•Deploy•Set up alerting and monitoring

Friday, November 15, 13

Page 30: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …Technology and Operations

•Service Oriented Architecture•Decentralized Operations. You

•Build•Test•Deploy•Set up alerting and monitoring•Wake up at 2AM

Friday, November 15, 13

Page 31: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …Technology and Operations

Friday, November 15, 13

Page 32: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …

• AWS-based for 100% of streaming*

Technology and Operations

Friday, November 15, 13

Page 33: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …

• AWS-based for 100% of streaming*• Huge expansion

Technology and Operations

Friday, November 15, 13

Page 34: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …

• AWS-based for 100% of streaming*• Huge expansion

• Customer Growth

Technology and Operations

Friday, November 15, 13

Page 35: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …

• AWS-based for 100% of streaming*• Huge expansion

• Customer Growth• New markets

Technology and Operations

Friday, November 15, 13

Page 36: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Word About Netflix …

• AWS-based for 100% of streaming*• Huge expansion

• Customer Growth• New markets• Metrics

Technology and Operations

Friday, November 15, 13

Page 37: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

In the Old Days …

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License

Friday, November 15, 13

Page 38: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

In the Old Days …Our Old Alerting System

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License

Friday, November 15, 13

Page 39: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

In the Old Days …Our Old Alerting System

• Enterprise IT Solution

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License

Copyright USAID Microlinks. CC Attribution 2.0 License

Friday, November 15, 13

Page 40: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

In the Old Days …Our Old Alerting System

• Enterprise IT Solution• Managed by the Enterprise IT Alerting People

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License

Copyright USAID Microlinks. CC Attribution 2.0 License

Friday, November 15, 13

Page 41: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

In the Old Days …Our Old Alerting System

• Enterprise IT Solution• Managed by the Enterprise IT Alerting People• File Tickets

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License

Copyright: http://www.flickr.com/photos/s_w_ellis

CC Attribution 2.0 License

Friday, November 15, 13

Page 42: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

In the Old Days …Our Old Alerting System

• Enterprise IT Solution• Managed by the Enterprise IT Alerting People• File Tickets• Send alerts to NOC

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License

Friday, November 15, 13

Page 43: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

In the Old Days …Our Old Alerting System

• Enterprise IT Solution• Managed by the Enterprise IT Alerting People• File Tickets• Send alerts to NOC• Completely separate from telemetry system

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License

Friday, November 15, 13

Page 44: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

In the Old Days …

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License

In the Old Days …

Friday, November 15, 13

Page 45: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

In the Old Days …

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License

In the Old Days …

Our Old Telemetry System

Friday, November 15, 13

Page 46: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

In the Old Days …

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License

In the Old Days …

Our Old Telemetry System

• Spare-time effort by a lone sysadmin

Friday, November 15, 13

Page 47: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

In the Old Days …

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License

In the Old Days …

Our Old Telemetry System

• Spare-time effort by a lone sysadmin• Loved by developers

Friday, November 15, 13

Page 48: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

In the Old Days …

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License

In the Old Days …

Our Old Telemetry System

• Spare-time effort by a lone sysadmin• Loved by developers• Custom TCP protocol

Friday, November 15, 13

Page 49: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

In the Old Days …

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License

In the Old Days …

Our Old Telemetry System

• Spare-time effort by a lone sysadmin• Loved by developers• Custom TCP protocol• RRD file back-end storage

Friday, November 15, 13

Page 50: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

In the Old Days …

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License

In the Old Days …

Our Old Telemetry System

• Spare-time effort by a lone sysadmin• Loved by developers• Custom TCP protocol• RRD file back-end storage• Mostly Perl

Copyright: http://www.flickr.com/photos/acme

CC Attribution 2.0 License

Friday, November 15, 13

Page 51: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

In the Old Days …

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License

In the Old Days …

Our Old Telemetry System

• Spare-time effort by a lone sysadmin• Loved by developers• Custom TCP protocol• RRD file back-end storage• Mostly Perl• Datacenter-bound (and limited)

Friday, November 15, 13

Page 52: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

In the Old Days …

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License

In the Old Days …

Our Old Telemetry System

• Spare-time effort by a lone sysadmin• Loved by developers• Custom TCP protocol• RRD file back-end storage• Mostly Perl• Datacenter-bound (and limited)• Starting to falter under metrics growth

Friday, November 15, 13

Page 53: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Speaking of Growth

Friday, November 15, 13

Page 54: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Speaking of Growth

Friday, November 15, 13

Page 55: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Speaking of Growth

By way of comparison

Friday, November 15, 13

Page 56: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Speaking of Growth

By way of comparison • Every person in the world• twice

Friday, November 15, 13

Page 57: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Speaking of Growth

By way of comparison • Every person in the world• twice•Every smartphone in the

world• ten times

Friday, November 15, 13

Page 58: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Copyright: http://www.flickr.com/photos/76651030@N02/

CC Attribution 2.0 License

So We Built Something Better

Friday, November 15, 13

Page 59: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better

UI

Atlas Epic CloudWatch

UI Layer Fronts Multiple Systems

Copyright: http://www.flickr.com/photos/76651030@N02/

CC Attribution 2.0 License

Friday, November 15, 13

Page 60: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E CClear Regional Separation

• And aggregation

global

us-east-1 us-west-1 us-west-2 eu-west-1

Copyright: http://www.flickr.com/photos/76651030@N02/

CC Attribution 2.0 License

Friday, November 15, 13

Page 61: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us e

Localized Node/Metric Identification

Before:

I think You’re Bob

Here’s a metric!

OK!

I’m Bob. Here’s

a metric!

Now:

Copyright: http://www.flickr.com/photos/76651030@N02/

CC Attribution 2.0 License

Friday, November 15, 13

Page 62: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us e

Friday, November 15, 13

Page 63: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

Friday, November 15, 13

Page 64: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

Friday, November 15, 13

Page 65: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

• 256 characters aren’t enough!

Friday, November 15, 13

Page 66: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

• 256 characters aren’t enough!• This is better:

Friday, November 15, 13

Page 67: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

• 256 characters aren’t enough!• This is better:

nf.ami ami-aa5166ef

Friday, November 15, 13

Page 68: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

• 256 characters aren’t enough!• This is better:

nf.app wpnf.ami ami-aa5166ef

Friday, November 15, 13

Page 69: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

• 256 characters aren’t enough!• This is better:

nf.app wpnf.cluster wp-batch

nf.ami ami-aa5166ef

Friday, November 15, 13

Page 70: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

• 256 characters aren’t enough!• This is better:

nf.asg wp-batch-v163

nf.app wpnf.cluster wp-batch

nf.ami ami-aa5166ef

Friday, November 15, 13

Page 71: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

• 256 characters aren’t enough!• This is better:

nf.asg wp-batch-v163

nf.app wpnf.cluster wp-batch

nf.ami ami-aa5166ef

nf.country us

Friday, November 15, 13

Page 72: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

• 256 characters aren’t enough!• This is better:

nf.asg wp-batch-v163

nf.app wpnf.cluster wp-batch

nf.ami ami-aa5166ef

nf.country us

nf.node i-097c0e52

Friday, November 15, 13

Page 73: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

• 256 characters aren’t enough!• This is better:

nf.asg wp-batch-v163

nf.app wpnf.cluster wp-batch

nf.ami ami-aa5166ef

nf.country us

nf.region us-west-1nf.node i-097c0e52

Friday, November 15, 13

Page 74: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

• 256 characters aren’t enough!• This is better:

nf.asg wp-batch-v163

nf.app wpnf.cluster wp-batch

nf.ami ami-aa5166ef

nf.country us

nf.region us-west-1nf.zone us-west-1b

nf.node i-097c0e52

Friday, November 15, 13

Page 75: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

• 256 characters aren’t enough!• This is better:

nf.asg wp-batch-v163

nf.app wpnf.cluster wp-batch

nf.ami ami-aa5166ef

nf.country usclass nccp

nf.region us-west-1nf.zone us-west-1b

nf.node i-097c0e52

Friday, November 15, 13

Page 76: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

• 256 characters aren’t enough!• This is better:

nf.asg wp-batch-v163

nf.app wpnf.cluster wp-batch

nf.ami ami-aa5166ef

nf.country usclass nccp

nf.region us-west-1nf.zone us-west-1b

nf.node i-097c0e52

type request

Friday, November 15, 13

Page 77: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

• 256 characters aren’t enough!• This is better:

nf.asg wp-batch-v163

nf.app wpnf.cluster wp-batch

nf.ami ami-aa5166ef

nf.country usclass nccp

nf.region us-west-1nf.zone us-west-1b

nf.node i-097c0e52

type request

uiversion UI_169_mid

Friday, November 15, 13

Page 78: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

• 256 characters aren’t enough!• This is better:

nf.asg wp-batch-v163

nf.app wpnf.cluster wp-batch

nf.ami ami-aa5166ef

nf.country usclass nccp

nf.region us-west-1nf.zone us-west-1b

nf.node i-097c0e52

type request

action authorizationuiversion UI_169_mid

Friday, November 15, 13

Page 79: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

• 256 characters aren’t enough!• This is better:

nf.asg wp-batch-v163

nf.app wpnf.cluster wp-batch

nf.ami ami-aa5166ef

nf.country usclass nccp

nf.region us-west-1nf.zone us-west-1b

nf.node i-097c0e52

type request

action authorizationdevtype 101

uiversion UI_169_mid

Friday, November 15, 13

Page 80: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

• 256 characters aren’t enough!• This is better:

nf.asg wp-batch-v163

nf.app wpnf.cluster wp-batch

nf.ami ami-aa5166ef

nf.country usclass nccp

nf.region us-west-1nf.zone us-west-1b

nf.node i-097c0e52

type requestclver PHL_0AB

action authorizationdevtype 101

uiversion UI_169_mid

Friday, November 15, 13

Page 81: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eWhat’s a Metric?

• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

• 256 characters aren’t enough!• This is better:

nf.asg wp-batch-v163

nf.app wpnf.cluster wp-batch

nf.ami ami-aa5166ef

nf.country usclass nccp

nf.region us-west-1nf.zone us-west-1b

nf.node i-097c0e52

type requestclver PHL_0AB

action authorizationdevtype 101

uiversion UI_169_mid

geo us

Friday, November 15, 13

Page 82: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us e

Copyright: Kurt Moerman

CC Attribution 2.0 License

Friday, November 15, 13

Page 83: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us e

Powerful queries

Copyright: Kurt Moerman

CC Attribution 2.0 License

Friday, November 15, 13

Page 84: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us e

Powerful queries• Make the complex possible

Copyright: Kurt Moerman

CC Attribution 2.0 License

Friday, November 15, 13

Page 85: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us e

Powerful queries• Make the complex possible• Make the simple … sort of hard

Copyright: Kurt Moerman

CC Attribution 2.0 License

Friday, November 15, 13

Page 86: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us e

Powerful queries• Make the complex possible• Make the simple … sort of hard

Friday, November 15, 13

Page 87: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us e

Powerful queries• Make the complex possible• Make the simple … sort of hard

http://atlas/api/v1/graph?q=nf.region,us-west-1,:eq,nf.app,employeeinfo,:eq,:and,name,employeeinfo_api,:eq,:and,:sum&e=now-5m&s=e-3h

Friday, November 15, 13

Page 88: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us e

Powerful queries• Make the complex possible• Make the simple … sort of hard

Friday, November 15, 13

Page 89: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us e

Powerful queries• Make the complex possible• Make the simple … sort of hard

http://atlas/api/v1/graph?q=nf.region,us-west-1,:eq,nf.app,employeeinfo,:eq,:and,name,employeeinfo_api,:eq,:and,:sum,(,nf.zone,),:by&e=now-5m&s=e-3h

Friday, November 15, 13

Page 90: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us e

Powerful queries• Make the complex possible• Make the simple … sort of hard

Friday, November 15, 13

Page 91: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us e

Powerful queries• Make the complex possible• Make the simple … sort of hard

http://atlas/api/v1/graph?q=sps,nf.cluster,(,nccp-legacy,nccp-modern,),:in,nccprt,(,NCCPLicense,com_netflix_streaming_nccp_request_license,),:in,:and,stat,SuccessfulRequests,:eq,:and,device.rollup,3ds,:eq,:and,:sum,:set,entering_trough,sps,:get,1h,:offset,0.95,:mul,sps,:get,:gt,:set,smoothed,sps,:get,10,0.1,0.02,:des,:set,low_volume,smoothed,:get,-0.005,:mul,0.1,:add,:set,mid_volume,smoothed,:get,-0.00125,:mul,0.1,:add,:set,base,0.06,:set,min_pct,1,smoothed,:get,20,:lt,low_volume,:get,:mul,smoothed,:get,80,:lt,mid_volume,:get,:mul,:add,entering_trough,:get,0.05,:mul,:add,base,:get,:add,:sub,10,0.1,0.02,:des,:set,sps,:get,$(device.rollup)SPS,:legend,min_pct,:get,smoothed,:get,:mul,lowerbound,:legend,sps,:get,min_pct,:get,smoothed,:get,:mul,:lt,5,:rolling-count,2,:ge,:vspan,60,:alpha,$(device.rollup),:legend

Friday, November 15, 13

Page 92: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us e

Friday, November 15, 13

Page 93: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eRidiculous Read Volume:

• Engage

• Graphs and Dashboards

Friday, November 15, 13

Page 94: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eRidiculous Read Volume:

• Engage

• Graphs and Dashboards• Alerting

Friday, November 15, 13

Page 95: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eRidiculous Read Volume:

• Engage

• Graphs and Dashboards• Alerting• Automated Canaries

Friday, November 15, 13

Page 96: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eRidiculous Read Volume:

• Engage

• Graphs and Dashboards• Alerting• Automated Canaries• Capacity Analytics

Friday, November 15, 13

Page 97: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eRidiculous Read Volume:

• Engage

• Graphs and Dashboards• Alerting• Automated Canaries• Capacity Analytics• Special Projects

Friday, November 15, 13

Page 98: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eRidiculous Read Volume:

• Engage

• Graphs and Dashboards• Alerting• Automated Canaries• Capacity Analytics• Special Projects• BI

Friday, November 15, 13

Page 99: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eRidiculous Read Volume:

• Engage

• Graphs and Dashboards• Alerting• Automated Canaries• Capacity Analytics• Special Projects• BI

Friday, November 15, 13

Page 100: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eRidiculous Read Volume:

• Engage

• Graphs and Dashboards• Alerting• Automated Canaries• Capacity Analytics• Special Projects• BI

Friday, November 15, 13

Page 101: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us eRidiculous Read Volume:

• Engage

• Graphs and Dashboards• Alerting• Automated Canaries• Capacity Analytics• Special Projects• BI

Friday, November 15, 13

Page 102: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us e

backendinstancebackend

instancebackendinstancebackend

instancebackendinstancebackend

instancebackendinstance

regionalendpoint

global endpoint

Friday, November 15, 13

Page 103: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us e

clientinstance

backendinstancebackend

instancebackendinstancebackend

instancebackendinstancebackend

instancebackendinstance

regionalendpoint

global endpoint

Friday, November 15, 13

Page 104: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

So We Built Something Better UA E C

glus us us e

clientinstance

publishcluster

backendinstancebackend

instancebackendinstancebackend

instancebackendinstancebackend

instancebackendinstance

regionalendpoint

global endpoint

Friday, November 15, 13

Page 105: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Amazon S3

So We Built Something Better UA E C

glus us us e

clientinstance

publishcluster

backendinstancebackend

instancebackendinstancebackend

instancebackendinstancebackend

instancebackendinstance

regionalendpoint

global endpoint

Friday, November 15, 13

Page 106: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Amazon S3

So We Built Something Better UA E C

glus us us e

clientinstance

publishcluster

backendinstancebackend

instancebackendinstancebackend

instancebackendinstancebackend

instancebackendinstance

pollercluster

regionalendpoint

global endpoint

Friday, November 15, 13

Page 107: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Amazon S3

So We Built Something Better UA E C

glus us us e

clientinstance

publishclusterm

backendinstancebackend

instancebackendinstancebackend

instancebackendinstancebackend

instancebackendinstance

pollercluster

regionalendpoint

global endpoint

Friday, November 15, 13

Page 108: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Amazon S3

So We Built Something Better UA E C

glus us us e

clientinstance

publishclusterm

backendinstancebackend

instancebackendinstancebackend

instancebackendinstancebackend

instancebackendinstance

pollercluster

m

regionalendpoint

global endpoint

Friday, November 15, 13

Page 109: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Amazon S3

So We Built Something Better UA E C

glus us us e

clientinstance

publishclusterm

backendinstancebackend

instancebackendinstancebackend

instancebackendinstancebackend

instancebackendinstance

pollercluster

m

m

regionalendpoint

global endpoint

Friday, November 15, 13

Page 110: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Amazon S3

So We Built Something Better UA E C

glus us us e

clientinstance

publishclusterm

backendinstancebackend

instancebackendinstancebackend

instancebackendinstancebackend

instancebackendinstance

pollercluster

m

m

regionalendpoint

global endpoint

Friday, November 15, 13

Page 111: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Sounds Great!

Friday, November 15, 13

Page 112: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Sounds Great!Surely there are no problems

Copyright: http://www.flickr.com/photos/lainetrees/

CC Attribution 2.0 License

Friday, November 15, 13

Page 113: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Sounds Great!Surely there are no problems

•Speed is hard

Friday, November 15, 13

Page 114: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Sounds Great!Surely there are no problems

•Speed is hard•Speed at volume is harder

Friday, November 15, 13

Page 115: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Sounds Great!Surely there are no problems

•Speed is hard•Speed at volume is harder•We looked at spinning disks

Friday, November 15, 13

Page 116: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Sounds Great!Surely there are no problems

•Speed is hard•Speed at volume is harder•We looked at spinning disks•Memory’s the way to go

Friday, November 15, 13

Page 117: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Sounds Great!Surely there are no problems

•Speed is hard•Speed at volume is harder•We looked at spinning disks•Memory’s the way to go•m2.4xlarge

Friday, November 15, 13

Page 118: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Sounds Great!Surely there are no problems

•Speed is hard•Speed at volume is harder•We looked at spinning disks•Memory’s the way to go•m2.4xlarge•This is operational data

Friday, November 15, 13

Page 119: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Sounds Great!Surely there are no problems

•Speed is hard•Speed at volume is harder•We looked at spinning disks•Memory’s the way to go•m2.4xlarge•This is operational data

•People want it available, fast

Friday, November 15, 13

Page 120: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Sounds Great!Surely there are no problems

•Speed is hard•Speed at volume is harder•We looked at spinning disks•Memory’s the way to go•m2.4xlarge•This is operational data

•People want it available, fast•Operations have short memories

Friday, November 15, 13

Page 121: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Sounds Great!Surely there are no problems

•Speed is hard•Speed at volume is harder•We looked at spinning disks•Memory’s the way to go•m2.4xlarge•This is operational data

•People want it available, fast•Operations have short memories

20,160 m2.4xlarge$32,094,720 upfront$8,005,939/month

per regionwith no redundancy

Friday, November 15, 13

Page 122: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Sounds Great!Surely there are no problems

•Speed is hard•Speed at volume is harder•We looked at spinning disks•Memory’s the way to go•m2.4xlarge•This is operational data

•People want it available, fast•Operations have short memories Copyright: http://www.flickr.com/photos/amenk/

CC Attribution 2.0 License

Friday, November 15, 13

Page 123: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Doesn’t Sound Great!

Friday, November 15, 13

Page 124: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Doesn’t Sound Great!•If only we could reduce it …

Friday, November 15, 13

Page 125: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Doesn’t Sound Great!•If only we could reduce it …•“Reduce”? Get it? Get it?

Friday, November 15, 13

Page 126: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Doesn’t Sound Great!•If only we could reduce it …•“Reduce”? Get it? Get it?•Our granularity is two-dimensional

Step size (time)

Dim

ensi

onal

ity (t

ags)

Friday, November 15, 13

Page 127: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Doesn’t Sound Great!•If only we could reduce it …•“Reduce”? Get it? Get it?•Our granularity is two-dimensional•We can reduce on either dimension

Step size (time)

Dim

ensi

onal

ity (t

ags)

Friday, November 15, 13

Page 128: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Doesn’t Sound Great!•If only we could reduce it …•“Reduce”? Get it? Get it?•Our granularity is two-dimensional•We can reduce on either dimension•Some tags make sense for very rapid reduction

Step size (time)

Dim

ensi

onal

ity (t

ags)

Friday, November 15, 13

Page 129: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Doesn’t Sound Great!•If only we could reduce it …•“Reduce”? Get it? Get it?•Our granularity is two-dimensional•We can reduce on either dimension•Some tags make sense for very rapid reduction

•Hystrix

Step size (time)

Dim

ensi

onal

ity (t

ags)

Friday, November 15, 13

Page 130: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Doesn’t Sound Great!•If only we could reduce it …•“Reduce”? Get it? Get it?•Our granularity is two-dimensional•We can reduce on either dimension•Some tags make sense for very rapid reduction

•Hystrix•nf.node

Step size (time)

Dim

ensi

onal

ity (t

ags)

Friday, November 15, 13

Page 131: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Doesn’t Sound Great!•If only we could reduce it …•“Reduce”? Get it? Get it?•Our granularity is two-dimensional•We can reduce on either dimension•Some tags make sense for very rapid reduction

•Hystrix•nf.node

•Sometimes a lot (vhs) Step size (time)

Dim

ensi

onal

ity (t

ags)

Friday, November 15, 13

Page 132: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

That Doesn’t Sound Great!•If only we could reduce it …•“Reduce”? Get it? Get it?•Our granularity is two-dimensional•We can reduce on either dimension•Some tags make sense for very rapid reduction

•Hystrix•nf.node

•Sometimes a lot (vhs)•Sometimes a little (Cassandra)

Step size (time)

Dim

ensi

onal

ity (t

ags)

Friday, November 15, 13

Page 133: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Reductive Approach

Friday, November 15, 13

Page 134: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Reductive Approach•For a series of values, reduce and keep:

Friday, November 15, 13

Page 135: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Reductive Approach•For a series of values, reduce and keep:

•minimum

Friday, November 15, 13

Page 136: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Reductive Approach•For a series of values, reduce and keep:

•minimum•maximum

Friday, November 15, 13

Page 137: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Reductive Approach•For a series of values, reduce and keep:

•minimum•maximum•total

Friday, November 15, 13

Page 138: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Reductive Approach•For a series of values, reduce and keep:

•minimum•maximum•total•count

Friday, November 15, 13

Page 139: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Reductive Approach•For a series of values, reduce and keep:

•minimum•maximum•total•count

•Example:

Friday, November 15, 13

Page 140: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Reductive Approach•For a series of values, reduce and keep:

•minimum•maximum•total•count

•Example:•3,5,9,14,20: min 3, max 20, tot 51, count 5

Friday, November 15, 13

Page 141: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Reductive Approach•For a series of values, reduce and keep:

•minimum•maximum•total•count

•Example:•3,5,9,14,20: min 3, max 20, tot 51, count 5

•Allows for sense of scale

Friday, November 15, 13

Page 142: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

A Reductive Approach•For a series of values, reduce and keep:

•minimum•maximum•total•count

•Example:•3,5,9,14,20: min 3, max 20, tot 51, count 5

•Allows for sense of scale•Allows for arbitrary further reduction w/o loss of precision

Friday, November 15, 13

Page 143: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Policy

Copyright: http://www.flickr.com/photos/bagaball/

CC Attribution 2.0 License

Friday, November 15, 13

Page 144: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Policy

•Policy-driven EMR engine

Copyright: http://www.flickr.com/photos/bagaball/

CC Attribution 2.0 License

Friday, November 15, 13

Page 145: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Policy

•Policy-driven EMR engine•Four possible actions

Copyright: http://www.flickr.com/photos/bagaball/

CC Attribution 2.0 License

Friday, November 15, 13

Page 146: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Policy

•Policy-driven EMR engine•Four possible actions

•preserve

Copyright: http://www.flickr.com/photos/bagaball/

CC Attribution 2.0 License

Friday, November 15, 13

Page 147: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Policy

•Policy-driven EMR engine•Four possible actions

•preserve•drop

Copyright: http://www.flickr.com/photos/bagaball/

CC Attribution 2.0 License

Friday, November 15, 13

Page 148: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Policy

•Policy-driven EMR engine•Four possible actions

•preserve•drop•consolidate

Copyright: http://www.flickr.com/photos/bagaball/

CC Attribution 2.0 License

Friday, November 15, 13

Page 149: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Policy

•Policy-driven EMR engine•Four possible actions

•preserve•drop•consolidate•rollup

Copyright: http://www.flickr.com/photos/bagaball/

CC Attribution 2.0 License

Friday, November 15, 13

Page 150: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Policy

{ "rules" : [ { "operations" : [{"op" : "drop"}], "query" : "nf.app,api,:eq,class,(,LastMinuteFailRatio,SLA,NetflixSimpleDBService,),:in,:and" }, { "operations" : [{ “config" : { "keys" : [ "nf.node", "device", "nf.country" ] }, "op" : “rollup" }], "query" : ":true" } ]}

Friday, November 15, 13

Page 151: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Amazon EMR

clientinstance

publishcluster

Amazon S3

pollercluster

regionalendpoint

global endpoint

6Hcluster

EMRDriver

4Dcluster

18Dcluster

Historicalcluster

metrics

query

responsemetrics

metrics metrics

1

2 3

45 5

5

Friday, November 15, 13

Page 152: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Amazon EMR

clientinstance

publishcluster

Amazon S3

pollercluster

regionalendpoint

global endpoint

6Hcluster

EMRDriver

4Dcluster

18Dcluster

Historicalcluster

metrics

query

responsemetrics

metrics metrics

1

2 3

45 5

5

Friday, November 15, 13

Page 153: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Amazon EMR

clientinstance

publishcluster

Amazon S3

pollercluster

regionalendpoint

global endpoint

6Hcluster

EMRDriver

4Dcluster

18Dcluster

Historicalcluster

as-neededcluster

as-neededcluster

as-neededcluster

metrics

query

responsemetrics

metrics metrics

1

2 3

45 5

5

Friday, November 15, 13

Page 154: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Benefits

Copyright: http://www.flickr.com/photos/dr_pete/

CC Attribution 2.0 License

Friday, November 15, 13

Page 155: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Benefits

•Indefinite storage in Amazon S3

Copyright: http://www.flickr.com/photos/dr_pete/

CC Attribution 2.0 License

Friday, November 15, 13

Page 156: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Benefits

•Indefinite storage in Amazon S3•Fear of commitment achievement:

Copyright: http://www.flickr.com/photos/dr_pete/

CC Attribution 2.0 License

Unlocked

Friday, November 15, 13

Page 157: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Benefits

•Indefinite storage in Amazon S3•Fear of commitment achievement:•Can be aggressive about hiding metrics

Copyright: http://www.flickr.com/photos/dr_pete/

CC Attribution 2.0 License

Unlocked

Friday, November 15, 13

Page 158: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Benefits

•Indefinite storage in Amazon S3•Fear of commitment achievement:•Can be aggressive about hiding metrics•High granularity for special days

Copyright: http://www.flickr.com/photos/dr_pete/

CC Attribution 2.0 License

Unlocked

Friday, November 15, 13

Page 159: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Benefits

•Indefinite storage in Amazon S3•Fear of commitment achievement:•Can be aggressive about hiding metrics•High granularity for special days•Automated for regular operations*

Copyright: http://www.flickr.com/photos/dr_pete/

CC Attribution 2.0 License

Unlocked

Friday, November 15, 13

Page 160: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Benefits

•Indefinite storage in Amazon S3•Fear of commitment achievement:•Can be aggressive about hiding metrics•High granularity for special days•Automated for regular operations*•Not in critical path for visibility SLA

Copyright: http://www.flickr.com/photos/dr_pete/

CC Attribution 2.0 License

Unlocked

Friday, November 15, 13

Page 161: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Benefits

•Indefinite storage in Amazon S3•Fear of commitment achievement:•Can be aggressive about hiding metrics•High granularity for special days•Automated for regular operations*•Not in critical path for visibility SLA•Firewalls accidental metric explosions

Copyright: http://www.flickr.com/photos/dr_pete/

CC Attribution 2.0 License

Unlocked

Friday, November 15, 13

Page 162: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Benefits

•Indefinite storage in Amazon S3•Fear of commitment achievement:•Can be aggressive about hiding metrics•High granularity for special days•Automated for regular operations*•Not in critical path for visibility SLA•Firewalls accidental metric explosions•Huge efficiency gains

Copyright: http://www.flickr.com/photos/dr_pete/

CC Attribution 2.0 License

Unlocked

Friday, November 15, 13

Page 163: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Efficiency

Copyright: http://www.flickr.com/photos/sebrenner/

CC Attribution 2.0 License

Friday, November 15, 13

Page 164: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Efficiency

Friday, November 15, 13

Page 165: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Efficiency

6H 4D 18D HISTORY

TimeHorizon

Size

Instances Per Hour

% Reduction

6 Hours 4 Days 18 Days 3 Months

600 512 180 12

100 5 0 0

0 95 100 100

Friday, November 15, 13

Page 166: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Efficiency

6H 4D 18D HISTORY

TimeHorizon

Size

Instances Per Hour

% Reduction

6 Hours 4 Days 18 Days 3 Months

600 512 180 12

100 5 0 0

0 95 100 100

Friday, November 15, 13

Page 167: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Efficiency

6H 4D 18D HISTORY

TimeHorizon

Size

Instances Per Hour

% Reduction

6 Hours 4 Days 18 Days 3 Months

600 512 180 12

100 5 0 0

0 95 100 100

Friday, November 15, 13

Page 168: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Efficiency

6H 4D 18D HISTORY

TimeHorizon

Size

Instances Per Hour

% Reduction

6 Hours 4 Days 18 Days 3 Months

600 512 180 12

100 5 0 0

0 95 100 100

Friday, November 15, 13

Page 169: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Reduction: Efficiency

6H 4D 18D HISTORY

TimeHorizon

Size

Instances Per Hour

% Reduction

6 Hours 4 Days 18 Days 3 Months

600 512 180 12

100 5 0 0

0 95 100 100

Friday, November 15, 13

Page 170: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Previews

Friday, November 15, 13

Page 171: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Previews

Copyright: http://www.flickr.com/photos/creativealan/

CC Attribution 2.0 License

Friday, November 15, 13

Page 172: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Previews

•Self-service for special requests

Copyright: http://www.flickr.com/photos/creativealan/

CC Attribution 2.0 License

Friday, November 15, 13

Page 173: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Previews

•Self-service for special requests•Different instance types

Copyright: http://www.flickr.com/photos/creativealan/

CC Attribution 2.0 License

Friday, November 15, 13

Page 174: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Previews

•Self-service for special requests•Different instance types

•cr1.8xlarge

Copyright: http://www.flickr.com/photos/creativealan/

CC Attribution 2.0 License

Friday, November 15, 13

Page 175: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Previews

•Self-service for special requests•Different instance types

•cr1.8xlarge•hi1.4xlarge

Copyright: http://www.flickr.com/photos/creativealan/

CC Attribution 2.0 License

Friday, November 15, 13

Page 176: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Previews

•Self-service for special requests•Different instance types

•cr1.8xlarge•hi1.4xlarge

•Multi-tiered metric visibility

Copyright: http://www.flickr.com/photos/creativealan/

CC Attribution 2.0 License

Friday, November 15, 13

Page 177: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Growth Redux

Friday, November 15, 13

Page 178: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

5/11 8/11 9/11 1/12 4/12 8/1211/121/13 5/1310/131/14

(M)

met

rics

2 2.5 10

Growth Redux

Friday, November 15, 13

Page 179: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

5/11 8/11 9/11 1/12 4/12 8/1211/121/13 5/1310/131/14

(M)

met

rics

2 2.5 10 15

Growth Redux

Friday, November 15, 13

Page 180: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

5/11 8/11 9/11 1/12 4/12 8/1211/121/13 5/1310/131/14

(M)

met

rics

2 2.5 10 15 18 30 55 90212

728

Growth Redux

Friday, November 15, 13

Page 181: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

5/11 8/11 9/11 1/12 4/12 8/1211/121/13 5/1310/131/14

(M)

met

rics

2 2.5 10 15 18 30 55 90212

728

1,200

Growth Redux

Friday, November 15, 13

Page 182: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Growth Redux

Friday, November 15, 13

Page 183: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

And a Last Word About Costs

Friday, November 15, 13

Page 184: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

And a Last Word About Costs

Friday, November 15, 13

Page 185: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

And a Last Word About Costs

•Priorities Reminder

Friday, November 15, 13

Page 186: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

And a Last Word About Costs

•Priorities Reminder•Speed of Innovation

Friday, November 15, 13

Page 187: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

And a Last Word About Costs

•Priorities Reminder•Speed of Innovation•Availability

Friday, November 15, 13

Page 188: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

And a Last Word About Costs

•Priorities Reminder•Speed of Innovation•Availability•Cost

Friday, November 15, 13

Page 189: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

And a Last Word About Costs

•Priorities Reminder•Speed of Innovation•Availability•Cost

•Never intended to lower costs

Friday, November 15, 13

Page 190: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

And a Last Word About Costs

•Priorities Reminder•Speed of Innovation•Availability•Cost

•Never intended to lower costs•Cloud migration

Friday, November 15, 13

Page 191: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

And a Last Word About Costs

•Priorities Reminder•Speed of Innovation•Availability•Cost

•Never intended to lower costs•Cloud migration•Additional features

Friday, November 15, 13

Page 192: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

And a Last Word About Costs

•Priorities Reminder•Speed of Innovation•Availability•Cost

•Never intended to lower costs•Cloud migration•Additional features•Massive Performance

Friday, November 15, 13

Page 193: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

And a Last Word About Costs

•Priorities Reminder•Speed of Innovation•Availability•Cost

•Never intended to lower costs•Cloud migration•Additional features•Massive Performance

Friday, November 15, 13

Page 194: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

EMR

FTWFriday, November 15, 13

Page 195: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Friday, November 15, 13

Page 196: Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

Please give us your feedback on this presentation

As a thank you, we will select prize winners daily for completed surveys!

BDT302 Thank You

Friday, November 15, 13