Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013
-
date post
16-Sep-2014 -
Category
Technology
-
view
736 -
download
1
description
Transcript of Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Roy Rapoport
November 14, 2013
Deft Data at Netflix:Using Amazon S3 and Amazon Elastic
Friday, November 15, 13
A Word About Me …
Friday, November 15, 13
• About 20 years in technology
A Word About Me …
Friday, November 15, 13
• About 20 years in technology• Systems engineering, networking,
software development, QA, release management
A Word About Me …
Friday, November 15, 13
• About 20 years in technology• Systems engineering, networking,
software development, QA, release management
• Time at Netflix: 1599 days
A Word About Me …
Friday, November 15, 13
• About 20 years in technology• Systems engineering, networking,
software development, QA, release management
• Time at Netflix: 1599 days
A Word About Me …
(4y:4m:15d)
Friday, November 15, 13
• About 20 years in technology• Systems engineering, networking,
software development, QA, release management
• Time at Netflix: 1599 days • Before at Netflix: Service Delivery in
the IT/Ops, troubleshooter, Builder of Python Things[tm]
A Word About Me …
(4y:4m:15d)
Friday, November 15, 13
• About 20 years in technology• Systems engineering, networking,
software development, QA, release management
• Time at Netflix: 1599 days • Before at Netflix: Service Delivery in
the IT/Ops, troubleshooter, Builder of Python Things[tm]
• Current role: Cloud Monitoring
A Word About Me …
(4y:4m:15d)
Friday, November 15, 13
• About 20 years in technology• Systems engineering, networking,
software development, QA, release management
• Time at Netflix: 1599 days • Before at Netflix: Service Delivery in
the IT/Ops, troubleshooter, Builder of Python Things[tm]
• Current role: Cloud Monitoring•We build platforms
A Word About Me …
(4y:4m:15d)
Friday, November 15, 13
• About 20 years in technology• Systems engineering, networking,
software development, QA, release management
• Time at Netflix: 1599 days • Before at Netflix: Service Delivery in
the IT/Ops, troubleshooter, Builder of Python Things[tm]
• Current role: Cloud Monitoring•We build platforms•Sometimes we make them easy to use
A Word About Me …
(4y:4m:15d)
Friday, November 15, 13
A Word About Netflix …
Friday, November 15, 13
A Word About Netflix …Just the Stats
Friday, November 15, 13
• 16 years
A Word About Netflix …Just the Stats
Friday, November 15, 13
• 16 years• 2000+ employees
A Word About Netflix …Just the Stats
Friday, November 15, 13
• 16 years• 2000+ employees• 40 million users
A Word About Netflix …Just the Stats
Friday, November 15, 13
• 16 years• 2000+ employees• 40 million users• 5x10^9 hours/quarter
A Word About Netflix …Just the Stats
Friday, November 15, 13
A Word About Netflix …
Friday, November 15, 13
A Word About Netflix …Freedom and Responsibility Culture
Friday, November 15, 13
• Optimize speed of innovationConstrain availabilityCost will be what cost will be
A Word About Netflix …Freedom and Responsibility Culture
Friday, November 15, 13
• Optimize speed of innovationConstrain availabilityCost will be what cost will be
• Hire smart (experienced) peopleGet out of their way
A Word About Netflix …Freedom and Responsibility Culture
Friday, November 15, 13
• Optimize speed of innovationConstrain availabilityCost will be what cost will be
• Hire smart (experienced) peopleGet out of their way
• Anti-process bias
A Word About Netflix …Freedom and Responsibility Culture
Friday, November 15, 13
A Word About Netflix …
Friday, November 15, 13
A Word About Netflix …Technology and Operations
Friday, November 15, 13
A Word About Netflix …Technology and Operations
•Service Oriented Architecture
Friday, November 15, 13
A Word About Netflix …Technology and Operations
•Service Oriented Architecture•Decentralized Operations. You
Friday, November 15, 13
A Word About Netflix …Technology and Operations
•Service Oriented Architecture•Decentralized Operations. You
•Build
Friday, November 15, 13
A Word About Netflix …Technology and Operations
•Service Oriented Architecture•Decentralized Operations. You
•Build•Test
Friday, November 15, 13
A Word About Netflix …Technology and Operations
•Service Oriented Architecture•Decentralized Operations. You
•Build•Test•Deploy
Friday, November 15, 13
A Word About Netflix …Technology and Operations
•Service Oriented Architecture•Decentralized Operations. You
•Build•Test•Deploy•Set up alerting and monitoring
Friday, November 15, 13
A Word About Netflix …Technology and Operations
•Service Oriented Architecture•Decentralized Operations. You
•Build•Test•Deploy•Set up alerting and monitoring•Wake up at 2AM
Friday, November 15, 13
A Word About Netflix …Technology and Operations
Friday, November 15, 13
A Word About Netflix …
• AWS-based for 100% of streaming*
Technology and Operations
Friday, November 15, 13
A Word About Netflix …
• AWS-based for 100% of streaming*• Huge expansion
Technology and Operations
Friday, November 15, 13
A Word About Netflix …
• AWS-based for 100% of streaming*• Huge expansion
• Customer Growth
Technology and Operations
Friday, November 15, 13
A Word About Netflix …
• AWS-based for 100% of streaming*• Huge expansion
• Customer Growth• New markets
Technology and Operations
Friday, November 15, 13
A Word About Netflix …
• AWS-based for 100% of streaming*• Huge expansion
• Customer Growth• New markets• Metrics
Technology and Operations
Friday, November 15, 13
In the Old Days …
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
In the Old Days …Our Old Alerting System
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
In the Old Days …Our Old Alerting System
• Enterprise IT Solution
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Copyright USAID Microlinks. CC Attribution 2.0 License
Friday, November 15, 13
In the Old Days …Our Old Alerting System
• Enterprise IT Solution• Managed by the Enterprise IT Alerting People
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Copyright USAID Microlinks. CC Attribution 2.0 License
Friday, November 15, 13
In the Old Days …Our Old Alerting System
• Enterprise IT Solution• Managed by the Enterprise IT Alerting People• File Tickets
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Copyright: http://www.flickr.com/photos/s_w_ellis
CC Attribution 2.0 License
Friday, November 15, 13
In the Old Days …Our Old Alerting System
• Enterprise IT Solution• Managed by the Enterprise IT Alerting People• File Tickets• Send alerts to NOC
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
In the Old Days …Our Old Alerting System
• Enterprise IT Solution• Managed by the Enterprise IT Alerting People• File Tickets• Send alerts to NOC• Completely separate from telemetry system
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
In the Old Days …
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
In the Old Days …
Friday, November 15, 13
In the Old Days …
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
In the Old Days …
Our Old Telemetry System
Friday, November 15, 13
In the Old Days …
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
In the Old Days …
Our Old Telemetry System
• Spare-time effort by a lone sysadmin
Friday, November 15, 13
In the Old Days …
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
In the Old Days …
Our Old Telemetry System
• Spare-time effort by a lone sysadmin• Loved by developers
Friday, November 15, 13
In the Old Days …
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
In the Old Days …
Our Old Telemetry System
• Spare-time effort by a lone sysadmin• Loved by developers• Custom TCP protocol
Friday, November 15, 13
In the Old Days …
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
In the Old Days …
Our Old Telemetry System
• Spare-time effort by a lone sysadmin• Loved by developers• Custom TCP protocol• RRD file back-end storage
Friday, November 15, 13
In the Old Days …
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
In the Old Days …
Our Old Telemetry System
• Spare-time effort by a lone sysadmin• Loved by developers• Custom TCP protocol• RRD file back-end storage• Mostly Perl
Copyright: http://www.flickr.com/photos/acme
CC Attribution 2.0 License
Friday, November 15, 13
In the Old Days …
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
In the Old Days …
Our Old Telemetry System
• Spare-time effort by a lone sysadmin• Loved by developers• Custom TCP protocol• RRD file back-end storage• Mostly Perl• Datacenter-bound (and limited)
Friday, November 15, 13
In the Old Days …
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
In the Old Days …
Our Old Telemetry System
• Spare-time effort by a lone sysadmin• Loved by developers• Custom TCP protocol• RRD file back-end storage• Mostly Perl• Datacenter-bound (and limited)• Starting to falter under metrics growth
Friday, November 15, 13
Speaking of Growth
Friday, November 15, 13
Speaking of Growth
Friday, November 15, 13
Speaking of Growth
By way of comparison
Friday, November 15, 13
Speaking of Growth
By way of comparison • Every person in the world• twice
Friday, November 15, 13
Speaking of Growth
By way of comparison • Every person in the world• twice•Every smartphone in the
world• ten times
Friday, November 15, 13
Copyright: http://www.flickr.com/photos/76651030@N02/
CC Attribution 2.0 License
So We Built Something Better
Friday, November 15, 13
So We Built Something Better
UI
Atlas Epic CloudWatch
UI Layer Fronts Multiple Systems
Copyright: http://www.flickr.com/photos/76651030@N02/
CC Attribution 2.0 License
Friday, November 15, 13
So We Built Something Better UA E CClear Regional Separation
• And aggregation
global
us-east-1 us-west-1 us-west-2 eu-west-1
Copyright: http://www.flickr.com/photos/76651030@N02/
CC Attribution 2.0 License
Friday, November 15, 13
So We Built Something Better UA E C
glus us us e
Localized Node/Metric Identification
Before:
I think You’re Bob
Here’s a metric!
OK!
I’m Bob. Here’s
a metric!
Now:
Copyright: http://www.flickr.com/photos/76651030@N02/
CC Attribution 2.0 License
Friday, November 15, 13
So We Built Something Better UA E C
glus us us e
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!• This is better:
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!• This is better:
nf.ami ami-aa5166ef
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!• This is better:
nf.app wpnf.ami ami-aa5166ef
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!• This is better:
nf.app wpnf.cluster wp-batch
nf.ami ami-aa5166ef
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!• This is better:
nf.asg wp-batch-v163
nf.app wpnf.cluster wp-batch
nf.ami ami-aa5166ef
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!• This is better:
nf.asg wp-batch-v163
nf.app wpnf.cluster wp-batch
nf.ami ami-aa5166ef
nf.country us
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!• This is better:
nf.asg wp-batch-v163
nf.app wpnf.cluster wp-batch
nf.ami ami-aa5166ef
nf.country us
nf.node i-097c0e52
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!• This is better:
nf.asg wp-batch-v163
nf.app wpnf.cluster wp-batch
nf.ami ami-aa5166ef
nf.country us
nf.region us-west-1nf.node i-097c0e52
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!• This is better:
nf.asg wp-batch-v163
nf.app wpnf.cluster wp-batch
nf.ami ami-aa5166ef
nf.country us
nf.region us-west-1nf.zone us-west-1b
nf.node i-097c0e52
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!• This is better:
nf.asg wp-batch-v163
nf.app wpnf.cluster wp-batch
nf.ami ami-aa5166ef
nf.country usclass nccp
nf.region us-west-1nf.zone us-west-1b
nf.node i-097c0e52
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!• This is better:
nf.asg wp-batch-v163
nf.app wpnf.cluster wp-batch
nf.ami ami-aa5166ef
nf.country usclass nccp
nf.region us-west-1nf.zone us-west-1b
nf.node i-097c0e52
type request
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!• This is better:
nf.asg wp-batch-v163
nf.app wpnf.cluster wp-batch
nf.ami ami-aa5166ef
nf.country usclass nccp
nf.region us-west-1nf.zone us-west-1b
nf.node i-097c0e52
type request
uiversion UI_169_mid
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!• This is better:
nf.asg wp-batch-v163
nf.app wpnf.cluster wp-batch
nf.ami ami-aa5166ef
nf.country usclass nccp
nf.region us-west-1nf.zone us-west-1b
nf.node i-097c0e52
type request
action authorizationuiversion UI_169_mid
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!• This is better:
nf.asg wp-batch-v163
nf.app wpnf.cluster wp-batch
nf.ami ami-aa5166ef
nf.country usclass nccp
nf.region us-west-1nf.zone us-west-1b
nf.node i-097c0e52
type request
action authorizationdevtype 101
uiversion UI_169_mid
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!• This is better:
nf.asg wp-batch-v163
nf.app wpnf.cluster wp-batch
nf.ami ami-aa5166ef
nf.country usclass nccp
nf.region us-west-1nf.zone us-west-1b
nf.node i-097c0e52
type requestclver PHL_0AB
action authorizationdevtype 101
uiversion UI_169_mid
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eWhat’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!• This is better:
nf.asg wp-batch-v163
nf.app wpnf.cluster wp-batch
nf.ami ami-aa5166ef
nf.country usclass nccp
nf.region us-west-1nf.zone us-west-1b
nf.node i-097c0e52
type requestclver PHL_0AB
action authorizationdevtype 101
uiversion UI_169_mid
geo us
Friday, November 15, 13
So We Built Something Better UA E C
glus us us e
Copyright: Kurt Moerman
CC Attribution 2.0 License
Friday, November 15, 13
So We Built Something Better UA E C
glus us us e
Powerful queries
Copyright: Kurt Moerman
CC Attribution 2.0 License
Friday, November 15, 13
So We Built Something Better UA E C
glus us us e
Powerful queries• Make the complex possible
Copyright: Kurt Moerman
CC Attribution 2.0 License
Friday, November 15, 13
So We Built Something Better UA E C
glus us us e
Powerful queries• Make the complex possible• Make the simple … sort of hard
Copyright: Kurt Moerman
CC Attribution 2.0 License
Friday, November 15, 13
So We Built Something Better UA E C
glus us us e
Powerful queries• Make the complex possible• Make the simple … sort of hard
Friday, November 15, 13
So We Built Something Better UA E C
glus us us e
Powerful queries• Make the complex possible• Make the simple … sort of hard
http://atlas/api/v1/graph?q=nf.region,us-west-1,:eq,nf.app,employeeinfo,:eq,:and,name,employeeinfo_api,:eq,:and,:sum&e=now-5m&s=e-3h
Friday, November 15, 13
So We Built Something Better UA E C
glus us us e
Powerful queries• Make the complex possible• Make the simple … sort of hard
Friday, November 15, 13
So We Built Something Better UA E C
glus us us e
Powerful queries• Make the complex possible• Make the simple … sort of hard
http://atlas/api/v1/graph?q=nf.region,us-west-1,:eq,nf.app,employeeinfo,:eq,:and,name,employeeinfo_api,:eq,:and,:sum,(,nf.zone,),:by&e=now-5m&s=e-3h
Friday, November 15, 13
So We Built Something Better UA E C
glus us us e
Powerful queries• Make the complex possible• Make the simple … sort of hard
Friday, November 15, 13
So We Built Something Better UA E C
glus us us e
Powerful queries• Make the complex possible• Make the simple … sort of hard
http://atlas/api/v1/graph?q=sps,nf.cluster,(,nccp-legacy,nccp-modern,),:in,nccprt,(,NCCPLicense,com_netflix_streaming_nccp_request_license,),:in,:and,stat,SuccessfulRequests,:eq,:and,device.rollup,3ds,:eq,:and,:sum,:set,entering_trough,sps,:get,1h,:offset,0.95,:mul,sps,:get,:gt,:set,smoothed,sps,:get,10,0.1,0.02,:des,:set,low_volume,smoothed,:get,-0.005,:mul,0.1,:add,:set,mid_volume,smoothed,:get,-0.00125,:mul,0.1,:add,:set,base,0.06,:set,min_pct,1,smoothed,:get,20,:lt,low_volume,:get,:mul,smoothed,:get,80,:lt,mid_volume,:get,:mul,:add,entering_trough,:get,0.05,:mul,:add,base,:get,:add,:sub,10,0.1,0.02,:des,:set,sps,:get,$(device.rollup)SPS,:legend,min_pct,:get,smoothed,:get,:mul,lowerbound,:legend,sps,:get,min_pct,:get,smoothed,:get,:mul,:lt,5,:rolling-count,2,:ge,:vspan,60,:alpha,$(device.rollup),:legend
Friday, November 15, 13
So We Built Something Better UA E C
glus us us e
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eRidiculous Read Volume:
• Engage
• Graphs and Dashboards
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eRidiculous Read Volume:
• Engage
• Graphs and Dashboards• Alerting
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eRidiculous Read Volume:
• Engage
• Graphs and Dashboards• Alerting• Automated Canaries
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eRidiculous Read Volume:
• Engage
• Graphs and Dashboards• Alerting• Automated Canaries• Capacity Analytics
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eRidiculous Read Volume:
• Engage
• Graphs and Dashboards• Alerting• Automated Canaries• Capacity Analytics• Special Projects
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eRidiculous Read Volume:
• Engage
• Graphs and Dashboards• Alerting• Automated Canaries• Capacity Analytics• Special Projects• BI
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eRidiculous Read Volume:
• Engage
• Graphs and Dashboards• Alerting• Automated Canaries• Capacity Analytics• Special Projects• BI
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eRidiculous Read Volume:
• Engage
• Graphs and Dashboards• Alerting• Automated Canaries• Capacity Analytics• Special Projects• BI
Friday, November 15, 13
So We Built Something Better UA E C
glus us us eRidiculous Read Volume:
• Engage
• Graphs and Dashboards• Alerting• Automated Canaries• Capacity Analytics• Special Projects• BI
Friday, November 15, 13
So We Built Something Better UA E C
glus us us e
backendinstancebackend
instancebackendinstancebackend
instancebackendinstancebackend
instancebackendinstance
regionalendpoint
global endpoint
Friday, November 15, 13
So We Built Something Better UA E C
glus us us e
clientinstance
backendinstancebackend
instancebackendinstancebackend
instancebackendinstancebackend
instancebackendinstance
regionalendpoint
global endpoint
Friday, November 15, 13
So We Built Something Better UA E C
glus us us e
clientinstance
publishcluster
backendinstancebackend
instancebackendinstancebackend
instancebackendinstancebackend
instancebackendinstance
regionalendpoint
global endpoint
Friday, November 15, 13
Amazon S3
So We Built Something Better UA E C
glus us us e
clientinstance
publishcluster
backendinstancebackend
instancebackendinstancebackend
instancebackendinstancebackend
instancebackendinstance
regionalendpoint
global endpoint
Friday, November 15, 13
Amazon S3
So We Built Something Better UA E C
glus us us e
clientinstance
publishcluster
backendinstancebackend
instancebackendinstancebackend
instancebackendinstancebackend
instancebackendinstance
pollercluster
regionalendpoint
global endpoint
Friday, November 15, 13
Amazon S3
So We Built Something Better UA E C
glus us us e
clientinstance
publishclusterm
backendinstancebackend
instancebackendinstancebackend
instancebackendinstancebackend
instancebackendinstance
pollercluster
regionalendpoint
global endpoint
Friday, November 15, 13
Amazon S3
So We Built Something Better UA E C
glus us us e
clientinstance
publishclusterm
backendinstancebackend
instancebackendinstancebackend
instancebackendinstancebackend
instancebackendinstance
pollercluster
m
regionalendpoint
global endpoint
Friday, November 15, 13
Amazon S3
So We Built Something Better UA E C
glus us us e
clientinstance
publishclusterm
backendinstancebackend
instancebackendinstancebackend
instancebackendinstancebackend
instancebackendinstance
pollercluster
m
m
regionalendpoint
global endpoint
Friday, November 15, 13
Amazon S3
So We Built Something Better UA E C
glus us us e
clientinstance
publishclusterm
backendinstancebackend
instancebackendinstancebackend
instancebackendinstancebackend
instancebackendinstance
pollercluster
m
m
regionalendpoint
global endpoint
Friday, November 15, 13
That Sounds Great!
Friday, November 15, 13
That Sounds Great!Surely there are no problems
Copyright: http://www.flickr.com/photos/lainetrees/
CC Attribution 2.0 License
Friday, November 15, 13
That Sounds Great!Surely there are no problems
•Speed is hard
Friday, November 15, 13
That Sounds Great!Surely there are no problems
•Speed is hard•Speed at volume is harder
Friday, November 15, 13
That Sounds Great!Surely there are no problems
•Speed is hard•Speed at volume is harder•We looked at spinning disks
Friday, November 15, 13
That Sounds Great!Surely there are no problems
•Speed is hard•Speed at volume is harder•We looked at spinning disks•Memory’s the way to go
Friday, November 15, 13
That Sounds Great!Surely there are no problems
•Speed is hard•Speed at volume is harder•We looked at spinning disks•Memory’s the way to go•m2.4xlarge
Friday, November 15, 13
That Sounds Great!Surely there are no problems
•Speed is hard•Speed at volume is harder•We looked at spinning disks•Memory’s the way to go•m2.4xlarge•This is operational data
Friday, November 15, 13
That Sounds Great!Surely there are no problems
•Speed is hard•Speed at volume is harder•We looked at spinning disks•Memory’s the way to go•m2.4xlarge•This is operational data
•People want it available, fast
Friday, November 15, 13
That Sounds Great!Surely there are no problems
•Speed is hard•Speed at volume is harder•We looked at spinning disks•Memory’s the way to go•m2.4xlarge•This is operational data
•People want it available, fast•Operations have short memories
Friday, November 15, 13
That Sounds Great!Surely there are no problems
•Speed is hard•Speed at volume is harder•We looked at spinning disks•Memory’s the way to go•m2.4xlarge•This is operational data
•People want it available, fast•Operations have short memories
20,160 m2.4xlarge$32,094,720 upfront$8,005,939/month
per regionwith no redundancy
Friday, November 15, 13
That Sounds Great!Surely there are no problems
•Speed is hard•Speed at volume is harder•We looked at spinning disks•Memory’s the way to go•m2.4xlarge•This is operational data
•People want it available, fast•Operations have short memories Copyright: http://www.flickr.com/photos/amenk/
CC Attribution 2.0 License
Friday, November 15, 13
That Doesn’t Sound Great!
Friday, November 15, 13
That Doesn’t Sound Great!•If only we could reduce it …
Friday, November 15, 13
That Doesn’t Sound Great!•If only we could reduce it …•“Reduce”? Get it? Get it?
Friday, November 15, 13
That Doesn’t Sound Great!•If only we could reduce it …•“Reduce”? Get it? Get it?•Our granularity is two-dimensional
Step size (time)
Dim
ensi
onal
ity (t
ags)
Friday, November 15, 13
That Doesn’t Sound Great!•If only we could reduce it …•“Reduce”? Get it? Get it?•Our granularity is two-dimensional•We can reduce on either dimension
Step size (time)
Dim
ensi
onal
ity (t
ags)
Friday, November 15, 13
That Doesn’t Sound Great!•If only we could reduce it …•“Reduce”? Get it? Get it?•Our granularity is two-dimensional•We can reduce on either dimension•Some tags make sense for very rapid reduction
Step size (time)
Dim
ensi
onal
ity (t
ags)
Friday, November 15, 13
That Doesn’t Sound Great!•If only we could reduce it …•“Reduce”? Get it? Get it?•Our granularity is two-dimensional•We can reduce on either dimension•Some tags make sense for very rapid reduction
•Hystrix
Step size (time)
Dim
ensi
onal
ity (t
ags)
Friday, November 15, 13
That Doesn’t Sound Great!•If only we could reduce it …•“Reduce”? Get it? Get it?•Our granularity is two-dimensional•We can reduce on either dimension•Some tags make sense for very rapid reduction
•Hystrix•nf.node
Step size (time)
Dim
ensi
onal
ity (t
ags)
Friday, November 15, 13
That Doesn’t Sound Great!•If only we could reduce it …•“Reduce”? Get it? Get it?•Our granularity is two-dimensional•We can reduce on either dimension•Some tags make sense for very rapid reduction
•Hystrix•nf.node
•Sometimes a lot (vhs) Step size (time)
Dim
ensi
onal
ity (t
ags)
Friday, November 15, 13
That Doesn’t Sound Great!•If only we could reduce it …•“Reduce”? Get it? Get it?•Our granularity is two-dimensional•We can reduce on either dimension•Some tags make sense for very rapid reduction
•Hystrix•nf.node
•Sometimes a lot (vhs)•Sometimes a little (Cassandra)
Step size (time)
Dim
ensi
onal
ity (t
ags)
Friday, November 15, 13
A Reductive Approach
Friday, November 15, 13
A Reductive Approach•For a series of values, reduce and keep:
Friday, November 15, 13
A Reductive Approach•For a series of values, reduce and keep:
•minimum
Friday, November 15, 13
A Reductive Approach•For a series of values, reduce and keep:
•minimum•maximum
Friday, November 15, 13
A Reductive Approach•For a series of values, reduce and keep:
•minimum•maximum•total
Friday, November 15, 13
A Reductive Approach•For a series of values, reduce and keep:
•minimum•maximum•total•count
Friday, November 15, 13
A Reductive Approach•For a series of values, reduce and keep:
•minimum•maximum•total•count
•Example:
Friday, November 15, 13
A Reductive Approach•For a series of values, reduce and keep:
•minimum•maximum•total•count
•Example:•3,5,9,14,20: min 3, max 20, tot 51, count 5
Friday, November 15, 13
A Reductive Approach•For a series of values, reduce and keep:
•minimum•maximum•total•count
•Example:•3,5,9,14,20: min 3, max 20, tot 51, count 5
•Allows for sense of scale
Friday, November 15, 13
A Reductive Approach•For a series of values, reduce and keep:
•minimum•maximum•total•count
•Example:•3,5,9,14,20: min 3, max 20, tot 51, count 5
•Allows for sense of scale•Allows for arbitrary further reduction w/o loss of precision
Friday, November 15, 13
Reduction: Policy
Copyright: http://www.flickr.com/photos/bagaball/
CC Attribution 2.0 License
Friday, November 15, 13
Reduction: Policy
•Policy-driven EMR engine
Copyright: http://www.flickr.com/photos/bagaball/
CC Attribution 2.0 License
Friday, November 15, 13
Reduction: Policy
•Policy-driven EMR engine•Four possible actions
Copyright: http://www.flickr.com/photos/bagaball/
CC Attribution 2.0 License
Friday, November 15, 13
Reduction: Policy
•Policy-driven EMR engine•Four possible actions
•preserve
Copyright: http://www.flickr.com/photos/bagaball/
CC Attribution 2.0 License
Friday, November 15, 13
Reduction: Policy
•Policy-driven EMR engine•Four possible actions
•preserve•drop
Copyright: http://www.flickr.com/photos/bagaball/
CC Attribution 2.0 License
Friday, November 15, 13
Reduction: Policy
•Policy-driven EMR engine•Four possible actions
•preserve•drop•consolidate
Copyright: http://www.flickr.com/photos/bagaball/
CC Attribution 2.0 License
Friday, November 15, 13
Reduction: Policy
•Policy-driven EMR engine•Four possible actions
•preserve•drop•consolidate•rollup
Copyright: http://www.flickr.com/photos/bagaball/
CC Attribution 2.0 License
Friday, November 15, 13
Reduction: Policy
{ "rules" : [ { "operations" : [{"op" : "drop"}], "query" : "nf.app,api,:eq,class,(,LastMinuteFailRatio,SLA,NetflixSimpleDBService,),:in,:and" }, { "operations" : [{ “config" : { "keys" : [ "nf.node", "device", "nf.country" ] }, "op" : “rollup" }], "query" : ":true" } ]}
Friday, November 15, 13
Amazon EMR
clientinstance
publishcluster
Amazon S3
pollercluster
regionalendpoint
global endpoint
6Hcluster
EMRDriver
4Dcluster
18Dcluster
Historicalcluster
metrics
query
responsemetrics
metrics metrics
1
2 3
45 5
5
Friday, November 15, 13
Amazon EMR
clientinstance
publishcluster
Amazon S3
pollercluster
regionalendpoint
global endpoint
6Hcluster
EMRDriver
4Dcluster
18Dcluster
Historicalcluster
metrics
query
responsemetrics
metrics metrics
1
2 3
45 5
5
Friday, November 15, 13
Amazon EMR
clientinstance
publishcluster
Amazon S3
pollercluster
regionalendpoint
global endpoint
6Hcluster
EMRDriver
4Dcluster
18Dcluster
Historicalcluster
as-neededcluster
as-neededcluster
as-neededcluster
metrics
query
responsemetrics
metrics metrics
1
2 3
45 5
5
Friday, November 15, 13
Reduction: Benefits
Copyright: http://www.flickr.com/photos/dr_pete/
CC Attribution 2.0 License
Friday, November 15, 13
Reduction: Benefits
•Indefinite storage in Amazon S3
Copyright: http://www.flickr.com/photos/dr_pete/
CC Attribution 2.0 License
Friday, November 15, 13
Reduction: Benefits
•Indefinite storage in Amazon S3•Fear of commitment achievement:
Copyright: http://www.flickr.com/photos/dr_pete/
CC Attribution 2.0 License
Unlocked
Friday, November 15, 13
Reduction: Benefits
•Indefinite storage in Amazon S3•Fear of commitment achievement:•Can be aggressive about hiding metrics
Copyright: http://www.flickr.com/photos/dr_pete/
CC Attribution 2.0 License
Unlocked
Friday, November 15, 13
Reduction: Benefits
•Indefinite storage in Amazon S3•Fear of commitment achievement:•Can be aggressive about hiding metrics•High granularity for special days
Copyright: http://www.flickr.com/photos/dr_pete/
CC Attribution 2.0 License
Unlocked
Friday, November 15, 13
Reduction: Benefits
•Indefinite storage in Amazon S3•Fear of commitment achievement:•Can be aggressive about hiding metrics•High granularity for special days•Automated for regular operations*
Copyright: http://www.flickr.com/photos/dr_pete/
CC Attribution 2.0 License
Unlocked
Friday, November 15, 13
Reduction: Benefits
•Indefinite storage in Amazon S3•Fear of commitment achievement:•Can be aggressive about hiding metrics•High granularity for special days•Automated for regular operations*•Not in critical path for visibility SLA
Copyright: http://www.flickr.com/photos/dr_pete/
CC Attribution 2.0 License
Unlocked
Friday, November 15, 13
Reduction: Benefits
•Indefinite storage in Amazon S3•Fear of commitment achievement:•Can be aggressive about hiding metrics•High granularity for special days•Automated for regular operations*•Not in critical path for visibility SLA•Firewalls accidental metric explosions
Copyright: http://www.flickr.com/photos/dr_pete/
CC Attribution 2.0 License
Unlocked
Friday, November 15, 13
Reduction: Benefits
•Indefinite storage in Amazon S3•Fear of commitment achievement:•Can be aggressive about hiding metrics•High granularity for special days•Automated for regular operations*•Not in critical path for visibility SLA•Firewalls accidental metric explosions•Huge efficiency gains
Copyright: http://www.flickr.com/photos/dr_pete/
CC Attribution 2.0 License
Unlocked
Friday, November 15, 13
Reduction: Efficiency
Copyright: http://www.flickr.com/photos/sebrenner/
CC Attribution 2.0 License
Friday, November 15, 13
Reduction: Efficiency
Friday, November 15, 13
Reduction: Efficiency
6H 4D 18D HISTORY
TimeHorizon
Size
Instances Per Hour
% Reduction
6 Hours 4 Days 18 Days 3 Months
600 512 180 12
100 5 0 0
0 95 100 100
Friday, November 15, 13
Reduction: Efficiency
6H 4D 18D HISTORY
TimeHorizon
Size
Instances Per Hour
% Reduction
6 Hours 4 Days 18 Days 3 Months
600 512 180 12
100 5 0 0
0 95 100 100
Friday, November 15, 13
Reduction: Efficiency
6H 4D 18D HISTORY
TimeHorizon
Size
Instances Per Hour
% Reduction
6 Hours 4 Days 18 Days 3 Months
600 512 180 12
100 5 0 0
0 95 100 100
Friday, November 15, 13
Reduction: Efficiency
6H 4D 18D HISTORY
TimeHorizon
Size
Instances Per Hour
% Reduction
6 Hours 4 Days 18 Days 3 Months
600 512 180 12
100 5 0 0
0 95 100 100
Friday, November 15, 13
Reduction: Efficiency
6H 4D 18D HISTORY
TimeHorizon
Size
Instances Per Hour
% Reduction
6 Hours 4 Days 18 Days 3 Months
600 512 180 12
100 5 0 0
0 95 100 100
Friday, November 15, 13
Previews
Friday, November 15, 13
Previews
Copyright: http://www.flickr.com/photos/creativealan/
CC Attribution 2.0 License
Friday, November 15, 13
Previews
•Self-service for special requests
Copyright: http://www.flickr.com/photos/creativealan/
CC Attribution 2.0 License
Friday, November 15, 13
Previews
•Self-service for special requests•Different instance types
Copyright: http://www.flickr.com/photos/creativealan/
CC Attribution 2.0 License
Friday, November 15, 13
Previews
•Self-service for special requests•Different instance types
•cr1.8xlarge
Copyright: http://www.flickr.com/photos/creativealan/
CC Attribution 2.0 License
Friday, November 15, 13
Previews
•Self-service for special requests•Different instance types
•cr1.8xlarge•hi1.4xlarge
Copyright: http://www.flickr.com/photos/creativealan/
CC Attribution 2.0 License
Friday, November 15, 13
Previews
•Self-service for special requests•Different instance types
•cr1.8xlarge•hi1.4xlarge
•Multi-tiered metric visibility
Copyright: http://www.flickr.com/photos/creativealan/
CC Attribution 2.0 License
Friday, November 15, 13
Growth Redux
Friday, November 15, 13
5/11 8/11 9/11 1/12 4/12 8/1211/121/13 5/1310/131/14
(M)
met
rics
2 2.5 10
Growth Redux
Friday, November 15, 13
5/11 8/11 9/11 1/12 4/12 8/1211/121/13 5/1310/131/14
(M)
met
rics
2 2.5 10 15
Growth Redux
Friday, November 15, 13
5/11 8/11 9/11 1/12 4/12 8/1211/121/13 5/1310/131/14
(M)
met
rics
2 2.5 10 15 18 30 55 90212
728
Growth Redux
Friday, November 15, 13
5/11 8/11 9/11 1/12 4/12 8/1211/121/13 5/1310/131/14
(M)
met
rics
2 2.5 10 15 18 30 55 90212
728
1,200
Growth Redux
Friday, November 15, 13
Growth Redux
Friday, November 15, 13
And a Last Word About Costs
Friday, November 15, 13
And a Last Word About Costs
Friday, November 15, 13
And a Last Word About Costs
•Priorities Reminder
Friday, November 15, 13
And a Last Word About Costs
•Priorities Reminder•Speed of Innovation
Friday, November 15, 13
And a Last Word About Costs
•Priorities Reminder•Speed of Innovation•Availability
Friday, November 15, 13
And a Last Word About Costs
•Priorities Reminder•Speed of Innovation•Availability•Cost
Friday, November 15, 13
And a Last Word About Costs
•Priorities Reminder•Speed of Innovation•Availability•Cost
•Never intended to lower costs
Friday, November 15, 13
And a Last Word About Costs
•Priorities Reminder•Speed of Innovation•Availability•Cost
•Never intended to lower costs•Cloud migration
Friday, November 15, 13
And a Last Word About Costs
•Priorities Reminder•Speed of Innovation•Availability•Cost
•Never intended to lower costs•Cloud migration•Additional features
Friday, November 15, 13
And a Last Word About Costs
•Priorities Reminder•Speed of Innovation•Availability•Cost
•Never intended to lower costs•Cloud migration•Additional features•Massive Performance
Friday, November 15, 13
And a Last Word About Costs
•Priorities Reminder•Speed of Innovation•Availability•Cost
•Never intended to lower costs•Cloud migration•Additional features•Massive Performance
Friday, November 15, 13
EMR
FTWFriday, November 15, 13
Friday, November 15, 13
Please give us your feedback on this presentation
As a thank you, we will select prize winners daily for completed surveys!
BDT302 Thank You
Friday, November 15, 13