Gluecon 2013 netflix api crash course
-
Upload
benjamin-schmaus -
Category
Technology
-
view
2.971 -
download
1
description
Transcript of Gluecon 2013 netflix api crash course
Netflix API Crash CourseBuilding & Running the API in 30 minutes
Ben Schmaus, NetflixMay 2013, Gluecon
[email protected]@schmaus
Streaming TV Shows & Movies Globally
> 1000 Devices
1/3 ofInternet at peak
Programmer not Distributor
More than 36 million subscribers in over
40 countries
How does the API fit into the picture?
PersonalizationEngine User Info Movie
Metadata Ratings SimilarMovies
InstantQueue
A/B TestEngine
API
PersonalizationEngine User Info Movie
Metadata Ratings SimilarMovies
InstantQueue
A/B TestEngine
APIEnable UX Innovation
Insulate from Failure
> 2 Billion Requests per Day
Growth Over Time
Automation
Visibility
Operational awareness
Balance speed& quality
How's the APIput together?
ELB RoutingCluster
Mid-tier Services
Backend App
Cluster
Backend App
Cluster
+
API Layer
ELB RoutingCluster
Mid-tier Services
Backend App
Cluster
Backend App
Cluster
+
API Layer
Inside an API
App Server
RxJava
Hystrix
Service Client 1 Service Client 2 Service Client N
HystrixRx+Java Service Layer
Service Client(provided JAR)
ApplicationService
/device/endpoint(provided script)
Service
UI Teams
Mid-tierService Teams
API Team
Continually changing UI scripts and mid-tier services
Functionality, resiliency and performance drifts over time
Deployment & Ops
REMOVE MANUAL WORK pushing code to multiple AWS regions/clusters
ENABLE RAPID DEPLOYMENT of code despite limited visibility into how it's
changed
KEEP TEAM INFORMED about what's happening in prod
MITIGATE RISK of systemic failure
Tools
End-to-end Traceability Using Python/Java Glue
Code Flow
Run 1% of your traffic on the new code and see how it does
API ami-123 API ami-456
2xx4xx5xx
latencybusy threads
load...
Manually looking at graphs and SSH-ing into servers and grep-ing logs
doesn't scale(although we used to do that)
Confidence score for each AMI based on comparison of 1000+ metrics
Scannable visualization of metric space
More important
Less important
Cross-reference Jira, Link to code diffs
Track lib changes
Easy to access report artifacts for each AMI
Your basic red/black push
Doing red/black by hand for multiple clusters across multiple regions is
not fun
Automate multi-cluster/region pushes
Automate multi-cluster/region pushes
Don't forget to automate
rollbacks, too!
$Who, $What, $Where, $When
e.g., "bschmaus, ami-123, Sandbox Canary, 2013-05-06 19:05"
Latest prod change in chat topic
Quickly see status of all clusters in a region
What the #%*! just happened!?
Historical & realtime metrics, sort realtime by error/request rate
Distributed grep + tail
2013-05-09.20:38:54 MX 200 us-east-1c i-1824cb73 i-1c61b77f prod NFPS3-001-8G50FJCX... 288404769389848058 90ms api-global.netflix.com GET /tvui/release/470/plus/pathEvaluator -amazon.ami-id: ami-502eb039amazon.availability-zone: us-east-1camazon.instance-id: i-1824cb73amazon.instance-type: m2.2xlargeamazon.local-ipv4: 10.6.213.112amazon.public-hostname: ec2-54-243-4-69.compute-1.amazonaws.comamazon.public-ipv4: 54.243.4.69cookie_esn: NFPS3-001-8G50FJCX...country: MXcurrentTime: 1368131934468duration-millis: 90esn: NFPS3-001-8G50FJCX...geo.city: CIUDADOBREGON...
$ ./simple_stream.py -f -q 'e["country"]=="MX" && e["esn"]==~/NFPS3.*/' -r us
Go for haystack handing you the needle
Or at least be able to make smaller haystacks
Continuously experiment to make hard things easier
Even with the best tools, building software is hard work.
Great engineers build great software.
Want to help us build the API?
[email protected]@schmaus