Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’...
Transcript of Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’...
![Page 1: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/1.jpg)
Ne#lix in the Cloud
Nov 3, 2010 Adrian Cockcro: @adrianco #ne#lixcloud acockcro:@ne#lix.com
h?p://www.linkedin.com/in/adriancockcro:
![Page 2: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/2.jpg)
With more than 16 million subscribers in the United States and Canada, Ne9lix, Inc. is the world’s leading Internet subscripAon service
for enjoying movies and TV shows.
![Page 3: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/3.jpg)
Why Give This Talk?
![Page 4: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/4.jpg)
Ne#lix is Path-‐finding
The Cloud ecosystem is evolving very fast
Share with and learn from the cloud community
![Page 5: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/5.jpg)
We want to use clouds, not build them
Cloud technology should be a commodity
Public cloud and open source for agility and scale
![Page 6: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/6.jpg)
We are looking for talent
Ne#lix wants to connect with the very best engineers
![Page 7: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/7.jpg)
Why Use AWS?
![Page 8: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/8.jpg)
We stopped building our own datacenters
Capacity growth rate is acceleraRng, unpredictable Product launch spikes -‐ iPhone, Wii, PS3, XBox
Datacenter is large inflexible capital commitment
![Page 9: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/9.jpg)
Customers Q3 year/year +52% Total and +145% Streaming
0 2 4 6 8
10 12 14 16 18
2009Q2 2009Q3 2009Q4 2010Q1 2010Q2 2010Q3
Source: h?p://ir.ne#lix.com
![Page 10: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/10.jpg)
Leverage AWS Scale “the biggest public cloud”
AWS investment in tooling and automaRon
AWS zones for high availability, scalability AWS skills are common on resumes…
![Page 11: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/11.jpg)
Leverage AWS Feature Set “two years ahead of the others”
EC2, S3, SDB, SQS, EBS, EMR, ELB, ASG, IAM, RDB
![Page 12: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/12.jpg)
“The cloud lets its users focus on delivering differenAaAng business value instead of wasAng valuable resources on the undifferen)ated heavy li0ing that makes up most of IT infrastructure.”
Werner Vogels
Amazon CTO
![Page 13: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/13.jpg)
Ne#lix Deployed on AWS
Content
Video Masters
EC2
S3
CDN
Logs
S3
EMR Hadoop
Hive
Business
Intelligence
Play
DRM
CDN rouRng
Bookmarks
Logging
WWW
Search
Movie Choosing
RaRngs
Similars
API
Metadata
Device Config
TV Movie Choosing
Mobile iPhone
![Page 14: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/14.jpg)
Movie Encoding farm (2009)
• Tens of thousands of videos • Thousands of EC2 instances • Encoding apps on MS Windows • ~100 speed/format permutaRons • Petabytes of S3 • Content Delivery Networks
“Ne9lix is one of the largest customers of the biggest CDNs Akamai and Limelight”
Content
Video Masters
EC2
S3
CDN
![Page 15: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/15.jpg)
Hadoop -‐ ElasRc Map-‐Reduce (2009)
• Web Access Logs • Streaming Service Logs • Terabyte per day scale • Easy Hadoop via Amazon EMR • Hive SQL “Data Mart” • Gateway to Datacenter BI Slideshare.net talks evamtse “Ne#lix: Hive User Group” h?p://slidesha.re/aqJLAC adrianco “Crunch Your Data In The Cloud” h?p://slidesha.re/dx4oCK
Logs
S3
EMR Hadoop
Hive
Business
Intelligence
![Page 16: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/16.jpg)
Streaming Service Back-‐end (early 2010)
• PC/Mac Silverlight Player Support • Highly available “play bu?on” • DRM Key Management
• Generate route to stream on CDN
• Lookup bookmark for user/movie
• Update bookmark for user/movie
• Log quality of service
Play
DRM
CDN rouRng
Bookmarks
Logging
![Page 17: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/17.jpg)
Web site, a page at a Rme (through 2010)
• Clean presentaRon layer rewrite • Search auto-‐complete • Search backend and landing page • Movie and genre choosing • Star raRngs and recommendaRons • Similar movies • Page by page to 80% of views
(leave account signup in DC)
WWW
Search
Movie Choosing
RaRngs
Similars
![Page 18: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/18.jpg)
API for TV devices and iPhone etc. (2010)
• REST API: developer.ne#lix.com
• Interfaces to everything else • TV Device ConfiguraRon • Personalized movie choosing
• iPhone Launch in the cloud only
“Ne9lix is an API for streaming to TVs (we also do DVD’s and a web site)”
API
Metadata
Device Config
TV Movie Choosing
Mobile iPhone
![Page 19: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/19.jpg)
Ne#lix EC2 Instances per Account
Encoding
Test and ProducRon Log Analysis
![Page 20: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/20.jpg)
Learnings…
![Page 21: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/21.jpg)
Datacenter oriented tools don’t work
Ephemeral instances High rate of change
![Page 22: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/22.jpg)
Cloud Tools Don’t Scale for Enterprise
Too many are “Startup” oriented Built our own tools
Drove vendors hard
![Page 23: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/23.jpg)
“fork-‐li:ed” apps don’t work well
Fragile Too many datacenter oriented
assumpRons
![Page 24: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/24.jpg)
Faster to re-‐code from scratch
• Re-‐architected and re-‐wrote most of the code • Fine grain web services • Leveraged many open source Java projects
• SystemaRcally instrumented
• “NoSQL” SimpleDB backend
![Page 25: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/25.jpg)
“In the datacenter, robust code is best pracAce. In the cloud, it’s essenAal.”
![Page 26: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/26.jpg)
Takeaway
Ne9lix is path-‐finding the use of public AWS cloud to replace in-‐house IT for non-‐trivial
applicaAons with hundreds of developers and thousands of systems.
(Pause for quesRons before we dive into details)
![Page 27: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/27.jpg)
What, Why and How?
The details…
![Page 28: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/28.jpg)
Synopsis
• The Goals – Faster, Scalable, Available and ProducRve
• AnR-‐pa?erns and Cloud Architecture – The things we wanted to change and why
• Cloud Bring-‐up Strategy – Developer TransiRons and Tools
• Roadmap and Next Steps
![Page 29: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/29.jpg)
Goals
• Faster – Lower latency than the equivalent datacenter web pages and API calls – Measured as mean and 99th percenRle – For both first hit (e.g. home page) and in-‐session hits for the same user
• Scalable – Avoid needing any more datacenter capacity as subscriber count increases – No central verRcally scaled databases – Leverage AWS elasRc capacity effecRvely
• Available – SubstanRally higher robustness and availability than datacenter services – Leverage mulRple AWS availability zones – No scheduled down Rme, no central database schema to change
• ProducRve – OpRmize agility of a large development team with automaRon and tools – Leave behind complex tangled datacenter code base (~8 year old architecture) – Enforce clean layered interfaces and re-‐usable components
![Page 30: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/30.jpg)
Cloud Architecture Pa?erns
Where do we start?
![Page 31: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/31.jpg)
Datacenter AnR-‐Pa?erns
What do we currently do in the datacenter that prevents us from
meeRng our goals?
![Page 32: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/32.jpg)
Rewrite from Scratch
Not everything is cloud specific Pay down technical debt
Robust pa?erns
![Page 33: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/33.jpg)
Old Datacenter vs. New Cloud Arch
Central SQL Database Distributed Key/Value NoSQL
SRcky In-‐Memory Session Shared Memcached Session
Cha?y Protocols Latency Tolerant Protocols
Tangled Service Interfaces Layered Service Interfaces
Instrumented Code Instrumented Service Pa?erns
Fat Complex Objects Lightweight Serializable Objects
Components as Jar Files Components as Services
![Page 34: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/34.jpg)
The Central SQL Database
• Datacenter has a central database – Everything in one place is convenient unRl it fails – Customers, movies, history, configuraRon
• Schema changes require downRme
AnA-‐paTern impacts scalability, availability
![Page 35: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/35.jpg)
The Distributed Key-‐Value Store
• Cloud has many key-‐value data stores – More complex to keep track of, do backups etc. – Each store is much simpler to administer
– Joins take place in java code • No schema to change, no scheduled downRme
• Latency for Memcached vs. Oracle vs. SimpleDB – Memcached is dominated by network latency <1ms – Oracle for simple queries is a few milliseconds
– SimpleDB has replicaRon and REST overheads >10ms
DBA
![Page 36: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/36.jpg)
The SRcky Session
• Datacenter SRcky Load Balancing – Efficient caching for low latency
– Tricky session handling code – Middle Rer load balancer has issues in pracRce
• Encourages concentrated funcRonality – one service that does everything
AnA-‐paTern impacts producAvity, availability
![Page 37: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/37.jpg)
The Shared Session
• Cloud Uses Round-‐Robin Load Balancing – Simple request-‐based code
– External shared caching with memcached
• More flexible fine grain services – Works be?er with auto-‐scaled instance counts
![Page 38: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/38.jpg)
Cha?y Opaque and Bri?le Protocols
• Datacenter service protocols – Assumed low latency for many simple requests
• Based on serializing exisRng java objects – Inefficient formats – IncompaRble when definiRons change
AnA-‐paTern causes producAvity, latency and availability issues
![Page 39: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/39.jpg)
Robust and Flexible Protocols
• Cloud service protocols – JSR311/Jersey is used for REST/HTTP service calls – Custom client code includes service discovery – Support complex data types in a single request
• Apache Avro – Evolved from Protocol Buffers and Thri:
– Includes JSON header defining key/value protocol – Avro serializaRon is half the size and several Rmes faster than Java serializaRon, more work to code
![Page 40: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/40.jpg)
Persisted Protocols
• Persist Avro in Memcached – Save space/latency (zigzag encoding, half the size) – Less bri?le across versions – New keys are ignored – Missing keys are handled cleanly
• Avro protocol definiRons – Can be wri?en in JSON or generated from POJOs – It’s hard, needs be?er tooling
![Page 41: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/41.jpg)
Tangled Service Interfaces
• Datacenter implementaRon is exposed – Oracle SQL queries mixed into business logic
• Tangled code – Deep dependencies, false sharing
• Data providers with sideways dependencies – Everything depends on everything else
AnA-‐paTern affects producAvity, availability
![Page 42: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/42.jpg)
Untangled Service Interfaces
• New Cloud Code With Strict Layering – Compile against interface jar
– Can use spring runRme binding to enforce
• Service interface is the service – ImplementaRon is completely hidden – Can be implemented locally or remotely
– ImplementaRon can evolve independently
![Page 43: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/43.jpg)
Untangled Service Interfaces
Two layers: • SAL -‐ Service Access Library – Basic serializaRon and error handling – REST or POJO’s defined by data provider
• ESL -‐ Extended Service Library – Caching, conveniences – Can combine several SALs – Exposes faceted type system (described later) – Interface defined by data consumer in many cases
![Page 44: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/44.jpg)
Service InteracRon Pa?ern Swimlane Diagram
![Page 45: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/45.jpg)
Service Architecture Pa?erns
• Internal Interfaces Between Services – Common pa?erns as templates
– Highly instrumented, observable, analyRcs – Service Level Agreements – SLAs
• Library templates for generic features – Instrumented Ne#lix Base Servlet template
– Instrumented generic client interface template – Instrumented S3, SimpleDB, Memcached clients
![Page 46: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/46.jpg)
Service Request Instruments Every Step in the call
CLIENT
Request Start Timestamp, Request End Timestamp
Client outbound
serialize start Rmestamp
Client outbound
serialize end Rmestamp
Client Network send
Rmestamp
Service Network receive
Rmestamp
Service inbound
serialize start Rmestamp
Service inbound
serialize end Rmestamp
SERVICE execute request start Rmestamp,
execute request end Rmestamp
Service outbound
serialize start Rmestamp
Service outbound
serialize end Rmestamp
Service network send Rmestamp
Client network receive
Rmestamp
Inbound deserialize
start Rmestamp
Inbound deserialize end Rmestamp
![Page 47: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/47.jpg)
Boundary Interfaces
• Isolate teams from external dependencies – Fake SAL built by cloud team
– Real SAL provided by data provider team later – ESL built by cloud team using faceted objects
• Fake data sources allow development to start – e.g. Fake IdenRty SAL for a test set of customers
– Development solidifies dependencies early – Helps external team provide the right interface
![Page 48: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/48.jpg)
One Object That Does Everything
• Datacenter uses a few big complex objects – Movie and Customer objects are the foundaRon – Good choice for a small team and one instance – ProblemaRc for large teams and many instances
• False sharing causes tangled dependencies – UnproducRve re-‐integraRon work
AnA-‐paTern impacAng producAvity and availability
![Page 49: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/49.jpg)
An Interface For Each Component
• Cloud uses faceted Video and Visitor – Basic types hold only the idenRfier – Facets scope the interface you actually need – Each component can define its own facets
• No false-‐sharing and dependency chains – Type manager converts between facets as needed
– video.asA(PresentaRonVideo) for www – video.asA(MerchableVideo) for middle Rer
![Page 50: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/50.jpg)
So:ware Architecture Pa?erns
• Object Models – Basic and derived types, facets, serializable – Pass by reference within a service – Pass by value between services
• ComputaRon and I/O Models – Service ExecuRon using Best Effort – Common thread pool management
![Page 51: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/51.jpg)
Ne#lix Systems Architecture
![Page 52: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/52.jpg)
Front End ELB
API Proxy
API ELB
Discovery Service
API
AWS EC2
Ne@lix Data Center
memcached
S3
Oracle
SQS
EBS
Component Services
AWS Storage SimpleDB
memcached
API
Oracle Oracle
API etc.
ReplicaRon
![Page 53: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/53.jpg)
Ne#lix UndifferenRated Li:ing
• Middle Tier Load Balancing
• Discovery (local DNS) • EncrypRon Services • Caching • Distributed App Management
We want cloud vendors to do all this for us as well!
![Page 54: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/54.jpg)
Load Balancing in AWS
• Middle Rer currently not supported in AWS – ELB are public-‐facing only – Cannot apply security group sezngs
• ELB verRcal scalability for concentrated clients – Too few proxy IP addresses leads to hot spots
• ELB needs support for balancing heurisRcs – ProporRonal balance across Availability Zones – Weighted Least connecRons, Weighted Round Robin
• Zone aware rouRng – Default to instances in the same Availability Zone – Falls back to cross-‐zone on failure
![Page 55: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/55.jpg)
Discovery • Discovery Service (Redundant instances per zone) – Simple REST interface – Cloud apps register with Discovery
• Apps send heartbeats every 30 sec to renew lease – App evicted a:er 3 missed heartbeats – Can re-‐register if the problem was transient
• Apps can store custom metadata – Version number, AMI id, Availability Zone, etc.
• So:ware Round-‐robin Load Balancer – Query Discovery for instances of specific applicaRon – Baked into Ne#lix REST client (JSR311/Jersey based)
AWS Middle-‐)er ELB would eliminate most use cases
![Page 56: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/56.jpg)
Database MigraRon
• Why SimpleDB? – No DBA’s in the cloud, Amazon hosted service – Work started two years ago, fewer viable opRons – Worked with Amazon to speed up and scale SimpleDB
• AlternaRves? – InvesRgaRng adding Cassandra and Membase to the mix – Need several opRons to match use cases well
• Detailed SimpleDB Advice – Sid Anand -‐ QConSF Nov 5th – Ne#lix’ TransiRon to High Availability Storage Systems
– Blog -‐ h?p://pracRcalcloudcompuRng.com/ – Download Paper PDF -‐ h?p://bit.ly/bhOTLu
![Page 57: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/57.jpg)
Tools and AutomaRon
• Developer and Build Tools – Jira, Eclipse, Hudson, Ivy, ArRfactory – Builds, creates .war file, .rpm, bakes AMI and launches
• Custom Ne#lix ApplicaRon Console – AWS Features at Enterprise Scale (hide the keys!) – Auto Scaler Group is unit of deployment to producRon
• Open Source + Support – Apache, Tomcat, OpenJDK, CentOS
• Monitoring Tools – Keynote – service monitoring and alerRng – AppDynamics – Developer focus for cloud – EpicNMS – flexible data collecRon and plots h?p://epicnms.com – Nimso: NMS – ITOps focus for Datacenter + Cloud alerRng
![Page 58: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/58.jpg)
Current Status
![Page 59: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/59.jpg)
WWW Page by Page during Q2/Q3/Q4
• Simplest possible page first – Minimal dependencies
• Add pages as dependent services come online
• Home page – most complex and highest traffic
• Leave low traffic pages for later cleanup
gradual migraAon from Datacenter pages
![Page 60: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/60.jpg)
Big-‐Bang TransiRon
• iPhone Launch (August/Sept) – No capacity in the datacenter, cloud only – App Store gates release, not gradual, can’t back out – Market is huge (exisRng and new customers) – Has to work at large scale on day one
• Datacenter Shadow Redirect Technique – Used to stress back-‐end and data sources
• SOASTA Cloud Based Load GeneraRon – Used to stress test API and end-‐to-‐end funcRonality
![Page 61: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/61.jpg)
Current Work for Cloud Pla#orm
• Drive latency and availability goals – More Aggressive caching – Fault and latency robustness
• Logging and monitoring portal/dashboards – Working to integrate tools and data sources – Need be?er observability and automaRon
• EvaluaRng a range of NoSQL choices – Broad set of use cases, no single winner – Good topic for another talk…
![Page 62: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/62.jpg)
Wrap Up
![Page 63: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/63.jpg)
Next Few Years… • “System of Record” moves to Cloud
– Master copies of data live only in the cloud, with backups etc. – Cut the datacenter to cloud replicaRon link
• InternaRonal Expansion – Global Clouds – Rapid deployments to new markets
• GPU Clouds opRmized for video encoding • Cloud StandardizaRon
– Cloud features and APIs should be a commodity not a differenRator – DifferenRate on scale and quality of service – CompeRRon also drives cost down – Higher resilience – Higher scalability
We would prefer to be an insignificant customer in a giant cloud
![Page 64: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/64.jpg)
Remember the Goals
Faster
Scalable
Available ProducRve
Track progress against these goals
![Page 65: Ne#lix’in’the’Cloud’ - QCon San Francisco · Ne#lix’is’PathKfinding’ The’Cloud’ecosystem’is’evolving’very’fast Share’with’and’learn’from’the’cloud’community’](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec98f6cf931947a177dd15f/html5/thumbnails/65.jpg)
Takeaway
Ne9lix is path-‐finding the use of public AWS cloud to replace in-‐house IT for non-‐trivial
applicaAons with hundreds of developers and thousands of systems.
h?p://www.linkedin.com/in/adriancockcro: @adrianco #ne#lixcloud acockcro:@ne#lix.com