Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April...
Transcript of Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April...
![Page 1: Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April 12, 2016 Low-Latency Datacenters Slide 3 Component 2010 Possible Today 5-10 Years](https://reader034.fdocuments.in/reader034/viewer/2022050110/5f47f27221a760452d67e468/html5/thumbnails/1.jpg)
Low-Latency Datacenters
John Ousterhout
![Page 2: Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April 12, 2016 Low-Latency Datacenters Slide 3 Component 2010 Possible Today 5-10 Years](https://reader034.fdocuments.in/reader034/viewer/2022050110/5f47f27221a760452d67e468/html5/thumbnails/2.jpg)
The Datacenter Revolution
● How to use 10,000 servers for a single application?
● New storage systems: § Bigtable, HDFS, ...
● New models of computation: § MapReduce, Spark, ...
● But, latencies high: § Network round-trips: 0.5 ms § Disk: 10 ms
● Interactive apps can’t access much data
● Latencies dropping dramatically
● Network round-trips: § 500 µs → 2.5 µs?
● Storage: § 10 ms → 1 µs?
● Potential: new applications (collaboration?)
● Challenge: need a new software stack
April 12, 2016 Low-Latency Datacenters Slide 2
Phase 1: Scale Phase 2: Low Latency
![Page 3: Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April 12, 2016 Low-Latency Datacenters Slide 3 Component 2010 Possible Today 5-10 Years](https://reader034.fdocuments.in/reader034/viewer/2022050110/5f47f27221a760452d67e468/html5/thumbnails/3.jpg)
Network Latency (Round Trip)
April 12, 2016 Low-Latency Datacenters Slide 3
Component 2010 Possible Today
5-10 Years
Switching fabric 100-300 µs 5 µs 0.2 µs
Software 50 µs 2 µs 1 µs
NICs 8-128 µs 3 µs 0.2 µs
Propagation delay 1 µs 1 µs 1 µs
Total 200-400 µs 11 µs 2.4 µs
(Within a datacenter, 100K servers)
![Page 4: Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April 12, 2016 Low-Latency Datacenters Slide 3 Component 2010 Possible Today 5-10 Years](https://reader034.fdocuments.in/reader034/viewer/2022050110/5f47f27221a760452d67e468/html5/thumbnails/4.jpg)
Storage Latency
April 12, 2016 Low-Latency Datacenters Slide 4
Disk 5–10 ms
Flash 50–500 µs Nonvolatile memory (e.g. 3D XPoint) 1–10 µs
![Page 5: Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April 12, 2016 Low-Latency Datacenters Slide 3 Component 2010 Possible Today 5-10 Years](https://reader034.fdocuments.in/reader034/viewer/2022050110/5f47f27221a760452d67e468/html5/thumbnails/5.jpg)
April 12, 2016 Low-Latency Datacenters Slide 5
Low-Latency Software Stack ● Existing software stacks highly layered
§ Great for software structuring § Layer crossings add latency § Slow networks and disks hide software latency
● Can’t achieve low latency with today’s stacks § Death by a thousand cuts § Networks:
● Complex OS protocol stacks ● Marshaling/serialization costs
§ Storage systems: ● OS file system overheads
Need significant changes to software stacks
![Page 6: Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April 12, 2016 Low-Latency Datacenters Slide 3 Component 2010 Possible Today 5-10 Years](https://reader034.fdocuments.in/reader034/viewer/2022050110/5f47f27221a760452d67e468/html5/thumbnails/6.jpg)
April 12, 2016 Low-Latency Datacenters Slide 6
Reducing Software Stack Latency
High Latency
1. Optimize layers (specialize?)
2. Eliminate layers
3. Bypass layers
![Page 7: Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April 12, 2016 Low-Latency Datacenters Slide 3 Component 2010 Possible Today 5-10 Years](https://reader034.fdocuments.in/reader034/viewer/2022050110/5f47f27221a760452d67e468/html5/thumbnails/7.jpg)
April 12, 2016 Low-Latency Datacenters Slide 7
The RAMCloud Storage System ● New class of storage for low-latency
datacenters: § All data in DRAM at all times § Low latency: 5-10µs remote access § Large scale: 1000-10000 servers
● Durability/availability equivalent to replicated disk
● 1000x improvements in: § Performance § Energy/op (relative to disk-based storage)
Master
Backup
Master
Backup
Master
Backup
Master
Backup …
Appl.
Library
Appl.
Library
Appl.
Library
Appl.
Library …
Datacenter Network Coordinator
1000 – 100,000 Application Servers
1000 – 10,000 Storage Servers
![Page 8: Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April 12, 2016 Low-Latency Datacenters Slide 3 Component 2010 Possible Today 5-10 Years](https://reader034.fdocuments.in/reader034/viewer/2022050110/5f47f27221a760452d67e468/html5/thumbnails/8.jpg)
April 12, 2016 Low-Latency Datacenters Slide 8
Thread Scheduling ● Traditional kernel-based thread scheduling is
breaking down: § Context switches too expensive § Applications don’t know how many cores are available
(Can’t match workload concurrency to available cores) § Kernel may preempt threads at inconvenient points
● Fine-grained thread scheduling must move to applications § Kernel allocates cores to apps over longer timer intervals § Kernel asks application to release cores
● Arachne project: core-aware thread scheduling § Partial design, implementation beginning § Initial performance result: 9ns context switches!
Application
Operating System
Thread Scheduling
![Page 9: Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April 12, 2016 Low-Latency Datacenters Slide 3 Component 2010 Possible Today 5-10 Years](https://reader034.fdocuments.in/reader034/viewer/2022050110/5f47f27221a760452d67e468/html5/thumbnails/9.jpg)
April 12, 2016 Low-Latency Datacenters Slide 9
New Datacenter Transport ● TCP optimized for:
§ Throughput, not latency § Long-haul networks (high latency) § Congestion throughout § Modest # connections/server
● Future datacenters: § High performance networking fabric:
● Low latency ● Multi-path
§ Congestion primarily at edges § Many connections/server (1M?)
Need new transport protocol
...
Top-of-rack switches
Servers
Datacenter Network
Congestion at edges (host-TOR links)
![Page 10: Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April 12, 2016 Low-Latency Datacenters Slide 3 Component 2010 Possible Today 5-10 Years](https://reader034.fdocuments.in/reader034/viewer/2022050110/5f47f27221a760452d67e468/html5/thumbnails/10.jpg)
April 12, 2016 Low-Latency Datacenters Slide 10
Homa: New Transport Protocol ● Greatest obstacle to low latency:
§ Congestion at receiver’s link § Large messages delay small ones
● Solution: drive congestion control from receiver § Schedule incoming traffic § Prioritize small messages § Take advantage of priorities in network
● Implemented at user level § Designed for kernel bypass, polling-based approach
● Status: § Evaluating scheduling algorithms via simulation
![Page 11: Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April 12, 2016 Low-Latency Datacenters Slide 3 Component 2010 Possible Today 5-10 Years](https://reader034.fdocuments.in/reader034/viewer/2022050110/5f47f27221a760452d67e468/html5/thumbnails/11.jpg)
April 12, 2016 Low-Latency Datacenters Slide 11
Conclusion
● Interesting times for datacenter software
● Revisit fundamental system design decisions
● Exploring from several different angles
● Will the role of the OS change fundamentally?
![Page 12: Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April 12, 2016 Low-Latency Datacenters Slide 3 Component 2010 Possible Today 5-10 Years](https://reader034.fdocuments.in/reader034/viewer/2022050110/5f47f27221a760452d67e468/html5/thumbnails/12.jpg)
February 24, 2016 Platform Lab Introduction Slide 12
New Platform Lab
Platforms
Large Systems Collaboration
Create the next generation of platforms to stimulate new classes of applications
![Page 13: Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April 12, 2016 Low-Latency Datacenters Slide 3 Component 2010 Possible Today 5-10 Years](https://reader034.fdocuments.in/reader034/viewer/2022050110/5f47f27221a760452d67e468/html5/thumbnails/13.jpg)
February 24, 2016 Platform Lab Introduction Slide 13
Platform Lab Faculty
John Ousterhout Faculty Director
Mendel Rosenblum Keith Winstein Guru Parulkar Executive Director
Bill Dally Phil Levis Sachin Katti Christos Kozyrakis
Nick McKeown
![Page 14: Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April 12, 2016 Low-Latency Datacenters Slide 3 Component 2010 Possible Today 5-10 Years](https://reader034.fdocuments.in/reader034/viewer/2022050110/5f47f27221a760452d67e468/html5/thumbnails/14.jpg)
February 24, 2016 Platform Lab Introduction Slide 14
Theme: Swarm Collaboration Infrastructure
Wired/Wireless Networks
Next-Generation Datacenter Clusters
(Cloud/Edge) Device Swarms
![Page 15: Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April 12, 2016 Low-Latency Datacenters Slide 3 Component 2010 Possible Today 5-10 Years](https://reader034.fdocuments.in/reader034/viewer/2022050110/5f47f27221a760452d67e468/html5/thumbnails/15.jpg)
February 24, 2016 Platform Lab Introduction Slide 15
Platform Lab Affiliates
![Page 16: Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April 12, 2016 Low-Latency Datacenters Slide 3 Component 2010 Possible Today 5-10 Years](https://reader034.fdocuments.in/reader034/viewer/2022050110/5f47f27221a760452d67e468/html5/thumbnails/16.jpg)
Questions/Comments?
April 12, 2016 Low-Latency Datacenters Slide 16
![Page 17: Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April 12, 2016 Low-Latency Datacenters Slide 3 Component 2010 Possible Today 5-10 Years](https://reader034.fdocuments.in/reader034/viewer/2022050110/5f47f27221a760452d67e468/html5/thumbnails/17.jpg)
April 12, 2016 Low-Latency Datacenters Slide 17
Does Low Latency Matter? Potential: enable new data-intensive applications ● Application characteristics
§ Collect many small pieces of data from different sources § Irregular access patterns § Need interactive/real-time response
● Candidate applications § Large-scale graph algorithms (machine learning?) § Collaboration at scale
![Page 18: Low-Latency Datacenters - Stanford University...2016/04/12 · Network Latency (Round Trip) April 12, 2016 Low-Latency Datacenters Slide 3 Component 2010 Possible Today 5-10 Years](https://reader034.fdocuments.in/reader034/viewer/2022050110/5f47f27221a760452d67e468/html5/thumbnails/18.jpg)
Large-Scale Collaboration
April 12, 2016 Low-Latency Datacenters Slide 18
Data for one user
Gmail: email for one user
Facebook: 50-500 friends
Morning commute: 10,000-100,000 cars
“Region of Consciousness”