Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers
Nathan FarringtonGeorge Porter, Sivasankar Radhakrishnan, Hamid Hajabdolali Bazzaz, Vikram Subramanya,
Yeshaiahu Fainman, George Papen, and Amin Vahdat
Nathan Farrington 2
Electrical Packet Switch• $500/port• 10 Gb/s fixed rate• 12 W/port• Requires transceivers• Per-packet switching• For bursty, uniform traffic
Optical Circuit Switch• $500/port• Rate free• 240 mW/port• No transceivers• 12 ms switching time• For stable, pair-wise traffic
2010-09-02 SIGCOMM
3
Analysis
TechnologyIntro
Data Plane
Control Plane
Experimental Setup
Evaluation
Related Work
Conclusion
Nathan Farrington 4
Optical Circuit Switch
2010-09-02 SIGCOMM
Lenses FixedMirror
Mirrors on Motors
Glass FiberBundle
Input 1Output 2Output 1
Rotate Mirror1. Full crossbar switch2. Does not decode packets3. Needs external scheduler
Nathan Farrington 5
Wavelength Division Multiplexing
2010-09-02 SIGCOMM
Electrical Packet Switch1 2 3 4 5 6 7 8
WDM MUX WDM DEMUX
Optical Circuit Switch
Superlink
10G WDM OpticalTransceivers
No TransceiversRequired80G
Nathan Farrington 6
Stability Increases with Aggregation
2010-09-02 SIGCOMM
Inter-ThreadInter-ProcessInter-ServerInter-RackInter-Pod
Inter-Data Center Where is theSweet Spot?
1. Enough Stability2. Enough Traffic
7
AnalysisTechnology
Intro
Data PlaneControl Plane
Experimental SetupEvaluation
Related Work
Conclusion
Nathan Farrington 8
Bisection Bandwidth
10% Electrical(10:1 Oversubscribed)
100% Electrical Helios Example10% Electrical + 90% Optical
Cost $6.3 M
Power 96.5 kW
Cables 6,656
Example: N=64 pods * k=1024 hosts/pod = 64K hosts total; 8 wavelengths
2010-09-02 SIGCOMM
N pods, k-ports each
k switches, N-ports each
Nathan Farrington 9
Bisection Bandwidth
10% Electrical(10:1 Oversubscribed)
100% Electrical Helios Example10% Electrical + 90% Optical
Cost $6.3 M $62.3 M
Power 96.5 kW 950.3 kW
Cables 6,656 65,536
Example: N=64 pods * k=1024 hosts/pod = 64K hosts total; 8 wavelengths
2010-09-02 SIGCOMM
N pods, k-ports each
k switches, N-ports each
Nathan Farrington 10
Bisection Bandwidth
10% Electrical(10:1 Oversubscribed)
100% Electrical Helios Example10% Electrical + 90% Optical
Cost $6.3 M $62.2 M $22.1 M 2.8x Less
Power 96.5 kW 950.3 kW 157.2 kW 6.0x Less
Cables 6,656 65,536 14,016 4.7x Less
Example: N=64 pods * k=1024 hosts/pod = 64K hosts total; 8 wavelengths
2010-09-02 SIGCOMM
Fewer CoreSwitches
N pods, k-ports each
Less than k switches, N-ports each
11
AnalysisTechnology
Intro
Data PlaneControl Plane
Experimental SetupEvaluation
Related WorkConclusion
Nathan Farrington 122010-09-02 SIGCOMM
10G 10G 10G80G80G 80G
Pod 1 -> 2:• Capacity = 10G• Demand = 10G• Throughput = 10GPod 1 -> 3:• Capacity = 80G• Demand = 80G• Throughput = 80G
OCSEPS
Setup a Circuit
Pod 1 Pod 2 Pod 3
Nathan Farrington 132010-09-02 SIGCOMM
10G 10G 10G80G80G 80G
Pod 1 -> 2:• Capacity = 10G• Demand = 10G• Throughput = 10GPod 1 -> 3:• Capacity = 80G• Demand = 80G• Throughput = 80G
OCSEPS
Traffic Patterns Change
Pod 1 Pod 2 Pod 3
Nathan Farrington 142010-09-02 SIGCOMM
10G 10G 10G80G80G 80G
Pod 1 -> 2:• Capacity = 10G• Demand = 10G 80G• Throughput = 10GPod 1 -> 3:• Capacity = 80G• Demand = 80G 10G• Throughput = 10G
OCSEPS
Traffic Patterns Change
Pod 1 Pod 2 Pod 3
Nathan Farrington 152010-09-02 SIGCOMM
10G 10G 10G80G80G 80G
Pod 1 -> 2:• Capacity = 10G• Demand = 10G 80G• Throughput = 10GPod 1 -> 3:• Capacity = 80G• Demand = 80G 10G• Throughput = 10G
OCSEPS
Pod 1 Pod 2 Pod 3
Break a Circuit
Nathan Farrington 162010-09-02 SIGCOMM
10G 10G 10G80G80G 80G
Pod 1 -> 2:• Capacity = 10G• Demand = 10G 80G• Throughput = 10GPod 1 -> 3:• Capacity = 80G• Demand = 80G 10G• Throughput = 10G
OCSEPS
Pod 1 Pod 2 Pod 3
Setup a Circuit
Nathan Farrington 172010-09-02 SIGCOMM
10G 10G 10G80G80G 80G
Pod 1 -> 2:• Capacity = 80G• Demand = 80G• Throughput = 80GPod 1 -> 3:• Capacity = 80G• Demand = 80G 10G• Throughput = 10G
OCSEPS
Pod 1 Pod 2 Pod 3
Nathan Farrington 182010-09-02 SIGCOMM
10G 10G 10G80G80G 80G
Pod 1 -> 2:• Capacity = 80G• Demand = 80G• Throughput = 80GPod 1 -> 3:• Capacity = 10G• Demand = 10G• Throughput = 10G
OCSEPS
Pod 1 Pod 2 Pod 3
19
AnalysisTechnology
Intro
Data Plane
Control PlaneExperimental Setup
EvaluationRelated Work
Conclusion
Nathan Farrington 202010-09-02 SIGCOMM
10G 10G 10G80G80G 80G
OCSEPS
Pod 1 Pod 2 Pod 3
Pod SwitchManager
Pod SwitchManager
Pod SwitchManager
Circuit SwitchManager
TopologyManager
Nathan Farrington 21
Outline of Control Loop
1. Estimate traffic demand2. Compute optimal topology for maximum
throughput3. Program the pod switches and circuit
switches
2010-09-02 SIGCOMM
Nathan Farrington 22
1. Estimate Traffic Demand
Question: Will this flow use more bandwidth if we give it more capacity?
1. Identify elephant flows (mice don’t grow)Problem: Measurements are biased by current
topology
2. Pretend all hosts are connected to an ideal crossbar switch
3. Compute the max-min fair bandwidth fixpoint
2010-09-02 SIGCOMM
Mohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, and Amin Vahdat. Hedera: Dynamic Flow Scheduling for Data Center Networks. In NSDI’10.
Nathan Farrington 23
2. Compute Optimal Topology
1. Formulate as instance of max-weight perfect matching problem on bipartite graph
2. Solve with Edmonds algorithm
2010-09-02 SIGCOMM
1
2
3
4
1
2
3
4
Source Pods Destination Pods
a) Pods do not send traffic to themselvesb) Edge weights represent interpod demandc) Algorithm is run iteratively for each circuit
switch, making use of the previous results
Nathan Farrington 24
Example: Compute Optimal Topology
2010-09-02 SIGCOMM
Nathan Farrington 25
Example: Compute Optimal Topology
2010-09-02 SIGCOMM
Nathan Farrington 26
Example: Compute Optimal Topology
2010-09-02 SIGCOMM
27
Analysis
Technology
Intro
Data Plane
Control Plane
Experimental SetupEvaluationRelated Work
Conclusion
Nathan Farrington 282010-09-02 SIGCOMM
Traditional Network Helios Network
100% bisection bandwidth(240 Gb/s)
Nathan Farrington 29
Hardware
• 24 servers– HP DL380– 2 socket (E5520) Nehalem– Dual Myricom 10G NICs
• 7 switches– One Dell 1G 48-port– Three Fulcrum 10G 24-port– One Glimmerglass 64-port
optical circuit switch– Two Cisco Nexus 5020 10G
52-port
2010-09-02 SIGCOMM
Nathan Farrington 302010-09-02 SIGCOMM
31
Analysis
Technology
Intro
Data Plane
Control Plane
Experimental Setup
EvaluationRelated Work
Conclusion
Nathan Farrington 32
Traditional Network
2010-09-02 SIGCOMM
Hash Collisions TCP/IP Overhead
190 Gb/s Peak171 Gb/s Avg
Nathan Farrington 33
Helios Network (Baseline)
2010-09-02 SIGCOMM
160 Gb/s Peak43 Gb/s Avg
Nathan Farrington 34
Port Debouncing
2010-09-02 SIGCOMM
0.0 0.25 0.5 0.75 1.0 1.25 1.5 1.75 2.0
Time (s)
1.Layer 1 PHY signal locked (bits are detected)2.Switch thread wakes up and polls for PHY status• Makes note to enable link after 2 seconds
3.Switch thread enables Layer 2 link
Nathan Farrington 35
Without Debouncing
2010-09-02 SIGCOMM
160 Gb/s Peak87 Gb/s Avg
Nathan Farrington 36
Without EDC
2010-09-02 SIGCOMM
160 Gb/s Peak142 Gb/s Avg
Software Limitation
27 ms Gaps
Nathan Farrington 37
Bidirectional Circuits
2010-09-02 SIGCOMM
Optical Circuit Switch
Pod Switch
RX TX
Pod Switch
RX TX
Pod Switch
RX TX
Nathan Farrington 38
Unidirectional Circuits
2010-09-02 SIGCOMM
Optical Circuit Switch
Pod Switch
RX TX
Pod Switch
RX TX
Pod Switch
RX TX
Nathan Farrington 39
Unidirectional Circuits
2010-09-02 SIGCOMM
Unidirectional Scheduler142 Gb/s Avg
Bidirectional Scheduler100 Gb/s Avg
Daisy Chain Needed for Good PerformanceFor Arbitrary Traffic Patterns
Nathan Farrington 40
Traffic Stability and Throughput
2010-09-02 SIGCOMM
41
Analysis
Technology
Intro
Data Plane
Control Plane
Experimental Setup
Evaluation
Related WorkConclusion
Nathan Farrington 422010-09-02 SIGCOMM
Link Technology Modifications Required
WorkingPrototype
Helios(SIGCOMM ‘10)
Optics w/ WDM10G-180G (CWDM)10G-400G (DWDM)
Switch Software Glimmerglass, Fulcrum
c-Through(SIGCOMM ’10)
Optics (10G) Host OS Emulation
Flyways(HotNets ‘09)
Wireless (1G, 10m) Unspecified
IBM System-S(GLOBECOM ‘09)
Optics (10G) Host Application;Specific to Stream Processing
Calient,Nortel
HPC(SC ‘05)
Optics (10G) Host NIC Hardware
43
Analysis
Technology
Intro
Data Plane
Control Plane
Experimental Setup
Evaluation
Related Work
Conclusion
Nathan Farrington 44
“Why Packet Switching?”
“The conventional wisdom [of 1985 is] that packet switching is poorly suited to the needs of telephony . . .”
2010-09-02 SIGCOMM
Jonathan Turner. “Design of an Integrated Services Packet Network”. IEEE J. on Selected Areas in Communications, SAC-4 (8), Nov 1986.
Nathan Farrington 45
Conclusion
• Helios: a scalable, energy-efficient network architecture for modular data centers
• Large cost, power, and cabling complexity savings• Dynamically and automatically provisions bisection
bandwidth at runtime• Does not require end-host modifications or switch
hardware modifications• Deployable today using commercial components• Uses the strengths of circuit switching to compensate
for the weaknesses of packet switching, and vice versa
2010-09-02 SIGCOMM
Top Related