Reporter: Fuchao Zhou
description
Transcript of Reporter: Fuchao Zhou
Department of Computer Science
A Scalable, Commodity Data Center Network Architecture
Mohammad Al-Fares Alexander Loukissas Amin VahdatSIGCOMM’08
Reporter: Fuchao Zhou
Department of Computer Science
Problem• How to design Data Center Network Architecture -- Scalable interconnection bandwidth
-- Without incurring tremendous cost
-- Compatibility with hosts running Ethernet and IP
Department of Computer Science
Existing solutions• Using specialized hardware and communication
protocols such as InfiniBand and Myrinet -- More expensive for using high-end switches
-- Not natively compatible with TCP/IP applications
• Using commodity Ethernet switches and routers to interconnect cluster machines
-- Need appropriate network topology
-- Bandwidth scales poorly with cluster size
-- Non-linear cost increases with cluster size
Department of Computer Science
Existing solutions• Typical architectures today -- Two-level trees of switches or routers (supports 5K to 8K hosts)
-- Three-level trees of switches or routers
• Disadvantages -- only support 50%
bandwidth available at the edge of the network
-- incurring tremendous cost($37M to supports 27,648 hosts)
Department of Computer Science
Proposed solution• Typical architectures today -- k pods, each containing two layers of k/2 switches
-- (k/2)2 k-port core switches
-- supports k3/4 hosts(48-ary fat-tree supports 27,648 hosts)
k-ary fat-tree topology
• Advantages -- non-blocking
-- all switching elements are identical ($8.64M to supports 27,648 hosts)
-- compatible with hosts running Ethernet and IP
Department of Computer Science
Static Routing method• two-level routing
table
-- maximum bisection bandwidth in this network
• IP address -- Core switches:10.k.j.i
-- Pod switches: 10.pod.switch.1
-- Hosts:10.pod.switch.ID
Department of Computer Science
Static Routing example
Packet from 10.0.1.2 to host 10.2.0.3 Packet from 10.0.1.3 to host 10.2.0.2
10.2.1.310.0.1.3
10.2.3.110.0.3.1
32
23
23
Prefix Output port
10.0.0.0/24 0
10.0.1.0/24 1
0.0.0.0/00.0.0.2/8 2
0.0.0.3/8 3
Prefix Output port
10.0.0.0/16 0
10.1.0.0/16 1
10.2.0.0/16 2
10.3.0.0/16 3
Prefix Output port
10.2.0.0/24 0
10.2.1.0/24 1
0.0.0.0/0
01
Department of Computer Science
Dynamic Routing methods
• flow classification 1. Recognize subsequence packets of the same flow, and
forward them to the same outgoing port against packet reordering;
2. Periodically reassign output ports to ensure fair distribution on flows on output ports in the face of dynamically changing flow size.
Department of Computer Science
Dynamic Routing methods
• flow scheduling (with a central scheduler)Method1:(notification)
1. Edge switches detect any outgoing large flow
2. Send notifications to a central scheduler periodically
3. The central scheduler order a re-assignment;
Method2:(monitor)
1. A central scheduler tracks all active large flows
2. Assign them non-conflicting paths if possible.
3. The scheduler maintains Boolean state for all links
Department of Computer Science
Fault-Tolerance
• Simple failure broadcast protocol -- Each switch maintains a Bidirectional forwarding
Detection session(BRD)(D.Datz, D.Ward. BFD for IPv4 AND IPv6, 2008)
• Two classes of failures
Department of Computer Science
Fault-Tolerance based on the flow classification(1)
Outgoing inter- and intra-pod traffic originating from the edge switchIntra-pod traffic using the upper-layer switch as an intermediaryInter-pod traffic coming into the upper-layer switch
Department of Computer Science
Fault-Tolerance based on the flow classification(2)
Outgoing inter-pod traffic
Incoming inter-pod traffic
Department of Computer Science
Fault-Tolerance based on the flow scheduling
• Simpler• The scheduler marks any link reported to be down
as busy or unavailable
Department of Computer Science
Limitations• The performance evaluation of a prototype of the
architecture consisting of 4 pods(16 hosts)• Fat-tree topology is wiring overhead
-- 3k3/4 wire cables for a k-ary fat tree
-- e.g. k=48, supporting 27,648 hosts.
3*483/4=82,944 wire cables --.• How many changes for the commodity switches
should be considered. --don’t support the dynamic routing techniques -- don’t
support two-level routing table
Department of Computer Science
Limitations• Dynamic routing techniques also have limitations
--- flow classifier just only has local knowledge available
-- centralized scheduler with global knowledge may be infeasible for large arbitrary network
• two-level routing solution cannot avoid local congestion without dynamic routing technique
Department of Computer Science
Q&A
Department of Computer Science
Extra slides