Reporter: Fuchao Zhou

17
Department of Computer Science A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares Alexander Loukissas Amin Vahdat SIGCOMM’08 Reporter: Fuchao Zhou

description

A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares Alexander Loukissas Amin Vahdat SIGCOMM’08. Reporter: Fuchao Zhou. Problem. How to design Data Center Network Architecture -- Scalable interconnection bandwidth -- Without incurring tremendous cost - PowerPoint PPT Presentation

Transcript of Reporter: Fuchao Zhou

Page 1: Reporter: Fuchao Zhou

Department of Computer Science

A Scalable, Commodity Data Center Network Architecture

Mohammad Al-Fares Alexander Loukissas Amin VahdatSIGCOMM’08

Reporter: Fuchao Zhou

Page 2: Reporter: Fuchao Zhou

Department of Computer Science

Problem• How to design Data Center Network Architecture -- Scalable interconnection bandwidth

-- Without incurring tremendous cost

-- Compatibility with hosts running Ethernet and IP

Page 3: Reporter: Fuchao Zhou

Department of Computer Science

Existing solutions• Using specialized hardware and communication

protocols such as InfiniBand and Myrinet -- More expensive for using high-end switches

-- Not natively compatible with TCP/IP applications

• Using commodity Ethernet switches and routers to interconnect cluster machines

-- Need appropriate network topology

-- Bandwidth scales poorly with cluster size

-- Non-linear cost increases with cluster size

Page 4: Reporter: Fuchao Zhou

Department of Computer Science

Existing solutions• Typical architectures today -- Two-level trees of switches or routers (supports 5K to 8K hosts)

-- Three-level trees of switches or routers

• Disadvantages -- only support 50%

bandwidth available at the edge of the network

-- incurring tremendous cost($37M to supports 27,648 hosts)

Page 5: Reporter: Fuchao Zhou

Department of Computer Science

Proposed solution• Typical architectures today -- k pods, each containing two layers of k/2 switches

-- (k/2)2 k-port core switches

-- supports k3/4 hosts(48-ary fat-tree supports 27,648 hosts)

k-ary fat-tree topology

• Advantages -- non-blocking

-- all switching elements are identical ($8.64M to supports 27,648 hosts)

-- compatible with hosts running Ethernet and IP

Page 6: Reporter: Fuchao Zhou

Department of Computer Science

Static Routing method• two-level routing

table

-- maximum bisection bandwidth in this network

• IP address -- Core switches:10.k.j.i

-- Pod switches: 10.pod.switch.1

-- Hosts:10.pod.switch.ID

Page 7: Reporter: Fuchao Zhou

Department of Computer Science

Static Routing example

Packet from 10.0.1.2 to host 10.2.0.3 Packet from 10.0.1.3 to host 10.2.0.2

10.2.1.310.0.1.3

10.2.3.110.0.3.1

32

23

23

Prefix Output port

10.0.0.0/24 0

10.0.1.0/24 1

0.0.0.0/00.0.0.2/8 2

0.0.0.3/8 3

Prefix Output port

10.0.0.0/16 0

10.1.0.0/16 1

10.2.0.0/16 2

10.3.0.0/16 3

Prefix Output port

10.2.0.0/24 0

10.2.1.0/24 1

0.0.0.0/0

01

Page 8: Reporter: Fuchao Zhou

Department of Computer Science

Dynamic Routing methods

• flow classification 1. Recognize subsequence packets of the same flow, and

forward them to the same outgoing port against packet reordering;

2. Periodically reassign output ports to ensure fair distribution on flows on output ports in the face of dynamically changing flow size.

Page 9: Reporter: Fuchao Zhou

Department of Computer Science

Dynamic Routing methods

• flow scheduling (with a central scheduler)Method1:(notification)

1. Edge switches detect any outgoing large flow

2. Send notifications to a central scheduler periodically

3. The central scheduler order a re-assignment;

Method2:(monitor)

1. A central scheduler tracks all active large flows

2. Assign them non-conflicting paths if possible.

3. The scheduler maintains Boolean state for all links

Page 10: Reporter: Fuchao Zhou

Department of Computer Science

Fault-Tolerance

• Simple failure broadcast protocol -- Each switch maintains a Bidirectional forwarding

Detection session(BRD)(D.Datz, D.Ward. BFD for IPv4 AND IPv6, 2008)

• Two classes of failures

Page 11: Reporter: Fuchao Zhou

Department of Computer Science

Fault-Tolerance based on the flow classification(1)

Outgoing inter- and intra-pod traffic originating from the edge switchIntra-pod traffic using the upper-layer switch as an intermediaryInter-pod traffic coming into the upper-layer switch

Page 12: Reporter: Fuchao Zhou

Department of Computer Science

Fault-Tolerance based on the flow classification(2)

Outgoing inter-pod traffic

Incoming inter-pod traffic

Page 13: Reporter: Fuchao Zhou

Department of Computer Science

Fault-Tolerance based on the flow scheduling

• Simpler• The scheduler marks any link reported to be down

as busy or unavailable

Page 14: Reporter: Fuchao Zhou

Department of Computer Science

Limitations• The performance evaluation of a prototype of the

architecture consisting of 4 pods(16 hosts)• Fat-tree topology is wiring overhead

-- 3k3/4 wire cables for a k-ary fat tree

-- e.g. k=48, supporting 27,648 hosts.

3*483/4=82,944 wire cables --.• How many changes for the commodity switches

should be considered. --don’t support the dynamic routing techniques -- don’t

support two-level routing table

Page 15: Reporter: Fuchao Zhou

Department of Computer Science

Limitations• Dynamic routing techniques also have limitations

--- flow classifier just only has local knowledge available

-- centralized scheduler with global knowledge may be infeasible for large arbitrary network

• two-level routing solution cannot avoid local congestion without dynamic routing technique

Page 16: Reporter: Fuchao Zhou

Department of Computer Science

Q&A

Page 17: Reporter: Fuchao Zhou

Department of Computer Science

Extra slides