1 Data Center Networking CS 6250: Computer Networking Fall 2011.

52
1 Data Center Networking CS 6250: Computer Networking Fall 2011

Transcript of 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Page 1: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

1

Data Center Networking

CS 6250: Computer NetworkingFall 2011

Page 2: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Cloud Computing

• Elastic resources– Expand and contract resources– Pay-per-use– Infrastructure on demand

• Multi-tenancy– Multiple independent users– Security and resource isolation– Amortize the cost of the (shared) infrastructure

• Flexibility service management– Resiliency: isolate failure of servers and storage– Workload movement: move work to other locations

2

Page 3: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Trend of Data Center

By J. Nicholas Hoover , InformationWeek June 17, 2008 04:00 AM

200 million Euro

Data centers will be larger and larger in the future cloud computing era to benefit from commodities of scale. Tens of thousand Hundreds of thousands in # of servers

Most important things in data center management- Economies of scale- High utilization of equipment- Maximize revenue- Amortize administration cost- Low power consumption (http://www.energystar.gov/ia/partners/prod_development/downloads/EPA_Datacenter_Report_Congress_Final1.pdf)

http://gigaom.com/cloud/google-builds-megadata-center-in-finland/

http://www.datacenterknowledge.com

Page 4: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Cloud Service Models

• Software as a Service– Provider licenses applications to users as a service– e.g., customer relationship management, email, …– Avoid costs of installation, maintenance, patches, …

• Platform as a Service– Provider offers software platform for building applications– e.g., Google’s App-Engine– Avoid worrying about scalability of platform

• Infrastructure as a Service– Provider offers raw computing, storage, and network– e.g., Amazon’s Elastic Computing Cloud (EC2)– Avoid buying servers and estimating resource needs

4

Page 5: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Multi-Tier Applications

• Applications consist of tasks– Many separate components– Running on different machines

• Commodity computers– Many general-purpose computers– Not one big mainframe– Easier scaling

5

Front end Server

Aggregator

Aggregator Aggregator… …

Aggregator

Worker

Worker Worker

Worker Worker

Page 6: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Enabling Technology: Virtualization

• Multiple virtual machines on one physical machine• Applications run unmodified as on real machine• VM can migrate from one computer to another6

Page 7: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Status Quo: Virtual Switch in Server

7

Page 8: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Top-of-Rack Architecture

• Rack of servers– Commodity servers– And top-of-rack switch

• Modular design– Preconfigured racks– Power, network, and

storage cabling

• Aggregate to the next level

8

Page 9: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Modularity

• Containers

• Many containers

9

Page 10: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

10

Data Center Challenges

• Traffic load balance• Support for VM migration• Achieving bisection bandwidth• Power savings / Cooling• Network management (provisioning)• Security (dealing with multiple tenants)

Page 11: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Data Center Costs (Monthly Costs)

• Servers: 45%– CPU, memory, disk

• Infrastructure: 25%– UPS, cooling, power distribution

• Power draw: 15%– Electrical utility costs

• Network: 15%– Switches, links, transit

11

http://perspectives.mvdirona.com/2008/11/28/CostOfPowerInLargeScaleDataCenters.aspx

Page 12: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

12

Common Data Center Topology

Internet

Servers

Layer-2 switchAccess

Data Center

Layer-2/3 switchAggregation

Layer-3 routerCore

Page 13: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Data Center Network Topology

13

CR CR

AR AR AR AR. . .

SS

Internet

SS

A AA …

SS

A AA …

. . .

Key•CR = Core Router•AR = Access Router•S = Ethernet Switch•A = Rack of app. servers

~ 1,000 servers/pod

Page 14: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Requirements for future data center• To catch up with the trend of mega data center, DCN technology

should meet the requirements as below– High Scalability– Transparent VM migration (high agility)– Easy deployment requiring less human administration– Efficient communication– Loop free forwarding– Fault Tolerance

• Current DCN technology can’t meet the requirements.– Layer 3 protocol can not support the transparent VM migration.– Current Layer 2 protocol is not scalable due to the size of forwarding table

and native broadcasting for address resolution.

Page 15: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

15

Problems with Common Topologies

• Single point of failure

• Over subscription of links higher up in the topology

• Tradeoff between cost and provisioning

Page 16: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Capacity Mismatch

16

CR CR

AR AR AR AR

SS

SS

A AA …

SS

A AA …

. . .

SS

SS

A AA …

SS

A AA …

~ 5:1~ 5:1

~ 40:1~ 40:1

~ 200:1~ 200:1

Page 17: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Data-Center Routing

17

CR CR

AR AR AR AR. . .

SS

DC-Layer 3

Internet

SS

A AA …

SS

A AA …

. . .

DC-Layer 2

Key•CR = Core Router (L3)•AR = Access Router (L3)•S = Ethernet Switch (L2)•A = Rack of app. servers

~ 1,000 servers/pod == IP subnet

S S S S

SS

Page 18: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Reminder: Layer 2 vs. Layer 3• Ethernet switching (layer 2)

– Cheaper switch equipment– Fixed addresses and auto-configuration– Seamless mobility, migration, and failover

• IP routing (layer 3)– Scalability through hierarchical addressing– Efficiency through shortest-path routing– Multipath routing through equal-cost multipath

• So, like in enterprises…– Data centers often connect layer-2 islands by IP routers

18

Page 19: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

19

Need for Layer 2

• Certain monitoring apps require server with same role to be on the same VLAN

• Using same IP on dual homed servers

• Allows organic growth of server farms

• Migration is easier

Page 20: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

20

Review of Layer 2 & Layer 3• Layer 2

– One spanning tree for entire network• Prevents loops• Ignores alternate paths

• Layer 3– Shortest path routing between source and destination– Best-effort delivery

Page 21: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

21

FAT Tree-Based Solution

• Connect end-host together using a fat-tree topology – Infrastructure consist of cheap devices

• Each port supports same speed as endhost– All devices can transmit at line speed if packets are distributed

along existing paths– A k-port fat tree can support k3/4 hosts

Page 22: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

22

Fat-Tree Topology

Page 23: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

23

Problems with a Vanilla Fat-tree

• Layer 3 will only use one of the existing equal cost paths

• Packet re-ordering occurs if layer 3 blindly takes advantage of path diversity

Page 24: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

24

Modified Fat Tree

• Enforce special addressing scheme in DC– Allows host attached to same switch to route only

through switch– Allows inter-pod traffic to stay within pod– unused.PodNumber.switchnumber.Endhost

• Use two level look-ups to distribute traffic and maintain packet ordering.

Page 25: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

25

Two-Level Lookups

• First level is prefix lookup– Used to route down the topology to endhost

• Second level is a suffix lookup– Used to route up towards core– Diffuses and spreads out traffic– Maintains packet ordering by using the same ports for

the same endhost

Page 26: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

26

Diffusion Optimizations

• Flow classification– Eliminates local congestion– Assign to traffic to ports on a per-flow basis instead of

a per-host basis

• Flow scheduling– Eliminates global congestion– Prevent long lived flows from sharing the same links– Assign long lived flows to different links

Page 27: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

27

Drawbacks

• No inherent support for VLAN traffic

• Data center is fixed size

• Ignored connectivity to the Internet

• Waste of address space– Requires NAT at border

Page 28: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Data Center Traffic Engineering

Challenges and Opportunities

28

Page 29: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Wide-Area Network

29

Router Router

DNS Server

DNS-basedsite selection

. . . . . .Servers Servers

Internet

Clients

Data Centers

Page 30: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Wide-Area Network: Ingress Proxies

30

Router Router

Data Centers

. . . . . .Servers Servers

Clients

ProxyProxy

Page 31: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Traffic Engineering Challenges

• Scale– Many switches, hosts, and virtual machines

• Churn– Large number of component failures

– Virtual Machine (VM) migration

• Traffic characteristics– High traffic volume and dense traffic matrix

– Volatile, unpredictable traffic patterns

• Performance requirements– Delay-sensitive applications

– Resource isolation between tenants31

Page 32: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Traffic Engineering Opportunities• Efficient network

– Low propagation delay and high capacity

• Specialized topology– Fat tree, Clos network, etc.– Opportunities for hierarchical addressing

• Control over both network and hosts– Joint optimization of routing and server placement– Can move network functionality into the end host

• Flexible movement of workload– Services replicated at multiple servers and data centers– Virtual Machine (VM) migration

32

Page 33: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

PortLand: Main IdeaAdd a new host

Transfer a packet

Key features - Layer 2 protocol based on tree topology - PMAC encode the position information - Data forwarding proceeds based on PMAC- Edge switch’s responsible for mapping between PMAC and AMAC- Fabric manger’s responsible for address resolution- Edge switch makes PMAC invisible to end host- Each switch node can identify its position by itself- Fabric manager keep information of overall topology. Corresponding to the fault, it notifies affected nodes.

Page 34: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Questions in discussion board• Question about Fabric Manager

o How to make Fabric Manager robust ?o How to build scalable Fabric Manager ?

Redundant deployment or cluster Fabric manager could be solution

• Question about reality of base-line tree topologyo Is the tree topology common in real world ?

Yes, multi-rooted tree topology has been a traditional topology in data center.

[A scalable, commodity data center network architecture Mohammad La-Fares and et el, SIGCOMM ‘08]

Page 35: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Discussion pointso Is the PortLand applicable to other topology ?

The Idea of central ARP management could be applicable. To solve forwarding loop problem, TRILL header-like method would be necessary. The benefits of PMAC ? It would require larger size of forwarding table.

• Question about benefits from VM migrationo VM migration helps to reduce traffic going through aggregate/core switches.o How about user requirement change?o How about power consumption ?

• ETCo Feasibility to mimic PortLand on layer 3 protocol.

o How about using pseudo IP ?

o Delay to boot up whole data center

Page 36: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Status Quo: Conventional DC Network

Reference – “Data Center: Load balancing Data Center Services”, Cisco 2004

CR CR

AR AR AR AR. . .

SS

DC-Layer 3

Internet

SS

A AA …

SS

A AA …

. . .

DC-Layer 2Key

• CR = Core Router (L3)• AR = Access Router (L3)• S = Ethernet Switch (L2)• A = Rack of app. servers

~ 1,000 servers/pod == IP subnet

Page 37: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Conventional DC Network Problems

CR CR

AR AR AR AR

SS

SS

A AA …

SS

A AA …

. . .

SS

SS

A AA …

SS

A AA …

~ 5:1

~ 40:1

~ 200:1

Dependence on high-cost proprietary routers

Extremely limited server-to-server capacity

Page 38: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

And More Problems …CR CR

AR AR AR AR

SS

SS SS

SS

SS SS

IP subnet (VLAN) #1

~ 200:1

• Resource fragmentation, significantly lowering utilization (and cost-efficiency)

IP subnet (VLAN) #2

A AA … A AA … A A… AA …AA A

Page 39: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

And More Problems …CR CR

AR AR AR AR

SS

SS SS

SS

SS SS

IP subnet (VLAN) #1

~ 200:1

• Resource fragmentation, significantly lowering cloud utilization (and cost-efficiency)

Complicated manual L2/L3 re-configuration

IP subnet (VLAN) #2

A AA … A AA … A A… AA …AA A

Page 40: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

All We Need is Just a Huge L2 Switch,

or an Abstraction of One

A AA … A AA …

. . .

A AA … A AA …

CR CR

AR AR AR AR

SS

SS SS

SS

SS SS

AAAA AAAA AAAA A A A A AA A AA AA AA

. . .

Page 41: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

41

VL2 Approach

• Layer 2 based using future commodity switches• Hierarchy has 2:

– access switches (top of rack) – load balancing switches

• Eliminate spanning tree– Flat routing– Allows network to take advantage of path diversity

• Prevent MAC address learning– 4D architecture to distribute data plane information– TOR: Only need to learn address for the intermediate switches– Core: learn for TOR switches

• Support efficient grouping of hosts (VLAN replacement)

Page 42: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

42

VL2

Page 43: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

43

VL2 Components

• Top-of-Rack switch: – Aggregate traffic from 20 end host in a rack– Performs ip to mac translation

• Intermediate Switch– Disperses traffic– Balances traffic among switches– Used for valiant load balancing

• Decision Element– Places routes in switches– Maintain a directory services of IP to MAC

• End-host– Performs IP-to-MAC lookup

Page 44: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

44

Routing in VL2

• End-host checks flow cache for MAC of flow– If not found ask agent to resolve– Agent returns list of MACs for server and MACs for intermediate

routers

• Send traffic to Top of Router – Traffic is triple-encapsulated

• Traffic is sent to intermediate destination

• Traffic is sent to Top of rack switch of destination

Page 45: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

The Illusion of a Huge L2 Switch

1. L2 semantics

2. Uniform high capacity

3. Performance isolation

A AA … A AA … A AA … A AA …AAAA AAAA AAAA A A A A AA A AA AA AA

Page 46: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

SolutionApproachObjective

2. Uniformhigh capacity between servers

Enforce hose model using existing

mechanisms only

Employ flat addressing1. Layer-2 semantics

3. Performance Isolation

Guarantee bandwidth for

hose-model traffic

Flow-based random traffic indirection

(Valiant LB)

Name-location separation & resolution

service

TCP

Objectives and Solutions

Page 47: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Name/Location Separation

47

payloadToR3

. . . . . .

yx

Servers use flat names

Switches run link-state routing and maintain only switch-level topology

Cope with host churns with very little overheadCope with host churns with very little overhead

y zpayloadToR4 z

ToR2 ToR4ToR1 ToR3

y, zpayloadToR3 z

. . .

DirectoryService

…x ToR2

y ToR3

z ToR4

Lookup &Response

…x ToR2

y ToR3

z ToR3

• Allows to use low-cost switches• Protects network and hosts from host-state churn• Obviates host and switch reconfiguration

• Allows to use low-cost switches• Protects network and hosts from host-state churn• Obviates host and switch reconfiguration

Page 48: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Clos Network Topology

48

. . .

. . .

TOR

20 Servers

Int

. . . . . . . . .

Aggr

K aggr switches with D ports

20*(DK/4) Servers

. . . . . . . . . . .

Offer huge aggr capacity & multi paths at modest costOffer huge aggr capacity & multi paths at modest cost

D (# of 10G ports)

Max DC size(# of Servers)

48 11,520

96 46,080

144 103,680

Page 49: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Valiant Load Balancing: Indirection

49

x y

payloadT3 y

z

payloadT5 z

IANYIANYIANY

IANY

Cope with arbitrary TMs with very little overheadCope with arbitrary TMs with very little overhead

Links used for up pathsLinks usedfor down paths

T1 T2 T3 T4 T5 T6

[ ECMP + IP Anycast ]• Harness huge bisection bandwidth• Obviate esoteric traffic engineering or optimization• Ensure robustness to failures• Work with switch mechanisms available today

[ ECMP + IP Anycast ]• Harness huge bisection bandwidth• Obviate esoteric traffic engineering or optimization• Ensure robustness to failures• Work with switch mechanisms available today

1. Must spread traffic2. Must ensure dst independence1. Must spread traffic2. Must ensure dst independenceEqual Cost Multi Path ForwardingEqual Cost Multi Path Forwarding

Page 50: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

50

Properties of Desired Solutions• Backwards compatible with existing

infrastructure– No changes in application– Support of layer 2 (Ethernet)

• Cost-effective– Low power consumption & heat emission– Cheap infrastructure

• Allows host communication at line speed• Zero-configuration• No loops• Fault-tolerant

Page 51: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Research Questions

• What topology to use in data centers?– Reducing wiring complexity

– Achieving high bisection bandwidth

– Exploiting capabilities of optics and wireless

• Routing architecture?– Flat layer-2 network vs. hybrid switch/router

– Flat vs. hierarchical addressing

• How to perform traffic engineering?– Over-engineering vs. adapting to load

– Server selection, VM placement, or optimizing routing

• Virtualization of NICs, servers, switches, …51

Page 52: 1 Data Center Networking CS 6250: Computer Networking Fall 2011.

Research Questions• Rethinking TCP congestion control?

– Low propagation delay and high bandwidth– “Incast” problem leading to bursty packet loss

• Division of labor for TE, access control, …– VM, hypervisor, ToR, and core switches/routers

• Reducing energy consumption– Better load balancing vs. selective shutting down

• Wide-area traffic engineering– Selecting the least-loaded or closest data center

• Security– Preventing information leakage and attacks

52