XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo...

28
XCo: Explicit Coordination to Prevent Network Fabric Congestion in Cloud Computing Cluster Platforms Presented by Wei Dai

Transcript of XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo...

Page 1: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

XCo: Explicit Coordination to Prevent Network Fabric Congestion in Cloud Computing Cluster 

Platforms

Presented by Wei Dai

Page 2: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Reasons for Congestion in CloudReasons for Congestion in Cloud

• Cloud operators use virtualization to consolidateCloud operators use virtualization to consolidate thousands of VMs on shared hardware platforms due to cost concerns.

• Most VMs host service‐oriented applications that are ppinherently communication intensive.

2

Page 3: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Reasons for Congestion in CloudReasons for Congestion in Cloud

• Cloud computing infrastructures consist of large dataCloud computing infrastructures consist of large data center clusters using commodity servers and networking hardware.

Pros:Cheap

Easy to install and manage

Can be shared by a wide range of network services and protocolsprotocols

3

Page 4: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Reasons for Congestion in CloudReasons for Congestion in Cloud

• Cloud computing infrastructures consist of large data p g gcenter clusters using commodity servers and networking hardware.Cons:Higher latencySmaller / lower‐performance packet buffersSmaller / lower‐performance packet buffers

• Switch buffers can easily become overwhelmed by y yhigh‐throughput traffic that can be bursty and synchronized, leading to significant packet losses.

4

Page 5: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Types of CongestionTypes of Congestion

• TCP throughput collapse (also known as Incast)TCP throughput collapse (also known as Incast)

Well‐known example of congestion experienced by barrier‐synchronized traffic, e.g. synchronous reads in networked storage

• Congestion caused by non‐TCP traffic, e.g. UDP

• Congestion caused by traffic not TCP‐friendly

voice/video over IP, and peer‐to‐peer traffic

• Congestion caused by large number of short TCP sessions

5

Page 6: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

How to Solve the ProblemHow to Solve the Problem

• Root cause: transient overload of buffers withinRoot cause: transient overload of buffers within switches

• Hardware and software mechanisms are hard to deploy at scale.

• Ethernet flow control in IEEE 802.3x helps in low‐end edge switches, but is counter‐productive in backbone switches.

6

Page 7: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

How to Solve the ProblemHow to Solve the Problem

• Current industry practice:Current industry practice:Add higher capacity network switches

Multi‐port network cards

Physically separate networks for data and control traffic

• Drawback: increase cost and complexity without addressing the root cause

7

Page 8: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

XCo – Explicit CoordinationXCo Explicit Coordination

• Coordinate network transmissions from multipleCoordinate network transmissions from multiple VMs to avoid throughput collapse and increase network utilization

• Advantages: simple, effective, feasible, and g pindependent of switch‐level hardware support, transparent implementation without modifying any 

l d d l k happlications, standard protocols, network switches or VMs

8

Page 9: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

XCo – Explicit CoordinationXCo Explicit Coordination

9

Page 10: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Central ControllerCentral Controller

• Resides in the same switched network as other nodes

• Takes as input:Switch interconnection topology and link capacitiesSwitch interconnection topology and link capacitiesLocation of VMs on physical nodesCurrent traffic matrix of the networkAdministrative policiesAdministrative policies

• Whenever detects congestion buildup at any link, computes and sends transmission directives to local coordinators at each end‐host that is contributing to the congestion

10

Page 11: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Local CoordinatorLocal Coordinator

• Intercepts and regulates the outgoing trafficIntercepts and regulates the outgoing traffic aggregates (VM‐to‐VM flows) from all VMs within the corresponding end‐host according to transmission directives 

• Provides traffic feedback to the central controller

• The specific regulation pattern is dictated by transmission directives.

11

Page 12: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Transmission DirectivesTransmission Directives

• Explicit instructions for transmissionExplicit instructions for transmission

• Various forms:Various forms:

Explicit timeslice scheduling

which V2V flow transmits when and for how longwhich V2V flow transmits when and for how long

Explicit rate limiting

t h t t V2V fl h ld t it f th t Nat what rate a V2V flow should transmit for the next N ms

Combination of the above two or other forms

12

Page 13: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Explicit Timeslice SchedulingExplicit Timeslice Scheduling

13

Page 14: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Work ConservationWork Conservation

• Some nodes may finish early with their timeslice.y y

• Local coordinators return the remaining part of timesliceb k l llback to central controller.

• Central controller then permits another node to transmit• Central controller then permits another node to transmit.

• Local coordinators introduce a small hysteresis delay y ybefore returning the timeslice, in case that more packets might arrive during the delay.

14

Page 15: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Experimental SetupsExperimental Setups

15

Page 16: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Impact of Ethernet CongestionImpact of Ethernet Congestion

16

Page 17: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Performance Evaluation of XCoPerformance Evaluation of XCo

17

Page 18: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Impact of Ethernet CongestionImpact of Ethernet Congestion

18

Page 19: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Performance Evaluation of XCoPerformance Evaluation of XCo

19

Page 20: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Impact of Ethernet CongestionImpact of Ethernet Congestion

20

Page 21: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Performance Evaluation of XCoPerformance Evaluation of XCo

21

Page 22: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Experimental SetupExperimental Setup

22

Page 23: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Impact of Ethernet CongestionImpact of Ethernet Congestion

23

Page 24: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Performance Evaluation of XCoPerformance Evaluation of XCo

24

Page 25: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Live VMMigrationLive VM Migration

25

Page 26: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Fairness among V2V FlowsFairness among V2V Flows

26

Page 27: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

Work ConservationWork Conservation 

27

Page 28: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to

ReferenceReference

• XCo: explicit coordination to prevent network fabric congestion in cloud computing cluster platformsVijay Shankar Rajanna, Smit Shah, Anand Jahagirdar, Christopher Lemoine, and Kartik Gopalan

Proceedings of the 19th ACM International Symposium onProceedings of the 19th ACM International Symposium on High Performance Distributed Computing, June 2010 

28