U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal...

28
UNDERSTANDING TCP INCAST THROUGHPUT COLLAPSE IN DATACENTER NETWORKS Presenter: Aditya Agarwal Tyler Maclean

Transcript of U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal...

Page 1: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

UNDERSTANDING TCP INCAST THROUGHPUT COLLAPSE IN DATACENTER NETWORKS

Presenter: Aditya AgarwalTyler Maclean

Page 2: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

MOTIVATION/IMPORTANCEMOTIVATION/IMPORTANCE

Internet datacenters support a myriad of service and applications. Google, Microsoft, Yahoo, Amazon

Vast majority of datacenter use TCP for communication between nodes.

The unique workload, scale and environment of internet datacenter violate the WAN assumption on which TCP was originally designed. RTO = 200ms (default value in Linux) 2-3 order of magnitude greater than the RTT in the data center

Page 3: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

WHAT IS THE PROBLEMWHAT IS THE PROBLEM

Incast communication pattern:

Try to understand TCP incast throughput collapse. Prove this problem is general, An analytical model Modifications to TCP and make sure that it works

client

server

swit

ch server

server

Page 4: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

THE CONTRIBUTIONSTHE CONTRIBUTIONS

Reproduce the problem in our own experimental testbeds and demonstrate the generality of Incast.

Propose a quantitative model that accounts some of the observed Incast behavior.

Implement several intuitive modifications to the TCP stack in Linux, and prove that some modifications are more helpful than others.

Page 5: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

ROADMAPROADMAP

Experiment setting: Workload

Experiment results: Initial Finding Deep analysis

Quantitative Models Conclusions

Page 6: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

WORKLOAD SETTINGWORKLOAD SETTING

Map Reduce like application: Receiver requests k blocks of data from S storage

servers. Each block of data striped across S storage servers Each server responses with a “fixed” amount of

data. (fixed-fragment workload) Client won’t request block k+1 until all the

fragments of block k have been received. Setting:

k=100 S = 1-48 fragment size : 256KB

Page 7: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

DETER NETWORK SECURITY DETER NETWORK SECURITY TESTBEDTESTBED

400 PCs, located at USC ISI and UC Berkeley Supported operating systems include Linux,

FreeBSD, Windows

Page 8: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

INITIAL RESULTSINITIAL RESULTS

Page 9: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

Different sender experience long , synchronized TCP retransmission timeout (RTO) events. RTO =200ms (default

value in WAN environment)

Page 10: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

MINOR AND INTUITIVE MINOR AND INTUITIVE MODIFICATIONSMODIFICATIONS

Decrease the minimum RTO timer from 200ms Randomize the minimum RTO timer Smaller multiplier for the RTO exponential back

off Randomize the multiplier for the RTO

exponential back off.

Page 11: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

INITIAL RESULTSINITIAL RESULTS

Smaller multiplier for the RTO exponential back off Useless

Randomize the multiplier for the RTO exponential back off Useless

There are only a tiny number of exponential back offs for the entire transfer

Page 12: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

INITIAL RESULTSINITIAL RESULTS

Randomize the RTO timer Useless, but also no

penalty

Because the servers share the same switch, all subsequent switch buffer overflow events will be synchronized for all sender.???

Page 13: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

ANALYSIS IN DEPTHANALYSIS IN DEPTH

Different RTO Timers Observations:

Initial goodput min occurs at the same number of servers.

Larger min RTO timer value, max goodput occurs at large number of senders.

Smaller RTO timer value has faster goodput “recovery” rate

The decrease rate after local max is the same between different min RTO settings.

Page 14: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

DELAY ACKS AND HIGH DELAY ACKS AND HIGH RESOLUTION TIMERSRESOLUTION TIMERS

Improving methods proposed by [11] Turn off the delay

ACKs function (defaults delayed ACKs threshold is 40ms)

Use high resolution Timer.

Page 15: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

CONGESTION WINDOWS CONGESTION WINDOWS WITH/WITHOUT DELAY ACKSWITH/WITHOUT DELAY ACKS

Page 16: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

SMOOTHED RTT SMOOTHED RTT WITH/WITHOUT DELAY ACKSWITH/WITHOUT DELAY ACKS

Page 17: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

DIFFERENT WORKLOADDIFFERENT WORKLOAD

Page 18: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

SUB-OPTIMAL BEHAVIOR WITH REGARDS SUB-OPTIMAL BEHAVIOR WITH REGARDS TO DELAYED ACKS IS WORKLOAD TO DELAYED ACKS IS WORKLOAD INDEPENDENT.INDEPENDENT.

Page 19: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

CANNOT MATCH THE RESULTS IN CANNOT MATCH THE RESULTS IN PREVIOUS WORK[11]PREVIOUS WORK[11]

Page 20: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

SMOOTHED RTT SMOOTHED RTT WITH/WITHOUT DELAY ACKSWITH/WITHOUT DELAY ACKS

Page 21: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

QUANTITATIVE MODELSQUANTITATIVE MODELS

Net good put:

D: total amount of data to be sent, 100 blocks of 256KB L: total transfer time of the workload without and RTO

events. R: the number of RTO events during the transfer S: number of server: r: the value of the minimum RTO timer value

D

L (R* r)

S *D

L (R* r)

Page 22: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

FIT THE CURVE OF THE NUMBER OF FIT THE CURVE OF THE NUMBER OF RTO EVENTSRTO EVENTS

Page 23: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

EQUATION OF LEQUATION OF L

I is the inter-packet waiting time

Page 24: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

HOW GOOD IS THEIR ANALYSIS HOW GOOD IS THEIR ANALYSIS MODEL?MODEL?

Page 25: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

FURTHER ANALYSIS ON R FURTHER ANALYSIS ON R AND IAND I

Number of RTO event is similar for different RTO values( 200ms and 1ms).

Interpkt waiting is vary different for different RTO value( 200ms and 1ms).

Page 26: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

QUALITATIVE REFINEMENT FOR QUALITATIVE REFINEMENT FOR THEIR MODELTHEIR MODEL

As the number of sender increase, the number of RTO event per sender increases. Beyond a certain number of sender, the number of RTO event is constant.

When a network resource becomes saturated, it is saturated at the same time for all senders.

After a congestion event, the senders enter the TCP RTO state. The RTO timer expires at each sender with a uniform distribution in time and a constant delay after the congestion event.

T is increase as the number of sender increase, however, T is bounded.

Page 27: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

MORE EXPLANATIONS MORE EXPLANATIONS

A smaller minimum RTO timer value means larger goodput values for the initial minimum.

The initial goodput minimum occurs at the same number of senders, regardless the value of the minimum RTO times.

The second order goodput peak occurs at a higher number of senders for a larger RTO timer value

The smaller the RTO timer values, the faster the rate of recovery between the goodput minimum and the second order goodput maximum.

After the second order goodput maximum, the slope of goodput decrease is the same for different RTO timer values.

Page 28: U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.

CONCLUSIONSCONCLUSIONS

Study the dynamic of Incast. Propose a simple mathematical model to

explain the observed trends Account for the difference between their

observation and that in previous work.