1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido...

28
1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University [email protected] http://www.stanford.edu/~nickm

Transcript of 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido...

Page 1: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

1

High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.

EE384Y: Packet Switch ArchitecturesPart II

Sizing Router Buffers

(Recent work by Guido Appenzeller)

Nick McKeownProfessor of Electrical Engineering and Computer Science, Stanford University

[email protected]://www.stanford.edu/~nickm

Page 2: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

2

How much Buffer does a Router need?

Universally applied rule-of-thumb: A router needs a buffer size:

• 2T is the round-trip propagation time (or just 250ms)• C is the capacity of the outgoing link

Background Mandated in backbone and edge routers. Appears in RFPs and IETF architectural guidelines. Has major consequences for router design. Comes from dynamics of TCP congestion control. Villamizar and Song: “High Performance TCP in ANSNET”,

CCR, 1994. Based on 2 to 16 TCP flows at speeds of up to 40 Mb/s.

CTB 2

Page 3: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

3

Example

10Gb/s linecard or router Requires 300Mbytes of buffering. Read and write new packet every 32ns.

Memory technologies SRAM: require 80 devices, 1kW, $2000. DRAM: require 4 devices, but too slow.

Problem gets harder at 40Gb/s Hence RLDRAM, FCRAM, etc.

Page 4: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

4

TCP

TCP adapts to congestion Sender sends packets, receiver sends ACKs Sending rate is controlled by Window W At any time, only W unacknowledged packets may be

outstanding

W is adjusted for each packet (in CA mode): If ACK received: W = W+1/W (W=W+1 for each W

packets) If packet is lost: W = W/2 (W halved in case of loss)

The sending rate of TCP is: RTT

WR

Page 5: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

5

Single TCP FlowRouter with large enough buffers for full link utilization

B

DestCC’ > C

Source

maxW

2maxW

t

Window size Buffer size and RTT

For every W ACKs received, send W+1 packets

RTT

WR

Page 6: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

6

Over-buffered Link

Page 7: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

7

Under-buffered Link

Page 8: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

8

Buffer = Rule-of-thumb

Interval magnifiedon next slide

Page 9: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

9

Microscopic TCP BehaviorWhen sender pauses, buffer drains

one RTTDrop

Page 10: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

10

Origin of rule-of-thumb Before and after reducing window size, the sending rate of the

TCP sender is the same

Inserting the rate equation we get

The RTT is part transmission delay T and part queuing delay B/C . We know that after reducing the window, the queueing delay is zero.

newold RR

new

new

old

old

RTT

W

RTT

W

T

W

CBT

W oldold

2

2/

/2

BCT 2

Page 11: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

11

Rule-of-thumb

Rule-of-thumb makes sense for one flow Typical backbone link has > 20,000 flows Does the rule-of-thumb still hold?

Answer: If flows are perfectly synchronized, then Yes. If flows are desynchronized then No.

Page 12: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

12

Buffer size is height of sawtooth

t

B

0

Page 13: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

13

If flows are synchronized

maxW

Aggregate window has same dynamics Therefore buffer occupancy has same dynamics Rule-of-thumb still holds.

2maxW

t

max

2

W

maxW

Page 14: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

14

Two TCP FlowsTwo TCP flows can synchronize

Page 15: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

15

If flows are not synchronized

maxW

Aggregate window has less variation Therefore buffer occupancy has less variation The more flows, the smaller the variation Rule-of-thumb does not hold.

2maxW

t

)( WMin

)( WMax

Page 16: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

16

If flows are not synchronized

maxW

2maxW

ProbabilityDistributionBuffer Size

B

0

Page 17: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

17

Quantitative Model Model congestion window of a flow as random variable

)(tWi model as )(][ xfxWP i iW where

For many de-synchronized flows We assume congestions windows are independent All congestion windows have the same probability distribution

2]var[][ WiWi WWE

Now central limit theorem gives us queue length distribution

)1,0()( NnntW WWn

i

Page 18: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

18

Required buffer size

2T C

n

Simulation

Page 19: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

19

Required buffer size

2T C

n

99.9%

98.0%

99.5%2T C

n

Page 20: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

20

Small buffers help short flowsAverage flow completion times of 14 packet flows that share a congested bottleneck link with long-lived flows.

2T C

n

CT 2

Page 21: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

21

Experiments with backbone routerGSR 12000, OC3 Line Card

TCP

Flows

Router Buffer Link Utilization

Pkts RAM Model Sim Exp

100 0.5 x

1 x

2 x

3 x

64

129

258

387

1Mb

2Mb

4Mb

8Mb

96.9%

99.9%

100%

100%

94.7%

99.3%

99.9%

99.8%

94.9%

98.1%

99.8%

99.7%

400 0.5 x

1 x

2 x

3 x

32

64

128

192

512kb

1Mb

2Mb

4Mb

99.7%

100%

100%

100%

99.2%

99.8%

100%

100%

99.5%

100%

100%

99.9%

2T C

n

Thanks: Experiments conducted by Paul Barford and Joel Sommers, U of Wisconsin

Page 22: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

22

What about Short Flows?

So far we assumed long flows in congestion avoidance mode. What if traffic is mainly short flows in slow-start?

Answer: Behavior is different, but In mixes of flows, long flows drive buffer requirements Required buffer for short flows is independent of line

speed and RTT (same for 1Mbit/s or 40 Gbit/s)

Page 23: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

23

A single, short-lived TCP flowFlow length 62 packets, RTT ~140 ms

2

4

8

16

32

RTTsynfin ack

received

Flow Completion Time (FCT)

Page 24: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

24

Modelling TCPFlows vs. independent bursts

Inter-Burst Arrival Time is greater than buffer sizeTherefore, we assume bursts are independent.

Poisson arrivals of flows

Arrivals of length Lflow (the

flow length in packets)

Poisson arrivals of bursts

Four different poisson arrival processes of lengths 2,4,...

S i Lflow

flow

CLflow

S i 2,4, 8, 16

burst

CLflow E S i

flow

E S i

Page 25: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

25

The M/G/1 ModelTCP traffic is modelled as an M/G/1 arrival

process: poisson arrivals of jobs

with an arrival rate of

S i 2,4, 8, 16...

burst E Si

is the load

Average queue length in jobs is:

E NQ

2E S 2

2 1

2E S 2

2 1 E S 2

This gives us an average queue length in packets of

E Q E NQ E S2E S 2

2 1 E S

Let's see if this works in practice...

Page 26: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

26

Average Queue length

capacity :C 40Mbit sload : 0.8

for length 50packets :Lflow 400MbitAverage100flows secondCompletion time 400ms

Page 27: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

27

Queue Distribution To determine the required buffer, we need the queue

distribution.

Or at least the tail endof the queue distribution

PDrop

x B

P Q x

P(Q = x)

Q

Buffer B

Packet Loss

● For M/G/1 queues there is no general solution for the queue distribution.

● We did two things (details are in the paper):

– Use M/G/1 processor sharing model (bad)– Use Frank Kelly's effective bandwidth (good)

Page 28: 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering.

28

In Summary

Buffer size is dictated by long TCP flows. 10Gb/s linecard with 200,000 x 56kb/s flows

Rule-of-thumb: Buffer = 2.5Gbits• Requires external, slow DRAM

Becomes: Buffer = 6Mbits• Can use on-chip, fast SRAM• Completion time halved for short-flows

40Gb/s linecard with 40,000 x 1Mb/s flows Rule-of-thumb: Buffer = 10Gbits Becomes: Buffer = 50Mbits