1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of...

Post on 24-Dec-2015

215 views 1 download

Tags:

Transcript of 1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of...

1

High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.

IP routers with memory thatruns slower than the line rate

Nick McKeownAssistant Professor of Electrical Engineering and Computer Science, Stanford University

nickm@stanford.eduhttp://www.stanford.edu/~nickm

2

Outline

• Trends in packet switch design • Additional problem:

“Data rates may soon exceed memory bandwidth”

• The Fork-Join Router & Parallel Packet Switches

3

Output 2

Output N

First Packet SwitchesShared Memory

Large, single dynamically allocated memory buffer:N writes per “cell” timeN reads per “cell” time.

Limited by memory bandwidth.

Input 1 Output 1

Input N

Input 2

Numerous work has proven and made possible:– Fairness– Delay Guarantees– Delay Variation Control– Loss Guarantees– Statistical Guarantees

4

Later Packet SwitchesSingle-stage crossbar with CIOQ and

VOQs

1 write per “cell” time 1 read per “cell” timeRate of writes/reads determined by switch

fabric speedup

Lookup&

DropPolicy

OutputScheduling

Virtual Output Queues

OutputScheduling

OutputScheduling

SwitchFabric

SwitchArbitration

Linecard Linecard

Switch Core(Bufferless)

Lookup&

DropPolicy

Lookup&

DropPolicy

5

Myths about CIOQ-based crossbar switches

1. “Input-queued crossbars have low throughput”– An input-queued crossbar can have as high

throughput as any switch.

2. “Crossbars don’t support multicast traffic well”– A crossbar inherently supports multicast efficiently.

3. “Crossbars don’t scale well”– Today, it is the number of chip I/Os, not the number

of crosspoints, that limits the size of a switch fabric. Expect 5Tb/s crossbar switches.

6

Myths about CIOQ-based crossbar switches (2)

4. “Crossbar switches can’t support delay/QoS guarantees”

– With an internal speedup of 2, a CIOQ switch can (in theory) precisely emulate a shared memory switch for all traffic.

7

What makes sense today?

Shared Memory

Input Queued

CIOQ Multistage

Blocking No No No Yes

Speedup High High Small High

Emulation of SM Yes No Yes No

Multicast Good Good Good Poor

Resequencing No No No Yes

Power Low OK OK High

Packaging - OK OK Complex

8

Summary of trend

Output 2

Output N

Input 1 Output 1

Input N

Input 2

SwitchFabric

SwitchArbitration

Higher CapacityMultistage:•Clos•Banyan•Toroidal…

Less frequentarbitration

Limited by:Memory bandwidth~50Gb/s

Limited by:Per-cell arbitrationPower~5Tb/s

1

2

9

Buffer MemoryHow Fast Can I Make a Packet Buffer?

BufferMemory

10ns on-chip DRAM

Rough Estimate:– 10ns per memory operation.– Two memory operations per

packet.– Therefore, maximum ~26Gb/s.

64-byte wide bus 64-byte wide bus

Exte

rnal

Lin

ee.g

. O

C7

68c

Sw

itch

Fabri

c

10

How can we make routers with 40Gb/s, 160Gb/s,…

interfaces?

11

Higher capacity and higher linerates

Output 2

Output N

Input 1 Output 1

Input N

Input 2

SwitchFabric

SwitchArbitration

Multistage

Less frequentarbitration

Limited by:Memory bandwidth~50Gb/s

Limited by:Per-cell arbitrationPower~5Tb/s

1

2

More parallelism:Fork-Join Router

3

Higher capacity

Higher Linerates

12

Fork-Join Router

How can we:– Increase capacity. – Reduce power per subsystem.

While at the same time…– Keep the system simple. – Support line rates faster than memory

bandwidth. – Provide delay guarantees.

Increase parallelism.

Multiple racks.

Single-stage buffering.

Pkt-by-pkt load balancing.

Hmmm….?

13

The Fork-Join Router

1

2

k

1

N

rate, R

rate, R

rate, R

rate, R

1

N

Router

Bufferless

14

The Fork-Join Router

• Advantages– Single-stage of buffering– kpower per subsystem – kmemory bandwidth – kfowarding table lookup rate

15

The Fork-Join Router

• Questions– Switching: What is the performance?– Forwarding Lookups: How do they

work?

16

A Parallel Packet Switch

1

N

rate, R

rate, R

rate, R

rate, R

1

N

OutputQueuedSwitch

OutputQueuedSwitch

OutputQueuedSwitch

1

2

k

Arriving packet tagged with egress port

17

Performance Questions

1. Can it be work-conserving?2. Can it emulate a single big output

queued switch?3. Can it support delay guarantees,

strict-priorities, WFQ, …?

18

Work Conservation

rate, R1rate, R

1

2

k

1

R/k

R/k

R/k

R/k

R/k

R/k

Input LinkConstraint

Output LinkConstraint

OutputQueuedSwitch

OutputQueuedSwitch

OutputQueuedSwitch

19

Work Conservation

rate, R1rate, R

1

2

k

1

R/k

R/k

R/k

R/k

R/k

R/k

1

2

3 Output LinkConstraint

45

1

2

3

4

1234115

20

Work Conservation

1

N

rate, R

rate, R

rate, R

rate, R

1

N

OutputQueuedSwitch

OutputQueuedSwitch

OutputQueuedSwitch

1

2

k

S(R/k)

S(R/k)

S(R/k)

S(R/k)

S(R/k)

S(R/k)

21

Precise Emulation of an Output Queued Switch

N N

Output Queued Switch

1

N

Parallel Packet Switch

= ?

1

N

1

N

22

Parallel Packet SwitchTheorems

1. If S > 2k/(k+2) 2 then a parallel packet switch can be work-conserving for all traffic.

2. If S > 2k/(k+2) 2 then a parallel packet switch can precisely emulate a FCFS output-queued switch for all traffic.

23

Parallel Packet SwitchTheorems

3. If S > 3k/(k+3) 3 then a parallel packet switch can precisely emulate a switch with WFQ, strict priorities, and other types of QoS, for all traffic.

24

Parallel Packet SwitchTheorems

4. If S >= 1 then a parallel packet switch with a small co-ordination buffer at rate R, can precisely emulate a FCFS switch for all traffic.

25

Co-ordination buffers

rate, R

rate, R

rate, R

rate, R

OutputQueuedSwitch

OutputQueuedSwitch

OutputQueuedSwitch

1

2

k

R/k

R/k

R/k

R/k

R/k

R/k

Size Nk Size Nk

26

Parallel Packet SwitchTheorems

5. If S > 2 then a parallel packet switch with a small co-ordination buffer at rate R, can precisely emulate a switch with WFQ, strict priorities, and other types of QoS, for all traffic.

27

The Fork-Join Router

• Questions– Switching: What is the performance?– Forwarding Lookups: How do they

work?

28

The Fork-Join RouterLookahead Forwarding Table Lookups

Packet tagged with egress port at next

router

Lookup performed in

parallel at rate R/k

29

The Fork-Join Router

1

2

k

1

N

rate, R

rate, R

rate, R

rate, R

1

N

Router

•Possibly >100Tb/s aggregate capacity•Linerates in excess of 100Gb/s