IP routers with memory that runs slower than the line rate
-
Upload
odakota-ledesma -
Category
Documents
-
view
22 -
download
0
description
Transcript of IP routers with memory that runs slower than the line rate
![Page 1: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/1.jpg)
1
High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.
IP routers with memory thatruns slower than the line rate
Nick McKeownAssistant Professor of Electrical Engineering and Computer Science, Stanford University
[email protected]://www.stanford.edu/~nickm
![Page 2: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/2.jpg)
2
Outline
• Trends in packet switch design • Additional problem:
“Data rates may soon exceed memory bandwidth”
• The Fork-Join Router & Parallel Packet Switches
![Page 3: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/3.jpg)
3
Output 2
Output N
First Packet SwitchesShared Memory
Large, single dynamically allocated memory buffer:N writes per “cell” timeN reads per “cell” time.
Limited by memory bandwidth.
Input 1 Output 1
Input N
Input 2
Numerous work has proven and made possible:– Fairness– Delay Guarantees– Delay Variation Control– Loss Guarantees– Statistical Guarantees
![Page 4: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/4.jpg)
4
Later Packet SwitchesSingle-stage crossbar with CIOQ and
VOQs
1 write per “cell” time 1 read per “cell” timeRate of writes/reads determined by switch
fabric speedup
Lookup&
DropPolicy
OutputScheduling
Virtual Output Queues
OutputScheduling
OutputScheduling
SwitchFabric
SwitchArbitration
Linecard Linecard
Switch Core(Bufferless)
Lookup&
DropPolicy
Lookup&
DropPolicy
![Page 5: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/5.jpg)
5
Myths about CIOQ-based crossbar switches
1. “Input-queued crossbars have low throughput”– An input-queued crossbar can have as high
throughput as any switch.
2. “Crossbars don’t support multicast traffic well”– A crossbar inherently supports multicast efficiently.
3. “Crossbars don’t scale well”– Today, it is the number of chip I/Os, not the number
of crosspoints, that limits the size of a switch fabric. Expect 5Tb/s crossbar switches.
![Page 6: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/6.jpg)
6
Myths about CIOQ-based crossbar switches (2)
4. “Crossbar switches can’t support delay/QoS guarantees”
– With an internal speedup of 2, a CIOQ switch can (in theory) precisely emulate a shared memory switch for all traffic.
![Page 7: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/7.jpg)
7
What makes sense today?
Shared Memory
Input Queued
CIOQ Multistage
Blocking No No No Yes
Speedup High High Small High
Emulation of SM Yes No Yes No
Multicast Good Good Good Poor
Resequencing No No No Yes
Power Low OK OK High
Packaging - OK OK Complex
![Page 8: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/8.jpg)
8
Summary of trend
Output 2
Output N
Input 1 Output 1
Input N
Input 2
SwitchFabric
SwitchArbitration
Higher CapacityMultistage:•Clos•Banyan•Toroidal…
Less frequentarbitration
Limited by:Memory bandwidth~50Gb/s
Limited by:Per-cell arbitrationPower~5Tb/s
1
2
![Page 9: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/9.jpg)
9
Buffer MemoryHow Fast Can I Make a Packet Buffer?
BufferMemory
10ns on-chip DRAM
Rough Estimate:– 10ns per memory operation.– Two memory operations per
packet.– Therefore, maximum ~26Gb/s.
64-byte wide bus 64-byte wide bus
Exte
rnal
Lin
ee.g
. O
C7
68c
Sw
itch
Fabri
c
![Page 10: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/10.jpg)
10
How can we make routers with 40Gb/s, 160Gb/s,…
interfaces?
![Page 11: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/11.jpg)
11
Higher capacity and higher linerates
Output 2
Output N
Input 1 Output 1
Input N
Input 2
SwitchFabric
SwitchArbitration
Multistage
Less frequentarbitration
Limited by:Memory bandwidth~50Gb/s
Limited by:Per-cell arbitrationPower~5Tb/s
1
2
More parallelism:Fork-Join Router
3
Higher capacity
Higher Linerates
![Page 12: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/12.jpg)
12
Fork-Join Router
How can we:– Increase capacity. – Reduce power per subsystem.
While at the same time…– Keep the system simple. – Support line rates faster than memory
bandwidth. – Provide delay guarantees.
Increase parallelism.
Multiple racks.
Single-stage buffering.
Pkt-by-pkt load balancing.
Hmmm….?
![Page 13: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/13.jpg)
13
The Fork-Join Router
1
2
k
1
N
rate, R
rate, R
rate, R
rate, R
1
N
Router
Bufferless
![Page 14: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/14.jpg)
14
The Fork-Join Router
• Advantages– Single-stage of buffering– kpower per subsystem – kmemory bandwidth – kfowarding table lookup rate
![Page 15: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/15.jpg)
15
The Fork-Join Router
• Questions– Switching: What is the performance?– Forwarding Lookups: How do they
work?
![Page 16: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/16.jpg)
16
A Parallel Packet Switch
1
N
rate, R
rate, R
rate, R
rate, R
1
N
OutputQueuedSwitch
OutputQueuedSwitch
OutputQueuedSwitch
1
2
k
Arriving packet tagged with egress port
![Page 17: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/17.jpg)
17
Performance Questions
1. Can it be work-conserving?2. Can it emulate a single big output
queued switch?3. Can it support delay guarantees,
strict-priorities, WFQ, …?
![Page 18: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/18.jpg)
18
Work Conservation
rate, R1rate, R
1
2
k
1
R/k
R/k
R/k
R/k
R/k
R/k
Input LinkConstraint
Output LinkConstraint
OutputQueuedSwitch
OutputQueuedSwitch
OutputQueuedSwitch
![Page 19: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/19.jpg)
19
Work Conservation
rate, R1rate, R
1
2
k
1
R/k
R/k
R/k
R/k
R/k
R/k
1
2
3 Output LinkConstraint
45
1
2
3
4
1234115
![Page 20: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/20.jpg)
20
Work Conservation
1
N
rate, R
rate, R
rate, R
rate, R
1
N
OutputQueuedSwitch
OutputQueuedSwitch
OutputQueuedSwitch
1
2
k
S(R/k)
S(R/k)
S(R/k)
S(R/k)
S(R/k)
S(R/k)
![Page 21: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/21.jpg)
21
Precise Emulation of an Output Queued Switch
N N
Output Queued Switch
1
N
Parallel Packet Switch
= ?
1
N
1
N
![Page 22: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/22.jpg)
22
Parallel Packet SwitchTheorems
1. If S > 2k/(k+2) 2 then a parallel packet switch can be work-conserving for all traffic.
2. If S > 2k/(k+2) 2 then a parallel packet switch can precisely emulate a FCFS output-queued switch for all traffic.
![Page 23: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/23.jpg)
23
Parallel Packet SwitchTheorems
3. If S > 3k/(k+3) 3 then a parallel packet switch can precisely emulate a switch with WFQ, strict priorities, and other types of QoS, for all traffic.
![Page 24: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/24.jpg)
24
Parallel Packet SwitchTheorems
4. If S >= 1 then a parallel packet switch with a small co-ordination buffer at rate R, can precisely emulate a FCFS switch for all traffic.
![Page 25: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/25.jpg)
25
Co-ordination buffers
rate, R
rate, R
rate, R
rate, R
OutputQueuedSwitch
OutputQueuedSwitch
OutputQueuedSwitch
1
2
k
R/k
R/k
R/k
R/k
R/k
R/k
Size Nk Size Nk
![Page 26: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/26.jpg)
26
Parallel Packet SwitchTheorems
5. If S > 2 then a parallel packet switch with a small co-ordination buffer at rate R, can precisely emulate a switch with WFQ, strict priorities, and other types of QoS, for all traffic.
![Page 27: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/27.jpg)
27
The Fork-Join Router
• Questions– Switching: What is the performance?– Forwarding Lookups: How do they
work?
![Page 28: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/28.jpg)
28
The Fork-Join RouterLookahead Forwarding Table Lookups
Packet tagged with egress port at next
router
Lookup performed in
parallel at rate R/k
![Page 29: IP routers with memory that runs slower than the line rate](https://reader036.fdocuments.in/reader036/viewer/2022062516/56812c58550346895d90e12f/html5/thumbnails/29.jpg)
29
The Fork-Join Router
1
2
k
1
N
rate, R
rate, R
rate, R
rate, R
1
N
Router
•Possibly >100Tb/s aggregate capacity•Linerates in excess of 100Gb/s