Nick McKeown Spring 2012 Lecture 4 Parallelizing an OQ Switch EE384x Packet Switch Architectures.
-
Upload
francesca-richman -
Category
Documents
-
view
219 -
download
3
Transcript of Nick McKeown Spring 2012 Lecture 4 Parallelizing an OQ Switch EE384x Packet Switch Architectures.
Nick McKeown
Spring 2012
Lecture 4
Parallelizing an OQ Switch
EE384xPacket Switch Architectures
Scaling an OQ Switch
one output
1
k
many outputs
1
k
111
NN
Not so clear.Work conserving if memory b/w >= R(N+1)
At most two memory operations per time slot: 1 write and 1 read
Parallel OQ SwitchMay not be work-conserving
1
1
k=3
N=3
A
C
B2Time slot = 1
A5
A6
A7
A5
A6
A7
B5
B6
A8
B5
B6
A8
Time slot = 2
B6
B5
A8
C5
C6Time slot = 3
Constant size packets
ProblemHow can we design a parallel OQ work-conserving switch from slower parallel memories?
Work Conserving
Theorem (sufficiency)A parallel output-queued switch is work-conserving with 3N –1 memories, each able to perform at most one memory operation per time slot.
Re-stating the Problem
1. There are K cages which can contain an infinite number of pigeons.
2. Assume that time is slotted, and in any one time slota. At most N pigeons can arrive and at most N can
depart. b. At most 1 pigeon can enter or leave a cage via a
pigeon hole.c. The time slot at which arriving pigeons will depart
is known
3. For any switchWhat is the minimum K, such that all N pigeons can be immediately placed in a cage when they arrive, and can depart at the right time?
Only one packet can enter or leave a memory at time t
Intuition for Theorem
Only one packet can enter a memory at time t
Time = t
DT=t+X
DT=t+X
DT=t
Only one packet can enter or leave a memory at any time
Memory
Proof of Theorem
When a packet arrives in a time slot it must choose a memory not chosen by
1. The N – 1 other packets that arrive at that timeslot.
2. The N other packets that depart at that timeslot.
3. The N - 1 other packets that can depart at the same time as this packet departs (in future).
Proof
By the pigeon-hole principle, the switch can be work-conserving if there are 3N –1 memories, each able to perform at most one memory operation per time slot.
Memory
Memory
Memory
Memory
Memory
Memory
Memory
A Parallel Shared Memory Switch
C
A
Departing Packets
R
R
Arriving Packets
A5
A4
B1
C1
A1
C3
A5
A4
From theorem 1, k = 7 memories don’t suffice .. but 8 memories do
Memory
1
K=8
C3
At most one operation – a write or a read per time slot
B
B3
C1
A1
A3
B1
Distributed Shared Memory Switch
The central memories are distributed to the line cards and shared.Memory and line cards can be added incrementally.
From theorem 1, the switch is work-conserving if we have a total of 3N –1 memories, each able to perform one operation per time slot i.e. a total memory bandwidth of 3NR.
Switch Fabric
Line Card 1 Line Card 2 Line Card NR R R
Memories Memories Memories
Switch bandwidth
What switch bandwidth does the DSM switch need in order to be work-conserving?
Theorem (sufficiency)A switch bandwidth of 4NR is sufficient for a distributed shared memory switch to be work-conserving.
ProofThere are a maximum of 3 memory accesses and 1
external line access per time slot.
Switch AlgorithmWhat switching algorithm allows the DSM switch to be
work-conserving? 1. Shared bus: No algorithm needed.
2. Crossbar switch: Algorithm needed because only permutations are allowed.
Theorem
An edge coloring algorithm can switch packets for a work-conserving distributed shared memory switch
ProofKönig’s theorem: Any bipartite graph with maximum degree has an edge coloring with colors.
Summary - Switches with 100% throughput
None2NR2NR2NR/kNk
Maximal2NR6NR3R2N
MWMNR2NR2RNCrossbarIQ
None2NR2NR2NR1BusShared Mem.
Switch Algorithm
Switch BW
Total MemBW
Mem. BW
# Mem.Fabric
NoneNRN(N+1)R(N+1)RNBusOQ
PSM
C. Sets4NR2N(N+1)R2R(N+1)/kNkClosPPS - OQ
C. Sets4NR4NR4RN
C. Sets6NR3NR3RN
Edge Color4NR3NR3RNXbar
C. Sets3NR3NR3NR/kkBus
C. Sets4NR4NR4NR/kNkClos
Time Reserve*
3NR6NR3R2NCrossbar
PPS
DSMJuniper M-series
CIOQ Cisco GSR