High Speed Stable Packet Switches

46
High Speed Stable Packet Switches Shivendra S. Panwar Joint work with : Yihan Li, Yanming Shen and H. Jonathan Chao New York State Center for Advanced Technology in Telecommunications (CATT) Electrical and Computer Engineering Dept. Polytechnic University, New York

description

High Speed Stable Packet Switches. Shivendra S. Panwar Joint work with : Yihan Li, Yanming Shen and H. Jonathan Chao New York State Center for Advanced Technology in Telecommunications (CATT) Electrical and Computer Engineering Dept. Polytechnic University, New York - PowerPoint PPT Presentation

Transcript of High Speed Stable Packet Switches

Page 1: High Speed Stable Packet Switches

High Speed Stable Packet Switches

Shivendra S. Panwar

Joint work with: Yihan Li, Yanming Shen and H. Jonathan Chao

New York State Center for Advanced Technology in Telecommunications (CATT) Electrical and Computer Engineering Dept.

Polytechnic University, New York http://catt.poly.edu/CATT/panwar.html

Page 2: High Speed Stable Packet Switches

2

Overview Switching technology continues to be one of the

bottlenecks in the development of broadband networks Fixed-length switching technology achieves high

switching efficiency for high-speed packet switches Virtual Output Queueing (VOQ) switches can achieve

100% throughput without speedup Two approaches to resolve the output contention in a

VOQ switch Matching algorithms Load-balanced switch

Switch performance Throughput Packet delay

Page 3: High Speed Stable Packet Switches

3

Buffering in a Packet Switch

High speed fixed-length packet switchingInput Queuing (IQ)

Easy to implementHOL Blocking, throughput 58.6%

Output Queuing (OQ)100% throughput Internal speedup of N

Virtual Output Queuing (VOQ)Overcome HOL blockingNo speedup requirement

1234

1234

1234

1234

Page 4: High Speed Stable Packet Switches

4

Matching Algorithms

Scheduling in a VOQ switch Stable matching schemes for VOQ switching

Maximum Weight Matching (MWM) Maximal Matching, iSLIP Other algorithms with 100% throughput and no

speedup Polling system based matching

Exhaustive Service Matching with Hamiltonian Walk Limited Service Matching Average delay analysis

Page 5: High Speed Stable Packet Switches

5

Scheduling in a VOQ Switch

Scheduling is needed to avoid output contention.

A scheduling problem can be modeled as a matching problem in a bipartite graph. An input and an output are

connected by an edge if the corresponding VOQ is not empty.

Each edge may have a weight, which can be

The length of the VOQ The age of the HOL cell

Page 6: High Speed Stable Packet Switches

6

Maximum Weight Matching (MWM)

MWM always finds a match with the maximum weight

Complexity of O(N3). Is stable with 100%

throughput under all admissible traffic.

7

43

7

8

5

6

10

5

2

Weight of the match: 25

Page 7: High Speed Stable Packet Switches

7

Maximal Matching Maximal Matching

Add connections incrementally, without removing connections made earlier.

No more matches can be made trivially by the end of the operation.

Stable with a speedup of 2 Complexity O(NlogN)

Multiple Iterative Matching Use multiple iterations to converge

to a maximal matching iSLIP and DRRM

complexity of each iteration is O(logN)

O(logN) iterations are needed to converge on a maximal matching

7

43

7

8

5

6

10

5

2

Weight of the match: 23

Page 8: High Speed Stable Packet Switches

8

iSLIPStep 1: Request

Each input sends a request to every output for which it has a queued cell.

Step 2: Grant If an output receives multiple requests it

chooses the one that appears next in a fixed round-robin schedule.

The output arbiter pointer is incremented by one location beyond the granted input if, and only if, the grant is accepted in step 3.

Step 3: Accept If an input receives multiple grants, it

accepts the one that appears next in a fixed round-robin schedule.

The input arbiter pointer is incremented by one location beyond the accepted output.

Input Output

RequestGrantAccept

Page 9: High Speed Stable Packet Switches

9

Achieving 100% Throughput without Speedup Algorithms with memory

[Tassiulas] Compare the latest schedule to a randomly generated match, select the one with higher weight as the new match, complexity O(logN).

[Giaccone et al] Derandomized algorithm, using Hamiltonian walk, complexity

O(logN). Other algorithms, with higher complexity, take into account the latest

schedule, its neighbors, and the arrival pattern. SERENA, complexity O(N).

Polling system based matching algorithms Low complexity: HE-iSLIP, O(logN). Low packet delay

Much lower than other O(logN) algorithms. Comparable to higher complexity algorithms.

Page 10: High Speed Stable Packet Switches

10

The Architecture of a Cell-Based VOQ Switch

Input 1

Input 2

Input 3

Input 4

Output 1

Output 2

Output 3

Output 4

Switch Fabric

VOQISM ORM1

N

1N

1

N

1

N

1

N

1

N

1

N

1

N

Input Segmentation Module (ISM): Segment packets to fixed-length cells.Output Reassembly Module (ORM): Reassemble cells into packets.

Previous work Try to find a good match in each (cell) time slot. Cells in the

same packet are interrupted during transmission. Considered cell delay, not packet delay.

Page 11: High Speed Stable Packet Switches

11

Polling System Based Matching Exhaustive Service Matching

Inspired by exhaustive service polling systems. All the cells in the corresponding VOQ are served after an input

and an output are matched. Slot times wasted to achieve an input-output match are

amortized over all the cells waiting in the VOQ instead of only one.

Cells within the same packet are transferred continuously. Hamiltonian walk is used to guarantee stability.

Hamiltonian walk is a walk which visits every vertex of a graph exactly once.

For an NxN switch, each possible match is visited exactly once in every N! time slots.

Page 12: High Speed Stable Packet Switches

12

Exhaustive Service Matching with Hamiltonian Walk (EMHW)

EMHW Let S(t) be the match at time t. At time t+1, generate match Z(t+1) by the Exhaustive Service

Matching algorithm based on S(t), and H(t+1) by Hamiltonian walk.

Let

where <S,Q(t+1)> is the weight of S at time t+1.

StabilityTheorem 2: An EMHW is stable under any admissible Bernoulli

i.i.d. input traffic.

)1(,maxarg)1()}1(),1({

tQStStHtZS

Page 13: High Speed Stable Packet Switches

13

Implementation Complexity of EMHW

Implementation complexity EMHW: O(logN) for HE-iSLIP HE-iSLIP: only one iteration is needed to achieve 100% throughput

and low packet delay.

Compare to the Derandomized Matching Algorithm (O(logN)) The weight of the schedule generated by EMHW is always larger

than or equal to the schedule generated by the derandomized matching algorithm.

Theorem: Suppose the schedule at time t is M(t), and at time t+1 the schedule by the derandomized matching algorithm and EMHW are Sd(t+1) and S(t+1), respectively. Then it is always true that .)1(),1()1(),1( tQtStQtSd

Page 14: High Speed Stable Packet Switches

14

Limited Service Matching

An approximation to EMHW with lower complexity.

When a VOQ is under service, a limit on the maximum number of cells that can be served continuously is enforced by means of a counter.

No Hamiltonian walk is used. Limited Service Matching DRRM can be

implemented in a distributed manner.

Page 15: High Speed Stable Packet Switches

15

Simulated Delay Performance Packet delay: the sum of cell delay and reassembly delay Cell delay: measured from VOQ to destination output Reassembly delay: time spent in an ORM, often ignored

in other work.

Input 1

Input 2

Input 3

Input 4

Output 1

Output 2

Output 3

Output 4

Switch Fabric

VOQISM ORM1

N

1

N

1

N

1

N

1

N

1

N

1

N

1

N

Page 16: High Speed Stable Packet Switches

16

Traffic Patterns in Simulations

Uniform traffic Pattern 1: packet size is 1 cell. Pattern 2: packet size is 10 cells. Pattern 3: packet size is varied and the

average is 10 cells (Internet packet size distribution).

Nonuniform traffic Diagonal traffic, packet size is 1 cell. Hotspot traffic, packet size is 1 cell.

Page 17: High Speed Stable Packet Switches

17

Performance Summary

schemes complexity stable packet delay performance

HE-iSLIP O(logN) Yes Lowest when packet size is larger than 1 cell.

iSLIP O(logN) No Always higher than HE-iSLIP.

DERAND O(logN) Yes Highest for all traffic patterns.

SERENA O(N) Yes Lower than HE-iSLIP only under nonuniform diagonal traffic.

MWM O(N3) Yes Lowest when packet size is 1 cell.

Page 18: High Speed Stable Packet Switches

18

Packet Delay under Uniform Traffic Pattern 1: packet size is 1 cell.

MWM

HE-iSLIP

SERENA

iSLIP

Page 19: High Speed Stable Packet Switches

19

Packet Delay under Uniform Traffic Pattern 2: packet length is 10

cells. Pattern 3: packet length is

variable, the average is 10 cells.

MWM

HE-iSLIP

HE-iSLIP

MWM

SERENA

iSLIPiSLIP

SERENA

Page 20: High Speed Stable Packet Switches

20

When Packet Length is Larger Than 1 Cell

Why does HE-iSLIP have a lower packet delay than MWM? For example, when packet length is 10 cells:

Cell delay Reassembly delay

Low cell delay and low reassembly delay needed for low packet delay

HE-iSLIP

MWM

HE-iSLIP

MWM

Page 21: High Speed Stable Packet Switches

21

Performance Analysis-- Average Delay of E-iSLIP (1)

Exhaustive random polling system model Symmetric system -- only consider one input N VOQs per input, exhaustive service policy --

an exhaustive service polling system with N stations

The service order of the VOQs are not fixed -- random polling system, assume all station VOQs have the same probability of selection for service after a VOQ is served.

Page 22: High Speed Stable Packet Switches

22

Performance Analysis-- Average Delay of E-iSLIP (2)

Switch over time S

,)1(11

)1(1

1 1

1

mmmNN

m mm

NQ

,1

)(Q

QSE

where

Q

Q

Q

QSE ,1

)1(21)( 2

]1

)1(

1

)1(

)1([

2

1)(

22

N

rN

N

Nr

NrTE

.,),()()(),( 2222

NNSESESVarSEr

.)1(11 m

m

Average delay T [Levy]

Page 23: High Speed Stable Packet Switches

23

Performance Analysis-- Average Delay of E-iSLIP (3)

When N is large

.1

)(~1

)()(

NSE

NSETE

Page 24: High Speed Stable Packet Switches

24

EMHW Summary Exhaustive Service Matching with Hamiltonian walk (EMHW)

Stable under any admissible Bernoulli i.i.d. input traffic. HE-iSLIP, complexity O(logN)

Packet delay performance Compared to iSLIP (O(logN)), Derandomized Algorithm

(O(logN)), SERENA (O(N)) and MWM (O(N3)), under uniform traffic, HE-iSLIP has lower packet delay than the Derandomized algorithm

and SERENA. HE-iSLIP has lower packet delay than MWM for typical packet

sizes. HE-iSLIP has low cell delay and low reassembly delay, which

lead to low packet delay.

Page 25: High Speed Stable Packet Switches

25

Load-Balanced Switches

Switch architecture Packet out-of-sequence issue Packet scheduling in the first-stage switch Packet delay performance Conclusions

Page 26: High Speed Stable Packet Switches

26

Motivation Challenges in designing high-performance

switch Scalable

Scale to high number of linecards and to high linecard speeds

No centralized scheduler Low complexity

Provide performance guarantees 100% throughput guarantee no commercial switch today has a throughput guarantee.

Load-balanced switch 100% throughput for broad class of traffic No centralized scheduler needed, scalable

Page 27: High Speed Stable Packet Switches

27

Work on Load-balanced Switches

Original load-balanced switch (Computer Communications, Chang) 100% throughput Unbounded out-of-sequence delay

FCFS (First come first serve) (Computer Communications, Chang) Jitter control mechanism Increases the average delay

EDF (Earliest deadline first) (Computer Communications, Chang) Reduce the average delay High complexity

Mailbox switch (Infocom 2003, Chang) Prevents packets from being out-of-sequence Not 100% throughput

FFF (Full frames first) (Infocom 2002, Mckeown) Frame-based No need for resequencing Require multi-stage buffer communication

FOFF (Full ordered frames first) (Sigcomm 2003, Mckeown) Frame-based Maximum resequencing delay N2

Bandwidth wastage

Page 28: High Speed Stable Packet Switches

28

Byte-Focal Switch Architecture

Input VOQArrival2nd stage switch fabricSecond-stage

VOQ

Re-sequencing buffer

i

1

N

(1,1)

(1,N)

(1,k)

(i,1)

(i,k)

...

...

(i,N)

……

(N,1)

(N,k)

(N,N)

...

...

...

...

...

...

...

...

...

...

(1,1)

(1,k)

(1,N)

(j,1)

1

j

N

(j,k)

(j,N)

(N,1)

(N,k)

(N,N)

1st stage switch fabric

……

1

k

N

12

N

12

N

12

N

……

1

i

N

Page 29: High Speed Stable Packet Switches

29

Switching fabric

Deterministic and periodic connection pattern At both stages, at time slot t, input i is connected to output j with

j = (i+t) mod N

t=0t=1t=2t=3

Page 30: High Speed Stable Packet Switches

30

Load-BalancingSecond stage VOQ

First stage VOQ(i,k) . . .

Packets are sent in Round-Robin order to the second-stage

(1,k)

(2,k)

(3,k)

(N,k)

1st stage switch fabric

2nd stage switch fabric

Page 31: High Speed Stable Packet Switches

31

Out-of-Sequence Problem

ith input port

1st stage switch fabricAt time t

jth central buffer

(j+1) th central buffer

Maximum re-sequencing delay=N2

At time t+

2nd stage switch fabric

121 2

2

1

Page 32: High Speed Stable Packet Switches

32

Resequencing Buffer Design

Input VOQArrival 2nd stage switch fabric

Second-stage VOQ

Re-sequencing buffer

i

1

N

(1,1)

(1,N)

(1,k)

(i,1)

(i,k)

...

...

(i,N)

……

(N,1)

(N,k)

(N,N)

...

...

...

...

...

...

...

...

...

...

(1,1)

(1,k)

(1,N)

(j,1)

1

j

N

(j,k)

(j,N)

(N,1)

(N,k)

(N,N)

1st stage switch fabric

……

1

k

N

12

N

12

N

12

N

……

1

i

N

Resequencing buffer design

Page 33: High Speed Stable Packet Switches

33

Resequencing Buffer Design

Input port i1

Input port i2

……

Departure queue

……

j1+2

j1j1+1

The next packet via center stage queue j has not arrived

j2j2+1

j2+2

i2 i1

The HOF packet arrived

The index joins the departure queue

For i2, j2 is the HOF packet, then the index also joins the departure queue

. . . . . .

Page 34: High Speed Stable Packet Switches

34

First Stage Scheduling

Input VOQArrival Switch fabricSecond-stage VOQ

Re-sequencing buffer

i

1

N

(1,1)

(1,N)

(1,k)

(i,1)

(i,k)

...

...

(i,N)

……

(N,1)

(N,k)

(N,N)

...

...

...

...

...

...

...

...

...

...

(1,1)

(1,k)

(1,N)

(j,1)

1

j

N

(j,k)

(j,N)

(N,1)

(N,k)

(N,N)

Switch fabric

……

1

k

N

12

N

12

N

12

N

……

1

i

N

First stage scheduling algorithms

Page 35: High Speed Stable Packet Switches

35

First Stage Scheduling

jth input port of 2nd stage

Arbiter

input port i

At time t, i connected to j

Up to N HOL eligible cells

Choose an eligible cell

……

Problem: Which VOQ to serve?

1st stage switch fabric

Eligible cells: cells that can be sent to jth input port of 2nd stage

2nd stage switch fabric

Page 36: High Speed Stable Packet Switches

36

First Stage Scheduling Round-robin

The arbiter at each input port selects VOQs in round-robin order Unstable under non-uniform traffic

Longest queue first The arbiter at each input port chooses to serve the longest

queue High complexity, not practical

Fixed threshold scheme Set a threshold, N, a queue exceeds the threshold has a higher

priority Serve the high priority queue first

Dynamic threshold scheme Set the threshold as TH=Q(t)/N Then same as fixed threshold scheme

Longest queue first, fixed threshold and dynamic threshold schemes have been proved to be stable in our work.

Page 37: High Speed Stable Packet Switches

37

Simulation Settings

Uniform i.i.d: Diagonal i.i.d: , for j = (i +1) mod

N. This is a very skewed loading, since input i has packets only for outputs i and (i + 1) mod N

Hot-spot: , , for i≠j. This type of traffic is more balanced than diagonal traffic, but obviously more unbalanced than uniform traffic

Nij / 2/ ijii

2/ ii )1(2/ Nij

Page 38: High Speed Stable Packet Switches

38

Average Cell Delay

At the low load, 2-stage switching has larger delay than iSlip, but smaller at high load

FOFF shows large delay even at low load due to the bandwidth waste when transferring partial frames

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

100

200

300

400

500

600

700

800

Load

Ave

rag

e d

ela

y

Output-bufferediSlipLQFDynamic thresholdFixed thresholdRound-robinFOFF

Average delay under uniform traffic with switch size of 32

LQF: Longest queue firstFOFF: Full order frame first

Page 39: High Speed Stable Packet Switches

39

Dynamic Threshold

The LQF scheme has the best delay performance not practical due to its high implementation complexity

Dynamic threshold scheme performance is comparable with the LQF scheme

Compared with fixed threshold scheme, adapt to the non-uniform input loadings, thus achieving a better delay performance, while maintaining low complexity

Focus on the dynamic threshold scheme from now on

Page 40: High Speed Stable Packet Switches

40

Dynamic Threshold As the input traffic

changes from uniform to hotspot to diagonal (hence less balanced), the dynamic threshold scheme can achieve good performance, especially for the diagonal traffic.

The Byte-Focal switch performs load-balancing at the first stage, thus achieving good performance even under extreme non-uniform traffic

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

200

400

600

800

1000

1200

1400

1600

Load

Ave

rag

e p

ack

et d

ela

y

DiagonalHot-spotUniform

The average delay of the dynamic threshold scheme under different input traffic patterns with a switch size of 32

Page 41: High Speed Stable Packet Switches

41

Delays with Different Switch Sizes

The average delays are almost linearly dependent on the switch size.

0 10 20 30 40 50 60 700

100

200

300

400

500

600

700

800

Switch size

Ave

rag

e d

ela

y

DiagonalUniformHot-spot

Page 42: High Speed Stable Packet Switches

42

3-stage Delays

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

50

100

150

200

250

300

350

Load

Ave

rag

e d

ela

y

Resequencing delaySecond stage delayFirst stage delayTotal delay

The three components of the total delay with switch size of 32

3 delay components:- first stage queueing delay - second stage queueing

delay- resequencing delay

The first stage queueing delay and the second stage queueing delay are comparable

The resequencing delay is much smaller compared to the other two delays.

Page 43: High Speed Stable Packet Switches

43

Variable Size Packet Scheduling Two approaches:

Cell-mode and packet-mode The average delay is the same

Combining the packet mode scheduling and the dynamic threshold scheme - the packet mode dynamic threshold algorithm:1. At each time slot, if it is in the middle of a packet, keep serving this

queue 2. If not, apply the dynamic threshold scheme

The resequencing delay and the reassembly delay overlap The sum of the resequencing delay and the reassembly delay is

bounded by

where is the maximum packet length

The additional delay due to packet reassembly is reduced

)1)(1( max2 kNN

maxk

Page 44: High Speed Stable Packet Switches

44

Packet Delay

Used Internet packet length distribution: trimodal distribution

As with the cell delay, the delay performance is not degraded under non-uniform traffic

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

100

200

300

400

500

600

Load

Ave

rag

e d

ela

y

UniformHot-spotDiagonal

Page 45: High Speed Stable Packet Switches

45

Packet Delay Packet delays increase as the packet length increases,

also increases with packet length variance But with a weak dependence

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

200

400

600

800

1000

1200

1400

Load

Ave

rage

pac

ket d

elay

Packet length=1Packet length=10Variable packet lengthPacket length=30

Page 46: High Speed Stable Packet Switches

46

Conclusion

The maximum resequencing delay is N2

The dynamic threshold scheme is practical and has good delay performance

The time complexity of the resequencing buffer is O(1)

Does not need communications between linecards

Achieve a uniformly good delay performance over a wide range of traffic matrices

Achieves good performance with low complexity