EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C -...

49
EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 High-Radix and Non-Blocking Networks 4/7/11

Transcript of EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C -...

Page 1: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 1

EE382C

Lecture 4

High-Radix and Non-Blocking Networks

4/7/11

Page 2: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 2

Question of the day

• What topology has an average hop count, Havg for load-

balanced traffic that is close to the logd/2N bound and is

able to route arbitrary traffic with an Hmax of twice this

amount?

Page 3: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 3

High-Radix Networks

Page 4: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4

Bandwidth Trend (ISCA ’05)

0.1

1

10

100

1000

10000

1985 1990 1995 2000 2005 2010

year

ban

dw

idth

per

rou

ter

no

de (

Gb

/s)

Torus Routing Chip

Intel iPSC/2

J-Machine

CM-5

Intel Paragon XP

Cray T3D

MIT Alewife

IBM Vulcan

Cray T3E

SGI Origin 2000

AlphaServer GS320

IBM SP Switch2

Quadrics QsNet

Cray X1

Velio 3003

IBM HPS

SGI Altix 3000

Cray XT3

YARC

Page 5: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4

Router bandwidth

Router

Router

5

Page 6: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4

Router

As bandwidth increases …

Router

6

Page 7: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4

Router

Router

Low-Radix vs. High-Radix Router

Router

Low-radix (small number of fat ports) High-radix (large number of skinny ports)

7

Page 8: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4

Latency vs. Radix

0

50

100

150

200

250

300

0 50 100 150 200 250

radix

late

nc

y (

nse

c)

2003 technology 2010 technology

Optimal radix ~ 40

Optimal radix ~ 128

Serialization latency increases

Header latency

decreases

Page 9: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4

Determining Optimal Radix

Latency = Header Latency + Serialization Latency

= H tr + L / b

= 2trlogkN + 2kL / B

Optimal radix

k log2 k = (B tr log N) / L

where k = radixB = total BandwidthN = # of nodesL = message size

Aspect Ratio

9

Page 10: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4

Higher Aspect Ratio, Higher Optimal Radix

1996

2003

2010

1991

1

10

100

1000

10 100 1000 10000

Aspect Ratio

Op

tim

al R

ad

ix (

k)

10

Page 11: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

KN – The Ultimate High-Radix Network

EE 382C - S11 - Lecture 4 11

Page 12: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

High-Radix Butterfly

• Just build a butterfly with large k

• For k = 128

– 128 in 1 stage

– 16K in 2 stages

– 2M in 3 stages

• But – vulnerable to adversarial

traffic

EE 382C - S11 - Lecture 4 12

Page 13: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

High-Radix Clos

• Not vulnerable to adversarial traffic,

• But twice the number of stages – even on benign traffic.

EE 382C - S11 - Lecture 4 13

Page 14: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

High-Radix Interconnection Networks

Flattened Butterfly

R0'

R1'

R2'

R3'

R4'

R5'

R6'

R7'

I0

I1

I2

I3

I4

I5

I6

I7

I8

I9

I10

I11

I12

I13

I14

I15

O0

O1

O2

O3

O4

O5

O6

O7

O8

O9

O10

O11

O12

O13

O14

O15

{ {{{

dimension 1

{

dimension 2

{

dimension 3

dimension 1

dimension 2

I0

I1

I2

I3

I4

I5

I6

I7

R0_3R0_2O0

O1

O2

O3

O4

O5

O5

O7

I8

I9

I10

I11

I12

I13

I14

I15

O8

O9

O10

O11

O12

O13

O14

O15

R0_1R0_0

Page 15: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

Dragonfly Topology

Dragonfly Topology

R2 Rn

-1

R1 R0 Rn

-2

Interconnection Network

intra-group interconnection

network

R0 R1 Ra

-1

G0 G1 Gg-1Gg-1

Inter-group Interconnection Network

Page 16: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

Dragonfly Topology

Dragonfly Topology Example

P0 P1

R0

P2 P3

R1

P4 P5

R2

P6 P7

R3

P24 P25 P26 P27 P28 P29 P30 P31

P8 P9

R4

P10 P11

R5

P12 P13

R6

P14 P15

R7

R1

2

R1

3

R1

4

R1

5

G2

G4 G5 G6 G7 G8

G3

G0 G1

Page 17: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

Dragonfly by the numbers

• Consider k=128

• Divide into g, l, p

– p processors per router

– l connections to other routers in same group

– g global connections – to other groups

• Design method

– Pick g

– Let l = 2g-1

– Let p = 128-2g

– Compute concentration factor c = p/g

EE 382C - S11 - Lecture 4 17

Page 18: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

An Example

d 128 128 128 128

p 93 75 51 33

l 23 35 51 63

g 12 18 26 32

p/g 7.75 4.17 1.96 1.03

(l+1)g 288 648 1352 2048

group 2139 2625 2601 2079

N 618,171 1,703,625 3,519,153 4,259,871

EE 382C - S11 - Lecture 4 18

Page 19: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 19

Non-Blocking Networks

Page 20: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 20

Non-blocking networks

• Non-blocking: able to connect any unconnected input to

any unconnected output.

• Circuit switching vs. packet switching

• Strictly vs. rearrangeably non-blocking

• Non-interfering networks

Page 21: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 21

Crossbar Switch

in0

in1

in2

in3

out0

out1

out2

out3

out4

crosspointinput line

output line

Page 22: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

Old Mechanical Crossbar

EE 382C - S11 - Lecture 4 22

Page 23: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 23

Implemented with Multiplexers

in0

in1

in2

in3

out0

out1

out2

out3

out4

1 0 3 1 2

Page 24: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 24

Crossbar Expansion

in0

in(n-1)

inn

in(2n-1)

out0

ou

t(n-1

)

ou

tn

ou

t(2n

-1)

Page 25: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 25

Clos Networks

Page 26: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 26

Basic Clos Structure (3,3,4)

n=3 ports

per switch

middle

switch 1

4x4

m=3 r x r

middle switches

middle

switch 2

4x4

middle

switch 3

4x4

input

switch 1

3x3

1.1

1.2

1.3

r=4 n x m

input switches

input

switch 2

3x3

2.1

2.2

2.3

input

switch 3

3x3

3.1

3.2

3.3

input

switch 4

3x3

4.1

4.2

4.3

output

switch 1

3x3

1.1

1.2

1.3

r=4 m x n

output switches

output

switch 2

3x3

2.1

2.2

2.3

output

switch 3

3x3

3.1

3.2

3.3

output

switch 4

3x3

4.1

4.2

4.3

Page 27: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 27

Simple Clos (2,2,2)

a1

a2

b1

b2

1.1

1.2

1.1

1.2

2.1

2.2

2.1

2.2

• Route

– 1.1 to 1.1

– 2.2 to 2.2

– 1.2 to 2.1

– 2.1 to 1.2

Page 28: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 28

Routing example

• Route

– 1.1 to 1.1

– 2.2 to 2.2

– 1.2 to 2.1

– 2.1 to 1.2

a1

a2

b1

b2

1.1

1.2

1.1

1.2

2.1

2.2

2.1

2.2

Page 29: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 29

Edge Coloring

a1

a2

b1

b2

Page 30: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 30

Edge Coloring

• Route

– 1.1 to 1.1

– 2.2 to 2.2

– 1.2 to 2.1

– 2.1 to 1.2

a1

a2

b1

b2

Page 31: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 31

Rearrangement

• Route

– 1.1 to 1.1

– 2.2 to 2.2

– 1.2 to 2.1

– 2.1 to 1.2

• Route next call violating rule

a1

a2

b1

b2

Page 32: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 32

Rearrangement

• Route

– 1.1 to 1.1

– 2.2 to 2.2

– 1.2 to 2.1

– 2.1 to 1.2

• Route next call violating rule

• Fix color conflict at b2 by flipping

other edge

a1

a2

b1

b2

Page 33: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 33

Rearrangement

• Route

– 1.1 to 1.1

– 2.2 to 2.2

– 1.2 to 2.1

– 2.1 to 1.2

• Route next call violating rule

• Fix color conflict at b2 by flipping

other edge

• Final circuit no longer conflicts

a1

a2

b1

b2

Page 34: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 34

Rearrangement

a1

a2

b1

b2

a1

a2

b1

b2

1.1

1.2

1.1

1.2

2.1

2.2

2.1

2.2

Page 35: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 35

a1

a2

b1

b2

Strictly non-blocking

a1

a2

b1

b2

1.1

1.2

1.1

1.2

2.1

2.2

2.1

2.2

Page 36: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 36

Strictly non-blocking Routing

• Number of stages

• Vector of free middle stages at every input and output

switch

• E.g. with 15 middle stages:

– Input i : 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1

– Output j : 0 1 1 0 1 1 1 0 1 0 0 0 0 1 1

---------------------------------------

Page 37: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 37

3D View of a Clos Network

r - n x m

input switches

m - r x r

middle switches

r - m x n

output switches

Page 38: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 38

Rearrangement Example

Page 39: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 39

(3,3,4) Routing Problem

I1

I2

I3

I4

O1

O2

O3

O4

Page 40: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 40

Before Setting Up (2,4)

1

2

3I1

I2

I3

I4

O1

O2

O3

O4

Page 41: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 41

Connect (2,4) using RED

I1

I2

I3

I4

O1

O2

O3

O4

Page 42: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 42

Switch (1,4) to BLUE

I1

I2

I3

I4

O1

O2

O3

O4

The red link being

moved cannot lead

back to i2 since i2’s

red link is already

accounted for.

Page 43: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 43

Connect (4,3) using BLUE

I1

I2

I3

I4

O1

O2

O3

O4

Page 44: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 44

Chain of calls to be switched

BLUE/RED

I1

I2

I3

I4

O1

O2

O3

O4

Chain is guaranteed

to be acyclic since

input blue and output

red links of previous

nodes are already

accounted for.

Page 45: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 45

After Switching 5 calls

I1

I2

I3

I4

O1

O2

O3

O4

Page 46: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 46

Complete Colored Graph

I1

I2

I3

I4

O1

O2

O3

O4

Page 47: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 47

Question of the day

• What topology has an average hop count, Havg for load-

balanced traffic that is close to the logd/2N bound and is

able to route arbitrary traffic with an Hmax of twice this

amount?

Page 48: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 48

Flattened Butterfly Topology

Page 49: EE382C Lecture 4 - Stanford Universitycva.stanford.edu/classes/ee382c/lectures/L4.pdf · EE 382C - S11 - Lecture 4 1 EE382C Lecture 4 ... g 288 648 1352 2048 group 2139 2625 2601

EE 382C - S11 - Lecture 4 49

Non-Blocking Summary

• Non blocking

– Can connect any unconnected input to any unconnected output

• Strictly non-blocking: without moving any other connections

• Rearrangeably non-blocking: may require moving other connections

– Applies to circuit switching

• For packet switching you usually want a non-interfering network

• Crossbar

– Trivially non-blocking

– Also has stiff backpressure

• Clos network

– 3 stages of crossbars – each switch of stage i connects to all switches of stage i+1

– (m,n,r)

– Strictly non blocking if m 2n-1, rearrangeable if m n

– Schedule with the looping algorithm – augmented paths

– Multicast

• Conflict vector representation

• Can fanout in input or middle stage (split calls)