A Wormhole Router Design - GitHub Pages

29
2014/5/13 Advanced Processor Technology Group The School of Computer Science A Wormhole Router Design progress report Wei Song 30/07/2009

Transcript of A Wormhole Router Design - GitHub Pages

Page 1: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

A Wormhole Router Design

progress report

Wei Song

30/07/2009

Page 2: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Content

• Motive and Plans

• Wormhole router

– Channel Slicing, motivation

– Lookahead, critical cycle

– Implementation

• XY/Stochastic routing scheme

Page 3: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Why a wormhole router now

• Easy

• Smallest cycle period

• Early performance

estimation

• Proof of channel

slicing

wormhole

SDM

Larger

crossbar /

switch network

QoS Fault-tolerance

Larger

route

scheduler

Larger

Input/Output

buffer controller

Page 4: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Plan for Router Design (1)

• Wormhole router

– Speed estimation, basic design flow

– Channel slicing, lookahead pipeline

• Spatial Design Multiplex (SDM) router

– Utilizing channel slicing (provide virtual circuit)

– M sub-channels on a port, crossbar*M

– Benes, Clos switch network (ATM)

– Route scheduling in the multi-stage switch

Page 5: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Plan for Router Design (2)

• Dynamic Link Allocation

– Allocate idle sub-channels to active virtual

circuits to reduce frame latency

– Arbitration planning, crossbar reconfiguration

and buffer planning

• Fault-tolerance

– Error detection, deadlock recovery, route

scheduling algorithm

Page 6: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Plan for Router Design (3)

• QoS

– Virtual circuit is latency and bandwidth

guaranteed (weak if dynamic link allocation is

used)

– Best Effort is a problem

– Priorities for virtual circuit setup (reduce circuit

setup time for high priority services)

Page 7: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Content

• Motive and Plans

• Wormhole router

– Channel Slicing, motivation

– Lookahead, critical cycle

– Implementation

• XY/Stochastic routing scheme

Page 8: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

ChSlice: motive

C C

C C

2-bit

2-bit

CDCD

16d_i d_o

ack_oack_i

8

4

ack

16-bit ack of sub-channels

Advantages: data on all sub-channels are synchronized, ease the time

division multiple access (TDMA) techniques, such as virtual channel and

TDMA

Drawbacks: low speed (66% on CD)

Page 9: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

ChSlice: implementation

H

H

H

H

H

H

D D

D

D

D

D

D

D D D D T

D D D D D T

D D D DD T

D D D DD T

D D D D D T

D D DD D T

su

b-c

ha

nn

els

time

head routing data

C C2-bit

16

d_i0

ack_i0

C C2-bit

d_i15

ack_i15

d_o0

ack_o0

d_o15

ack_o15

Page 10: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

ChSlice: conclusion

• Advantage

– fast

• Overhead

– extra controllers

– larger wire-count

• No TDMA techniques but SDM is easy.

Page 11: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Content

• Motive and Plans

• Wormhole router

– Channel Slicing, motivation

– Lookahead, critical cycle

– Implementation

• XY/Stochastic routing scheme

Page 12: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Lookahead: pipeline style

N

CDN

+1 CD

N+

2 CD

Di

DN

AN

DN+1

AN+1

Do

Ao

Ai

data data

data data

data data

data data

N

CD

N+

1 CD

N+

2 CD

Di

Ai

DN DN+1

Do

ANAN+1

Ao

Di

DN

AN

DN+1

AN+1

Do

Ao

Ai

data data

data data

datadata

data data

Page 13: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Lookahead: conclusion

• Advantage

– fast

• Disadvantage

– not QDI

– a small area overhead N

CD

N+

1 CD

N+

2 CD

Di

DN

AN

DN+1

AN+1

Do

Ao

Ai

data data

data data

data data

data data

Page 14: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Lookahead: critical cycle

pip

elin

e

pip

elin

e

pip

elin

e

pip

elin

e

pip

elin

e

arbiter

ack from

other outputs

data from

other inputs

crossbar

input buffer output buffer

pip

elin

e

pip

elin

e

pip

elin

e

pip

elin

e

pip

elin

e

arbiter

ack from

other outputs

data from

other inputs

crossbar

input buffer output buffer

lon

g lin

e

be

twe

en

rou

ters

d_i

ack_i

d_o

ack_o

Page 15: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Lookahead: implementation

N

CD

N+

1 CDN

+2 CD

N+

1 CD

cro

ssb

ar

Page 16: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Content

• Motive and Plans

• Wormhole router

– Channel Slicing, motivation

– Lookahead, critical cycle

– Implementation

• XY/Stochastic routing scheme

Page 17: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Router: structure

arbiter

arbiter

5 input

ports

5 output

ports

ctl

ctl

80

16

80

16

80

16

80

16

d_i_0

ack_i_0

d_i_4

ack_i_4

d_o_0

ack_o_0

d_o_4

ack_o_4

Page 18: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Router: data path input buffer crossbar output buffer

ip_d

ib_d ic_d

ib_pa

ib_a

ip_a

rt_err

acken

gnt

oc_a

ic_a

op_a

op_d

ob_doc_d

ob_pa

ob_a

eof

3

2

1

0

eof

3

2

1

0

eof

3

2

1

0

ic_da

eof

acki

ip_d+ ib_d+ ic_d+

ip_a+

ip_d-

ib_a+

ib_d-

ip_a-

ic_d-

oc_d+ ob_d+

ob_pa+

op_d+

ob_a+

oc_a+

ic_a+

ib_pa+

ib_a-

oc_d- ob_d-

ob_pa-

oc_a-

ic_a-

ib_pa-

op_d-

op_a+

op_a-ob_a-

ic_da+

ic_da-

Page 19: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Router: layout

• Faraday 130nm Technology

• 32-bit, 5 ports, XY routing algorithm

• 0.3x0.3mm (14.3K gates, 0.057mm2)

• Typical corner (25 oC 1.2V)

• Cycle period 1.7 ns (2.35GByte/s per port)

Page 20: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

ChSlice and Lookahead

Speed: ChSlice 24.1% LH 17.2% Area: ChSlice 23.0% LH 5.3%

Page 21: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Compare to other routers

• MANGO: 1.26ns; 0.12um; bundled data

• ANoC: 4ns; 0.13um; 1-of-4

• QNoC: 4.8ns; 0.18um; bundled data

• ASPIN: 0.88ns; 90nm; dual-rail & bundled data

• Our: 1.7ns; 0.13um; 1-of-4 & lookahead

Page 22: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Speed vs. data width

QNoC

Wormhole

Page 23: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Content

• Motive and Plans

• Wormhole router

– Channel Slicing, motivation

– Lookahead, critical cycle

– Implementation

• XY/Stochastic routing scheme

Page 24: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

XY/Stochastic

• Motive

– Two routing algorithm is complicated

– The deadlock problem

– The involvement of network interfaces

– Keep router simple

• Solution

– Router: XY

– Network interface: generate, consume, or forward (random)

Page 25: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

XY/Stochastic (request path)

M M

XY/Stochastic Router Only

Page 26: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

XY/Stochastic (ack path)

M M

Same Path Single Jump

Page 27: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Compare

• Rely on routers – A larger router (Special router design)

– Longer routing overhead

– Deadlocks

– Shorter search time

• XY/Stochastic routing – Smaller router (normal router design)

– Shorter routing time

– Only deadlocks caused by errors

– Longer search time (higher priority by QoS)

Page 28: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Conclusion

• Router design plan

– Wormhole router is the first step

• Wormhole router

– Channel slicing

– SDM is better than TDMA for asynchronous

routers

• XY/Stochastic routing

Page 29: A Wormhole Router Design - GitHub Pages

2014/5/13 Advanced Processor Technology Group

The School of Computer Science

Question?