A Wormhole Router Design - GitHub Pages
Transcript of A Wormhole Router Design - GitHub Pages
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
A Wormhole Router Design
progress report
Wei Song
30/07/2009
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Content
• Motive and Plans
• Wormhole router
– Channel Slicing, motivation
– Lookahead, critical cycle
– Implementation
• XY/Stochastic routing scheme
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Why a wormhole router now
• Easy
• Smallest cycle period
• Early performance
estimation
• Proof of channel
slicing
wormhole
SDM
Larger
crossbar /
switch network
QoS Fault-tolerance
Larger
route
scheduler
Larger
Input/Output
buffer controller
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Plan for Router Design (1)
• Wormhole router
– Speed estimation, basic design flow
– Channel slicing, lookahead pipeline
• Spatial Design Multiplex (SDM) router
– Utilizing channel slicing (provide virtual circuit)
– M sub-channels on a port, crossbar*M
– Benes, Clos switch network (ATM)
– Route scheduling in the multi-stage switch
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Plan for Router Design (2)
• Dynamic Link Allocation
– Allocate idle sub-channels to active virtual
circuits to reduce frame latency
– Arbitration planning, crossbar reconfiguration
and buffer planning
• Fault-tolerance
– Error detection, deadlock recovery, route
scheduling algorithm
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Plan for Router Design (3)
• QoS
– Virtual circuit is latency and bandwidth
guaranteed (weak if dynamic link allocation is
used)
– Best Effort is a problem
– Priorities for virtual circuit setup (reduce circuit
setup time for high priority services)
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Content
• Motive and Plans
• Wormhole router
– Channel Slicing, motivation
– Lookahead, critical cycle
– Implementation
• XY/Stochastic routing scheme
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
ChSlice: motive
C C
C C
2-bit
2-bit
CDCD
16d_i d_o
ack_oack_i
8
4
ack
16-bit ack of sub-channels
Advantages: data on all sub-channels are synchronized, ease the time
division multiple access (TDMA) techniques, such as virtual channel and
TDMA
Drawbacks: low speed (66% on CD)
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
ChSlice: implementation
H
H
H
H
H
H
D D
D
D
D
D
D
D D D D T
D D D D D T
D D D DD T
D D D DD T
D D D D D T
D D DD D T
su
b-c
ha
nn
els
time
head routing data
C C2-bit
16
d_i0
ack_i0
C C2-bit
d_i15
ack_i15
d_o0
ack_o0
d_o15
ack_o15
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
ChSlice: conclusion
• Advantage
– fast
• Overhead
– extra controllers
– larger wire-count
• No TDMA techniques but SDM is easy.
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Content
• Motive and Plans
• Wormhole router
– Channel Slicing, motivation
– Lookahead, critical cycle
– Implementation
• XY/Stochastic routing scheme
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Lookahead: pipeline style
N
CDN
+1 CD
N+
2 CD
Di
DN
AN
DN+1
AN+1
Do
Ao
Ai
data data
data data
data data
data data
N
CD
N+
1 CD
N+
2 CD
Di
Ai
DN DN+1
Do
ANAN+1
Ao
Di
DN
AN
DN+1
AN+1
Do
Ao
Ai
data data
data data
datadata
data data
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Lookahead: conclusion
• Advantage
– fast
• Disadvantage
– not QDI
– a small area overhead N
CD
N+
1 CD
N+
2 CD
Di
DN
AN
DN+1
AN+1
Do
Ao
Ai
data data
data data
data data
data data
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Lookahead: critical cycle
pip
elin
e
pip
elin
e
pip
elin
e
pip
elin
e
pip
elin
e
arbiter
ack from
other outputs
data from
other inputs
crossbar
input buffer output buffer
pip
elin
e
pip
elin
e
pip
elin
e
pip
elin
e
pip
elin
e
arbiter
ack from
other outputs
data from
other inputs
crossbar
input buffer output buffer
lon
g lin
e
be
twe
en
rou
ters
d_i
ack_i
d_o
ack_o
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Lookahead: implementation
N
CD
N+
1 CDN
+2 CD
N+
1 CD
cro
ssb
ar
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Content
• Motive and Plans
• Wormhole router
– Channel Slicing, motivation
– Lookahead, critical cycle
– Implementation
• XY/Stochastic routing scheme
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Router: structure
arbiter
arbiter
5 input
ports
5 output
ports
ctl
ctl
80
16
80
16
80
16
80
16
d_i_0
ack_i_0
d_i_4
ack_i_4
d_o_0
ack_o_0
d_o_4
ack_o_4
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Router: data path input buffer crossbar output buffer
ip_d
ib_d ic_d
ib_pa
ib_a
ip_a
rt_err
acken
gnt
oc_a
ic_a
op_a
op_d
ob_doc_d
ob_pa
ob_a
eof
3
2
1
0
eof
3
2
1
0
eof
3
2
1
0
ic_da
eof
acki
ip_d+ ib_d+ ic_d+
ip_a+
ip_d-
ib_a+
ib_d-
ip_a-
ic_d-
oc_d+ ob_d+
ob_pa+
op_d+
ob_a+
oc_a+
ic_a+
ib_pa+
ib_a-
oc_d- ob_d-
ob_pa-
oc_a-
ic_a-
ib_pa-
op_d-
op_a+
op_a-ob_a-
ic_da+
ic_da-
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Router: layout
• Faraday 130nm Technology
• 32-bit, 5 ports, XY routing algorithm
• 0.3x0.3mm (14.3K gates, 0.057mm2)
• Typical corner (25 oC 1.2V)
• Cycle period 1.7 ns (2.35GByte/s per port)
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
ChSlice and Lookahead
Speed: ChSlice 24.1% LH 17.2% Area: ChSlice 23.0% LH 5.3%
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Compare to other routers
• MANGO: 1.26ns; 0.12um; bundled data
• ANoC: 4ns; 0.13um; 1-of-4
• QNoC: 4.8ns; 0.18um; bundled data
• ASPIN: 0.88ns; 90nm; dual-rail & bundled data
• Our: 1.7ns; 0.13um; 1-of-4 & lookahead
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Speed vs. data width
QNoC
Wormhole
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Content
• Motive and Plans
• Wormhole router
– Channel Slicing, motivation
– Lookahead, critical cycle
– Implementation
• XY/Stochastic routing scheme
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
XY/Stochastic
• Motive
– Two routing algorithm is complicated
– The deadlock problem
– The involvement of network interfaces
– Keep router simple
• Solution
– Router: XY
– Network interface: generate, consume, or forward (random)
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
XY/Stochastic (request path)
M M
XY/Stochastic Router Only
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
XY/Stochastic (ack path)
M M
Same Path Single Jump
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Compare
• Rely on routers – A larger router (Special router design)
– Longer routing overhead
– Deadlocks
– Shorter search time
• XY/Stochastic routing – Smaller router (normal router design)
– Shorter routing time
– Only deadlocks caused by errors
– Longer search time (higher priority by QoS)
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Conclusion
• Router design plan
– Wormhole router is the first step
• Wormhole router
– Channel slicing
– SDM is better than TDMA for asynchronous
routers
• XY/Stochastic routing
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Question?