The NoX Router
description
Transcript of The NoX Router
![Page 1: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/1.jpg)
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE
The NoX Router
Mitchell Hayenga
Mikko Lipasti
![Page 2: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/2.jpg)
2/19The NoX Router, Micro’11
Overview• New low-latency router technique
– Don’t arbitrate or speculate! Encode.• XOR Property (A^B) ^ B = A
– Hides arbitration latency– Eliminates dead cycles
• The NoX Router– Single-cycle/wormhole/mesh implementation– Frequency competitive with pure speculative– 2.7%-34.4% better ED2 on application traces– Up to 9.9% better throughput on synthetic traffic
Control
Input Channel
SwitchFabric
![Page 3: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/3.jpg)
3/19The NoX Router, Micro’11
Motivation• Modern On-Chip Networks
– Bandwidth Plentiful, Latency Critical– Control
• Complex, Speculative, Critical Path
– Datapath• Fast, Simple, Wire-Dominated
• NoX Tradeoff– Marginal increase in datapath complexity– Hide control latency
Intel Teraflops Router
LTBWNRC VA SA ST
LTRC VA SA STBW
LTBWNRC
VASA ST
LTVA
NRCSA
ST
Virtual Channel Router Pipeline Evolution
![Page 4: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/4.jpg)
4/19The NoX Router, Micro’11
Switch Arbitration Techniques• Non-Speculative
– Arbitration occurs before switch traversal
• Speculative Switch Traversal [Mullins ISCA 2004]– Assume contention doesn’t happen– Wasted cycle in the event of contention
• Arbiter decides what gets sent on the next cycle
SwitchFabric
Control
B
AA
clkport 0port 1grantvalid outdata out
0 1 4cycle 2 3
A
p0
A
ABp1
???
B
AA ?
B
A
p0
B A
A
BA
No Contention Contention
B WinsA Wins
![Page 5: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/5.jpg)
5/19The NoX Router, Micro’11
Switch Arbitration Techniques• Non-Speculative
– Arbitration occurs before switch traversal
• Speculative Switch Traversal [Mullins ISCA 2004]– Assume contention doesn’t happen– Wasted cycle in the event of contention
• Arbiter decides what gets sent on the next cycle
• Encoding– Blindly transmit, XOR within switch fabric– No contention - data sent unmodified– Contention - data sent XOR’d
• Arbiter decides what was sent
SwitchFabric
Control
B
A
B
AA A^BA
0 1 4cycle 2 3clkport 0port 1grantvalid outdata out
A
p0
A
ABp1
B^A
A
A
A
No Contention Contention
B Wins
![Page 6: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/6.jpg)
6/19The NoX Router, Micro’11
Coded
Flit Buffer
AA
^B^C
B^C C
Receive Logic• Works upon simple XOR property.
– (A^B^C) ^ (B^C) = A
• Simple Decode– Always able to decode by XORing two sequential values– Maintains previous router’s arbitration order/fairness
A
0
0
B^C
1
A^B
^C
C B^C
B
![Page 7: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/7.jpg)
7/19The NoX Router, Micro’11
Tradeoffs and Scaling• Arbitration
– O(log n) delay for most arbiters
• Decode logic– Constant with respect to # of ports
• Switch Fabric– XOR delay scales slightly worse than a
mux/tristate-based solution– Maybe not an issue (control latency)
Control
Input Channel
SwitchFabricSwitchFabric
![Page 8: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/8.jpg)
8/19The NoX Router, Micro’11
The NoX Router• Network of XORs• Implementation Details
– 8x8 Mesh, 2mm long 64-bit links– Single Cycle (Router+Link)– Wormhole– Dimension ordered routing– Minimally buffered
![Page 9: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/9.jpg)
9/19The NoX Router, Micro’11
Baseline Designs• Non-Speculative
– Serial arbitration & switch logic– Long cycle time– Efficient link utilization
• Speculative Techniques [Mullins ISCA 2004]– Hides arbitration latency– Potential for wasted link bandwidth– Spec-Fast & Spec-Accurate [Mullins ASP-DAC 2006]
![Page 10: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/10.jpg)
10/19The NoX Router, Micro’11
Frequency Analysis• Overheads present in all designs
– 248ps SRAM delay– 98ps link latency
Architecture Clock Period %Non-Speculative 0.92 ns -Spec-Fast 0.69 ns 33.3%Spec-Accurate 0.72 ns 27.7%NoX 0.76 ns 21.1%
![Page 11: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/11.jpg)
11/19The NoX Router, Micro’11
Synthetic Traffic - Latency
bandwidth (MB/s/node) bandwidth (MB/s/node)
![Page 12: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/12.jpg)
12/19The NoX Router, Micro’11
Synthetic Traffic – ED2
bandwidth (MB/s/node) bandwidth (MB/s/node)
![Page 13: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/13.jpg)
13/19The NoX Router, Micro’11
Application Traffic - Latency
![Page 14: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/14.jpg)
14/19The NoX Router, Micro’11
Application Traffic – ED2
![Page 15: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/15.jpg)
15/19The NoX Router, Micro’11
Power @ Fixed Bandwidth• Traffic Pattern
– Uniform Random– 2GB/s/node injection rate
• Spec-Fast saturated
• Switch/Link glitching in speculative
• Marginal additional decode power
Decodenegligible
![Page 16: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/16.jpg)
16/19The NoX Router, Micro’11
Area Floorplanning
Standard Router NoX Router
Por
t 0 –
64x
4 S
RA
MP
ort 1
– 6
4x4
SR
AM
Por
t 2 –
64x
4 S
RA
MP
ort 3
– 6
4x4
SR
AM
Por
t 4 –
64x
4 S
RA
M
Crossbar
Dec
odin
g an
d M
aski
ng
140
µm
70 µm 101.0 µm
161.
2 µm
Por
t 0 –
64x
4 S
RA
MP
ort 1
– 6
4x4
SR
AM
Por
t 2 –
64x
4 S
RA
MP
ort 3
– 6
4x4
SR
AM
Por
t 4 –
64x
4 S
RA
M
140
µm
70 µm
XORSwitch
102.2 µm
161.
2 µm
28 µm
~17% More Area
![Page 17: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/17.jpg)
17/19The NoX Router, Micro’11
Going Further• Input Speedup
– What if we could drive two values from an input buffer in a single cycle
– Final decode step has 2 values available• Last packet sees no additional delay
from contention at the previous router
• Multi-hop encoded forwarding– Don’t decode @ every hop, decode
when packets diverge– Allow new collisions with the “head” flit– Requires additional sideband info
SwitchFabric
Flit Buffer
A^B
B
AB
![Page 18: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/18.jpg)
18/19The NoX Router, Micro’11
Conclusion• New encoding-based low-latency router technique
– Hides arbitration latency– Comparable frequency to speculative switch traversal techniques– Eliminates wasted interconnect bandwidth– Promising application to multiple router architectures
![Page 19: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/19.jpg)
19/19The NoX Router, Micro’11
Thanks – Questions?
![Page 20: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/20.jpg)
20/19The NoX Router, Micro’11
Virtual Channels• Future Work• Physical Channels vs. Virtual Channels
– VC Router Benefits Dynamic bandwidth sharing (performance)
– VC Router Negatives Increased arbitration delay (performance) Increased buffer energy (power) Large unified crossbar (area, power)
• Possible but tradeoffs need to be re-evaluated– Structuring of input buffers/decode logic– VC credit accounting
![Page 21: The NoX Router](https://reader036.fdocuments.in/reader036/viewer/2022081502/5681664c550346895dd9c4e7/html5/thumbnails/21.jpg)
21/19The NoX Router, Micro’11
Multi-Flit Support• Current support is conservative
– Performs similarly to speculative routers if multi-flit packets collide– Not all bad though
• ~70% of packets are single-flit coherence packets• Only head-flit collisions matter• Requests all single-flit
• Alternatives– Fragment multi-flit packets– Provide sufficient buffering space