isnoc
-
Upload
anonymous-vass3z0wth -
Category
Documents
-
view
216 -
download
0
Transcript of isnoc
-
8/2/2019 isnoc
1/34
Conference title 1
An Efficient Implementation ofDistributed Routing Algorithms in
NoCs Authors: J. Flich, S. Rodrigo, and J. DuatoParallel Architectures Group
Technical University of Valencia, Spain
-
8/2/2019 isnoc
2/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 2
Agenda
Introduction
System environment
Description
Evaluation
[Further evaluations]
Conclusions
-
8/2/2019 isnoc
3/34
LBDR: Efficient Routing Implementation in NoCs INA-OCMC'08 3
Introduction
Multi-core arquitectures are becomingmainstream for designing high performanceprocessors
Performance on single-core solutions is limitedby power
The trend is to integrate a large number of coresinside a chip
Need for a high-performance on-chipinterconnect (NoC) to communicate eficientlybetween all chip devices
-
8/2/2019 isnoc
4/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 4
Introduction (2)
Area, power and delay are the mainconstraints when designing a NoC
Some problems arise:
High integration scale -> communication
reliability issues Fabrication faults
Those problems lead to an irregulartopology still functional
-
8/2/2019 isnoc
5/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 5
Introduction (3)
Virtualization of the chip is also possible thanks tothe increasing number of cores
Efficient use of resources
Distributing system resources among different tasks
So, the original 2D mesh is partitioned intodifferent irregular topologies.
-
8/2/2019 isnoc
6/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 6
Introduction (4)
-
8/2/2019 isnoc
7/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 7
Introduction (5)
To deal with irregular topologies, switches based onforwarding tables are preferred off-chip.
However, on-chip, area, power and delay constraintsare critical as memories do not scale in those terms.
PROPOSAL: LBDR (Logic-Based Distributed Routing) isimplemented to get rid of tables with a minimum logicto allow the use of any distributed routing algorithm.
-
8/2/2019 isnoc
8/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 8
Agenda
Introduction
System environment
Description
Evaluation
[Further evaluations]
Conclusions
-
8/2/2019 isnoc
9/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 9
System environment
For LBDR to be applied, some conditions must befulfilled:
Messages routed with X and Y offsets, every switchmust know its own coordinates
Every end node can communicate with other nodethrough a minimal path
LBDR, on the other hand:
There is no restriction to be applied in systems with or
without virtual channel requeriments.
Supports both wormhole and virtual cut-throughswitching
-
8/2/2019 isnoc
10/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 10
System environment (2)
LBDR is applicable to any routing algorithm thatenforces minimal paths for every source-destination pair:
A deterministic routing algorithm without cyclic
dependencies can be represented by routing restrictions A routing restriction forbids a packet to use two
consecutive channels
-
8/2/2019 isnoc
11/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 11
System environment (3)
-
8/2/2019 isnoc
12/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 12
System environment (4)
-
8/2/2019 isnoc
13/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 13
Agenda
Introduction
System environment
Description
Evaluation
[Further evaluations]
Conclusions
-
8/2/2019 isnoc
14/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 14
Description
LBDR uses two sets of bits:
Routing bits (Rxy), 2 per each output port
Connectivity bits (Cx), 1 per each output port
The four output ports are labeled as N, E, W and S
-
8/2/2019 isnoc
15/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 15
Description (2)
-
8/2/2019 isnoc
16/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 16
Description (3)
-
8/2/2019 isnoc
17/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 17
Description (4)
1st part of logic:
S'=1, W'=1
N'=0, E'=0
2nd part of logic
S''=0 (Rsw=0)
W''=1 (Rws=1, W'=1, S'=1)
Final
W=1 (Cw=1) -> TOARBITER
-
8/2/2019 isnoc
18/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 18
Description (5)
1st part of logic:
S'=1
W'=0, N'=0, E'=0
2nd part of logic
S''=1 (S'=1, E'=0, W'=0)
Final
S=1 (Cs=1) -> TO ARBITER
-
8/2/2019 isnoc
19/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 19
Description (6)
1st part of logic:
S'=1
W'=0, N'=0, E'=0
2nd part of logic
S''=1 (S'=1, E'=0, W'=0)
Final
S=1 (Cs=1) -> TO ARBITER
-
8/2/2019 isnoc
20/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 20
Description (7)
LBDR has visibility of one hop away -> LBDReexpands visibility to two hops away
LBDRe adds four more bits per ouput port. It is asecond set of routing bits (R2xy), meaning that y
direction can be taken two hops away through thex direction
-
8/2/2019 isnoc
21/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 21
Description (8)
(*) For further details of the full logic, please refer to the paper
-
8/2/2019 isnoc
22/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 22
Description (9)
Why LBDRe?
-
8/2/2019 isnoc
23/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 23
Description (10)
-
8/2/2019 isnoc
24/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 24
Description (11)
-
8/2/2019 isnoc
25/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 25
Agenda
Introduction
System environment
Description
Evaluation
[Further evaluations]
Conclusions
-
8/2/2019 isnoc
26/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 26
Evaluation
NOXIM Simulator Wormhole switching
Input port buffer 4-flit long
Packets 32-flit long
8x8 mesh with different irregular topologies
XY, UD and SRh routing algorithms
-
8/2/2019 isnoc
27/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 27
Evaluation (2)
Performance achieved for different routingalgorithms on a 2D mesh
-
8/2/2019 isnoc
28/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 28
Evaluation (3)
Comparison of performance for LBDR and LBDRe
-
8/2/2019 isnoc
29/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 29
Agenda
Introduction
System environment
Description
Evaluation
[Further evaluations]
Conclusions
-
8/2/2019 isnoc
30/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 30
Further evaluations
Study on impact on area, power and delayconstraints
Evaluations achieved with much more detailusing Synopsys Design Compiler and 90nm
technology library from TSMC Good expectations. Region-Based Routing(*), with
much more logic implied than LBDR, gets betterresults than implemented tables
(*) Region-Based Routing: An Efficient Routing Mechanism to TackleUnreliable Hardware in Network on Chips, NoCs 2007
-
8/2/2019 isnoc
31/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 31
Further evaluations (2)
Minimum logic (n x n 2D mesh, d ports): Table-based: n x n x d x d bits
RBR: 4 comparators, 4 registers log2(N)/2 bits, 1 register d+1
bits, 1 register d bits
LBDR: 12 bits per switch (3 per output port), 2 comparators, 2inverters and 5 gates
-
8/2/2019 isnoc
32/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 32
Agenda
Introduction
System environment
Description
Evaluation
[Further evaluations]
Conclusions
-
8/2/2019 isnoc
33/34
An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 33
Conclusions
LBDR (and LBDRe) allows for implementing mostof the distributed routing algorithms in suitabletopologies for NoCs.
Future work:
Applicability on system/chip virtualization
Support non-minimal paths
Broadcast
-
8/2/2019 isnoc
34/34
Conference title 34
Thank you.