isnoc

download isnoc

of 34

Transcript of isnoc

  • 8/2/2019 isnoc

    1/34

    Conference title 1

    An Efficient Implementation ofDistributed Routing Algorithms in

    NoCs Authors: J. Flich, S. Rodrigo, and J. DuatoParallel Architectures Group

    Technical University of Valencia, Spain

  • 8/2/2019 isnoc

    2/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 2

    Agenda

    Introduction

    System environment

    Description

    Evaluation

    [Further evaluations]

    Conclusions

  • 8/2/2019 isnoc

    3/34

    LBDR: Efficient Routing Implementation in NoCs INA-OCMC'08 3

    Introduction

    Multi-core arquitectures are becomingmainstream for designing high performanceprocessors

    Performance on single-core solutions is limitedby power

    The trend is to integrate a large number of coresinside a chip

    Need for a high-performance on-chipinterconnect (NoC) to communicate eficientlybetween all chip devices

  • 8/2/2019 isnoc

    4/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 4

    Introduction (2)

    Area, power and delay are the mainconstraints when designing a NoC

    Some problems arise:

    High integration scale -> communication

    reliability issues Fabrication faults

    Those problems lead to an irregulartopology still functional

  • 8/2/2019 isnoc

    5/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 5

    Introduction (3)

    Virtualization of the chip is also possible thanks tothe increasing number of cores

    Efficient use of resources

    Distributing system resources among different tasks

    So, the original 2D mesh is partitioned intodifferent irregular topologies.

  • 8/2/2019 isnoc

    6/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 6

    Introduction (4)

  • 8/2/2019 isnoc

    7/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 7

    Introduction (5)

    To deal with irregular topologies, switches based onforwarding tables are preferred off-chip.

    However, on-chip, area, power and delay constraintsare critical as memories do not scale in those terms.

    PROPOSAL: LBDR (Logic-Based Distributed Routing) isimplemented to get rid of tables with a minimum logicto allow the use of any distributed routing algorithm.

  • 8/2/2019 isnoc

    8/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 8

    Agenda

    Introduction

    System environment

    Description

    Evaluation

    [Further evaluations]

    Conclusions

  • 8/2/2019 isnoc

    9/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 9

    System environment

    For LBDR to be applied, some conditions must befulfilled:

    Messages routed with X and Y offsets, every switchmust know its own coordinates

    Every end node can communicate with other nodethrough a minimal path

    LBDR, on the other hand:

    There is no restriction to be applied in systems with or

    without virtual channel requeriments.

    Supports both wormhole and virtual cut-throughswitching

  • 8/2/2019 isnoc

    10/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 10

    System environment (2)

    LBDR is applicable to any routing algorithm thatenforces minimal paths for every source-destination pair:

    A deterministic routing algorithm without cyclic

    dependencies can be represented by routing restrictions A routing restriction forbids a packet to use two

    consecutive channels

  • 8/2/2019 isnoc

    11/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 11

    System environment (3)

  • 8/2/2019 isnoc

    12/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 12

    System environment (4)

  • 8/2/2019 isnoc

    13/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 13

    Agenda

    Introduction

    System environment

    Description

    Evaluation

    [Further evaluations]

    Conclusions

  • 8/2/2019 isnoc

    14/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 14

    Description

    LBDR uses two sets of bits:

    Routing bits (Rxy), 2 per each output port

    Connectivity bits (Cx), 1 per each output port

    The four output ports are labeled as N, E, W and S

  • 8/2/2019 isnoc

    15/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 15

    Description (2)

  • 8/2/2019 isnoc

    16/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 16

    Description (3)

  • 8/2/2019 isnoc

    17/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 17

    Description (4)

    1st part of logic:

    S'=1, W'=1

    N'=0, E'=0

    2nd part of logic

    S''=0 (Rsw=0)

    W''=1 (Rws=1, W'=1, S'=1)

    Final

    W=1 (Cw=1) -> TOARBITER

  • 8/2/2019 isnoc

    18/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 18

    Description (5)

    1st part of logic:

    S'=1

    W'=0, N'=0, E'=0

    2nd part of logic

    S''=1 (S'=1, E'=0, W'=0)

    Final

    S=1 (Cs=1) -> TO ARBITER

  • 8/2/2019 isnoc

    19/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 19

    Description (6)

    1st part of logic:

    S'=1

    W'=0, N'=0, E'=0

    2nd part of logic

    S''=1 (S'=1, E'=0, W'=0)

    Final

    S=1 (Cs=1) -> TO ARBITER

  • 8/2/2019 isnoc

    20/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 20

    Description (7)

    LBDR has visibility of one hop away -> LBDReexpands visibility to two hops away

    LBDRe adds four more bits per ouput port. It is asecond set of routing bits (R2xy), meaning that y

    direction can be taken two hops away through thex direction

  • 8/2/2019 isnoc

    21/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 21

    Description (8)

    (*) For further details of the full logic, please refer to the paper

  • 8/2/2019 isnoc

    22/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 22

    Description (9)

    Why LBDRe?

  • 8/2/2019 isnoc

    23/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 23

    Description (10)

  • 8/2/2019 isnoc

    24/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 24

    Description (11)

  • 8/2/2019 isnoc

    25/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 25

    Agenda

    Introduction

    System environment

    Description

    Evaluation

    [Further evaluations]

    Conclusions

  • 8/2/2019 isnoc

    26/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 26

    Evaluation

    NOXIM Simulator Wormhole switching

    Input port buffer 4-flit long

    Packets 32-flit long

    8x8 mesh with different irregular topologies

    XY, UD and SRh routing algorithms

  • 8/2/2019 isnoc

    27/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 27

    Evaluation (2)

    Performance achieved for different routingalgorithms on a 2D mesh

  • 8/2/2019 isnoc

    28/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 28

    Evaluation (3)

    Comparison of performance for LBDR and LBDRe

  • 8/2/2019 isnoc

    29/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 29

    Agenda

    Introduction

    System environment

    Description

    Evaluation

    [Further evaluations]

    Conclusions

  • 8/2/2019 isnoc

    30/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 30

    Further evaluations

    Study on impact on area, power and delayconstraints

    Evaluations achieved with much more detailusing Synopsys Design Compiler and 90nm

    technology library from TSMC Good expectations. Region-Based Routing(*), with

    much more logic implied than LBDR, gets betterresults than implemented tables

    (*) Region-Based Routing: An Efficient Routing Mechanism to TackleUnreliable Hardware in Network on Chips, NoCs 2007

  • 8/2/2019 isnoc

    31/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 31

    Further evaluations (2)

    Minimum logic (n x n 2D mesh, d ports): Table-based: n x n x d x d bits

    RBR: 4 comparators, 4 registers log2(N)/2 bits, 1 register d+1

    bits, 1 register d bits

    LBDR: 12 bits per switch (3 per output port), 2 comparators, 2inverters and 5 gates

  • 8/2/2019 isnoc

    32/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 32

    Agenda

    Introduction

    System environment

    Description

    Evaluation

    [Further evaluations]

    Conclusions

  • 8/2/2019 isnoc

    33/34

    An Efficient Implementation of Distributed Routing Algorithms in NoCs NoCs'08 33

    Conclusions

    LBDR (and LBDRe) allows for implementing mostof the distributed routing algorithms in suitabletopologies for NoCs.

    Future work:

    Applicability on system/chip virtualization

    Support non-minimal paths

    Broadcast

  • 8/2/2019 isnoc

    34/34

    Conference title 34

    Thank you.