CoNA : Dynamic Application Mapping for Congestion Reduction in Many-Core Systems 2012 IEEE 30th...

Post on 23-Dec-2015

213 views 0 download

Tags:

Transcript of CoNA : Dynamic Application Mapping for Congestion Reduction in Many-Core Systems 2012 IEEE 30th...

1

CoNA : Dynamic Application Mapping forCongestion Reduction in Many-Core

Systems

2012 IEEE 30th International Conference on Computer Design (ICCD)

M. Fattah, M. Ramirez, M. Daneshtalab, P. Liljeberg, J. Plosila

2

Outline

Introduction

Mapping Problem and Evaluation Metrics

Contiguous Neighborhood Allocation Mapping

Experimental Setup

Results and Analysis

Conclusion

3

Outline

Introduction

Mapping Problem and Evaluation Metrics

Contiguous Neighborhood Allocation Mapping

Experimental Setup

Results and Analysis

Conclusion

4

Introduction

An efficient algorithm for run-time application mapping problem

Three novel contributions

First node selection

First task selection

Map the rest of tasks onto nearest neighborhood

5

Outline

Introduction

Mapping Problem and Evaluation Metrics

Contiguous Neighborhood Allocation Mapping

Experimental Setup

Results and Analysis

Conclusion

6

Mapping Problem and Evaluation Metrics

Applications

Ap =TG(T, E) ti ϵ T ei,j ϵ E

Communication platform

AG(Ñ, L)

ñi,j={(ri,j, pei,j)| ñi,jϵ Ñ, 0≤ i<M, 0≤ j<N}

Manhattan Distance : MD(ñi,j, ñm,n ) = (|i - m| + |j - n|)

Mapping function

map: T→ Ñ, s.t. map(ti ) = ñm,n; ∀ti∈T, ∃nm,n∈ Ñ

7

Evaluation Metrics

Packet latency

Average Manhattan Distance

Average Weighted Manhattan Distance

8

Evaluation Metrics (cont.)

Mapped Region Dispersion

Internal Congestion Ratio (ICR)

The number of edges using the same channel with respect to its total number of edges

9

Outline

Introduction

Mapping Problem and Evaluation Metrics

Contiguous Neighborhood Allocation Mapping

Experimental Setup

Results and Analysis

Conclusion

10

Contiguous Neighborhood Allocation Mapping(CoNA)

Three steps

First node selection

Choosing the first task of the application

Contiguous neighborhood allocation

11

CoNA (cont.)

12

CoNA (cont.)

First node selection

The nearest node to the central manager among the nodes with the largest number of available neighbors

13

CoNA (cont.)

Choosing the first task of the application

Selects the task with the largest number of edges

The most intensive communication

14

CoNA (cont.)

Contiguous neighborhood allocation

Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t1, t4), (t2, t4), (t5, t4), (t0, t1), (t3, t2)}

Select the one which fits in the smallest square with the first node

15

CoNA (cont.)

Contiguous neighborhood allocation

Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t1, t4), (t2, t4), (t5, t4), (t0, t1), (t3, t2)}

Select the one which fits in the smallest square with the first node

16

CoNA (cont.)

Contiguous neighborhood allocation

Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t1, t4), (t2, t4), (t5, t4), (t0, t1), (t3, t2)}

Select the one which fits in the smallest square with the first node

17

CoNA (cont.)

Contiguous neighborhood allocation

Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t1, t4), (t2, t4), (t5, t4), (t0, t1), (t3, t2)}

Select the one which fits in the smallest square with the first node

18

CoNA (cont.)

Contiguous neighborhood allocation

Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t1, t4), (t2, t4), (t5, t4), (t0, t1), (t3, t2)}

Select the one which fits in the smallest square with the first node

19

CoNA (cont.)

20

Outline

Introduction

Mapping Problem and Evaluation Metrics

Contiguous Neighborhood Allocation Mapping

Experimental Setup

Results and Analysis

Conclusion

21

Experimental Setup

NoC platform

Plasma processor

Local memory

DMA controller

Tra-NI interface

Central manager (CM)

The maximum number of applications that could be injected per second into the system is denoted as λfull

22

Experimental Setup (cont.)

Simulation

To extract packet latency

FPGA

To investigate CoNA time complexity

Xilinx ML605

23

Experimental Setup (cont.)

Application set

Task graphs are randomly generated (set1) using the Task graph generator

Number of nodes : 4 – 11

Weight of edges : 4 – 16 flits

The weights of applications edges are equally multiplied by 16 (set16)

24

Outline

Introduction

Mapping Problem and Evaluation Metrics

Contiguous Neighborhood Allocation Mapping

Experimental Setup

Results and Analysis

Conclusion

25

Results and Analysis

Packet latency evaluation

Time complexity evaluation

26

Packet latency evaluation

27

Packet latency evaluation (cont.)

28

Packet latency evaluation (cont.)

29

Packet latency evaluation (cont.)

30

Time complexity evaluation

31

Time complexity evaluation (cont.)

32

Outline

Introduction

Mapping Problem and Evaluation Metrics

Contiguous Neighborhood Allocation Mapping

Experimental Setup

Results and Analysis

Conclusion

33

Conclusion

An efficient run-time task allocation is proposed

Reduce internal and external congestions

Three novel contributions

34

Thank you !