CoNA : Dynamic Application Mapping for Congestion Reduction in Many-Core Systems 2012 IEEE 30th...
Transcript of CoNA : Dynamic Application Mapping for Congestion Reduction in Many-Core Systems 2012 IEEE 30th...
1
CoNA : Dynamic Application Mapping forCongestion Reduction in Many-Core
Systems
2012 IEEE 30th International Conference on Computer Design (ICCD)
M. Fattah, M. Ramirez, M. Daneshtalab, P. Liljeberg, J. Plosila
2
Outline
Introduction
Mapping Problem and Evaluation Metrics
Contiguous Neighborhood Allocation Mapping
Experimental Setup
Results and Analysis
Conclusion
3
Outline
Introduction
Mapping Problem and Evaluation Metrics
Contiguous Neighborhood Allocation Mapping
Experimental Setup
Results and Analysis
Conclusion
4
Introduction
An efficient algorithm for run-time application mapping problem
Three novel contributions
First node selection
First task selection
Map the rest of tasks onto nearest neighborhood
5
Outline
Introduction
Mapping Problem and Evaluation Metrics
Contiguous Neighborhood Allocation Mapping
Experimental Setup
Results and Analysis
Conclusion
6
Mapping Problem and Evaluation Metrics
Applications
Ap =TG(T, E) ti ϵ T ei,j ϵ E
Communication platform
AG(Ñ, L)
ñi,j={(ri,j, pei,j)| ñi,jϵ Ñ, 0≤ i<M, 0≤ j<N}
Manhattan Distance : MD(ñi,j, ñm,n ) = (|i - m| + |j - n|)
Mapping function
map: T→ Ñ, s.t. map(ti ) = ñm,n; ∀ti∈T, ∃nm,n∈ Ñ
7
Evaluation Metrics
Packet latency
Average Manhattan Distance
Average Weighted Manhattan Distance
8
Evaluation Metrics (cont.)
Mapped Region Dispersion
Internal Congestion Ratio (ICR)
The number of edges using the same channel with respect to its total number of edges
9
Outline
Introduction
Mapping Problem and Evaluation Metrics
Contiguous Neighborhood Allocation Mapping
Experimental Setup
Results and Analysis
Conclusion
10
Contiguous Neighborhood Allocation Mapping(CoNA)
Three steps
First node selection
Choosing the first task of the application
Contiguous neighborhood allocation
11
CoNA (cont.)
12
CoNA (cont.)
First node selection
The nearest node to the central manager among the nodes with the largest number of available neighbors
13
CoNA (cont.)
Choosing the first task of the application
Selects the task with the largest number of edges
The most intensive communication
14
CoNA (cont.)
Contiguous neighborhood allocation
Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t1, t4), (t2, t4), (t5, t4), (t0, t1), (t3, t2)}
Select the one which fits in the smallest square with the first node
15
CoNA (cont.)
Contiguous neighborhood allocation
Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t1, t4), (t2, t4), (t5, t4), (t0, t1), (t3, t2)}
Select the one which fits in the smallest square with the first node
16
CoNA (cont.)
Contiguous neighborhood allocation
Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t1, t4), (t2, t4), (t5, t4), (t0, t1), (t3, t2)}
Select the one which fits in the smallest square with the first node
17
CoNA (cont.)
Contiguous neighborhood allocation
Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t1, t4), (t2, t4), (t5, t4), (t0, t1), (t3, t2)}
Select the one which fits in the smallest square with the first node
18
CoNA (cont.)
Contiguous neighborhood allocation
Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t1, t4), (t2, t4), (t5, t4), (t0, t1), (t3, t2)}
Select the one which fits in the smallest square with the first node
19
CoNA (cont.)
20
Outline
Introduction
Mapping Problem and Evaluation Metrics
Contiguous Neighborhood Allocation Mapping
Experimental Setup
Results and Analysis
Conclusion
21
Experimental Setup
NoC platform
Plasma processor
Local memory
DMA controller
Tra-NI interface
Central manager (CM)
The maximum number of applications that could be injected per second into the system is denoted as λfull
22
Experimental Setup (cont.)
Simulation
To extract packet latency
FPGA
To investigate CoNA time complexity
Xilinx ML605
23
Experimental Setup (cont.)
Application set
Task graphs are randomly generated (set1) using the Task graph generator
Number of nodes : 4 – 11
Weight of edges : 4 – 16 flits
The weights of applications edges are equally multiplied by 16 (set16)
24
Outline
Introduction
Mapping Problem and Evaluation Metrics
Contiguous Neighborhood Allocation Mapping
Experimental Setup
Results and Analysis
Conclusion
25
Results and Analysis
Packet latency evaluation
Time complexity evaluation
26
Packet latency evaluation
27
Packet latency evaluation (cont.)
28
Packet latency evaluation (cont.)
29
Packet latency evaluation (cont.)
30
Time complexity evaluation
31
Time complexity evaluation (cont.)
32
Outline
Introduction
Mapping Problem and Evaluation Metrics
Contiguous Neighborhood Allocation Mapping
Experimental Setup
Results and Analysis
Conclusion
33
Conclusion
An efficient run-time task allocation is proposed
Reduce internal and external congestions
Three novel contributions
34
Thank you !