Ant Colony Optimization for Mapping, Scheduling and Placing in Reconfigurable Systems
-
Upload
quintessa-holden -
Category
Documents
-
view
22 -
download
0
description
Transcript of Ant Colony Optimization for Mapping, Scheduling and Placing in Reconfigurable Systems
Torino (Italy) – June 25th, 2013
Ant Colony Optimization for Mapping, Scheduling and Placing in Reconfigurable
Systems
Fabrizio Ferrandi, PierLuca Lanzi, Christian Pilato, Donatella SciutoPolitecnico di Milano – Dip. di Elettronica, Informazione e Bioingegneria
Antonino TumeoPacific Northwest National Laboratory – Richland, WA, U.S.A
NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013)
Torino, Italy – June 25-27, 2013
Christian Pilato – Politecnico di Milano, Italy
Outline
Motivation
Related Work
Preliminaries and Motivation
Proposed Exploration Methodology
Experimental Results
Conclusions and Future Work
2
Christian Pilato – Politecnico di Milano, Italy
Heterogeneous Systems
Mapping and scheduling of partitioned applications are crucial in particular for heterogeneous MPSoCs
Different design constraints and overheads have to be necessarily considered to provide feasible and efficient solutions (e.g., limited area for hardware devices, interconnection topology, …)
Constructive methods are definitively required
Ant Colony Optimization is a promising constructive method toto produce very efficient solutions for the combined problem
Considering FPGAs, possibility of introducing dynamic reconfiguration introduces several challenges to be taken into account
3
Christian Pilato – Politecnico di Milano, Italy
Reconfigurable Systems
Partial Dynamic Reconfiguration allows changing portion of FPGA configuration at run time
reuse of the device area to accelerate even more sections of an application
Additional constraints and overheads are introducedreconfiguration latencies, number or reconfiguration ports and processing elements to drive the reconfiguration.
Accurate placement of the hardware components is critical
Concurrent exploration of the design space for mapping, scheduling and placing of the tasks
Christian Pilato – Politecnico di Milano, Italy
Related Work
[Niemann and Marwedel 1997] Exact solutions for the combined problem with an ILP formulation on DAGs.
Different heuristic methods have been proposed to approach the problem[Pilato et al. 2010] Ant Colony Optimization (ACO) has been demonstrated to produce good solutions, limiting the number of unfeasible ones
[Banerjee et al. 2006] Optimization method based on Kernighan-Lin-Fiduccia-Matthesys (KLFM) adopts heuristics for the scheduling and the placing of the tasks
Reduced exploration in the design space
5
Christian Pilato – Politecnico di Milano, Italy
Generic architectural template composed of processing and communication elements. A valid test case is the following one:
Number of pre-defined blocks where the tasks can be placedGranularity and occupation for each task have to be defined in advance
Target Architectural Template
ARM
DSP
Loca
l M
em
ory
Loca
l M
em
ory
Local Memory
PowerPC
Shared Memory
Sh
are
d b
us
6
CnC1C0 …
Christian Pilato – Politecnico di Milano, Italy
Preliminaries: Problem Definition
Job: generic activity (task or communication) to be completed in order to execute the specification
Implementation point: the mode for the execution of a job. It represents a combination of latency and requirements of resources on the related target component
Mapping: assign each job to an admissible implementation point, respecting the architectural constraints (e.g., the limited resources of the components)
Scheduling: determine the order of execution of all the jobs of the specification in terms of priorities
Placing: determine the physical position of all the tasks that have to be executed in hardware
Objective: minimize the overall execution time of the application on the target architecture
7
Christian Pilato – Politecnico di Milano, Italy
ACO Principles
Ant Colony Optimization (ACO) heuristic is a constructive approach that limits as much as possible the generation of unfeasible solutions
Constructive approach, based on a decision tree, to generate parts of the solution based on the decisions taken in the previous parts.Analysis and evaluation of different combinations of mapping, scheduling and placing
Decision is based on a combination of local and global information, through a roulette wheel mechanism
Stochastic principles guarantee the explorationHeuristic principles and feed-backs guarantee the exploitation of good parts of the solutions
8
Christian Pilato – Politecnico di Milano, Italy
Design Space Exploration with ACO
Initialize pheromones
Prepare N ants
Compute the set C of candidates
Perform a decision
Update set C of candidate
Evaluate design solution
Update pheromones
ACO
Colony
Ant
9
Christian Pilato – Politecnico di Milano, Italy
Stochastic Selection Process
At each decision point d, the probability to assign a candidate job j to a proper implementation point i is:
Global information G: feedback informationProbability that the decision leads to a good solution
Local heuristic L: problem-specific hint“Adjusted” by the global heuristic if wrong
Roulette wheel and extraction of a combination i, jThe ant does not generate the probability if the decision leads to a constraint violation
nkijdijd
ijdijdijd
nknk LG
LGp
,,,,,
,,,,,, ][][
][][
global heuristic local heuristic
Christian Pilato – Politecnico di Milano, Italy
Decision Methods for Combined Problem
Scheduling can include both reconfiguration and execution tasksExecuting tasks can be eligible only if the dependences are satisfiedReconfiguration tasks are always available (implicit hardware assignment)
When a reconfiguration task is selected, it is generate a candidate choice also with respect to the position in the FPGA for its execution
The latency of reconfiguration tasks depends on where the task is assigned (i.e., if the reconfiguration takes effectively place or not)Scheduling of the reconfiguration takes into account also the availability of the reconfiguration port (ICAP) and the processor driving the reconfiguration
Christian Pilato – Politecnico di Milano, Italy
Solution Evaluation for a Task Graph
The decisions performed by the ant give a traceSequence of jobs, where each of them is assigned to an implementation pointThe position into the trace represents the priority for the scheduling (if they have been selected early, they have higher priority…)
List-based scheduler based on the trace (i.e., the implementation points and the priority values)
Different decisions performed by the ant correspond in exploring different design solutions (combination of mapping and scheduling)
Return overall execution time of the applicationFeedback to compare different solutions (reinforcement/penality of the global heuristic for the corresponding decisions)
12
Christian Pilato – Politecnico di Milano, Italy
Experimental Setup
Target architecture composed of an ARM processor, a Digital Signal processor and an FPGA that also embeds a Power Pc processor
It allows to explore both hardware and software solutions
Synthetic benchmark to evaluate the scalability of the approach
We compared the ACO solutions with other search methods[Pilato et al. 2010] ACO where PDR is not supported: tasks can be allocated to the FPGA as long as they fit into the available area• Advantages of the PDR technique
[Banerjee et al. 2008] KLFM with support for PDR• Advantages of the ACO method
13
Christian Pilato – Politecnico di Milano, Italy
Experimental Results
[9] corresponds to [Pilato et al. 2010]: ACO without support for PDRGreat advantages in introducing PDR
[10] corresponds to [Banerjee et al. 2006]: KLFM with support for PDRACO performs better in terms of quality of the solutions• Better exploration of the design space
Much more scalable• KLFM get stuck in approaching larger benchmarks
14
Christian Pilato – Politecnico di Milano, Italy
Conclusions and Future Work
Ant Colony Optimization is very attractive to generate solutions for designing heterogeneous MPSoCs
Handling of design constraints is very simple and efficientConstructive approach that limits unfeasible solutionsSupport for different architectural templates can be easily provided
Results show that it is able to outperform most of the existing search methods
More robust and scalable
Future work:Closer integration with estimation methods and/or high-level synthesis for creating the implementations
15