UIC Thesis Morandi
-
Upload
usrdresd -
Category
Technology
-
view
278 -
download
1
Transcript of UIC Thesis Morandi
BY
Massimo Morandi
Thesis committee:
John Lillis (Chair), Donatella Sciuto, Mitchell Theys
UIC Thesis Defense: May 9 2008
Runtime Core Allocation Management Runtime Core Allocation Management for 2D Self Partially and Dynamically for 2D Self Partially and Dynamically
Reconfigurable SystemsReconfigurable Systems
2
Rationale and InnovationRationale and Innovation
Problem statementProviding runtime management support for 2D self partial and dynamical reconfiguration, in particular for what concerns Core placement decisions
Innovative contributionsA fast and flexible solution
A low complexity, to avoid introducing too much overhead at runtimeSupporting different scenarios and placement policies, according to user needs
Allowing the possibility to exploit multiple shapes per Core by integration with area constraints definition
3
AimsAims
Our proposed solution must support different scenarios, placement policies and intervention from the designer
It must be fast when compared to related solutions existing in literature
The quality of the placement choices must be high, in terms of percentage of placement success, global application completion time or other metrics, as defined by the user
4
OutlineOutline
Context Definition
Motivations and GoalsThe Complete Polaris WorkflowSpecific Contributions
Area Constraints DefinitionProposed solution
Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution
ResultsConclusions and Future Work
5
Context DefinitionContext Definition
Reconfigurable hardware:Has the capability of changing its configuration (functionality) according to user needs
Self reconfiguration:the system must be completely autonomous at runtime
Partial reconfiguration:the changes can also involve fractions of the device
Dynamical Reconfiguration:if a part of the hardware is reconfigured, the rest can continue its computation
2D Reconfiguration:arbitrary rectangular slots can be dynamically reconfigured, as opposed to arbitrary columns in 1D
6
Field Programmable Gate ArrayField Programmable Gate Array
Minimum Granularity:Physical: there is a minimum unit that can be configured independently, depending on the device (Tile)Practical: since reconfiguration has a cost, it is reasonable to define a multiple of a Tile as the minimum reconfigurable unit (Slot)
7
A bit of TerminologyA bit of Terminology
Bitstream:Binary file defining the configuration of part or all the reconfigurable device (FPGA)
Core:Representation of a functionality, independent of shape and position (example: JPEG)
RFU (Reconfigurable Functional Unit):A Core to which area constraints have been applied (example: JPEG constrained in a 2x3 rectangle)
A partial bitstream defines a RFU, implemented in a specific position defined by bottom-left cornerThe same bitstream can be reused for all positions if we exploit bitstream relocation
8
A bit of TerminologyA bit of Terminology
9
Virtual homogeneityVirtual homogeneity
10
What’s nextWhat’s next
Context Definition
Motivations and GoalsThe Complete Polaris WorkflowSpecific Contributions
Area Constraints DefinitionProposed solution
Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution
ResultsConclusions and Future Work
11
Motivations and goalsMotivations and goals
The creation and management of a self partially and dynamically reconfigurable system is a complex problem
this is even more critical when exploiting the 2D reconfiguration paradigmmore issues in the definition of area constraints, in the core allocation decisionssince the system must be autonomous, it also needs runtime management functionalities
Need for automation in those processesto reduce the workload on the designerto improve efficiency of the final reconfigurable system
12
Motivations and goalsMotivations and goals
Creation of an automated workflow to generate a self dynamically reconfigurable architecture that:
Has “good” area constraints assigned to coresIs autonomous in performing 2D runtime core allocation decisionsExploits relocation to ensure that the system can obtain the configuration bitstreams it needs at runtimeSupports intervention from the designer, to guide or constraint the decisionsKeeps high flexibility and generality
13
The Complete WorkflowThe Complete Workflow
Workflow to automate the creation and management of self dynamically reconfigurable architectures
Input: user specificationsFinal output: complete architecture generation
14
Specific ContributionsSpecific Contributions
In particular, this thesis deals with the solution identification phase of the flowThis involves:
The definition of area constraints for Cores, when the user does not specify themThe creation of Core Allocation Management solutions, able to efficiently manage runtime Core placement
This last task includes:Offering high versatility, supporting different placement policies and different scenariosKeeping low complexity, to avoid too much overhead in the running time of the systemExperimenting techniques to improve the efficiency, for example allowing multiple shapes per Core
15
What’s NextWhat’s Next
Context Definition
Motivations and GoalsThe Complete Polaris WorkflowSpecific Contributions
Area Constraints DefinitionProposed solution
Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution
ResultsConclusions and Future Work
16
Area Constraints DefinitionArea Constraints Definition
The designer can choose to specify or not the AC for each Core in the application
If not specified, they are automatically computed
The designer can also choose wheter to allow multiple shapes per Core (and how many)
Finally, the last parameter represent the tightness of the constraints that will be defined:
Impacts on feasibility of implementationImpacts on performance of the RFU
CORE RFU (or set of RFUs)
17
Area Constraints DefinitionArea Constraints Definition
The constraints are defined with a simple heuristics
First a square-like constraint is defined, using these formulae:
Where H is the height (in slice) and W is the width, S is the number of slices of the Core and m is the tightness
18
Area Constraints DefinitionArea Constraints Definition
Then, the constraints are converted from slice to slots
Where Vg is a granularity parameter, Vslices is the number of vertical slices in the device and avgH is the average height of all the RFUs defined with the square-like formula
Finally, the constraints (in slots) are iteratively altered to horizontally or vertically stretch the Core and obtain multiple RFUs
19
What’s nextWhat’s next
Context Definition
Motivations and GoalsThe Complete Polaris WorkflowSpecific Contributions
Area Constraints DefinitionProposed solution
Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution
ResultsConclusions and Future Work
20
Runtime Core Allocation ManagementRuntime Core Allocation Management
The Problem:Perform the choice of where to place new cores on the reconfigurable areaIn an online scenario: self partial and dynamical reconfiguration
The Goal:Allow efficient usage of the FPGA area Critical in the 2D reconfiguration case
This requires the creation of a solution for allocation management and suitable policies
21
Allocation Manager Desired FeaturesAllocation Manager Desired Features
Low Core Rejection Rate (CRR)% of cores that are not successfully placed in time
Fast application completion timeTime from arrival of first Core to completion of last
Low fragmentation gradeFraction of area that is unusable because too sparse
Small management overheadWe want a lightweight solution to run inside the system
High routing efficiencyIf interacting cores are clustered, the system is more efficient
Need to find a good compromise between them
22
Example: 2D fragmentationExample: 2D fragmentation
the 2D-fragmentation problem:Area generally more fragmentedCan nullify the area optimizations obtained
23
Example: Core RejectionExample: Core Rejection
Bad choices can lead to performance loss and rejectionA: Core C is successfully placed at step 2B: Core C is delayed (possibly rejected, if deadline=2)
24
Considered ScenariosConsidered Scenarios
Dynamic ScheduleCores can arrive at any timeHave an ASAP and an ALAP time (dependencies)Rejection: failure to respect ALAP for a CoreGoal: respect the schedule, CRR is the most important metric and should tend to zero
Blind ScheduleCores can be either available from the start or arrive at different times, no dependencies assumedno ASAP, Cores can optionally have a deadlineIf a Core is not placed, retry laterGoal: application must complete as fast as possibile, rejection is not the main issue, total time is
25
Allocation Manager CreationAllocation Manager Creation
Choose how to maintain information on empty spaceKeep all information (Expensive but more accurate)Heuristically prune information (Cheaper)
Which placement policy to choose:General (First Fit, Best Fit, Worst Fit…)Focused (Fragmentation Aware, Routing Aware… )
Define in which scenario(s) the manager will work
It can also be useful to consider and exploit different shapes of a Core (multiple RFUs per Core scenario)
26
What’s nextWhat’s next
Context Definition
Motivations and GoalsThe Complete Polaris WorkflowSpecific Contributions
Area Constraints DefinitionProposed solution
Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution
ResultsConclusions and Future Work
27
Relevant WorksRelevant Works
Maintain complete information on empty space:
KAMER: K. Bazargan, R. Kastner and M. Sarrafzadeh, ''Fast template placement for reconfigurable computing systems'', IEEE Design and Test of Computers, Vol.17, 2000.
Keep All Maximally Empty RectanglesApply a general placement policy
CUR: A. Ahmadinia and C. Bobda and S. P. Fekete and J. Teich and J. v.d. Veen, ''Optimal Routing-Conscious Dynamic Placement for Reconfigurable Devices'', Field-Programmable Logic and Applications (FPL'04), 2004.
Maintain the Countour of a Union of RectanglesApply a focused placement policy
28
Relevant WorksRelevant Works
Heuristically prune part of the information:
KNER: K. Bazargan, R. Kastner and M. Sarrafzadeh, ''Fast template placement for reconfigurable computing systems'', IEEE Design and Test of Computers, Vol.17, 2000.
Keep Non-overlapping Empty RectanglesApply a general placement policy
2D-HASHING: H. Walder and C. Steiger and M. Platzner, ''Fast Online Task Placement on FPGAs: Free Space Partitioning and 2D-Hashing'', International Parallel and Distributed Processing Symposium (IPDPS'03), 2003.
Keep Non-ov. Empty Rectangles in optimized data structure
Apply (exclusively) a general placement policy
29
Example: Empty Space InformationExample: Empty Space Information
30
EvaluationEvaluation
The solutions with higher placement quality also have higher complexityThe fastest solution cannot exploit focused policies, for example routing aware, and adds the overhead of maintaining the 2D hashing structureCUR does not support all general policies, for example Best Fit is not allowed
31
What’s nextWhat’s next
Context Definition
Motivations and GoalsThe Complete Polaris WorkflowSpecific Contributions
Area Constraints DefinitionProposed solution
Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution
ResultsConclusions and Future Work
32
Proposed ApproachProposed Approach
Choice driven by:Need for a low complexity solution to introduce low overhead at runtime in the self reconfigurable systemDesire to keep high flexibility, to suit user needs also in terms of placement policies
For this reasons we propose an heuristic (KNER-like) empty space manager:
Supporting general and focused placement policies (in particular, First Fit, Best Fit and Routing Aware)Suitable for both dynamic schedule and blind schedule scenariosExploiting multiple RFUs per Core, to improve results
33
Data RepresentationData Representation
Core, defined by:Arrival time,Set of RFUs, each one with:
H, W, Latency
Optional set of communicating Cores (if using RA)ASAP and ALAP (if in dynamic schedule scenario)
Two queues: one for new Coresone for Cores that were not successfully placed and need reexamination
34
Data RepresentationData Representation
Reconfigurable Device, represented as:Binary Tree structure, each node is a Rectangle, each leaf is an empty Rectangle.Navigation trough:
pointers to left child, right child, next leafa function to find the previous leaf (used for bookkeeping after rectangle split and merge operations)
Rectangle, defined by:Coordinates on device: X, YSize: H, WInitially one, the root, with:
(X,Y)=(0,0), H=FPGA Rows, W=FPGA Cols
35
The Online Placement AlgorithmThe Online Placement Algorithm
The whole processing of a Core is completed in linear time
36
The Online Placement AlgorithmThe Online Placement Algorithm
37
The Online Placement AlgorithmThe Online Placement Algorithm
38
What’s nextWhat’s next
Context Definition
Motivations and GoalsThe Complete Polaris WorkflowSpecific Contributions
Area Constraints DefinitionProposed solution
Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution
ResultsConclusions and Future Work
39
Evaluation of the proposed solutionEvaluation of the proposed solution
To evaluate the quality of the proposed approach in various scenarios and with different metrics 3 kinds of experiment were performed:
1) A comparison against presented literature solutionsIn a dynamic schedule scenarioWith a Routing Aware placement policyMeasuring CRR (and indirectly fragmentation), routing costs and computational overheadResults published in:
M. MORANDI, M. Novati, M. D. Santambrogio, D. Sciuto, “Core allocation and relocation management for a self dynamically recongurable architecture”, IEEE Computer Society Annual Symposium on VLSI, 2008
40
Evaluation of the proposed solutionEvaluation of the proposed solution
2) A measure of application completion timeComposed of real Cores used as benchmarksIn a blind schedule scenarioDirectly measuring application completion time, gaining some insight on CRR and fragmentation
3) Evaluation of the multiple shapes per Core approachComparison between our solution with multiple shapes and KNER (adapted to blind schedule scenario)In a mixed scenario (blind schedule with deadlines and variable arrival times)Using both First Fit and Best FitMeasure of CRR and running time
41
Experiment 1: Routing AwareExperiment 1: Routing Aware
Version of our general solution:Tailored to minimize routing pathsCompared with close solutions from literatureNamed in the table RALP (Routing Aware Linear Placer)
Benchmark of 100 randomly generated tasks:Size (5% to 20% of FPGA), randomly interconnected
42
Experiment 2: Appl. Completion TimeExperiment 2: Appl. Completion Time
Benchmark applications composed of cores taken from opencores.org like JPEG, AES, 3DESMeasure the time instants needed to complete the applications with different amounts of resources
Infinite resources is shown, to compare against the lower bound
43
Experiment 3: Multiple ShapesExperiment 3: Multiple Shapes
Similar benchmark, but Cores have deadlines (for CRR)Shapes defined using the heuristic described previously
Difference in runtime is on average 30% more for 3 shapes and 40% more for 5 shapes w.r.t. 1 shapeCRR is more than halved, often reduced to one third
44
Numerical ExampleNumerical Example
To give an idea of the goodness of the obtained results, it is useful to give some numerical values for reconfiguration
Let us consider a JPEG Core, described by a 690 Kb configuration bitstream for a V4 device and using about 10% of the total area
Reconfiguration time: 150 msRelocation time: 90 msPlacement time: 0.4 ms
The obtained time is low and is suitable to actual usage in a real system
45
Concluding RemarksConcluding Remarks
The proposed solution offers:High versatility, supporting different placement policies and scenarios, designer intervention, multiple shapesLow overhead, always processing a Core in linear time and obtaining good results compared with literatureGood CRR, especially when exploiting multiple shapesFast application completion time, as shown by exp. 2Effective routing costs reduction, when used in conjunction with a Routing Aware policy (exp. 1)
The original goals were metUnder Review:
S. Corbetta, M. MORANDI, M. Novati, M. D. Santambrogio, D. Sciuto, P. Spoletini, “Internal and External Bitstream Relocation for Partial Dynamic Reconfiguration”, IEEE Transactions on VLSI (2nd review)
46
Future WorkFuture Work
Future work will be in the direction of integration with the rest of the workflow that was briefly introduced
The parts that were described achieved good results as a stand-alone in the runtime management of the reconfigurable system, it is important to evaluate them also inside the complete workflow
The final goal is to achieve complete automation in the creation process of a self dynamically reconfigurable architecture, from user specification up to bistreams and processor code generation
47
General InformationGeneral Information
Webpagewww.dresd.org/polaris
Mailing [email protected]
ContactTo have more information regarding Polaris:
For a complete list of information on how to contact us:www.dresd.org/contact_polaris
48
QuestionsQuestions