RCIM 2008 - Allocation Relocation
-
Upload
usrdresd -
Category
Technology
-
view
253 -
download
1
Transcript of RCIM 2008 - Allocation Relocation
HLR
Core allocation and relocation Core allocation and relocation
management for self dynamically management for self dynamically
reconfigurable architecturesreconfigurable architectures
Massimo Morandi: [email protected]
Marco Novati: [email protected]
Reconfigurable Computing Italian MeetingReconfigurable Computing Italian Meeting19 December 2008
Room S01, Politecnico di Milano - Milan (Italy)
2
OutlineOutline
Aims
Introduction
Basic Concepts
Rationale
Relocation solutions
Core Allocation manager
Concluding Remarks
3
AimsAims
Provide support for self partially and dynamically reconfigurable systems:
Relocation support:
1D and 2D solutions
SW and HW solutions
Runtime Core Placement support with:
Low overhead
High versatility
Efficient use of resources
44
Reconfigurable architectureReconfigurable architecture
A basic reconfigurable architecture consists of:
a Static area: a basic Harward architecture
a Reconfigurable area: an device area composed by
several reconfigurable regions
5
Basic DefinitionsBasic Definitions
CoreCore: a specific representation of a functionality. It is possible, for example, to have a core described in VHDL, in C or in an intermediate representation (e.g. a DFG)
IPIP--CoreCore: a core described using a HD Language combined with its communication infrastructure (i.e. the bus interface)
Reconfigurable Functional UnitReconfigurable Functional Unit: an IP-Core that can be plugged and/or unplugged at runtime in an already working architecture
Reconfigurable RegionReconfigurable Region: a portion of the device area used to implement a reconfigurable core
6
Relocation: The ProblemRelocation: The Problem
Set of Available
Functionalities
FiArea/Time
Legenda:
A2/1
B 1/2
C2/2
D 1/1 E 1/1
F 2/2
RR3RR2RR1
A
RR3RR2RR1
F
RR3RR2RR1
D
RR3RR2RR1
B
RR3RR2RR1
C
E
RR3RR2RR1
RFU
Implementations
7
Relocation: MotivationRelocation: Motivation
A
E
DC
B
F
2/1
2/2
1/2
1/1
1/1
2/2
Demanded Tasks
TiArea/Time
Request
Sequnce
Legenda: D
R2D
R2F
F
Area
Time
Area
AB
AB
Rec. C
C
Rec. F
F
Rec. E
E
Rec. C
C
Rec. D
D
8
Relocation: RationaleRelocation: Rationale
Bitstreams relocation technique to:
speedup the overall system execution
reduce the amount of memory used to store partial bitstreams
achieve a core preemptive execution
assign at runtime the bitstreams placement
99
Proposed Relocation SolutionsProposed Relocation Solutions
Architectural support for relocation:
Create an integrated HW/SW system to manage online
relocation (1D and 2D) in reconfigurable architecture
Create efficient bitstream relocation solutions suitable
for the target system:
1D (BiRF) – 2D (BiRF Square)
HW (BiRF, BiRF Square) – SW (BAnMaT Lite)
10
Xilinx FPGAs and Configuration MemoryXilinx FPGAs and Configuration Memory
CRC CalculationCRC Calculation
Particular CRC value, used by Xilinx tools
Two version of BiRF and BiRF Square:
By using the “predefined” values
With actual CRC calculation
X16 + X15 + X2 + 1 [1D]
X32 + X28 + X27 + X26 + X25 + X23 + X22 + X20 + X19 + X18 + X14 + X13 + X11 + X10 + X9 + X8 + X6 + 1 [2D]
11
Synthesis Results: AreaSynthesis Results: Area
12
FPGA
BiRF BiRF Square
GenericVersion
OptimizedVersion
GenericVersion
OptimizedVersion
xc2vp7 11.6 % 3.6 % − −Xc2vp20 5.8 % 1.8 % − −xc2vp30 4.2 % 1.3 % − −xc4vlx40 − − 2.2 % 0.9 %
xc4vlx60 − − 1.5 % 0.6 %
xc4vlx100 − − 0.8 % 0.3 %
xc5vlx50 − − 1.1 % 0.8 %
xc5vlx85 − − 0.6 % 0.4 %
xc5vlx110 − − 0.5 % 0.3 %
Relocation Solutions Results (1/2)Relocation Solutions Results (1/2)
BiRF, BiRF Square, BAnMaT Lite
Permit to support relocation in a self partially and
dynamically 1D or 2D reconfigurable system
The occupation ratio is relatively small
Frequency more than acceptable
Reduction of internal memory requirements
Throughput:
BiRF: 6 MB/s
BiRF Square: 7.3 MB/s
BAnMaT Lite: 2.6 MB/s
13
Relocation Solutions Results (2/2)Relocation Solutions Results (2/2)
A total configuration file size is about 1 MB
Considering an architecture:
1/3 of the area as fixed part
2/3 as reconfigurable part with 6 slots
With such hypothesis
Size of a partial bitstream will be about 110 KB
Relocation time of about:
18 ms with BiRF
15 ms with BiRF Square
42 ms with BAnMaT Lite
14
15
OutlineOutline
Aims
Introduction
Basic Concepts
Rationale
Relocation solutions
Core Allocation manager
Concluding Remarks
16
Runtime Core Allocation ManagementRuntime Core Allocation Management
Choose where to place Cores to achieve:
Low Core Rejection Rate (CRR)
Fast application completion time
Small management overhead
Other policy driven goal
Choose how to maintain information on empty space
Keep all information (Expensive but more accurate)
Heuristically prune information (Cheaper)
17
Evaluation and Proposed ApproachEvaluation and Proposed Approach
Choice driven by:
Need for low complexity solution to reduce overhead at runtime
Desire to keep high flexibility, to best suit user needs
We propose heuristic (KNER-like) empty space manager:
Support both general and focused policy (in particular FF, BF, RA)
Suitable for dynamic schedule and blind schedule
Exploiting multiple RFUs per Core, to improve quality
18
The Online Placement AlgorithmThe Online Placement Algorithm
The whole processing of a Core is completed in linear time
19
Experiment 1: Routing AwareExperiment 1: Routing Aware
Comparison against literature solutions
dynamic schedule scenario, RA placement policy
Measuring CRR, routing costs and overhead
Benchmark of 100 randomly generated Cores:
Size (5% to 20% of FPGA), randomly interconnected
20
Experiment 2: Application Completion TimeExperiment 2: Application Completion Time
Benchmark applications composed of cores taken from opencores.org like JPEG, AES, 3DES …
Blind Schedule, measure the time instants needed to complete
the applications with different amounts of resources
Infinite resources is shown, to compare against lower bound
21
Experiment 3: Multiple ShapesExperiment 3: Multiple Shapes
Similar benchmark, but Cores have deadlines (for CRR)
Shapes defined with the heuristics described previously
Difference in runtime on average 30% more for 3 shapes and 40% more for 5 shapes w.r.t. 1 shape
CRR more than halved, often reduced to one third
22
Concluding RemarksConcluding Remarks
Original goals met:
Create efficient bitstream relocation solution suitable for target systems:
1D - 2D
HW – SW
Create a Core allocation manager with:
Low overhead
High efficiency (CRR, application completion time, routing costs …)
High versatility
23
QuestionsQuestions