Exploration of the Time Domain S. G. Djorgovski VOEvent Workshop, CACR, Caltech, April 2005.
An Architectural Space Exploration Tool for Domain ...
Transcript of An Architectural Space Exploration Tool for Domain ...
1
An Architectural Space Exploration Tool for Domain Specific Reconfigurable Computing
Gayatri Mehta Alex JonesElectrical Engineering Electrical & Computer EngineeringUniversity of North Texas University of Pittsburghemail: [email protected] email: [email protected]
April 19, 2010
2
Outline
Motivation A domain-specific fabric Design space exploration case studies Automation of design space exploration Results Conclusions
Motivation
Designing a complex SoC design requires the evaluation of many potential architectural options
Exploring the design space manually would be very time consuming and may not be feasible for complex designs
One approach is…
To develop design space exploration tools– Allow application developers to explore
architectural tradeoffs efficiently and reach solutions quickly
– Consider the applications of interest that will run on the device
5
Signal and image processing applications
Core signal processing benchmarks from MediaBench benchmark suite
Edge detection benchmarks from image processing domain
6
Mapping of a benchmark onto the fabric
temp1 = x * y;temp2 = z *temp1;if (temp2 < 255)
result1 = temp2<<16 ;else
result1 = temp1;result1 += (((temp1)>>16)-0xffff);
7
CharacteristicsCombinational structureASIC-like powerProgrammable
Domain specific fabricThe fabric is comprised of
Power-optimized Arithmetic and Logic UnitsConfigurable interconnect
Different architectural implementations of functional units
Inputs
Inputs
Inputs
Output
Output
Output
Hardware
ALU
Optimized ALU
9
Interconnect Options
4:1 cardinality interconnect
5:1 cardinality interconnect
10
Implementation of the fabric Implemented the fabric in parameterized VHDL
– same base VHDL description for the fabric model– by varying different parameters, generate different architectural instances
Global ParametersData width DW={8,16,32}
Fabric ParametersWidth of the fabric W
Height of the fabric H
Arithmetic and Logic Unit ParametersNumber of Operands O={3}
Number of Operations OP={8,10,16, 23}
Interconnect ParametersMultiplexer Cardinality C={2,4,8,16,32}
11
Design space exploration case studies
ALU granularity Multiplexer cardinality Dedicated pass gates Heterogeneous ALUs
Conclusion: 10 ops per ALU (32-bit) with 8:1 interconnect and 33% dedicated pass gates
12
Design Space Exploration
•Width/height of the fabric
•Number and Type of operators
•Data width
•Dedicated pass gates
•Interconnect cardinality
FabricMapper
FabricGenerator
SynthesisSimulation
Power/Performance/Area Analysis
Applications
Applications
Final Domain-Specific Fabric
Automation of generation of Domain-Specific Fabric
FIM
FIM
Physical Characteristics
Fabric Instance Model (FIM)<ftudefinename="alu0“ noop="10111" useic="false">
<op code="00001"> + </op><op code="00010"> - </op><op code="00011"> * </op><op code="10011"> == </op><op code="00111"> ^ </op><op code="01110"> > </op><op code="10000"> >= </op><op code="01111"> < </op><op code="10001"> <= </op><op code="10010"> != </op><op code="00100"> & </op><op code="00101"> | </op><op code="01001"> << </op><op code="01011"> >> </op><op code="00000"> pass </op><op code="10100" order="reverse"> pass </op><op code="11111"> mux </op><op code="01000"> ! </op>
</ftudefine>
<rowpattern repeat="forever"><row><ftupattern repeat="forever"><FTU type="alu0">
<operand number="0">
<range left ="-3" right ="4"/>
</operand>
<operand number="1">
<range left ="-3" right ="4"/>
</operand>
<operand number="2">
<range left ="-3" right ="4"/>
</operand></FTU></ftupattern></row>
</rowpattern>
Energy Consumption
Two factors that affect the energy consumption of the device– Increase in total path length of the
mapped application onto the device– Number of ALUs used as pass gates
14
15
Design Space ExplorationObservation: 11 ops per ALU with 8:1 interconnect & 33%
dedicated pass gates is the best candidate
0
1
2
3
4
5
6
7
8
9
10
33:1 No DPs-homo ALUs
32:1 No DPs-homo ALUs
17:1 No DPs-homo ALUs
16:1 No DPs-homo ALUs
9:1 No DPs-homo ALUs
8:1 No DPs-homo ALUs
5:1 No DPs-homo ALUs
8:1 25% DPs-homo ALUs
8:1 33% DPs-homo ALUs
8:1 50% DPs-homo ALUs
8:1 33% DPs-14ops
8:1 33% DPs-13ops
8:1 33% DPs-12ops
8:1 33% DPs-11ops
8:1 33% DPs-10ops
Ave
rag
e p
ath
le
ng
th i
ncr
ea
se
Architecture
16
Design Space Exploration
17
Comparison of manual and tool DSE results
0
50
100
150
200
250
300
350
400
450
En
erg
y (
pJ)
adpcmencoder
adpcmdecoder
idct row idct col gsm sobel laplace dwt average
Benchmark
10ops-8:1-33%DP (M) 11ops-8:1-33%DP (T)
18
Design Space ExplorationObservation: 11 ops per ALU with 8:1 interconnect & 50%
dedicated pass gates is the best candidate
0
1
2
3
4
5
6
7
8
9
10
33:1 No DPs-homo ALUs
32:1 No DPs-homo ALUs
17:1 No DPs-homo ALUs
16:1 No DPs-homo ALUs
9:1 No DPs-homo ALUs
8:1 No DPs-homo ALUs
5:1 No DPs-homo ALUs
8:1 25% DPs-homo ALUs
8:1 33% DPs-homo ALUs
8:1 50% DPs-homo ALUs
8:1 66% DPs-homo ALUs
8:1 50% DPs-
14ops
8:1 50% DPs-
13ops
8:1 50% DPs-
12ops
8:1 50% DPs-
11ops
8:1 50% DPs-
10ops
Ave
rag
e p
ath
le
ng
th i
ncr
ea
se
Architecture
19
Design Space Exploration
20
Comparison of manual and tool DSE results
0
50
100
150
200
250
300
350
400
450
En
erg
y (
pJ)
adpcmencoder
adpcmdecoder
idct row idct col gsm sobel laplace dwt average
Benchmark
10ops-8:1-33%DP (M) 11ops-8:1-50%DP (T)
21
Reconfigurable Fabric ComparisonObservation: Optimized fabric is 3X of ASIC energy
1
10
100
1000
10000
100000
1000000
En
erg
y (
pJ)
adpcmenc
adpcmdec
idct row idct col gsm sobel laplace average
Benchmark
Xscale (0.18um) Virtex-2P (0.13um) Fabric (initial)(0.16um)
Fabric (optimized)(0.16um) Fabric (optimized)(0.13um) ASIC (0.16um)
ASIC (0.13 um)
22
Thank You
Final thoughts and discussion