DWARV 2.0: A CoSyy-based C-to-VHDL Hd C l iHardware ...DWARV 2.0: A CoSyy-based C-to-VHDL Hd C l...
Transcript of DWARV 2.0: A CoSyy-based C-to-VHDL Hd C l iHardware ...DWARV 2.0: A CoSyy-based C-to-VHDL Hd C l...
DWARV 2 0: A CoSy based C to VHDL DWARV 2.0: A CoSy-based C-to-VHDL DWARV 2.0: A CoSy based C to VHDL yH d C ilHardware CompilerHardware Compilerp
Razvan Nane Vlad-Mihai Sima Bryan Olivier Roel Meeuws Yana Yankova Koen BertelsRazvan Nane, Vlad-Mihai Sima, Bryan Olivier, Roel Meeuws, Yana Yankova, Koen Bertels
C t t DWARVContext DWARV
Engines Flow C fileC file gDesign Goals
Q tit tiDelft WorkBenchC fileC file
No limitation on theQuantitativeDelft WorkBench No limitation on the application domain* c Profiling and Model CF tVHDL filapplication domainA t l ti
.c Profiling and C t E ti ti
CFrontVHDL fileCDFG BuildingActual execution on Cost Estimation User csea real hardwareUser
Di ticse
prototype platformA hit tDirectives
CDFG Schedulingprototype platformRestructuringArchitectureemit ssa
CDFG SchedulingToolset Input
gand Optimization ManualDescription emit ssap
Pragma-annotated C:and Optimization
Designp
Pragma annotated C: identifies the functions
Design
IRVHDL Generationidentifies the functions b l d i VHDLMOLEN IR matchdump
to be translated into VHDLMOLEN C il HDL Generation IP Library IRp
Compiler yVHDL file
Automated: ischedVHDL file
Legend:Automated: psrequivsched
Legend:• Modifies IR:DWARV
lf hd • Modifies IR: • Engine Flow:
elf vhdfplibsetlatency• Engine Flow:
• Input/Output:Configuration Filefplibsetlatency
• Input/Output: Configuration FileC data sizes h configC-data sizesM d XREGGPP RP CCU
hwconfigMemory and XREG GPP RP CCU
CDFG Builder:latency and bandwidthMOLEN OrganizationCDFG Builder:
Cf t t d i iti li th IR f th dyMOLEN OrganizationDelft WorkBench:
Cfront creates and initializes the IR from the source code.f b i li i iToolset OutputDelft WorkBench: cse performs common sub-expression elimination.p
FSM-based VHDL DesignSemi-automatic tool platform for hardware/software co-design ssa performs static single assignment.FSM-based VHDL DesignMOLEN CCU Interface[5]
p gin the context of Custom Computing Machines (CCM)
p g gpsrequiv performs custom FPGA register allocationMOLEN-CCU Interface[5]in the context of Custom Computing Machines (CCM).
T h MOLEN l hi hi i i [1]psrequiv performs custom FPGA register allocation.
Targets the MOLEN polymorphic machine organization [1]. CDFG S h d lSupports the MOLEN programming paradigm. CDFG Scheduler:
f l b l d d i f i f l lib filSupports the MOLEN programming paradigm.MOLEN Polymorphic Processor: fplib loads FP and IP core information from an external library file.MOLEN Polymorphic Processor:
C l G l P P (GPP) i h R fi bl P (RP)hwconfig loads platform-dependent parameters, e.g. memory latency.
Couples General Purpose Processor (GPP) with Reconfigurable Processor (RP) f g p p p , g y y
setlatency adjusts the length of DFGraph’s edges with true latencieswith one or more Custom Computing Units (CCU) .
setlatency adjusts the length of DFGraph s edges with true latencies.sched schedules the graph and cycle annotates IR nodesp g ( )
HDL Generation:sched schedules the graph and cycle annotates IR nodes.
HDL Generation:M l i l t ti IP lib i t ti ti f t l iti l k lManual implementation or IP library instantiation for extremely critical kernels. VHDL Generation:Automated: for fast prototyping and fast performance estimation during design space emit prints vhdl code of the RULE annotated IR nodes.p yp g p g g pexploration as well as for other kernels
pexploration as well as for other kernels.
C-to-VHDL ExampleC-to-VHDL Example*.c Data flow representation of IR
#pragma to dfg ENTITY CCU ISp
an power 0RST
#pragma to_dfgint quan(short an, PORT(
D l M l i f XREG MEMRST
short power[]) {int i; RULE [ dd i ] i Pl ( 1 2 ) d
– Declare Molen interface portse g data addr read data start done
A WR E A WR E.CLK
int i;for(i=0; i<15 &&
RULE [add_int_s] o:mirPlus(rs1:reg, rs2:reg) -> rd:reg;CONDITION { IS SINT(o Type) };
– e.g. data_addr, read_data, start, done);
i56 0
for(i 0; i 15 && an>=power[i];
CONDITION { IS_SINT(o.Type) };EXPAND emit {
);END ENTITY CCU;
56
57
0
1START_OP
i++);return i;
{fprintf (OUTFILE,"\t%s <= std_logic_vector(signed(%s) + signed(%s));\n",
ARCHITECTURE ARCH example OF CCU ISsplit 58
1return i;}
REGNAME(rd), REGNAME(rs1), REGNAME(rs2));};
ARCHITECTURE ARCH_example OF CCU IS-- declare the FP components .END_OP
} };END
declare the FP components – e.g. input ports, output ports, name, op selectionEND
BEGINFP t i t ti ti & i l d l ti
splitbi
+FSM
.-- FP component instantiation & signal declarationsp
LD
i FSM
STATES: PROCESS (sl STATE, START OP)LD 115
l 0
( _ , _ )-- Sequential logic = FSM
E l h “00101”ll
l
rrr .
15.RULE [ld_32] o:mirContent (rs:reg) -> rd:reg;
CONDITION { /* TRUE l f i t l 32 bit */ }-- Example: when “00101” =>
sl state <= “00110”;>= < +
r 15CONDITION { …. /* TRUE only for int values on 32 bits */ … };WRITE REGISTER <DATA ADDR>;
-- sl_state <= 00110 ;END PROCESS;+
l r
WRITE REGISTER <DATA_ADDR>;EXPAND {
END PROCESS;
&&
l r
+< >=
{gcg_create_data_addr( … , rs, …); EXECUTION: PROCESS (CLK, RST)
Combinational logic data path&&loop0
+1
(-> fprintf(OUTFILE, "\tDATA_ADDR <= %s;\n", REGNAME(rs));)gcg create read data( rd );
-- Combinational logic = data path– Example: when “00101” =>loop0 gcg_create_read_data(…, rd );
(-> fprintf(OUTFILE, "\t %s <= READ DATA;\n", REGNAME(rd));)Example: when 00101 >
-- DATA_ADDR <= var32;
lcond &&
( fprintf(OUTFILE, \t %s READ_DATA;\n , REGNAME(rd));)};
_-- var33 <= var30 + var31;
END PROCESSIN/OUT register lcondEND END PROCESS;IN/OUT register
*.ngcEND ARCHITECTURE ARCH example;*.cgdlocal register
iret.ngc_ p ;
* hdControl flow dependency edge
*.vhdData flow dependency edge
E i t l R lt DWARV 2 0 N F tExperimental Results DWARV 2.0 New Features
SData Types Statements IP Library FieldsMethodology:Integer 64 bit div, mod IP name
Methodology:Integer 64 bit div, mod IP nameFloating Point case label switch Input port namesTarget Platform: Xilinx Virtex5 ML510 board. Floating Point case, label, switch Input port namesTarget Platform: Xilinx Virtex5 ML510 board.
LegUp [2] cycle information obtained by Multi-dim arrays function calls Output port namesLegUp [2] cycle information obtained by i h i l i i
yStruct while do-while Operation type & sizerunning the LegUp simulation scripts. Struct while, do while Operation type & sizeU i t b k L t & fDWARV cycle information obtained by Union return, break Latency & frequencyDWARV cycle information obtained by
running the generated CCU in Modelsimrunning the generated CCU in Modelsim.
R fTo integrate LegUp in our design, we built Referencesg g p g ,a simple wrapper around the generateda simple wrapper around the generated
il d l [1] The MOLEN Polymorphic Processor, S.Vassiliadis, S.Wong, verilog module. [ ] y p , , g,G.N.Gaydadjiev, K.Bertels, G.Kuzmanov, E.M.Panainte, IEEE Both DWARV generated CCU and the LegUp y j , , , ,Transactions on Computers 2004 pp1363-1375
g g pmodule were synthesized decreasingly from Transactions on Computers, 2004, pp1363 1375
[2] LegUp: high level synthesis for FPGA based processor/acceleratormodule were synthesized decreasingly from 350 MH t bt i M F [2] LegUp: high-level synthesis for FPGA-based processor/accelerator
systems A Canis J Choi M Aldham V Zhang A Kammoona J350 MHz to obtain Max. Freq.
systems. A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, J. A d S B d T C jk ki P di f th 19th
(the highest frequency for which the design Anderson, S. Brown and T. Czajkowski. Proceedings of the 19th AC /S G A l GA ’
( g q y gwas successfully routed). ACM/SIGDA international symposium on FPGA ’11: 33–36.was successfully routed). C l i f ti d M F bt i d [3] CoSy compiler platform. Associated Compiler Experts ACE. Cycle information and Max. Freq. obtained [ ] y p p p p
[Online]. Available: www.ace.nlDWARV 2 0 Speed-ups vs LegUp timesused to compute the Speed-ups. [ ]DWARV 2.0 Speed-ups vs. LegUp times.p p p
Conclusion AcknowledgementFuture Work gThi h i ti ll t d bUp to 4.41x kernel speedup vs. LegUp HW compiler [2]. This research is partially supported by:Study and identify hardware compiler optimisations.
CEUp to 4.41x kernel speedup vs. LegUp HW compiler [2].Hi hl t ibl d t th C S f k [3]
Artemisia iFEST project (grant 100203) Study and identify hardware compiler optimisations. H d b d h d li t l t CEHighly extensible due to the CoSy framework [3]. Artemisia SMECY project (grant 100230)Hardware reuse based on scheduling templates.
Support for a wide range of C-language constructs.p j (g )
FP7 Reflect project (grant 248976)Tool-chain integration and support for AOP.pp g g gSupport for FP operations and custom IP block calls
FP7 Reflect project (grant 248976)g ppSupport for FP operations and custom IP block calls.
Faculty of Electrical Engineering Computer Engineering C t t PFaculty of Electrical Engineering, Mathematics and Computer Science
Computer Engineering Mekelweg 4 (16th floor)
Contact Person:R NMathematics and Computer Science
Delft University of TechnologyMekelweg 4 (16th floor) 2628 CD Delft The Netherlands
Razvan Nane E il @t d lft lDelft University of Technology 2628 CD Delft, The Netherlands
http://ce et tudelft nlEmail: [email protected]
http://ce.et.tudelft.nl