Download - DWARV 2.0: A CoSyy-based C-to-VHDL Hd C l iHardware ...DWARV 2.0: A CoSyy-based C-to-VHDL Hd C l iHardware Compiler p Razvan Nane VladRazvan Nane, Vlad-Mihai Sima Bryan Olivier Roel

DWARV 2 0: A CoSy based C to VHDL DWARV 2.0: A CoSy-based C-to-VHDL DWARV 2.0: A CoSy based C to VHDL yH d C ilHardware CompilerHardware Compilerp

Razvan Nane Vlad-Mihai Sima Bryan Olivier Roel Meeuws Yana Yankova Koen BertelsRazvan Nane, Vlad-Mihai Sima, Bryan Olivier, Roel Meeuws, Yana Yankova, Koen Bertels

C t t DWARVContext DWARV

Engines Flow C fileC file gDesign Goals

Q tit tiDelft WorkBenchC fileC file

No limitation on theQuantitativeDelft WorkBench No limitation on the application domain* c Profiling and Model CF tVHDL filapplication domainA t l ti

.c Profiling and C t E ti ti

CFrontVHDL fileCDFG BuildingActual execution on Cost Estimation User csea real hardwareUser

Di ticse

prototype platformA hit tDirectives

CDFG Schedulingprototype platformRestructuringArchitectureemit ssa

CDFG SchedulingToolset Input

gand Optimization ManualDescription emit ssap

Pragma-annotated C:and Optimization

Designp

Pragma annotated C: identifies the functions

Design

IRVHDL Generationidentifies the functions b l d i VHDLMOLEN IR matchdump

to be translated into VHDLMOLEN C il HDL Generation IP Library IRp

Compiler yVHDL file

Automated: ischedVHDL file

Legend:Automated: psrequivsched

Legend:• Modifies IR:DWARV

lf hd • Modifies IR: • Engine Flow:

elf vhdfplibsetlatency• Engine Flow:

• Input/Output:Configuration Filefplibsetlatency

• Input/Output: Configuration FileC data sizes h configC-data sizesM d XREGGPP RP CCU

hwconfigMemory and XREG GPP RP CCU

CDFG Builder:latency and bandwidthMOLEN OrganizationCDFG Builder:

Cf t t d i iti li th IR f th dyMOLEN OrganizationDelft WorkBench:

Cfront creates and initializes the IR from the source code.f b i li i iToolset OutputDelft WorkBench: cse performs common sub-expression elimination.p

FSM-based VHDL DesignSemi-automatic tool platform for hardware/software co-design ssa performs static single assignment.FSM-based VHDL DesignMOLEN CCU Interface[5]

p gin the context of Custom Computing Machines (CCM)

p g gpsrequiv performs custom FPGA register allocationMOLEN-CCU Interface[5]in the context of Custom Computing Machines (CCM).

T h MOLEN l hi hi i i [1]psrequiv performs custom FPGA register allocation.

Targets the MOLEN polymorphic machine organization [1]. CDFG S h d lSupports the MOLEN programming paradigm. CDFG Scheduler:

f l b l d d i f i f l lib filSupports the MOLEN programming paradigm.MOLEN Polymorphic Processor: fplib loads FP and IP core information from an external library file.MOLEN Polymorphic Processor:

C l G l P P (GPP) i h R fi bl P (RP)hwconfig loads platform-dependent parameters, e.g. memory latency.

Couples General Purpose Processor (GPP) with Reconfigurable Processor (RP) f g p p p , g y y

setlatency adjusts the length of DFGraph’s edges with true latencieswith one or more Custom Computing Units (CCU) .

setlatency adjusts the length of DFGraph s edges with true latencies.sched schedules the graph and cycle annotates IR nodesp g ( )

HDL Generation:sched schedules the graph and cycle annotates IR nodes.

HDL Generation:M l i l t ti IP lib i t ti ti f t l iti l k lManual implementation or IP library instantiation for extremely critical kernels. VHDL Generation:Automated: for fast prototyping and fast performance estimation during design space emit prints vhdl code of the RULE annotated IR nodes.p yp g p g g pexploration as well as for other kernels

pexploration as well as for other kernels.

C-to-VHDL ExampleC-to-VHDL Example*.c Data flow representation of IR

#pragma to dfg ENTITY CCU ISp

an power 0RST

#pragma to_dfgint quan(short an, PORT(

D l M l i f XREG MEMRST

short power[]) {int i; RULE [ dd i ] i Pl ( 1 2 ) d

– Declare Molen interface portse g data addr read data start done

A WR E A WR E.CLK

int i;for(i=0; i<15 &&

RULE [add_int_s] o:mirPlus(rs1:reg, rs2:reg) -> rd:reg;CONDITION { IS SINT(o Type) };

– e.g. data_addr, read_data, start, done);

i56 0

for(i 0; i 15 && an>=power[i];

CONDITION { IS_SINT(o.Type) };EXPAND emit {

);END ENTITY CCU;

56

57

0

1START_OP

i++);return i;

{fprintf (OUTFILE,"\t%s <= std_logic_vector(signed(%s) + signed(%s));\n",

ARCHITECTURE ARCH example OF CCU ISsplit 58

1return i;}

REGNAME(rd), REGNAME(rs1), REGNAME(rs2));};

ARCHITECTURE ARCH_example OF CCU IS-- declare the FP components .END_OP

} };END

declare the FP components – e.g. input ports, output ports, name, op selectionEND

BEGINFP t i t ti ti & i l d l ti

splitbi

+FSM

.-- FP component instantiation & signal declarationsp

LD

i FSM

STATES: PROCESS (sl STATE, START OP)LD 115

l 0

( _ , _ )-- Sequential logic = FSM

E l h “00101”ll

l

rrr .

15.RULE [ld_32] o:mirContent (rs:reg) -> rd:reg;

CONDITION { /* TRUE l f i t l 32 bit */ }-- Example: when “00101” =>

sl state <= “00110”;>= < +

r 15CONDITION { …. /* TRUE only for int values on 32 bits */ … };WRITE REGISTER <DATA ADDR>;

-- sl_state <= 00110 ;END PROCESS;+

l r

WRITE REGISTER <DATA_ADDR>;EXPAND {

END PROCESS;

&&

l r

+< >=

{gcg_create_data_addr( … , rs, …); EXECUTION: PROCESS (CLK, RST)

Combinational logic data path&&loop0

+1

(-> fprintf(OUTFILE, "\tDATA_ADDR <= %s;\n", REGNAME(rs));)gcg create read data( rd );

-- Combinational logic = data path– Example: when “00101” =>loop0 gcg_create_read_data(…, rd );

(-> fprintf(OUTFILE, "\t %s <= READ DATA;\n", REGNAME(rd));)Example: when 00101 >

-- DATA_ADDR <= var32;

lcond &&

( fprintf(OUTFILE, \t %s READ_DATA;\n , REGNAME(rd));)};

_-- var33 <= var30 + var31;

END PROCESSIN/OUT register lcondEND END PROCESS;IN/OUT register

*.ngcEND ARCHITECTURE ARCH example;*.cgdlocal register

iret.ngc_ p ;

* hdControl flow dependency edge

*.vhdData flow dependency edge

E i t l R lt DWARV 2 0 N F tExperimental Results DWARV 2.0 New Features

SData Types Statements IP Library FieldsMethodology:Integer 64 bit div, mod IP name

Methodology:Integer 64 bit div, mod IP nameFloating Point case label switch Input port namesTarget Platform: Xilinx Virtex5 ML510 board. Floating Point case, label, switch Input port namesTarget Platform: Xilinx Virtex5 ML510 board.

LegUp [2] cycle information obtained by Multi-dim arrays function calls Output port namesLegUp [2] cycle information obtained by i h i l i i

yStruct while do-while Operation type & sizerunning the LegUp simulation scripts. Struct while, do while Operation type & sizeU i t b k L t & fDWARV cycle information obtained by Union return, break Latency & frequencyDWARV cycle information obtained by

running the generated CCU in Modelsimrunning the generated CCU in Modelsim.

R fTo integrate LegUp in our design, we built Referencesg g p g ,a simple wrapper around the generateda simple wrapper around the generated

il d l [1] The MOLEN Polymorphic Processor, S.Vassiliadis, S.Wong, verilog module. [ ] y p , , g,G.N.Gaydadjiev, K.Bertels, G.Kuzmanov, E.M.Panainte, IEEE Both DWARV generated CCU and the LegUp y j , , , ,Transactions on Computers 2004 pp1363-1375

g g pmodule were synthesized decreasingly from Transactions on Computers, 2004, pp1363 1375

[2] LegUp: high level synthesis for FPGA based processor/acceleratormodule were synthesized decreasingly from 350 MH t bt i M F [2] LegUp: high-level synthesis for FPGA-based processor/accelerator

systems A Canis J Choi M Aldham V Zhang A Kammoona J350 MHz to obtain Max. Freq.

systems. A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, J. A d S B d T C jk ki P di f th 19th

(the highest frequency for which the design Anderson, S. Brown and T. Czajkowski. Proceedings of the 19th AC /S G A l GA ’

( g q y gwas successfully routed). ACM/SIGDA international symposium on FPGA ’11: 33–36.was successfully routed). C l i f ti d M F bt i d [3] CoSy compiler platform. Associated Compiler Experts ACE. Cycle information and Max. Freq. obtained [ ] y p p p p

[Online]. Available: www.ace.nlDWARV 2 0 Speed-ups vs LegUp timesused to compute the Speed-ups. [ ]DWARV 2.0 Speed-ups vs. LegUp times.p p p

Conclusion AcknowledgementFuture Work gThi h i ti ll t d bUp to 4.41x kernel speedup vs. LegUp HW compiler [2]. This research is partially supported by:Study and identify hardware compiler optimisations.

CEUp to 4.41x kernel speedup vs. LegUp HW compiler [2].Hi hl t ibl d t th C S f k [3]

Artemisia iFEST project (grant 100203) Study and identify hardware compiler optimisations. H d b d h d li t l t CEHighly extensible due to the CoSy framework [3]. Artemisia SMECY project (grant 100230)Hardware reuse based on scheduling templates.

Support for a wide range of C-language constructs.p j (g )

FP7 Reflect project (grant 248976)Tool-chain integration and support for AOP.pp g g gSupport for FP operations and custom IP block calls

FP7 Reflect project (grant 248976)g ppSupport for FP operations and custom IP block calls.

Faculty of Electrical Engineering Computer Engineering C t t PFaculty of Electrical Engineering, Mathematics and Computer Science

Computer Engineering Mekelweg 4 (16th floor)

Contact Person:R NMathematics and Computer Science

Delft University of TechnologyMekelweg 4 (16th floor) 2628 CD Delft The Netherlands

Razvan Nane E il @t d lft lDelft University of Technology 2628 CD Delft, The Netherlands

http://ce et tudelft nlEmail: [email protected]

http://ce.et.tudelft.nl