Highlevel synthesis

17
ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC

description

highlevel synthesis of vlsi

Transcript of Highlevel synthesis

  • ECE 565High-Level SynthesisAn IntroductionShantanu DuttECE Dept., UIC

  • HLS Flow Code/Algorithm Architecture (interconnected functional units (FUs), memory units (MUs) via muxes, demuxes, tristate buffers, buses, dedicated interconnects)Classically, these 3 stages were performed sequentially but currently performed together (which leads to better optimization)

  • HLS Flow (contd)

  • HLS Flow (contd)Allocation: Simple counting of FUs after theabove 2 stages(Binding)

  • Simple HLS Examples+

  • Simple HLS Examples (contd)2) Mapping to h/w w/ constraints: use only 1 (X) and 1 (+) w/ X delay of 2 ccs and + delay of 1 ccNote: A register is loaded at the +ve/-ve edge (in a +ve/-ve edge triggered system) of the cc after the one in which its load signal is asseted.lda=1, ldb=1,ldc=1, ldd=1,mux1=1, mux2=1demux=1,ldz=1 mux1=0,mux2=0demux=0,ldy=1 ldx=1 [z x+y](c3)[y c+d](c2)[x a x b](c1)cc 3icc 3(i+2)ResetController FSM:Note: Unspecified control signals have either an inactive value, or if such a concept doesnt exists for the cs, then the dont-care value(a) Scheduling(b) Arch. Synthesis(c) Controller FSMSynthesisO0O1

  • Simple HLS Examples (contd)2) Mapping to h/w w/ constraints: use only 1 (X) and 1 (+) (contd)c1(1)c1(2)c2(1)c3(1)c2(2)c3(2)X+ii) Overlapped pipelined schedulingcc 3(i+1)lda=1, ldb=1,mux1=0, mux2=0demux=0,ldy=1, ldx=1 ldc=1, ldd=1,mux1=1,mux2=1,demux=1,ldz=1 [y c+d, x a x b]((c1, c2)[z x+y,](c3)cc 3iResetController FSM:ccs For 4 iterations, the overlapped schedule takes 9 ccs versus 12 ccs by the non-overlapped sched. Overlap. sched: Time for n iterations = 2n+1 Throughput = n/(2n+1) ~ 0.5 outputs/cc Nonoverlap. sched: Time for n iterations = 3n Throughput = n/3n ~ 0.33 outputs/cc ~ 34% throughput improvement using an overlapped schedule(a) Scheduling(b) Arch. Synthesis(c) Controller FSMSynthesis

  • Simple HLS Examples (contd) Conditional code:If (a > b) then c a-b;Else c b-a; Possible DFGs corresponding to the above conditional code:

  • Simple HLS Examples (contd)c1c2a(a) Scheduling (using only 1 adder/sub)(b) Arch. Synthesis

  • Delay Nodes in DFGsA delay node is generally implemented as a register; a delay node thus becomes a state variable.

  • Delay Nodes in DFGs (contd)registerTransformation in the DFGMapping to the architecture

  • Detailed HLS Example

  • Detailed HLS Example (contd)Note: Not clear how register allocation has been done.It is sub-optimal (4 non-primary i/p regs. needed)(a) Scheduling w/ one X (2 ccs) & one + (1 cc); goal: min. latencyDifferent paths (i/p o/p) in the DFG(b) Reg. alloc. for o/p of operations(c) Arch. synthesisFor WAR constraintScheduling heuristic: Among available opers schedule those on available FUs whose delay to o/p is the highest, breaking ties in favor of those opers u whose sibling o/ps (o/ps to the same children) that are avail. or will be available at us earliest finish will have the largest lifetime at that point.

  • Detailed HLS Example (contd)

  • Detailed HLS ExampleRegister Allocation

  • Detailed HLS ExampleRegister Allocation (contd) In the conflict graph (one per FU), there is an edge between 2 var. nodes if their lifetimes overlap (indicating that different registers need to be allocated to them) Graph coloringusing min. # of colors to color node s.t. connected node pairs have different colorsin general is NP-hard The above type of conflict graph is called an interval graph (derived from a 1-dimensional interval of the lifetimes) Min. graph coloring can be solved optimally in linear time for interval graphs (using the left-edge algorithm that we will see later for channel routing)Scheduling heuristic: Among available opers schedule those on avail. FUs whose delay to o/p is the highest, breaking ties in favor of those opers u whose sibling o/ps (o/ps to the same children) that are avail. or will be avail. at us earliest finish will have the largest lifetime at that point.

  • Detailed HLS ExampleRegister Allocation (contd)3 non-primary i/pregs. neededScheduling heuristic: Among available opers schedule those on available FUs whose delay to o/p is the highest, breaking arbitrarily: Bs lifetime oncreases, but Ds (dep. of B) decreases similarlyheuristic should be based on more global information