Post on 30-Dec-2015
description
4/23/20131
VLSI Programming: Lecture 2
Course 2IN35
Course: Kees van Berkel c.h.v.berkel@tue.nl
Rudolf Mak r.h.mak@tue.nl
Lab: Kees van Berkel
Rudolf Mak
Alok Lele, Hrishikesh Salunkhe
www: http://www.win.tue.nl/~cberkel/2IN35/
Lecture 2 pipelining, retiming, J-slow, parallel
4/23/20132
VLSI Programming: time table 2013
date in hour 5 hour 6 out hour 7 hour 8 out
April 23
introduction, DSP representations,
bounds
pipelining, retiming, transposition,
J-slow, unfolding T1 + T2
May 7 T1 + T2
unfolding (cntd), look-ahead,
strength reduction T3 + T4
(have FPGA tools installed)
FPGA + Verilog intros L1
May 14 T3 + T4 systolic computation T5
FPGA lab/L1: audio filter
simulation
May 21 T5 folding FPGA lab/L2: audio filter on XUP board L2
May 28 DSP processors FPGA lab/L3: sequential FIR, strength-reduced FIR L3
May 30 FPGA lab/L3: sequential FIR, strength-reduced FIR (cntd)
June 4 L3
FPGA lab/L4: audio sample rate convertor
deadline report L3 L4
June 6 FPGA lab/L4: audio sample rate convertor (cntd)
June 11 L4
FPGA lab/L5: audio sample rate convertor "1024x"
deadline report L4 L5
June 13 FPGA lab/L5: audio sample rate convertor "1024x" (cntd)
June 18 L5 deadline report L5
4/23/20133
FPGA IC on a Xilinx XUP Board (Atlys)
XilinxSpartan 6
FPGA
4/23/2013
4
Atlys board, based on Xilinx Spartan 6
XilinxSpartan 6
FPGA
4/23/20135
Preparation for Lab work
• Prepare your notebook for lab work
• See preparation link on 2IN35 web-site
• Install the required tools and test them2 weeks from now (May 7): Hrishikesh and Alok will be around for Q&A
• First Lab exercises: Tue May 7
• Find a partner (team size is maxmaxmaxmax 2)
4/23/20136
Note on course literature
Lectures VLSI programming are loosely based on:
• Keshab K. Parhi. VLSI Digital Signal Processing Systems, Design and Implementation. Wiley Inter-Science 1999.
• This book is recommended, but not mandatory
Accompanying slides can be found on:
• http://www.ece.umn.edu/users/parhi/slides.html
• http://www.win.tue.nl/~cberkel/2IN35/
Mandatory readingMandatory readingMandatory readingMandatory reading:
• Edward A. Lee and David G. Messerschmitt. Synchronous Data Flow. Proc. of the IEEE, Vol. 75, No. 9, Sept 1987, pp 1235-1245.
• Keshab K. Parhi. High-Level Algorithm and Architecture Transformations for DSP Synthesis. Journal of VLSI Signal Processing, 9, 121-143 (1995), Kluwer Academic Publishers.
4/23/20137
Outline Lecture 2
Transformations of DFGs and SFGs:
• (commuting of an SFG) lecture 1
• pipelining of a DFG Parhi3.pdf
• transposition of an SFG Parhi3.pdf
• retiming of a DFG Parhi4.pdf
• K-slow transformation of a DFG Parhi4.pdf
• unfolding of a DFG Parhi3.pdf Parhi5.pdf
• assignments
4/23/20138
4/23/20139
• car assembly line; Henry Ford [1908]• 1914: Ts = 3min; latency = 93 min
4/23/201310
4/23/201311
4/23/201312
4/23/201313
4/23/201314
every
by ≥ 0
4/23/201315
4/23/201316
4/23/201317
4/23/201318
4/23/201319
4/23/201320
4/23/201321
Retiming and pipelining
• Review slides Parhi3.pdf
• Parhi follows a graph-theoretic approach to compute optimal pipelining/retiming
• For our purposes “moving delays around” is sufficient:
• Node retiming (Parhi4.pdf, slide 2)
• Introduction of a delay at all inputs (or all outputs)
4/23/201322
Parhi ’95, Fig 3a
2
2
2
1
1
1 1Critical path is 10 time units long
(transposed version: 8 time units)
4/23/201323
Parhi ’95, Fig 3a / retiming step 1
Critical path is 10 time units long
4/23/201324
Parhi ’95, Fig 3a / retiming step 2
Critical path is 10 time units long
4/23/201325
Parhi ’95, Fig 3a / retiming step 3
Critical path is 7 time units long
4/23/201326
Parhi ’95, Fig 3a / retiming step 4
Critical path is 7 time units long
4/23/201327
Parhi ’95, Fig 3a / retiming step 5
Critical path is 4 time units long
4/23/201328
Parhi ’95, Fig 3a / retiming step 6
Critical path is 4 time units long
4/23/201329
Parhi ’95, Fig 3a / retiming step 7
Critical path is 3 time units long
3 3
4/23/201330
Parhi ’95, Fig 3a / retiming step 8
Critical path is 3 time units long
4 3
4/23/201331
Parhi ’95, Fig 3a / retiming step 9
Critical path is 2 time units long
4 3
4/23/201332
Parhi ’95, Fig 3a after retiming = Fig 3b
Critical path is 2 time units long
4/23/201333
4/23/201334
4/23/201335
4/23/201336
4/23/201337
4/23/201338
4/23/201339
4/23/201340
x(2(k-1))
x(10(k-1))
4/23/201341
Unfolding, L=2
•Parhi’s paper, Fig 1/2, paper p123/124
•y(n) = ax(n) + bx(n-1) + cx(n-2)
•y(2k) = ax(2k) + bx(2k-1) + cx(2k-2)
•y(2k+1) = ax(2k+1) + bx(2k) + cx(2k-1)
•Rewrite all indices in equations to the form
•(L(k - i) + j), with 0 ≤ j < L
•y(2k) = ax(2k) + bx(2(k-1)+1) + cx(2(k-1))
•y(2k+1) = ax(2k+1) + bx(2k) + cx(2(k-1)+1) = Fig 2
4/23/201342
Unfolding, L=3
•Same FIR
•y(3k) = ax(3k ) + bx(3k-1) + cx(3k-2)
•y(3k+1) = ax(3k+1) + bx(3k ) + cx(3k-1)
•y(3k+2) = ax(3k+2) + bx(3k+1) + cx(3k )
•Rewrite all indices in equations to the form
•(L(k - i) + j), with 0 ≤ j < L
•y(3k) = ax(3k ) + bx(3(k-1)+2)+ cx(3(k-1)+1)
•y(3k+1) = ax(3k+1) + bx(3k ) + cx(3(k-1)+2)
•y(3k+2) = ax(3k+2) + bx(3k+1) + cx(3k )
4/23/201343
4/23/201344
4/23/201345
4/23/201346
4/23/201347
4/23/201348
4/23/201349
Parhi 5, slide 2
•Original program: y(n) = a x(n) + b y(n-2)
•2-unfolded version y(2k) = a x(2k) + b y(2k-2) y(2k+1) = a x(2k+1) + b y(2k-1)
••Rewrite all indices in equations to the form
•(L(k - i) + j), with 0 ≤ j < L
•2-unfolded version y(2k) = a x(2k) + b y(2(k-1)) y(2k+1) = a x(2k+1) + b y(2(k-1)+1)
4/23/201350
Parhi 5, slide 3 (Fig 5.3, pp 123)
•Original program: v(n) = u(n-37)
•4-unfolded version v(4k) = u(4k-37)
• v(4k+1) = u(4k-36)
• v(4k+2) = u(4k-35)
• v(4k+3) = u(4k-34)
•4-unfolded, v(4k) = u(4(k-10) +3)
• v(4k+1) = u(4(k-9))
• v(4k+2) = u(4(k-9)+1)
• v(4k+3) = u(4(k-9)+2)
4/23/201351
4/23/201352
Parhi5, slide 4 (Fig 5.4, pp 123)
•v(n) = u(n-1) + t(n-6) + v(n-12)
•v(3k) = u(3k-1) + t(3k-6) + v(3k-12)
•v(3k+1) = u(3k) + t(3k-5) + v(3k-11)
•v(3k+2) = u(3k+1) + t(3k-4) + v(3k-10)
•v(3k) = u(3(k-1)+2) + t(3(k-2)) + v(3(k-4))
•v(3k+1) = u(3k) + t(3(k-2)+1) + v(3(k-4)+1)
•v(3k+2) = u(3k+1) + t(3(k-2)+2) + v(3(k-4)+2)
•= Fig 5.4b
4/23/201353
4/23/201354
Parhi5, slide 6 (Fig 5.6, pp 129)
•u(n) = p(n) + (s*u(n-3) + t*u(n-2))
•u(2k) = p(2k) + (s*u(2k-3) + t*u(2k-2))
•u(2k+1) = p(2k+1) + (s*u(2k-2) + t*u(2k-1))
•u(2k) = p(2k) + (s*u(2(k-2)+1) + t*u(2(k-1))
•u(2k+1) = p(2k+1) + (s*u(2(k-1)) + t*u(2(k-1)+1)
4/23/201355
4/23/201356
4/23/201357
FIR assignment
• Consider FIR: y(n) = a*x(n) + b*x(n-1) + c*x(n-3)
• Assume add and multiply times: 2 and 5 nsec resp.
1. Draw DFG of FIR, calculate throughput.
2. Pipeline and retime FIR for maximal throughput.
3. Unfold FIR J=2; draw the unfolded DFG. Throughput?
4. pipeline and retime unfolded FIR; draw DFG. Throughput?
5. Same for J=3 (draw DFG), and J=16 (no need to draw DFGs). Throughput?
• Return deadline: Tuesday May 7, 13:45
4/23/201358
IIR assignment
• Consider IIR: y(n) = x(n) + a*y(n-2)
• Assume add and multiply time: 2 and 5 nsec resp.
1. Draw DFG of IIR, calculate throughput.
2. Pipeline and retime IIR for maximal throughput.
3. Unfold IIR J=2; draw the unfolded DFG. Throughput?
4. pipeline and retime unfolded IIR; draw DFG. Throughput?
5. Same for J=3 (draw DFG), and J=16 (no need to draw DFGs). Throughput?
• Return deadline: Tuesday May 7, 13:45
4/23/201359
VLSI Programming: Feb 28
• Parhi,
• More unfolding, parallelism
• Strength reduction
THANK YOU