pipleing,parallel,retimimg

60
4/23/2013 1 VLSI Programming: Lecture 2 Course 2IN35 Course: Kees van Berkel [email protected] Rudolf Mak [email protected] Lab: Kees van Berkel Rudolf Mak Alok Lele, Hrishikesh Salunkhe www: http://www.win.tue.nl/~cberkel/2IN35/ Lecture 2 pipelining, retiming, J-slow, parallel

description

signal

Transcript of pipleing,parallel,retimimg

Page 1: pipleing,parallel,retimimg

4/23/20131

VLSI Programming: Lecture 2

Course 2IN35

Course: Kees van Berkel [email protected]

Rudolf Mak [email protected]

Lab: Kees van Berkel

Rudolf Mak

Alok Lele, Hrishikesh Salunkhe

www: http://www.win.tue.nl/~cberkel/2IN35/

Lecture 2 pipelining, retiming, J-slow, parallel

Page 2: pipleing,parallel,retimimg

4/23/20132

VLSI Programming: time table 2013

date in hour 5 hour 6 out hour 7 hour 8 out

April 23

introduction, DSP representations,

bounds

pipelining, retiming, transposition,

J-slow, unfolding T1 + T2

May 7 T1 + T2

unfolding (cntd), look-ahead,

strength reduction T3 + T4

(have FPGA tools installed)

FPGA + Verilog intros L1

May 14 T3 + T4 systolic computation T5

FPGA lab/L1: audio filter

simulation

May 21 T5 folding FPGA lab/L2: audio filter on XUP board L2

May 28 DSP processors FPGA lab/L3: sequential FIR, strength-reduced FIR L3

May 30 FPGA lab/L3: sequential FIR, strength-reduced FIR (cntd)

June 4 L3

FPGA lab/L4: audio sample rate convertor

deadline report L3 L4

June 6 FPGA lab/L4: audio sample rate convertor (cntd)

June 11 L4

FPGA lab/L5: audio sample rate convertor "1024x"

deadline report L4 L5

June 13 FPGA lab/L5: audio sample rate convertor "1024x" (cntd)

June 18 L5 deadline report L5

Page 3: pipleing,parallel,retimimg

4/23/20133

FPGA IC on a Xilinx XUP Board (Atlys)

XilinxSpartan 6

FPGA

Page 4: pipleing,parallel,retimimg

4/23/2013

4

Atlys board, based on Xilinx Spartan 6

XilinxSpartan 6

FPGA

Page 5: pipleing,parallel,retimimg

4/23/20135

Preparation for Lab work

• Prepare your notebook for lab work

• See preparation link on 2IN35 web-site

• Install the required tools and test them2 weeks from now (May 7): Hrishikesh and Alok will be around for Q&A

• First Lab exercises: Tue May 7

• Find a partner (team size is maxmaxmaxmax 2)

Page 6: pipleing,parallel,retimimg

4/23/20136

Note on course literature

Lectures VLSI programming are loosely based on:

• Keshab K. Parhi. VLSI Digital Signal Processing Systems, Design and Implementation. Wiley Inter-Science 1999.

• This book is recommended, but not mandatory

Accompanying slides can be found on:

• http://www.ece.umn.edu/users/parhi/slides.html

• http://www.win.tue.nl/~cberkel/2IN35/

Mandatory readingMandatory readingMandatory readingMandatory reading:

• Edward A. Lee and David G. Messerschmitt. Synchronous Data Flow. Proc. of the IEEE, Vol. 75, No. 9, Sept 1987, pp 1235-1245.

• Keshab K. Parhi. High-Level Algorithm and Architecture Transformations for DSP Synthesis. Journal of VLSI Signal Processing, 9, 121-143 (1995), Kluwer Academic Publishers.

Page 7: pipleing,parallel,retimimg

4/23/20137

Outline Lecture 2

Transformations of DFGs and SFGs:

• (commuting of an SFG) lecture 1

• pipelining of a DFG Parhi3.pdf

• transposition of an SFG Parhi3.pdf

• retiming of a DFG Parhi4.pdf

• K-slow transformation of a DFG Parhi4.pdf

• unfolding of a DFG Parhi3.pdf Parhi5.pdf

• assignments

Page 8: pipleing,parallel,retimimg

4/23/20138

Page 9: pipleing,parallel,retimimg

4/23/20139

• car assembly line; Henry Ford [1908]• 1914: Ts = 3min; latency = 93 min

Page 10: pipleing,parallel,retimimg

4/23/201310

Page 11: pipleing,parallel,retimimg

4/23/201311

Page 12: pipleing,parallel,retimimg

4/23/201312

Page 13: pipleing,parallel,retimimg

4/23/201313

Page 14: pipleing,parallel,retimimg

4/23/201314

every

by ≥ 0

Page 15: pipleing,parallel,retimimg

4/23/201315

Page 16: pipleing,parallel,retimimg

4/23/201316

Page 17: pipleing,parallel,retimimg

4/23/201317

Page 18: pipleing,parallel,retimimg

4/23/201318

Page 19: pipleing,parallel,retimimg

4/23/201319

Page 20: pipleing,parallel,retimimg

4/23/201320

Page 21: pipleing,parallel,retimimg

4/23/201321

Retiming and pipelining

• Review slides Parhi3.pdf

• Parhi follows a graph-theoretic approach to compute optimal pipelining/retiming

• For our purposes “moving delays around” is sufficient:

• Node retiming (Parhi4.pdf, slide 2)

• Introduction of a delay at all inputs (or all outputs)

Page 22: pipleing,parallel,retimimg

4/23/201322

Parhi ’95, Fig 3a

2

2

2

1

1

1 1Critical path is 10 time units long

(transposed version: 8 time units)

Page 23: pipleing,parallel,retimimg

4/23/201323

Parhi ’95, Fig 3a / retiming step 1

Critical path is 10 time units long

Page 24: pipleing,parallel,retimimg

4/23/201324

Parhi ’95, Fig 3a / retiming step 2

Critical path is 10 time units long

Page 25: pipleing,parallel,retimimg

4/23/201325

Parhi ’95, Fig 3a / retiming step 3

Critical path is 7 time units long

Page 26: pipleing,parallel,retimimg

4/23/201326

Parhi ’95, Fig 3a / retiming step 4

Critical path is 7 time units long

Page 27: pipleing,parallel,retimimg

4/23/201327

Parhi ’95, Fig 3a / retiming step 5

Critical path is 4 time units long

Page 28: pipleing,parallel,retimimg

4/23/201328

Parhi ’95, Fig 3a / retiming step 6

Critical path is 4 time units long

Page 29: pipleing,parallel,retimimg

4/23/201329

Parhi ’95, Fig 3a / retiming step 7

Critical path is 3 time units long

3 3

Page 30: pipleing,parallel,retimimg

4/23/201330

Parhi ’95, Fig 3a / retiming step 8

Critical path is 3 time units long

4 3

Page 31: pipleing,parallel,retimimg

4/23/201331

Parhi ’95, Fig 3a / retiming step 9

Critical path is 2 time units long

4 3

Page 32: pipleing,parallel,retimimg

4/23/201332

Parhi ’95, Fig 3a after retiming = Fig 3b

Critical path is 2 time units long

Page 33: pipleing,parallel,retimimg

4/23/201333

Page 34: pipleing,parallel,retimimg

4/23/201334

Page 35: pipleing,parallel,retimimg

4/23/201335

Page 36: pipleing,parallel,retimimg

4/23/201336

Page 37: pipleing,parallel,retimimg

4/23/201337

Page 38: pipleing,parallel,retimimg

4/23/201338

Page 39: pipleing,parallel,retimimg

4/23/201339

Page 40: pipleing,parallel,retimimg

4/23/201340

x(2(k-1))

x(10(k-1))

Page 41: pipleing,parallel,retimimg

4/23/201341

Unfolding, L=2

•Parhi’s paper, Fig 1/2, paper p123/124

•y(n) = ax(n) + bx(n-1) + cx(n-2)

•y(2k) = ax(2k) + bx(2k-1) + cx(2k-2)

•y(2k+1) = ax(2k+1) + bx(2k) + cx(2k-1)

•Rewrite all indices in equations to the form

•(L(k - i) + j), with 0 ≤ j < L

•y(2k) = ax(2k) + bx(2(k-1)+1) + cx(2(k-1))

•y(2k+1) = ax(2k+1) + bx(2k) + cx(2(k-1)+1) = Fig 2

Page 42: pipleing,parallel,retimimg

4/23/201342

Unfolding, L=3

•Same FIR

•y(3k) = ax(3k ) + bx(3k-1) + cx(3k-2)

•y(3k+1) = ax(3k+1) + bx(3k ) + cx(3k-1)

•y(3k+2) = ax(3k+2) + bx(3k+1) + cx(3k )

•Rewrite all indices in equations to the form

•(L(k - i) + j), with 0 ≤ j < L

•y(3k) = ax(3k ) + bx(3(k-1)+2)+ cx(3(k-1)+1)

•y(3k+1) = ax(3k+1) + bx(3k ) + cx(3(k-1)+2)

•y(3k+2) = ax(3k+2) + bx(3k+1) + cx(3k )

Page 43: pipleing,parallel,retimimg

4/23/201343

Page 44: pipleing,parallel,retimimg

4/23/201344

Page 45: pipleing,parallel,retimimg

4/23/201345

Page 46: pipleing,parallel,retimimg

4/23/201346

Page 47: pipleing,parallel,retimimg

4/23/201347

Page 48: pipleing,parallel,retimimg

4/23/201348

Page 49: pipleing,parallel,retimimg

4/23/201349

Parhi 5, slide 2

•Original program: y(n) = a x(n) + b y(n-2)

•2-unfolded version y(2k) = a x(2k) + b y(2k-2) y(2k+1) = a x(2k+1) + b y(2k-1)

••Rewrite all indices in equations to the form

•(L(k - i) + j), with 0 ≤ j < L

•2-unfolded version y(2k) = a x(2k) + b y(2(k-1)) y(2k+1) = a x(2k+1) + b y(2(k-1)+1)

Page 50: pipleing,parallel,retimimg

4/23/201350

Parhi 5, slide 3 (Fig 5.3, pp 123)

•Original program: v(n) = u(n-37)

•4-unfolded version v(4k) = u(4k-37)

• v(4k+1) = u(4k-36)

• v(4k+2) = u(4k-35)

• v(4k+3) = u(4k-34)

•4-unfolded, v(4k) = u(4(k-10) +3)

• v(4k+1) = u(4(k-9))

• v(4k+2) = u(4(k-9)+1)

• v(4k+3) = u(4(k-9)+2)

Page 51: pipleing,parallel,retimimg

4/23/201351

Page 52: pipleing,parallel,retimimg

4/23/201352

Parhi5, slide 4 (Fig 5.4, pp 123)

•v(n) = u(n-1) + t(n-6) + v(n-12)

•v(3k) = u(3k-1) + t(3k-6) + v(3k-12)

•v(3k+1) = u(3k) + t(3k-5) + v(3k-11)

•v(3k+2) = u(3k+1) + t(3k-4) + v(3k-10)

•v(3k) = u(3(k-1)+2) + t(3(k-2)) + v(3(k-4))

•v(3k+1) = u(3k) + t(3(k-2)+1) + v(3(k-4)+1)

•v(3k+2) = u(3k+1) + t(3(k-2)+2) + v(3(k-4)+2)

•= Fig 5.4b

Page 53: pipleing,parallel,retimimg

4/23/201353

Page 54: pipleing,parallel,retimimg

4/23/201354

Parhi5, slide 6 (Fig 5.6, pp 129)

•u(n) = p(n) + (s*u(n-3) + t*u(n-2))

•u(2k) = p(2k) + (s*u(2k-3) + t*u(2k-2))

•u(2k+1) = p(2k+1) + (s*u(2k-2) + t*u(2k-1))

•u(2k) = p(2k) + (s*u(2(k-2)+1) + t*u(2(k-1))

•u(2k+1) = p(2k+1) + (s*u(2(k-1)) + t*u(2(k-1)+1)

Page 55: pipleing,parallel,retimimg

4/23/201355

Page 56: pipleing,parallel,retimimg

4/23/201356

Page 57: pipleing,parallel,retimimg

4/23/201357

FIR assignment

• Consider FIR: y(n) = a*x(n) + b*x(n-1) + c*x(n-3)

• Assume add and multiply times: 2 and 5 nsec resp.

1. Draw DFG of FIR, calculate throughput.

2. Pipeline and retime FIR for maximal throughput.

3. Unfold FIR J=2; draw the unfolded DFG. Throughput?

4. pipeline and retime unfolded FIR; draw DFG. Throughput?

5. Same for J=3 (draw DFG), and J=16 (no need to draw DFGs). Throughput?

• Return deadline: Tuesday May 7, 13:45

Page 58: pipleing,parallel,retimimg

4/23/201358

IIR assignment

• Consider IIR: y(n) = x(n) + a*y(n-2)

• Assume add and multiply time: 2 and 5 nsec resp.

1. Draw DFG of IIR, calculate throughput.

2. Pipeline and retime IIR for maximal throughput.

3. Unfold IIR J=2; draw the unfolded DFG. Throughput?

4. pipeline and retime unfolded IIR; draw DFG. Throughput?

5. Same for J=3 (draw DFG), and J=16 (no need to draw DFGs). Throughput?

• Return deadline: Tuesday May 7, 13:45

Page 59: pipleing,parallel,retimimg

4/23/201359

VLSI Programming: Feb 28

• Parhi,

• More unfolding, parallelism

• Strength reduction

Page 60: pipleing,parallel,retimimg

THANK YOU