Dr. Elwin Chandra Monie Department of ECE, RMK Engineering College

VLSI Signal Processing

Dr. Elwin Chandra MonieDepartment of ECE, RMK Engineering College

Dept. of ECE, RMK Engineering College

APPLICATIONS

Dept. of ECE, R M K Engineering College

SYLLABUS Anna University syllabus forVL9253 VLSI Signal processing

TextKeshab K. Parhi, ‘VLSI Digital Signal Processing Systems, Design and implementation’, Wiley India Pvt. Ltd., 2009

Need for VLSI DSP System

Processors for DSP system• General Purpose

Microprocessors/Microcontrollers• General Purpose DSPs• Custom Processors in VLSI- FPGA, ASIC

Real time throughput• Sampling rates from 20KHz to 500 MHz• Present sample is to be processed before

the arrival of the next sample; if not buffered

• Processing rate upto 100 GOPs/sec is required Dept. of ECE, R M K Engineering College

Need for VLSI DSP system ….

Data Driven property• Systems are synchronized by data and

not by clock • Asynchronous operation possible

Reduced size• For portable and mobile applications • High density circuits available -

90MnTr/cm2

• Increases according to Moore’s Law• Submicron fabrication technology feasible

0.07µm Dept. of ECE, R M K Engineering College

Typical DSP AlgorithmsFiltering

• FIR, IIR filters• y(n) = ∑

kak y(n-k) + ∑

kbk x(n-k)

• With (Recursive) and without feedback• Convolution and Correlation• y(n) = ∑ x(k) h (n-k)• y(n) = ∑ a(k) x (n+k) n= 1 to ∞• Non-terminating programs – Execute the

same code repetitively • Adaptive Filters –LMS Algorithm

Typical DSP Algorithms …

Transforms• FFT, DCT, DWT• FFT : X(k) = ∑

n x(n) e -j2πkn/N Real and imaginary

components

Decomposition• SVD, LU Matrix factorization, QR decomposition

Operations involved• Arithmetic – Multiplication, Addition• MAC operation• Logic – Shifting, barrel shifiting – Delay • Dot Product/ Matrix-Vector operations

Data Flow Graph A DSP program is often represented using a

Data Flow Graph (DFG), which is a directed graph that describes the program

Consider the following IIR filter

y[n] = x[n] + a y[n − 1]

Data Flow Graph ….

In the DFG, nodes represent the tasks or computations (Multiplication/Addition)

Each task is associated with its corresponding execution time

The edges represent the communications between the nodes A → B

Associated with each edge is a non-negative number representing the delay

An iteration of the node is the execution of the node, exactly once

Each edge describes a precedence constraint between two nodes

The precedence constraint is an intra-iteration constraint if the edge has zero delays

(i.e. computations at nodes connecting the edge occur in the same clock cycle)

The precedence constraint is an inter-iteration constraint if the edge has one or more delays(i.e. computations at nodes connecting the edge occur in different clock cycles) A1 → B1 => A2 → B2 => A3 …

Critical Path the path with the longest computation time among

all paths that contain zero delaysCritical path length is 26 unitsCritical path: the lower bound on clock period

To achieve high-speed, the length of the critical path should be reduced

D D D Dx(n)

10 10 10

Loop Bound A recursive DFG has one or more loops A loop bound for the L-th loop is defined as tL / wL

tL is the loop computation time wL is the number of delays in the loop

Iteration bound T∞ Iteration bound is the maximum loop bound of all

loops in the DFG The loop that gives the iteration bound is called the

critical loop The iteration bound determines the minimum

critical path of a recursive system represented by that DFG structure!

In other words, no matter how you pipeline or retime the DFG, you cannot get a circuit with lower critical path than the iteration bound!

Example of Iteration Bound

Loops Loop 1: ADBA

Loop bound = 4/2 Loop 2: AECBA

Loop bound = 5/3 Loop 3: AFCB

Loop bound = 5/4 Critical Loop

Loop 1 Iteration Bound

Max{4/2,5/3,5/4} = 4/2 = 2

T∞=2 units of time.

That is the minimum clock period (max frequency) this circuit can operate at after pipelining and retiming

Longest path matrix algorithm-1

Let d be the number of delays in DFG. Define K = [1, 2, · · · , d]Form the matrix L(1) as follows

max tqd

i → dj if at least one path exists

L(1)i,j =

-1 if no such path exists

where max tqd

i → dj is the maximum of the longest computation time between delay element di to delayelement dj

Longest path matrix algorithm-2Compute the successive matrices

L(m+1)i,j = max ( -1, L(1)

i,k + L(m)k,j )

in which Si,j = { k K |(li,j -1) & (lk,j -1)}The iteration bound is computed from

L(m)i,i

T∞ = max ---------- i,mK m

Longest path matrix algorithm-3

-1 0 0 -1 4 -1 0 -1L(1) = 5 -1 -1 0 5 -1 -1 -1

L2,1(2) = max ( -1, L(1)

2,k + L(1)

k,1) k{1,2,3,4}

18Dept. of ECE, RMK Engineering College

LONGEST PATH MATRIX ALGORITHM-4

L2,1(2) = max( -1, L(1)

2,k + L(1)k,1)

k{1,2,3,4}

= max( -1,0+5) = 5L2,2

(2) = max( -1, L(1)2,k + L(1)

k,2) k{1,2,3,4}

= max( -1,4+0 ) = 4L2,3

(2) = max( -1, L(1)2,k + L(1)

k,3) k{1,2,3,4}

= max(-1) = -1L2,4

(2) = max ( -1, L(1)2,4 + L(1)

k,4) k{1,2,3,4}

= max(-1,0+0) = 0

LONGEST PATH MATRIX ALGORITHM-5 4 -1 0 -1 5 4 -1 0L(2)

= 5 5 -1 -1 -1 5 -1 -1

5 4 -1 0 8 5 4 -1L(3) = 9 5 5 -1 T∞ = max 4/2, 4/2, 5/3, 5/3, 5/3, 8/4, 8/4, 5/4, 5/4

9 -1 5 -1 = 2 8 5 4 -1 9 8 5 4 L(4) = 10 9 5 5 10 9 -1 5

DATA INDEPENDENCE GRAPH

x1 x2 x3 x4 x5

y0 y1 y2 y3 y4 y5

0 0 0 0 0

y(n)= b0 x(n) + b1 x(n-1) + b2 x(n-2)

x’=xy

y’= y+bx

b b’=b

PIPELINING IN FIR FILTERS

Reduce the critical path Increase the clock speed or sample speed Reduce power consumption

Introduce pipelining latches along the data path

PIPELINING IN FIR FILTERS

Critical path : TM+2TA => TM+TA

GENERAL METHOD OF PIPELINING Pipelining latches can only be placed across any

feed-forward cutset of the graph without affecting of the structure

Cutset: A cutset is a set of edges of a graph such that if these edges are removed from the graph, the graph becomes disjoint.

Feed-forward cutset: A cutset is called a feed-forward cutset if the data move in the forward direction on all the edges of the cutset

Limitations of Pipelining Increase in Latency : The difference in the

availability of the first output Increase in the number of latches

GENERAL METHOD OF PIPELINING

Critical path: 4

Not Correct !

Critical Path: 2

Feed forward cutset

TRANSPOSITION THEOREM

Z-1 Z-1 y(n)

Reverse the direction of all edges in a given SFG and interchanging the input and output ports preserve the functionality of the system

Critical Path : TM+2TA => TM+TA

FINE-GRAIN PIPELINING

Multiplier with processing time of 10 is split into two units with processing times 6 and 4Critical path: 12 => 6

PARALLEL PROCESSING FIR FILTERS

y(n)= ax(n)+bx(n-1)+cx(n-2)

y(3k) = ax(3k)+bx(3k-1)+cx(3k-2)y(3k+1)= ax(3k+1)+bx(3k)+cx(3k-1)y(3k+2)= ax(3k+2)+bx(3k+1)+cx(3k)

Sample speed is increased since multiple samples are processed at the same time. Clock speed remains the same

PARALLEL PROCESSING FIR FILTERS

Used 3 sets of resources for 3-parallel system

Iteration Time= 1/3 (TM+2TA )

PIPELINING FOR LOW POWER Ccharge V0

Propagation delay = --------------- k(V0- Vt)2

Power consumption = Ctotal V02 f

For M Level pipelining Ccharge is reduced by 1/MKeeping f same reduce V0 by β V0 where β 0 to 1 Ppip = Ctotal β2 V0

2 f = β2 Pseq

Ccharge/M β V0Propagation delaypip = -------------------- k(βV0- Vt)2

If the clock period is kept the same

Ccharge V0 Ccharge/M β V0 ------------ = ------------------- k(V0- Vt)2 k(βV0- Vt)2

(βV0- Vt)2 = β (V0- Vt)2 Solve for β

EXAMPLE ON PIPELININGConsider an original 3-tap FIR filter and its fine-grain pipeline. Assume TM=10 ut, TA=2 ut, Vt=0.6V, Vo=5V, and CM=5CA.In fine-grain pipeline filter, the multiplier is broken into 2 parts, m1 and m2 with computation time of 6 u.t. and 4 u.t. respectively, with capacitance 3 times and 2 times that of an adder, respectively.

(a) What is the supply voltage of the pipelined filter if the clock period remains unchanged?(b) What is the power consumption of the pipelined filter as a percentage of the original filter?

SOLUTIONSolution:

Original : C charge = CM + CA = 6 CA

Pipelining : C charge = 3 C A (5 β - 0.6)2 = β (5 - 0.6)2 β = 0.6033 or 0.0239 ( not valid)

Vpip = 3.0165V0

Ppip = 0.364 Pseq

PARALLEL SYSTEM FOR LOW POWERPower consumption :

Ppar = (L Ctotal) (β V0)2 f / L = P seq for L- Parallel System

Propagation delay:

Ccharge V0 Ccharge β V0

Tseq = --------------- Tpar = ---------------- k(V0- Vt)2 k(βV0- Vt)2

L Tseq = Tpar

β(V0- Vt)2 = L (βV0- Vt)2

Solve for βDept. of ECE, RMK Engineering College

EXAMPLE ON PARALLEL SYSTEM

Consider a 4-tap FIR filter shown in Fig. 3.18(a) and its 2-parallel version in 3.18(b). The two architectures are operated at the sample period 9 u.t. Assume TM=8, TA=1, Vt=0.45V, Vo=3.3V, CM=8CA (a) What is the supply voltage of the 2-parallel filter? (b) What is the power consumption of the 2- parallel filter as a percentage of the original filter?

SOLUTION Ccharge = CM + CA 2- parallel: Ccharge = CM + 2CA = 10CA 9 (β 3.3 - 0.45)2 = 5 β (3.3 - 0.45)2

β = 0.6585 or 0.0282 (not valid)Vpar = 2.1743 Vo

Ppar = 0.4341 P

PROBLEMS & ASSIGNMENTS1) Prob. 2.7.1 (a)2) Prob. 2.7.4

Assignment

3) Design a Low pass filter with sample rate of 48KHz and order 40 with cut off frequency of 10KHz. Write VHDL/Verilog code and simulateHint: Use Matlab to find the coefficients and test the filter functionality by testing the impulse response

2) Implement a 4-tap filter in direct form and in transpose form. Introduce pipelining and compare the performance

Dr. Elwin Chandra Monie Department of ECE, RMK Engineering College

Documents

Transcript of Dr. Elwin Chandra Monie Department of ECE, RMK Engineering College

RMK Aramark March 2006 Presentation

Family Engagement Is Key To Increased Student Achievement Monie Byers Family Engagement Consultant.

RMK 11 - Strategy Paper 17

Zeiss RMK TOP aerial survey camera system · RMK TOP Aerial Survey Camera System ... tion comprises the RMK TOP camera body, the T-CU control ... and internal control. e.g. expo ...

Installation Guide: RMK-IFE-1U - fullcompass.com RMK-IFE-1U IFE Rack Mount Kit Installation Guide – DOC. 7627A IFE Rack Mount Kit: RMK-IFE-1U • 1 • 19” black, steel, rack-mountable

Verrier Elwin as the presenter of the Tales of the Tribes ...

RMK Catalog ColorMake 08 - rmkrmk.com

a’Beckett, Hastings Elwin. - Scots All Saints' College · 2019. 5. 3. · a’Beckett, Hastings Elwin. ASC1902-7. Military Cross: Lieutenant H E a'Beckett, 1 Battalion AIF . Hastings

700 RMK Model #0980966 EURO 700 RMK Model …pi54.com/corp/manuals/Parts/9914372r02.pdf1998 700 RMK and EUROPEAN 700 RMK SPECIFICATIONS BRAKES Type Liquid Cooled Hydraulic Disc CAPACITIES

Rmk 2 Fraud Theory

Universiti Teknologi Malaysia in brief - NICT€¦ · [1955] Technical College, Jalan Gurney Technical College –Professional Programmes[1960] ... RMK 8 RMK 9 RMK 10 ('11-'14) 150

RMK 254 Tort & Criminal

Freshjive Spring 2012 RMK Lookbook

RMK Shift 600 RMK 700 RMKcdn.polarisindustries.com/polaris/common/parts-manual… · · 2012-09-27RMK Shift 600 RMK 700 RMK 700 Dragon RMK 800 Dragon RMK Snowmobile Owner's Manual

Rmk 5_analisa Jejak

Baptist Burial Ground - Bath Record Office · 2017. 9. 14. · Wilhelmina Elwin (1784-1853) CHARLOTTE WILHELMINA ELWIN FELL ASLEEP NOVR 6TH 1853 AGED 69 “Looking unto Jesus” Flat

2U RACKMOUNT KIT RMK-2

RMK Shift 600 RMK 700 RMK - Grand County EMSgrandcountyems.com/uploads/3/2/3/4/32347367/2008_polaris_600_rmk...RMK Shift 600 RMK 700 RMK 700 Dragon RMK 800 Dragon RMK Snowmobile Owner's

RMK Shift 600 RMK 700 RMK - cdn-gen.polaris.comcdn-gen.polaris.com/polaris/common/parts-manuals/9921554r01.pdf · RMK Shift 600 RMK 700 RMK 700 Dragon RMK 800 Dragon RMK Snowmobile

Book art monie