Post on 23-Mar-2016
description
VLSI Signal Processing
Dr. Elwin Chandra MonieDepartment of ECE, RMK Engineering College
2
256K
MEM
ORY
CHIP
Dept. of ECE, RMK Engineering College
3
APPLICATIONS
Dept. of ECE, R M K Engineering College
4
SYLLABUS Anna University syllabus forVL9253 VLSI Signal processing
TextKeshab K. Parhi, ‘VLSI Digital Signal Processing Systems, Design and implementation’, Wiley India Pvt. Ltd., 2009
Dept. of ECE, RMK Engineering College
5
Need for VLSI DSP System
Processors for DSP system• General Purpose
Microprocessors/Microcontrollers• General Purpose DSPs• Custom Processors in VLSI- FPGA, ASIC
Real time throughput• Sampling rates from 20KHz to 500 MHz• Present sample is to be processed before
the arrival of the next sample; if not buffered
• Processing rate upto 100 GOPs/sec is required Dept. of ECE, R M K Engineering College
6
Need for VLSI DSP system ….
Data Driven property• Systems are synchronized by data and
not by clock • Asynchronous operation possible
Reduced size• For portable and mobile applications • High density circuits available -
90MnTr/cm2
• Increases according to Moore’s Law• Submicron fabrication technology feasible
0.07µm Dept. of ECE, R M K Engineering College
7
Typical DSP AlgorithmsFiltering
• FIR, IIR filters• y(n) = ∑
kak y(n-k) + ∑
kbk x(n-k)
• With (Recursive) and without feedback• Convolution and Correlation• y(n) = ∑ x(k) h (n-k)• y(n) = ∑ a(k) x (n+k) n= 1 to ∞• Non-terminating programs – Execute the
same code repetitively • Adaptive Filters –LMS Algorithm
Dept. of ECE, R M K Engineering College
8
Typical DSP Algorithms …
Transforms• FFT, DCT, DWT• FFT : X(k) = ∑
n x(n) e -j2πkn/N Real and imaginary
components
Decomposition• SVD, LU Matrix factorization, QR decomposition
Operations involved• Arithmetic – Multiplication, Addition• MAC operation• Logic – Shifting, barrel shifiting – Delay • Dot Product/ Matrix-Vector operations
Dept. of ECE, R M K Engineering College
9
Data Flow Graph A DSP program is often represented using a
Data Flow Graph (DFG), which is a directed graph that describes the program
Consider the following IIR filter
Dept. of ECE, RMK Engineering College
y[n] = x[n] + a y[n − 1]
10
Data Flow Graph ….
In the DFG, nodes represent the tasks or computations (Multiplication/Addition)
Each task is associated with its corresponding execution time
The edges represent the communications between the nodes A → B
Associated with each edge is a non-negative number representing the delay
An iteration of the node is the execution of the node, exactly once
Dept. of ECE, RMK Engineering College
11
Data Flow Graph ….
Each edge describes a precedence constraint between two nodes
The precedence constraint is an intra-iteration constraint if the edge has zero delays
(i.e. computations at nodes connecting the edge occur in the same clock cycle)
The precedence constraint is an inter-iteration constraint if the edge has one or more delays(i.e. computations at nodes connecting the edge occur in different clock cycles) A1 → B1 => A2 → B2 => A3 …
Dept. of ECE, RMK Engineering College
12
Data Flow Graph ….
Critical Path the path with the longest computation time among
all paths that contain zero delaysCritical path length is 26 unitsCritical path: the lower bound on clock period
To achieve high-speed, the length of the critical path should be reduced
Dept. of ECE, RMK Engineering College
D D D Dx(n)
y(n)
10
4
10 10 10
4 4 4
10
26
26 22
18 14
13
Loop Bound A recursive DFG has one or more loops A loop bound for the L-th loop is defined as tL / wL
tL is the loop computation time wL is the number of delays in the loop
Iteration bound T∞ Iteration bound is the maximum loop bound of all
loops in the DFG The loop that gives the iteration bound is called the
critical loop The iteration bound determines the minimum
critical path of a recursive system represented by that DFG structure!
In other words, no matter how you pipeline or retime the DFG, you cannot get a circuit with lower critical path than the iteration bound!
Dept. of ECE, RMK Engineering College
Example of Iteration Bound
Loops Loop 1: ADBA
Loop bound = 4/2 Loop 2: AECBA
Loop bound = 5/3 Loop 3: AFCB
Loop bound = 5/4 Critical Loop
Loop 1 Iteration Bound
Max{4/2,5/3,5/4} = 4/2 = 2
T∞=2 units of time.
2D
D
D
(1)
(1)
(1)
(2)
(2)
(2)
A
B
C
D
E
F
That is the minimum clock period (max frequency) this circuit can operate at after pipelining and retiming
15
Longest path matrix algorithm-1
Let d be the number of delays in DFG. Define K = [1, 2, · · · , d]Form the matrix L(1) as follows
max tqd
i → dj if at least one path exists
L(1)i,j =
q
-1 if no such path exists
where max tqd
i → dj is the maximum of the longest computation time between delay element di to delayelement dj
Dept. of ECE, RMK Engineering College
16
Longest path matrix algorithm-2Compute the successive matrices
L(m+1)i,j = max ( -1, L(1)
i,k + L(m)k,j )
kS
in which Si,j = { k K |(li,j -1) & (lk,j -1)}The iteration bound is computed from
L(m)i,i
T∞ = max ---------- i,mK m
Dept. of ECE, RMK Engineering College
17
Longest path matrix algorithm-3
Dept. of ECE, RMK Engineering College
-1 0 0 -1 4 -1 0 -1L(1) = 5 -1 -1 0 5 -1 -1 -1
L2,1(2) = max ( -1, L(1)
2,k + L(1)
k,1) k{1,2,3,4}
18Dept. of ECE, RMK Engineering College
LONGEST PATH MATRIX ALGORITHM-4
L2,1(2) = max( -1, L(1)
2,k + L(1)k,1)
k{1,2,3,4}
= max( -1,0+5) = 5L2,2
(2) = max( -1, L(1)2,k + L(1)
k,2) k{1,2,3,4}
= max( -1,4+0 ) = 4L2,3
(2) = max( -1, L(1)2,k + L(1)
k,3) k{1,2,3,4}
= max(-1) = -1L2,4
(2) = max ( -1, L(1)2,4 + L(1)
k,4) k{1,2,3,4}
= max(-1,0+0) = 0
19
LONGEST PATH MATRIX ALGORITHM-5 4 -1 0 -1 5 4 -1 0L(2)
= 5 5 -1 -1 -1 5 -1 -1
5 4 -1 0 8 5 4 -1L(3) = 9 5 5 -1 T∞ = max 4/2, 4/2, 5/3, 5/3, 5/3, 8/4, 8/4, 5/4, 5/4
9 -1 5 -1 = 2 8 5 4 -1 9 8 5 4 L(4) = 10 9 5 5 10 9 -1 5
Dept. of ECE, RMK Engineering College
20
DATA INDEPENDENCE GRAPH
Dept. of ECE, RMK Engineering College
0
2 3 4
2
5 61
1
x0
x1 x2 x3 x4 x5
y0 y1 y2 y3 y4 y5
b2
b1
b0
0
0 0 0 0 0
0
y(n)= b0 x(n) + b1 x(n-1) + b2 x(n-2)
x
x’=xy
y’= y+bx
b b’=b
21Dept. of ECE, RMK Engineering College
PIPELINING IN FIR FILTERS
Reduce the critical path Increase the clock speed or sample speed Reduce power consumption
Introduce pipelining latches along the data path
22
PIPELINING IN FIR FILTERS
Dept. of ECE, RMK Engineering College
Critical path : TM+2TA => TM+TA
23
GENERAL METHOD OF PIPELINING Pipelining latches can only be placed across any
feed-forward cutset of the graph without affecting of the structure
Cutset: A cutset is a set of edges of a graph such that if these edges are removed from the graph, the graph becomes disjoint.
Feed-forward cutset: A cutset is called a feed-forward cutset if the data move in the forward direction on all the edges of the cutset
Limitations of Pipelining Increase in Latency : The difference in the
availability of the first output Increase in the number of latches
Dept. of ECE, RMK Engineering College
24
GENERAL METHOD OF PIPELINING
Dept. of ECE, RMK Engineering College
Critical path: 4
Not Correct !
Critical Path: 2
Feed forward cutset
25
TRANSPOSITION THEOREM
Dept. of ECE, RMK Engineering College
x(n)
Z-1 Z-1 y(n)
c b a
Reverse the direction of all edges in a given SFG and interchanging the input and output ports preserve the functionality of the system
Critical Path : TM+2TA => TM+TA
26
FINE-GRAIN PIPELINING
Dept. of ECE, RMK Engineering College
Multiplier with processing time of 10 is split into two units with processing times 6 and 4Critical path: 12 => 6
27
PARALLEL PROCESSING FIR FILTERS
Dept. of ECE, RMK Engineering College
y(n)= ax(n)+bx(n-1)+cx(n-2)
y(3k) = ax(3k)+bx(3k-1)+cx(3k-2)y(3k+1)= ax(3k+1)+bx(3k)+cx(3k-1)y(3k+2)= ax(3k+2)+bx(3k+1)+cx(3k)
Sample speed is increased since multiple samples are processed at the same time. Clock speed remains the same
28
PARALLEL PROCESSING FIR FILTERS
Dept. of ECE, RMK Engineering College
Used 3 sets of resources for 3-parallel system
Iteration Time= 1/3 (TM+2TA )
29
PIPELINING FOR LOW POWER Ccharge V0
Propagation delay = --------------- k(V0- Vt)2
Power consumption = Ctotal V02 f
For M Level pipelining Ccharge is reduced by 1/MKeeping f same reduce V0 by β V0 where β 0 to 1 Ppip = Ctotal β2 V0
2 f = β2 Pseq
Ccharge/M β V0Propagation delaypip = -------------------- k(βV0- Vt)2
If the clock period is kept the same
Ccharge V0 Ccharge/M β V0 ------------ = ------------------- k(V0- Vt)2 k(βV0- Vt)2
(βV0- Vt)2 = β (V0- Vt)2 Solve for β
Dept. of ECE, RMK Engineering College
30
EXAMPLE ON PIPELININGConsider an original 3-tap FIR filter and its fine-grain pipeline. Assume TM=10 ut, TA=2 ut, Vt=0.6V, Vo=5V, and CM=5CA.In fine-grain pipeline filter, the multiplier is broken into 2 parts, m1 and m2 with computation time of 6 u.t. and 4 u.t. respectively, with capacitance 3 times and 2 times that of an adder, respectively.
(a) What is the supply voltage of the pipelined filter if the clock period remains unchanged?(b) What is the power consumption of the pipelined filter as a percentage of the original filter?
Dept. of ECE, RMK Engineering College
31Dept. of ECE, RMK Engineering College
SOLUTIONSolution:
Original : C charge = CM + CA = 6 CA
Pipelining : C charge = 3 C A (5 β - 0.6)2 = β (5 - 0.6)2 β = 0.6033 or 0.0239 ( not valid)
Vpip = 3.0165V0
Ppip = 0.364 Pseq
32
PARALLEL SYSTEM FOR LOW POWERPower consumption :
Ppar = (L Ctotal) (β V0)2 f / L = P seq for L- Parallel System
Propagation delay:
Ccharge V0 Ccharge β V0
Tseq = --------------- Tpar = ---------------- k(V0- Vt)2 k(βV0- Vt)2
L Tseq = Tpar
β(V0- Vt)2 = L (βV0- Vt)2
Solve for βDept. of ECE, RMK Engineering College
33
EXAMPLE ON PARALLEL SYSTEM
Consider a 4-tap FIR filter shown in Fig. 3.18(a) and its 2-parallel version in 3.18(b). The two architectures are operated at the sample period 9 u.t. Assume TM=8, TA=1, Vt=0.45V, Vo=3.3V, CM=8CA (a) What is the supply voltage of the 2-parallel filter? (b) What is the power consumption of the 2- parallel filter as a percentage of the original filter?
Dept. of ECE, RMK Engineering College
34
SOLUTION Ccharge = CM + CA 2- parallel: Ccharge = CM + 2CA = 10CA 9 (β 3.3 - 0.45)2 = 5 β (3.3 - 0.45)2
β = 0.6585 or 0.0282 (not valid)Vpar = 2.1743 Vo
Ppar = 0.4341 P
Dept. of ECE, RMK Engineering College
35
PROBLEMS & ASSIGNMENTS1) Prob. 2.7.1 (a)2) Prob. 2.7.4
Assignment
3) Design a Low pass filter with sample rate of 48KHz and order 40 with cut off frequency of 10KHz. Write VHDL/Verilog code and simulateHint: Use Matlab to find the coefficients and test the filter functionality by testing the impulse response
2) Implement a 4-tap filter in direct form and in transpose form. Introduce pipelining and compare the performance
Dept. of ECE, RMK Engineering College