7/29/2019 Jisha Ijera Paper
1/5
Jisha M. Vijayan, Arathy Iyer / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 1, January -February 2013, pp.1671-1675
1671 | P a g e
Area Efficient Fast Block LMS Adaptive Filter Using Distributed
Arithmetic
Jisha M. Vijayan, Arathy IyerM.Tech VLSI (IV Sem) S.N.G.C.E., KadayiruppuErnakulam, India
Asst. Professor, Dept. of ECE S.N.G.C.E., KadayiruppuErnakulum, India
AbstractIn FBLMS algorithm, the filter weights are
adapted in the frequency domain by using the FFT
algorithm. The main hardware complexity of thesystem is due to hardware multipliers in
FFT/IFFT block. Introduction of Distributed
Algorithm eliminates the need of that multipliers
by a mechanism that generates partial products
and then sums the products together and resulting
system will have high area efficiency. Using DA,
FFT can be efficiently calculated by jointly
employing the Good-Thomas and Rader
algorithms. In the proposed FBLMS algorithm
using DA, it is required to calculate only half of
the conjugate symmetric coefficients, under thatcondition the hardware requirements for the
proposed system is approximately half of that of
existing one.
KeywordsFBLMS, signed DA, Unsigned DA,Good-Thomas, Rader algorithm.
I. INTRODUCTIONThe term estimator or filter is commonly
used to refer to a system that is designed to extractinformation about a prescribed quantity of interestfrom noisy data. At the core of DSP applications is
the digital filter. Digital filters are generally used for:Separation of signals that have been combined,Restoration of signals that have been distorted,
Transform operations. In fast block LMS algorithmthe filter parameters are adapted in the frequencydomain by using the FFT algorithm.
In this paper, a new DA based FBLMS is proposedto reduce area and power consumption. Distributedarithmetic (DA) is an efficient multiplication-freetechnique for calculating inner products. The
multiplication operation is replaced by a mechanismthat generates partial products and then sums theproducts together. The key difference betweendistributed arithmetic and standard multiplication isin the way the partial products are generated andadded together. In the proposed architecture, usingthe DA based FFT (and IFFT) block FFT can be
efficiently calculated by jointly employing the Good-Thomas and Rader algorithms. DA based FFT blockprovides only half of the conjugate symmetric outputs
without calculating others and in DA based IFFTblock we require to feed only half of the conjugatesymmetric coefficients not all, which is not possible
in existing FBLMS based adaptive filters. Since inthe proposed FBLMS algorithm based adaptive filter,
we require to calculate only half of the conjugatesymmetric coefficients, under that condition thehardware requirements for our proposed system isapproximately half of that of existing one.
II. DISTRIBUTED ARITHMETICDistributed arithmetic (DA) is first
introduced by Croisier, and Zohar and furtherdeveloped by Peled and Liu more than three decadesago. The DA is a direct method for sum of productsoperations, partial products can pre-compute by
difference equation and storing in look-up-table
(LUT) contained in memory, input signals are usedfor addressing.
Consider the following inner product of two N
dimensional vectors c and x, where c is a knownconstant vector, x is the input sample vector, and y is
the result.
= , = 1=0 (1)= 00 + 11+. . .+ 1 1 (2)
An unsigned DA system assumes that the variablex[n] is given by
= 2 1=0 0,1 (3)where xb[n] denotes the b
thbit ofx[n], i.e., the n
th
sample ofx. The inner product y can, therefore, berepresented as:
=
2
1
=0
1
=0(4)
in more compact form
7/29/2019 Jisha Ijera Paper
2/5
Jisha M. Vijayan, Arathy Iyer / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 1, January -February 2013, pp.1671-1675
1672 | P a g e
= 2 , (5)1=0
1=0
=
2
1
=0 (
,
)
1
=0(6)
In case of signed DA, we use modified equation inorder to process a signed twos complement number.We use, therefore, the following (B + 1)-bitrepresentation
= 2 + 2 (7)1=0 the outcomeyis defined by
= 2 , + 2 , 1
=01
=0 (8)
III. PROPOSED SYSTEM
S/Ps(n)
BUFFER
OF
LENGTH
N= 2M
DA-FFT
of length
N
DA-IFFT
of length
N
Save
last N/2
data
y(n)y(k)
Conjugate
Multiply by
DA-FFT of
length N
Append
zero block
Delete last
block
DA-IFFT of
length N
Delay
(k)
(k+1)
DA-FFT of
length N
Insert zero
block
P/S
S/Pd(n)
Desired
response
Old
s
New
s
y
Discarded
e(n)
SH(k)
e0
E(k)
Discarded
0
S (k)
Figure 1. Proposed DA based FBLMS Adaptive filter
FFT and IFFT are the blocks that include largenumber of multipliers. Introduction of DA eliminatesthe need of that multipliers by a mechanism thatgenerates partial products and then sums the productstogether and resulting system will have high areaefficiency.
IV. INTRODUCTION TO FFTUSING DA
Figure 2. FFT Blocks
Using DA, FFT can be efficiently calculated byjointly employing the Good-Thomas and Raderalgorithms.
A. Good-Thomas Index MappingGood-Thomas algorithm re-expresses the
discrete Fourier transform (DFT) of a size N = N1N2
as a two-dimensional N1N2DFT, but only for the casewhere Nland N2are relatively prime.The index mapping suggest by Good and Thomas forn is
= 21 + 12 0 1 1 10 2 2 1 (9)and as index mapping forkresults
= 22111 + 11122 0 1 1 10 2 2 1 (10)A = 12-point DFT can be computed accordingto following steps:1) Index transform of input sequence, according to
equation for n.
2) Computation of 2 DFTs of length 1 usingRader algorithm.
3) Computation of 1 DFTs of length 2 usingRader algorithm.
4) Index transform of input sequence, according toequation for k.
To find the 14 point FFT block, Fig. 3 shows aschematic of a block processing system.
7/29/2019 Jisha Ijera Paper
3/5
Jisha M. Vijayan, Arathy Iyer / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 1, January -February 2013, pp.1671-1675
1673 | P a g e
Figure 3. Mapping of 14 point FFT using Good-
Thomas Algorithm
From Fig. 3, we realize that first stage has 2 DFTs
each having 7-points and second stage has 7 DFTseach having of length 2. Here multiplication withtwiddle factors between the stages is not required.B.Rader Algorithm
It is easily seen from the definition of theDFT that the transform of a length N real sequence
x(n) has conjugate symmetry, i.e. X(N - K) = X*(K).This property facilitates to compute only half of thetransform, as the remaining half is redundant andneed not be calculated. Rader algorithm providesstraightforward way to compute only half of the
conjugate symmetric outputs without calculating theothers. Algorithm presented here first decomposes theone dimensional DFT into a multidimensional DFTusing the index map proposed by Good. Next, amethod which is based on the index permutationproposed by Rader is used to convert the short DFTs
into convolution. This method changes a prime lengthN DFT of real data into two convolutions of length
(N- 1)/2.
Now consider1 = 7 if the data are real we need tocalculate only half of the transform. Also, as Radershowed the zero frequency term must be calculatedseparately.
(0) = 1=0 (11)In matrixform, we write
123 = 1 2 3
2 4 6
3 6 2
4 5 61 3 5
5 1 4
1
2
3456+ 000 (12)
Replacing by(),123 =
1 2 3
2 3 13 1 2
3 2 11 3 22 1 3
12
3
456
+ 0
00 (13)
If real and imaginary parts of W matrix are separated,a simplification is possible. Consider first the real
part using notation in matrix that k stands fors(2/7). The real part becomes12
(3)
= 1 2 32 3 13 1 2
(1) + (6)(2) + (5)
(3) +
(4)
+ 00
(0)
(14)Using the notation of k for n(2/7) gives, for theimaginary part
12(3) = 1 2 32 3 13 1 2
(1) (6)(2) (5)(4) (3)(15)
These are cyclic convolution relation. Since in ourproblem we always convolve with the same
coefficients, arithmetic efficiency can be improved byprecalculating some of the intermediate results. Theseare stored in table in memory and simply addressed
as needed. Using distributed arithmetic this can beimplemented efficiently.
x 0
x 2
x 4
x 5
x(3)
x 1
x(13)
x 11
x(9)
x(7)
x 12
x 10
x(8)
x(6)
X(0)
X(7)
X(8)
X(1)
X(2)
X(9)
X(10)
X(3)
X(4)
X(11)
X(12)
X(5)
X(6)
X(13)
k1=0
k1=1
k1=2
k1=3
k1=4
k1=5
k1=6
n2
=1
n2
=0
7/29/2019 Jisha Ijera Paper
4/5
Jisha M. Vijayan, Arathy Iyer / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 1, January -February 2013, pp.1671-1675
1674 | P a g e
R1+
x(1)
x(6)
R2+
x(2)
x(5)
R3+
x(3)
x(4)
R4ROM 1
R1-
x(1)
x(6)
R2-
x(2)
x(5)
R3-
x(3)
x(4)
R4ROM 2
ADD_SUB
R5
R6
R7
+
+
+
x(0)
Xk(1)
Xk(2)
Xk(3)
ADD_SUB
R5
R6
R7
Xl(1)
Xl(2)
Xl(3)
+ X(0)
R0
x0x1x2x3x4x5x6
Figure 5. Architecture for FFT using DA.
V. EXPERIMENTAL RESULTSVerilog codes are written for both FFT using
DA and shift and add method and synthesized using
Xilinx 13.2 version. Family of device was Spartan 3E
and target device was XC3S500E. Fig. 6 shows thelogic utilization of both the architectures and Table 1shows Delay in ns. From these results, it is clear that
clear that our proposed architecture based adaptivefilter is faster than that used by shift and add methodand Area utilization using DA is very much less than
that by shift and add method. Thus the powerutilization is also less in case of FBLMS using DA
Table 1. Delay Comparison
Design Minimum Period in
ns
1616 DA Multiplier 35.2811616 SA Multiplier 36.296
Figure 6. Comparison of resource utilization in
FPGA
VI. CONCLUSIONIn this paper, we have proposed a new area
efficient FBLMS adaptive filter. FBLMS is the fastestand computationally efficient adaptive algorithm andFFT, the main computational block in FBLMS can be
efficiently calculated by DA. The concept of DAinvolves for implementation of FFT block without anyhardware multiplier using LUT and adders. Due to
reduced hardware complexity the proposed DA basedFBLMS adaptive filter is best suitable forimplementation of higher order filters in FPGA
efficiently with minimum area requirement, lowpower dissipation.
REFERENCES[1] S. Haykin, Adaptive Filter Theory, 4th ed., T.
Kailath, Ed. Pearson Education, 2008.
[2] U Meyer Baese, Digital Signal Processing Withfield Programmable Gate Arrays,3rd ed.
[3] Sudhanshu Baghel , Rafiahamed Shaik, FPGAImplementation of Fast Block LMS AdaptiveFilter Using Distributed Arithmetic for High
Throughput, p443-p447, 2011.
[4] Arman Chahardahcherik, Yousef S. Kavian, Otto
Strobel, and Ridha Rejeb, Implementing FFTAlgorithms on FPGA, IJCSNS InternationalJournal of Computer Science and NetworkSecurity, VOL.11 No.11, pp. 148 156,November 2011.
[5] S. A. White, "Applications of distributedarithmetic to digital signal processing: A tutorial
review,"IEEE ASSP Magazine, July 1989.[6] C. S. Burrus, "Index mappings for
multidimensional formulation of the DFT andconvolution," IEEE Transactions On Acoustics,
Speech, And Signal Processing, vol. 25, pp. 239-242, June 1977.
[7]
D.J. Allred, W. Huang, Y.Krishnan, H. Yoo, D.VAnderson, "An FPGA
4292
2576
7113
3402
2102
5639
0
1000
2000
3000
4000
5000
6000
7000
8000
Slices Slice flip flops LUTs
FFT using shift and add FFT using DA
7/29/2019 Jisha Ijera Paper
5/5
Jisha M. Vijayan, Arathy Iyer / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 1, January -February 2013, pp.1671-1675
1675 | P a g e
implementation for a high throughput adaptivefilter using distributedarithmetic." 12th Annual IEEE Symposium onField-Programmable
Custom Computing Machines, pp. 324 - 325,
2004.
[8] S. K. M. Gregory A. Clark and S. R. Parker,
"Block implementation of adaptive digitalfilters," IEEE Transactions on Circuits andSystems, vol. 28, pp. 584592, 1981.
[9] C. M. Rader, "Discrete fourier transforms whenthe number of data samples is prime," IEEEProceedings, vol. 56, pp. 1107-1108, June 1968.
Top Related