Download - Jisha Ijera Paper

7/29/2019 Jisha Ijera Paper

1/5

Jisha M. Vijayan, Arathy Iyer / International Journal of Engineering Research and

Applications (IJERA) ISSN: 2248-9622 www.ijera.com

Vol. 3, Issue 1, January -February 2013, pp.1671-1675

1671 | P a g e

Area Efficient Fast Block LMS Adaptive Filter Using Distributed

Arithmetic

Jisha M. Vijayan, Arathy IyerM.Tech VLSI (IV Sem) S.N.G.C.E., KadayiruppuErnakulam, India

Asst. Professor, Dept. of ECE S.N.G.C.E., KadayiruppuErnakulum, India

AbstractIn FBLMS algorithm, the filter weights are

adapted in the frequency domain by using the FFT

algorithm. The main hardware complexity of thesystem is due to hardware multipliers in

FFT/IFFT block. Introduction of Distributed

Algorithm eliminates the need of that multipliers

by a mechanism that generates partial products

and then sums the products together and resulting

system will have high area efficiency. Using DA,

FFT can be efficiently calculated by jointly

employing the Good-Thomas and Rader

algorithms. In the proposed FBLMS algorithm

using DA, it is required to calculate only half of

the conjugate symmetric coefficients, under thatcondition the hardware requirements for the

proposed system is approximately half of that of

existing one.

KeywordsFBLMS, signed DA, Unsigned DA,Good-Thomas, Rader algorithm.

I. INTRODUCTIONThe term estimator or filter is commonly

used to refer to a system that is designed to extractinformation about a prescribed quantity of interestfrom noisy data. At the core of DSP applications is

the digital filter. Digital filters are generally used for:Separation of signals that have been combined,Restoration of signals that have been distorted,

Transform operations. In fast block LMS algorithmthe filter parameters are adapted in the frequencydomain by using the FFT algorithm.

In this paper, a new DA based FBLMS is proposedto reduce area and power consumption. Distributedarithmetic (DA) is an efficient multiplication-freetechnique for calculating inner products. The

multiplication operation is replaced by a mechanismthat generates partial products and then sums theproducts together. The key difference betweendistributed arithmetic and standard multiplication isin the way the partial products are generated andadded together. In the proposed architecture, usingthe DA based FFT (and IFFT) block FFT can be

efficiently calculated by jointly employing the Good-Thomas and Rader algorithms. DA based FFT blockprovides only half of the conjugate symmetric outputs

without calculating others and in DA based IFFTblock we require to feed only half of the conjugatesymmetric coefficients not all, which is not possible

in existing FBLMS based adaptive filters. Since inthe proposed FBLMS algorithm based adaptive filter,

we require to calculate only half of the conjugatesymmetric coefficients, under that condition thehardware requirements for our proposed system isapproximately half of that of existing one.

II. DISTRIBUTED ARITHMETICDistributed arithmetic (DA) is first

introduced by Croisier, and Zohar and furtherdeveloped by Peled and Liu more than three decadesago. The DA is a direct method for sum of productsoperations, partial products can pre-compute by

difference equation and storing in look-up-table

(LUT) contained in memory, input signals are usedfor addressing.

Consider the following inner product of two N

dimensional vectors c and x, where c is a knownconstant vector, x is the input sample vector, and y is

the result.

= , = 1=0 (1)= 00 + 11+. . .+ 1 1 (2)

An unsigned DA system assumes that the variablex[n] is given by

= 2 1=0 0,1 (3)where xb[n] denotes the b

thbit ofx[n], i.e., the n

th

sample ofx. The inner product y can, therefore, berepresented as:

=

2

1

=0

1

=0(4)

in more compact form


2/5




1672 | P a g e

= 2 , (5)1=0

1=0

=

2

1

=0 (

,

)

1

=0(6)

In case of signed DA, we use modified equation inorder to process a signed twos complement number.We use, therefore, the following (B + 1)-bitrepresentation

= 2 + 2 (7)1=0 the outcomeyis defined by

= 2 , + 2 , 1

=01

=0 (8)

III. PROPOSED SYSTEM

S/Ps(n)

BUFFER

OF

LENGTH

N= 2M

DA-FFT

of length

N

DA-IFFT

of length

N

Save

last N/2

data

y(n)y(k)

Conjugate

Multiply by

DA-FFT of

length N

Append

zero block

Delete last

block

DA-IFFT of

length N

Delay

(k)

(k+1)

DA-FFT of

length N

Insert zero

block

P/S

S/Pd(n)

Desired

response

Old

s

New

s

y

Discarded

e(n)

SH(k)

e0

E(k)

Discarded

0

S (k)

Figure 1. Proposed DA based FBLMS Adaptive filter

FFT and IFFT are the blocks that include largenumber of multipliers. Introduction of DA eliminatesthe need of that multipliers by a mechanism thatgenerates partial products and then sums the productstogether and resulting system will have high areaefficiency.

IV. INTRODUCTION TO FFTUSING DA

Figure 2. FFT Blocks

Using DA, FFT can be efficiently calculated byjointly employing the Good-Thomas and Raderalgorithms.

A. Good-Thomas Index MappingGood-Thomas algorithm re-expresses the

discrete Fourier transform (DFT) of a size N = N1N2

as a two-dimensional N1N2DFT, but only for the casewhere Nland N2are relatively prime.The index mapping suggest by Good and Thomas forn is

= 21 + 12 0 1 1 10 2 2 1 (9)and as index mapping forkresults

= 22111 + 11122 0 1 1 10 2 2 1 (10)A = 12-point DFT can be computed accordingto following steps:1) Index transform of input sequence, according to

equation for n.

2) Computation of 2 DFTs of length 1 usingRader algorithm.

3) Computation of 1 DFTs of length 2 usingRader algorithm.

4) Index transform of input sequence, according toequation for k.

To find the 14 point FFT block, Fig. 3 shows aschematic of a block processing system.


3/5




1673 | P a g e

Figure 3. Mapping of 14 point FFT using Good-

Thomas Algorithm

From Fig. 3, we realize that first stage has 2 DFTs

each having 7-points and second stage has 7 DFTseach having of length 2. Here multiplication withtwiddle factors between the stages is not required.B.Rader Algorithm

It is easily seen from the definition of theDFT that the transform of a length N real sequence

x(n) has conjugate symmetry, i.e. X(N - K) = X*(K).This property facilitates to compute only half of thetransform, as the remaining half is redundant andneed not be calculated. Rader algorithm providesstraightforward way to compute only half of the

conjugate symmetric outputs without calculating theothers. Algorithm presented here first decomposes theone dimensional DFT into a multidimensional DFTusing the index map proposed by Good. Next, amethod which is based on the index permutationproposed by Rader is used to convert the short DFTs

into convolution. This method changes a prime lengthN DFT of real data into two convolutions of length

(N- 1)/2.

Now consider1 = 7 if the data are real we need tocalculate only half of the transform. Also, as Radershowed the zero frequency term must be calculatedseparately.

(0) = 1=0 (11)In matrixform, we write

123 = 1 2 3

2 4 6

3 6 2

4 5 61 3 5

5 1 4

1

2

3456+ 000 (12)

Replacing by(),123 =

1 2 3

2 3 13 1 2

3 2 11 3 22 1 3

12

3

456

+ 0

00 (13)

If real and imaginary parts of W matrix are separated,a simplification is possible. Consider first the real

part using notation in matrix that k stands fors(2/7). The real part becomes12

(3)

= 1 2 32 3 13 1 2

(1) + (6)(2) + (5)

(3) +

(4)

+ 00

(0)

(14)Using the notation of k for n(2/7) gives, for theimaginary part

12(3) = 1 2 32 3 13 1 2

(1) (6)(2) (5)(4) (3)(15)

These are cyclic convolution relation. Since in ourproblem we always convolve with the same

coefficients, arithmetic efficiency can be improved byprecalculating some of the intermediate results. Theseare stored in table in memory and simply addressed

as needed. Using distributed arithmetic this can beimplemented efficiently.

x 0

x 2

x 4

x 5

x(3)

x 1

x(13)

x 11

x(9)

x(7)

x 12

x 10

x(8)

x(6)

X(0)

X(7)

X(8)

X(1)

X(2)

X(9)

X(10)

X(3)

X(4)

X(11)

X(12)

X(5)

X(6)

X(13)

k1=0

k1=1

k1=2

k1=3

k1=4

k1=5

k1=6

n2

=1

n2

=0


4/5




1674 | P a g e

R1+

x(1)

x(6)

R2+

x(2)

x(5)

R3+

x(3)

x(4)

R4ROM 1

R1-

x(1)

x(6)

R2-

x(2)

x(5)

R3-

x(3)

x(4)

R4ROM 2

ADD_SUB

R5

R6

R7

+

+

+

x(0)

Xk(1)

Xk(2)

Xk(3)

ADD_SUB

R5

R6

R7

Xl(1)

Xl(2)

Xl(3)

+ X(0)

R0

x0x1x2x3x4x5x6

Figure 5. Architecture for FFT using DA.

V. EXPERIMENTAL RESULTSVerilog codes are written for both FFT using

DA and shift and add method and synthesized using

Xilinx 13.2 version. Family of device was Spartan 3E

and target device was XC3S500E. Fig. 6 shows thelogic utilization of both the architectures and Table 1shows Delay in ns. From these results, it is clear that

clear that our proposed architecture based adaptivefilter is faster than that used by shift and add methodand Area utilization using DA is very much less than

that by shift and add method. Thus the powerutilization is also less in case of FBLMS using DA

Table 1. Delay Comparison

Design Minimum Period in

ns

1616 DA Multiplier 35.2811616 SA Multiplier 36.296

Figure 6. Comparison of resource utilization in

FPGA

VI. CONCLUSIONIn this paper, we have proposed a new area

efficient FBLMS adaptive filter. FBLMS is the fastestand computationally efficient adaptive algorithm andFFT, the main computational block in FBLMS can be

efficiently calculated by DA. The concept of DAinvolves for implementation of FFT block without anyhardware multiplier using LUT and adders. Due to

reduced hardware complexity the proposed DA basedFBLMS adaptive filter is best suitable forimplementation of higher order filters in FPGA

efficiently with minimum area requirement, lowpower dissipation.

REFERENCES[1] S. Haykin, Adaptive Filter Theory, 4th ed., T.

Kailath, Ed. Pearson Education, 2008.

[2] U Meyer Baese, Digital Signal Processing Withfield Programmable Gate Arrays,3rd ed.

[3] Sudhanshu Baghel , Rafiahamed Shaik, FPGAImplementation of Fast Block LMS AdaptiveFilter Using Distributed Arithmetic for High

Throughput, p443-p447, 2011.

[4] Arman Chahardahcherik, Yousef S. Kavian, Otto

Strobel, and Ridha Rejeb, Implementing FFTAlgorithms on FPGA, IJCSNS InternationalJournal of Computer Science and NetworkSecurity, VOL.11 No.11, pp. 148 156,November 2011.

[5] S. A. White, "Applications of distributedarithmetic to digital signal processing: A tutorial

review,"IEEE ASSP Magazine, July 1989.[6] C. S. Burrus, "Index mappings for

multidimensional formulation of the DFT andconvolution," IEEE Transactions On Acoustics,

Speech, And Signal Processing, vol. 25, pp. 239-242, June 1977.

[7]

D.J. Allred, W. Huang, Y.Krishnan, H. Yoo, D.VAnderson, "An FPGA

4292

2576

7113

3402

2102

5639

0

1000

2000

3000

4000

5000

6000

7000

8000

Slices Slice flip flops LUTs

FFT using shift and add FFT using DA


5/5




1675 | P a g e

implementation for a high throughput adaptivefilter using distributedarithmetic." 12th Annual IEEE Symposium onField-Programmable

Custom Computing Machines, pp. 324 - 325,

2004.

[8] S. K. M. Gregory A. Clark and S. R. Parker,

"Block implementation of adaptive digitalfilters," IEEE Transactions on Circuits andSystems, vol. 28, pp. 584592, 1981.

[9] C. M. Rader, "Discrete fourier transforms whenthe number of data samples is prime," IEEEProceedings, vol. 56, pp. 1107-1108, June 1968.