824-2207-1-PB

International Journal of Research in Computer andCommunication Technology, Vol 3, Issue 9, September - 2014

ISSN (Online) 2278- 5841ISSN (Print) 2320- 5156

www.ijrcct.org Page 971

Survey on DLMS Adaptive Filter With Low DelayT.J.MILNA (M.E.,) #1, S.K MYTHILI (Ph.D) #2

ECE Department, SVS COLLEGE OF ENGINEERING, [email protected]

[email protected]

Abstract-In practical applications of the LMSadaptive transversal filtering algorithm, a delay in thecoefficient is updated. This paper discusses about thebehavior of the delayed LMS algorithm. This paperpresent an efficient architecture for the implementation ofa delayed least mean square adaptive filter .In order toachieve lower adaptation delay, a novel partial productgenerator is used .The convergence and steady statebehaviors of the adaptive filter are compare and analyze.

Keywords-Area Delay Product (ADP), EnergyDelay (EDP), systolic architecture, adaptive delay, steadystate behavior, convergence.

I. INTRODUCTION

Adaptive digital filters have been applied to avariety of important problems in recent years. Perhapsone of the most well known adaptive algorithms is theleast mean squares (LMS) algorithm, which updates theweights of a transversal filter using an approximatetechnique of steepest descent .Due to its simplicity, theLMS algorithm has received a great deal of attention,and has been successfully applied in a number of areasincluding channel equalization, noise and echocancellation and many others.

Least mean squares (LMS) algorithms are a classof adaptive filter used to mimic a desired filter byfinding the filter coefficients that relate to producing theleast mean squares of the error signal (differencebetween the desired signal and the actual signal). It is astochastic gradient descent method in which the filter isadapted based on the current time error.

The basic idea behind LMS filter is to update thefilter weights to converge to the optimum filter weight.The algorithm starts by assuming a small weights (zero

in most cases), and at each step, where the gradient ofthe mean square error, the weights are found andupdated. If the MSE-gradient is positive, the errorincreases positively, else the same weight is used forfurther iterations, which means we need to reduce theweights.

If the gradient is negative, weight need to beincreased .Hence, basic weight update equation duringthe nth iteration:= ( ) (1)where represents the mean-square error, is the stepsize, Wn is the weight vector. The negative signindicates that, need to change the weights in a directionopposite to that of the gradient slope The mean-squareerror which is a function of filter weights is a quadraticfunction which says that it has only one extreme,which minimizes the mean-square error, is the optimalweight. The LMS thus, approaches towards this optimalweight by ascending/descending down the mean-square-error verses filter weight curve.

II. OVERVIEW OF LITERATURESURVEY

The LEAST MEAN SQUARE (LMS) adaptive lteris the most popular and most widely used adaptivelter, not only because of its simplicity but also becauseof its satisfactory convergence performance [1] [2]. Thedirect-form LMS adaptive lter involves a long criticalpath due to an inner-product computation to obtain the




lter output. The critical path is required to be reducedby pipelined implementation when it exceeds thedesired sample period. Since the conventional LMSalgorithm does not support pipelined implementationbecause of its recursive nature, it is modied to a formcalled the Delayed Least Mean Square (DLMS)algorithm [3][5], which allows pipelinedimplementation of the lter.

A lot of work has been done to implement theDLMS algorithm in systolic architectures to increasethe maximum usable frequency [3] [6] [7] but, theyinvolve an adaptation delay of ~ N cycles for lterlength N, which is quite high for large order lters.Since the convergence performance degradesconsiderably for a large adaptation delay, Visvanathanet al.[8] have proposed a modied systolic architectureto reduce the adaptation delay. A transpose-form LMSadaptive lter is suggested in [9], where the lter outputat any instant depends on the delayed versions ofweights and the number of delays in weights variesfrom 1 to N. Van and Feng [10] have proposed asystolic architecture, where they have used relativelylarge processing elements (PEs) for achieving a loweradaptation delay with the critical path of one MACoperation. Ting et al.[11] have proposed a ne-grainedpipelined design to limit the critical path to themaximum of one addition time, which supports highsampling frequency, but involves a lot of area overheadfor pipelining and higher power consumption than in[10], due to its large number of pipeline latches. Furthereffort has been made by Meher and Maheshwari [12] toreduce the number of adaptation delays. Meher andPark have proposed a 2-bit multiplication cell, and usedthat with an efcient adder tree for pipelined inner-product computation to minimize the critical path andsilicon area without increasing the number ofadaptation delays [13][14].

The existing work on the DLMS adaptive lterdoes not discuss the xed-point implementation issues,e.g., location of radix point, choice of word length, andquantization at various stages of computation, eventhough they directly affect the performance of theconvergence, particularly due to the recursive behaviorof the LMS algorithm. we present here theoptimization of our previously reported design [13],[14] to reduce the number of pipeline delays with thearea, sampling period, and energy consumption. Theproposed design is found to be more efcient in termsof the power-delay product (PDP) and energy-delayproduct (EDP) compared to the existing structure.

III. DLMS ADAPTIVE FILTER MODULARPIPELINED IMPLEMENTATION

A modular pipelined filter architecture [3] basedon a time-shifted version of the DLMS algorithm isdiscuses here. This pipelined architecture displays themost desirable features of both lattice and transversalform adaptive filters. As in an adaptive lattice filter, thecomputations are structured to be order recursive,resulting in a highly pipelined implementation. Also,the weights are updated locally within each stage.However, the equations being implemented actuallycorrespond to a true transversal adaptive filter, andhence desirable properties of this structure arepreserved. The modular pipeline consists of a lineararray of identical processing elements (PES) which arelinked together using both local and feedbackconnections. Each PE performs all the computationsassociated with a single coefficient of the filter.

A significant advantage of the modular structureof the pipelined DLMS filter is that, unlikeconventional transversal filters, the order of the filtercan be increased by simply adding more PE modules tothe end of the pipeline. The performance of the systemis computed using speed up over single processorsystem. The other advantages of this structure are Highthroughput. Useful for real time applications and itseasily expandable.

IV. VIRTEX FPGA IMPLEMENTATIONOF PIPELINED ADAPTIVE LMS

PREDICTOR

FPGAs provide a good combination of high-speed implementation features with the flexibility of aCOTS platform [11]. FPGAs have grown over the pastdecade to the point where there is no wan assortment ofadaptive algorithms which can be implemented on asingle FPGA device. However, the directimplementation of an adaptive filter on an FPGA oftenproves to be slow due to the error feedback signal in therecursive structure. Typically, the system throughputrate of many DSP algorithms can be improved byexploiting concurrency in the form of parallelism andpipelining. Since virtex is used very limited area isincreased. The pipelined architecture shows betterinterconnect delays.

V. SYSTOLIC ARCHITECTURE OF DLMSADAPTIVE FILTER AND APPLICATION




Fig 1.Overall architecture of systolic array

LMS adaptive algorithm reduces approximatelythe mean-square error by recursively altering the weightvector at each sampling instance.

= (2)= - (3)= + (4)

Where is the desired signal and is the outputsignal. The step-size is used for adaptation of theweight vector, and is the feedback error. The work isfocused on reducing delay and critical path at the sametime satisfying the requirements of a systolic array. It isknown that the tree method enhances the performanceof adaptive FIR digital systems. However, the treestructure lacks driving-consideration, modularity, andlocal- connection[10]. While the number of tree levelsincreases, the critical period would be sacrificed sincethe pipeline is not sufficiently full. Here, the treeconcept is applied to devise a new generalized tree-systolic processing element. The design parametersinvolve the desired critical period, operating voltage,aspect ratio, and logic style. Let the maximum numberof tap-connections of the feedback error signal be justlarger than or equal to the value to achieve a highdegree of reliability and convenient processing. Theprocessing element operates at high throughput andwith local connection unlike the earlier structures. Thetwo convergence parameters delay (D) and requirednumber of different kinds of PEs ( Np), affects thesystem more than the number of kinds of PEs. Henceoptimum value of delay and Np has to be chosen. Herethe main aim is to reduce the value of d hence the valueof p is chosen such that it gives minimum delay. if there

are more than one value of p that gives same delay, thenthe value of Np is taken into account. The value of pwith low Np is considered. The system proposed isimplemented in two various applications namely, thesystem identification and adaptive equalization.

Fig 2.Block diagram of system identification

Fig 3.Block diagram of adaptive equalization

The advantage of this method is that itreduces delay and critical path with finite driving,local connections, and satisfactory convergence atno extra area cost.

VI. LOW ADAPTATION DELAY LMSADAPTIVE FILTER

To implement the LMS algorithm, duringeach sampling period of training phase, one has tocompute a filter output and an error value whichequals to the difference between the current filteroutput and the desired response. The estimatederror is used to update the filter weights in everycycle. In case of pipelined designs, the feedback-error e(n) corresponding to the nth iteration is notavailable for updating the filter weights in the same




iteration. It becomes available after certain numberof cycles, called the adaptation delays. The DLMSalgorithm therefore uses the delayed error. In thismethod the critical path is reduced by using apipelined register structure.

Fig 4. Modified multiplier unit

A new multiplier block is utilized in theerror computation multiplication like aconventional MAC unit. It consists of L/2AND/OR cells (AOC), where l represents thelength of input bits and 2-to-3 decoders. Thismodified structure is shown in figure 5.

The area delay product and energy delayproduct of the proposed multiplier cell areconsiderably less compared to other architectures.

VII. SUMMARY OF EXPERIMENTALRESULTS

This table1 shown the synthesis result of theproposed existing design in terms of area, leakagepower, energy per sample(EPS) and ADP obtained forfilter length N=8,16,and 32.The design explainedin[18]could reduce the area by using a PPG based oncommon sub expression sharing compared to theremaining design. As shown in Table the reduction inarea is more significant in the case of N=32 since moresharing can be obtained in the of large order filter. The

design[18] could achieve less area and more powerreduction compared with [11] by removing redundantpipeline latches , which are not required to maintain acritical path of one addition time .It is found that theproposed design involves 17% less ADP and 14% lessEDP than the best previous work of (10),on average forfilter length N=8,16 and 32.The proposed design wasalso implemented on the field table programmable gatearray(FPGA) platform of Xilinx devices.

TABLE.I. PERFORMANCE OF DLMSADAPTIVE FILTER

VIII. CONCLUSIONThis paper presents a survey of the

existing adaptive filter implementation with lowadaptation delay. This survey briefly describes theprinciples behind the adaptive filter in order tounderstand different implementation styles in abetter way and their structures. From thecomparison of these techniques it is concluded thatthe partial product generator (PPG) architecture isthe best for implementation of low power adaptivefilter. This architecture used a novel PPG forefcient implementation of general multiplicationsand inner-product computation by common subexpression sharing. Besides, this implementationuses an efcient addition scheme for inner -productcomputation to reduce the adaptation delay therebyachieving fast convergence performance. Further itreduces the critical path to support high input-sampling rates.




REFERENCES[1]. B. Widrow and S. D. Stearns, Adaptive SignalProcessing. Englewood Cliffs, NJ, USA: Prentice-Hall,1985.[2]. S. Haykin and B. Widrow, Least-Mean-SquareAdaptive Filters. Hoboken, NJ, USA: Wiley, 2003.[3]. M. D. Meyer and D. P. Agrawal, A modularpipelined implementation of a delayed LMS transversaladaptive lter, in Proc. IEEE Int. Symp. Circuits Syst., May1990, pp. 19431946.[4]. G. Long, F. Ling, and J. G. Proakis, The LMSalgorithm with delayed coefcient adaptation, IEEE Trans.Acoust., Speech, Signal Process., vol. 37, no. 9, pp. 13971405,Sep. 1989.[5]. G. Long, F. Ling, and J. G. Proakis, Corrections toThe LMS algorithm with delayed coefcient adaptation,IEEE Trans. Signal Process., vol. 40, no. 1, pp. 230232, Jan.1992.[6]. H. Herzberg and R. Haimi-Cohen, A systolic arrayrealization of an LMS adaptive lter and the effects of delayedadaptation, IEEE Trans. Signal Process., vol. 40, no. 11, pp.27992803, Nov. 1992.[7]. M. D. Meyer and D. P. Agrawal, A high samplingrate delayed LMS lter architecture, IEEE Trans. CircuitsSyst. II, Analog Digital Signal Process., vol. 40, no. 11, pp.727729, Nov. 1993.[8]. S. Ramanathan and V. Visvanathan, A systolicarchitecture for LMS adaptive ltering with minimal adaptationdelay, in Proc.Int. Conf. Very Large Scale Integr. (VLSI)Design, Jan. 1996, pp. 286289.[9]. Y. Yi, R. Woods, L.-K. Ting, and C. F. N. Cowan,High speed FPGA-based implementations of delayed-LMSlters, J. Very Large Scale Integr. (VLSI) Signal Process. vol.39, nos. 12, pp. 113131, Jan. 2005.[10]. L. D. Van and W. S. Feng, An efcient systolicarchitecture for the DLMS adaptive lter and its applications,IEEE Trans. CircuitsSyst. II, Analog Digital Signal Process., vol. 48, no. 4, pp. 359366, Apr. 2001.[11]. L.-K. Ting, R. Woods, and C. F. N. Cowan, VirtexFPGA implementation of a pipelined adaptive LMS predictorfor electronic support measures receivers, IEEE Trans. VeryLarge Scale Integr. (VLSI) Syst., vol. 13, no. 1, pp. 8699, Jan.2005.[12]. P. K. Meher and M. Maheshwari, A high-speedFIR adaptive lter architecture using a modied delayed LMSalgorithm, in Proc. IEEE Int. Symp. Circuits Syst., May 2011,pp. 121124.[13]. P. K. Meher and S. Y. Park, Low adaptation-delayLMS adaptive lter part-I: Introducing a novel multiplicationcell, in Proc. IEEE Int.Midwest Symp. Circuits Syst., Aug. 2011, pp. 14.[14]. P. K. Meher and S. Y. Park, Low adaptation-delayLMS adaptive lter part -II: An optimized architecture, in Proc.IEEE Int. Midwest Symp. Circuits Syst., Aug. 2011, pp. 14.[15]. K. K. Parhi, VLSI Digital Signal Procesing Systems:Design and Implementation. New York, USA: Wiley, 1999.[16]. C. Caraiscos and B. Liu, A roundoff error analysisof the LMS adaptive algorithm, IEEE Trans. Acoust., Speech,Signal Process., vol. 32, no. 1, pp. 3441, Feb. 1984.[17]. R. Rocher, D. Menard, O. Sentieys, and P. Scalart,Accuracy evaluation of xed-point LMS algorithm, in Proc.IEEE Int. Conf. Acoust., Speech, Signal Process., May 2004,pp. 237240.

[18]. Pramod kumar Mcher,Sang yoon parkArea delaypower efficient fixed point LMS adaptive filter with lowadaptation delayIEEE trans on Very Large Scale Integrationsystem ,vol.22,No.2 feb.2014

824-2207-1-PB

Documents

Transcript of 824-2207-1-PB