Frequency Estimation

Improvisation of Two Signal Processing Algorithms withrespect to Accuracy and Throughput

B. Tech Project Presentation

Anit Kumar Sahu08EC3401

Guide:Prof. Mrityunjoy ChakrabortyIndian Institute of Technology,Kharagpur

4th May,2012

Anit Kumar Sahu 08EC3401 Improvisation of Two Signal Processing Algorithms with respect to Accuracy and Throughput

Abstract

Frequency Estimation of a complex exponential is a problem relevant to a largenumber of fields. In this work a computationally efficient and accuratefrequency estimator is presented which approaches Jacobsens estimator andCandans estimator for large N with an extra correction term multiplied to it forthe stabilization of the sliding DFT. Simulation results show that theperformance of the proposed estimator were found to be better than Jacobsensestimator and Candans estimator.Distributed Arithmetic is a widely used technique for multiplier lessarchitectures.The basic challenge lies in reducing the number of operations forthis LUT based architecture.For adaptive filters where the update term needs alot of operations makes the need of using some efficient techniques for reducingthe number of operations for the updation of the LUT.For a LMS volterra filterthe input vector consists of the indiviual samples as well as the quadraticcoefficients of the samples.A technique involving the splitting of LUTs formaking the update simplisitic has been used in this work.


Running Fast Fourier Transform

The running FFT or the sliding FFT computes the DFT of a running sequencein an iterative way.Instead of using a block of data parallelly like conventionaloine approach of computing FFT, it updates the previously computed value(using a feedback mechanism) with only one new input sample. The order of Ncomplexity running FFT can be best explained by referring to the FFTstructure as shown in the figure


Recursive Sliding DFT

An efficient technique for computing sparse DFT results is a sliding DFTprocess [6] whose spectral bin output rate is equal to the input data rate, on asample-by-sample basis.In applications where a new DFT output spectrum isdesired every sample, or every few samples, the sliding DFT is computationallysimpler than the traditional radix-2 FFT.


Work Done till Last Semester(Contd.)

The principle used for the SDFT is known as the DFT shifting theorem or thecircular shift property. It states that if the DFT of a windowed (finite-length)time-domain sequence is X(k), then the DFT of that sequence, circularlyshifted by one sample, is X(k)ej2kpi/N . Thus the spectral components of ashifted time sequence are the original (unshifted) spectral componentsmultiplied by ej2kpi/N , where k is the DFT bin of interest. This process isexpressed as follows:-

Sk(n) = Sk(n 1)ej2kpi/N x(nN) + x(n) (1)where Sk(n) is the new spectral component and Sk(n 1) is the previousspectral component and x(n) is the incoming sample and x(nN) is theoutgoing sample. The single-bin SDFT algorithm is implemented as an IIRfilter with a comb filter followed by a complex resonator.The output will not bevalid, or equivalent to X(k), the k-th DFT bin until N input samples havebeen processed.


SDFT stability and gs-SDFT

The z-domain transfer function for the k-th bin of the sliding DFT filter is

HSDFT (z) =1 zN

1 ej2kpi/Nz1 (2)

This complex filter has N zeros equally spaced around the z-domains unitcircle, due to the N -delay comb filter, as well as a single pole canceling thezero at z=ej2kpi/N . Filter instability can be a problem, however, if numericalcoefficient rounding causes the filters pole to move outside the unit circle. Wecan use a damping factor r to force the pole to be at a radius of r inside theunit circle and guarantee stability using a transfer function of

HSDFT,gs(z) =1 rNzN

1 rej2kpi/Nz1 (3)

with the subscript gs meaning guaranteed-stable.


Guaranteed Stable Sliding DFT

Using a damping factor guarantees stability, but the Sk,gs(n) output, defined by

S(k) =

N1n=0

x[n]rnej2knpi/N (4)

is no longer exactly equal to the k-th bin of the actual N -point DFT.


Last Semester Work:Proposed Estimator

When estimating the frequency of a tone, the idea is to estimate the frequencyof the spectral peak kp+ based on three DFT samples:kp , kp 1 and kp + 1.

A single complex sinusoid with white gaussian noise can be represented in theform

r[n] = Aejwn + w[n] (5)

where A and are unknown variables which represent the amplitude and

frequency of the complex sinusoid respectively where =2pi(kp+)

Nand kp is

the index of the peak of the sliding DFT. is to be estimated from the threesamples around the peak of the sliding DFT where || < 1/2


Last Semester Work:Proposed Estimator Contd.

Let the indices for the peak be kp and that of its immediate neighbours bekp 1 and kp + 1 respectively. The sliding DFT bin where the peak occurs andits immediate neighbours can be represented as follows:-

R[kp] = A

N1n=0

rnej2piNn + w[kp] = Af() + w[kp] (6)

R[kp 1] = AN1n=0

rnej2piN

(+1)n + w[kp 1] = Af( + 1) + w[kp 1] (7)

R[kp + 1] = A

N1n=0

rnej2piN

(1)n + w[kp + 1] = Af( 1) + w[kp + 1] (8)

where w[k] is the DFT of w[n] which also is white and f() =N1n=0

rnej2piNn

The aim being to estimate the value of from these three samplesR[kp],R[kp 1] and R[kp + 1] so that = 2pi/N(kp + ) becomes the finefrequency estimate. The two stage process consists of finding kp in the firststage and in the second stage.


Last Semester Work:Proposed Estimator Contd.(2)

To determine from the set of three equations 6,7 and 8 the GeometricProgression sums of each of the DFT bin is considered and solved for usingthe approximation that the second and higher powers of are negligible as

compared to . f() =N1n=0

rnej2piNn = 1r

N

1rej2piN

the first difference

f( + 1) f( 1) can be written as

f( + 1) f( 1) = 2jr sin(2pi/N)ej2pi/N (1 rN )

1 + r2ej4pi/N 2rej2pi/N cos(2pi/N) (9)

the second difference f( + 1) 2f() + f( 1) can be written as

f(+1)2f()+f(1) = (1 rN )(rej

2piN (ej

piN ej piN )2 + r2ej 4piN (ej piN ej piN )2)

(1 + r2ej4pi/N 2rej2pi/N cos(2pi/N))(1 rej2pi/N )(10)


Last Semester Work:Fine Estimate of delta from the proposed estimator

For large N and 2

Last Semester Work:Adaptation of Quinns Estimator

Hence the final adaptation comes out to

1 = 1 +Real[1

Rk+1Rk 1

] (14)

2 = 1Real[ 1Rk1Rk 1

] (15)

If 1 and 2 are both greater than zero then =2 or else =1.It can be seen that the result derived in the equations 14 and 15 that it isindependent of the factor r which is indeed amusing,which also suggests thatdue to varied approximations for which the results are independent of r wonthold good beyond a certain value as the frequency spectrum values get moreand more corrupted.Hence the suggested value for r is greater than 0.85.


Last Semester Work:Multi tone Frequency Detection

In the entire literature all the frequency estimators based on interpolation ofthe DFT coefficients a single tone frequency estimator is catered to.For a givenfrequency sample resolution the frequencies of the tones can be estimated usingthe proposed estimator to a certain degree of accuracy given that thefrequencies are seperated to such an extent that the distinct peaks due to thefrequencies are visible. So as to estimate frequencies seperated with a rangeless than the sample resolution a DFT of greater resolution is required.The accuracy of the estimator in the multi tone case is limited due to that allthe individual peaks for each individual peak has residual components from theother frequencies.Hence each correction factor for each frequency is a functionof all other frequencies which is unrealisable as a closed form expression. Sometechniques such as the zooming window DFT exist but the computationalcomplexity in that case increases considerably.


Last Semester Work:Simulation and Results-I

The simulation is done in MATLAB.A sinusoidal signal is taken whosefrequency is varied from 30.1 MHz to 30.9 MHz in steps of 0.1 MHz. 128samples of the signal are taken where the sampling frequency is 128 MHz.A128 point sliding DFT is taken for the proposed estimator while a 128 pointtraditional DFT is taken for the other estimators.The noise taken is whitegaussian noise. In this section a numerical comparison is presented between theproposed estimator and the other estimators namely, Candans estimator[5] andJacobsens estimator[4].


Last Semester Work:Simulation and Results-II


Last Semester Work: Simulation and Results-III


Last Semester Work:Simulation and Results-IV


Last Semester Work:Simulation and Results for multitone frequencyestimation

For signal having set of frequenciesseperated by 10 MHz.

For signal having set of frequenciesseperated by 5 MHz.


Analysis of Results

The SNR values taken for the above simulations range from 2 dB to 3dB. Athigher SNR values the estimator bias and the variance values decrease furtherand all the three estimators behave nearly the same with the proposedestimator giving even lower bias and variance values. It should be noted thatwith the damping factor r approaching 1 with higher values of N the proposedestimator have nearly the same performance.The value of the damping factorhas been kept around 0.9 for guaranteed stability and better performance.It isintutively satisfying that the bias correction factor in Candans estimator is alsoa part of the derived expression for the proposed estimator.The fewer numberof operations involved for the proposed estimator compared to that of otherestimators and the lower bias and variance values makes the proposedestimator very useful for radar signal processing.


Distributed Arithmetic FIR filter

A discrete time linear finite impulse response filter generates the output y[n] as:-

y[n] =

K1i=0

wix[n i] (16)

A typical digital implementation will require K MAC operations. A singleprocessing unit digital signal processor will complete this operation in O(K)clock cycles. Thus, the system clock has to operate at least K times fasterthan the rate at which the signal is sampled


Distributed Arithmetic FIR filter contd.

MAC operations in a filter may be replaced by a series of look-up table (LUT)accesses and summations. This may be done by implementing the filteringoperation in a bit-serial fashion. The signal samples can be represented asB-bit twos complement form in :-

x[n i] = bi0 +B1l=1

bil2l (17)

where bil is the lth bit in the twos complement representation ofx[n i].Substituting 17 in equation 16 we get:-

y[n] =

K1i=0

bi0wk +

B1l=1

[

K1i=0

bilwl]2l (18)

For a given set of wi the term in the square bracket can take only one of 2K

values which may be stored in a LUT called DA-F-LUT. The entry in theDA-F-LUT addressed as r can be written as :-

DA F LUT(r) =N1i=0

c(r)i wi (19)


DA FIR Block Diagram


DA Adaptive Filter

A widely used adaptive algorithm is the LMS adaptive filter where instead ofthe expectation of the error function an instantaneous estimate of e[n]2 isconsidered.For each sample x[n i] the weights wi are updated as :-

wi(n+ 1) = wi(n) + e[n]x[n i] (20)where wi(n) is the value of the filter weight wi at the nth iteration and isthe step size.A typical implementation of the LMS adaptive filter on hardware with a singleMultiply and Accumulate Unit will require K operations for filtering andfurther K operations for weight updation.Thus each filtering and adaptation iscompleted in O(k) clock cycles.For real time systems the system clock needs tobe much faster than the digital sampling rate.


DA LMS Filter

For implementing the Distributed Arithmetic based LMS Adaptive Filter theDA-F-LUT needs to be re-calculated after every iteration which will be highlytime consuming and resource consuming. A novel architecture proposed byAnderson et.al. makes use of the redundancies in the LUT entries and thusreduces the number of operations considerably. The block diagram for theabove mentioned architecture is as follows:-


DA LMS Filter:Contd.

The DA based Adaptive filter proposed by [8] directly applies the weightadaptation to the contents of the DA-F-LUT on sample by sample basis.Therth entry in the DA-F-LUT is given by as in equation 19.If each term isupdated according to the LMS algorithm the updation would be as follows:-

N1i=0

c(r)i wi[n+ 1] =

N1i=0

c(r)i wi[n] + e(n)

N1i=0

c(r)i x[n i] (21)

which can be written as

DAF LUT(r)(n+1) = DAF LUT(r)(n)+e(n)DAALUT(r)(n)(22)


DA LMS Filter:DA-A-LUT Update

It may also be observed that the contents of the odd addressed locations(locations whose addresses have a 1 in the LSB) of the DA-A-LUT[n] can beobtained from the even addressed locations of the DA-A-LUT[n] according to

DAA LUT(2l+1)[n] = DAA LUT(2l)[n] + x[n] (23)


DA LMS Filter:DA-F-LUT Update

DAF LUT(r)(n+1) = DAF LUT(r)(n)+e(n)DAALUT(r)(n)(24)

Once the update of the DA-A-LUT[n] as well as the filtering operation aredone, the update of the DA-F-LUT[n+ 1] is performed. The DA-F-LUT[n+ 1]is updated by reading the same memory location in both the DA-F-LUT[n] andDA-A-LUT[n] , multiplying the output of DA-A-LUT[n] by e[n] , adding thisquantity to the output of the DA-F-LUT[n] , and finally storing the result backin the same memory location of the DA-F-LUT


LMS Volterra Filter

The filtering operation for a p-th order LMS Volterra filter can be defined as

y(n) = w0 +

N1m1=0

w1(m1)x(nm1)+

N1m1=0

N1m2=0

w2(m1,m2)x(nm1)x(nm2) + .....

....+

N1m1=0

N1m2=0

.......

N1mp=0

wp(m1,m2, ..

...mp)x(nm1)x(nm2).....x(nmp). (25)Assuming h0 = 0 and p=2 the weight vector for the adaptive filter at the n-thindex is given by,

~W (n) = {w1(0;n), w1(1;n), ..., w1(N 1;n), w2(0, 0;n), w2(0, 1;n), ...,

w2(0, N 1;n), w2(1, 1;n), .....w2(N 1, N 1;n)}T (26)Anit Kumar Sahu 08EC3401 Improvisation of Two Signal Processing Algorithms with respect to Accuracy and Throughput

LMS Volterra Filter Contd.

Similarly, the input vector at the n-th index is given as,

~X(n) = {x(n), x(n 1), ..., x(nN + 1), x2(n), x(n)x(n 1), ..., x(n)x(nN + 1)

x2(n 1), ..., x2(nN + 1)}T . (27)The linear and quadratic coefficients are updated seperately according to thefollowing equations,

w1(m1;n+ 1) = w1(m1;n) + e(n)x(nm1) (28)and,

w2(m1,m2;n+ 1) = w2(m1,m2;n) + e(n)x(nm1)x(nm2) (29)where is the so-called step-size, used to control the speed of convergence andensure stability of the filter.


Input Vector Generator

As the incoming data samples are not in the form as that needed in the LMSVolterra filter the input vector has to be generated using an array multiplier likestructure where the incoming data sample x[n] gets multiplied by itself andeach of the existing samples.


Input Vector Generator contd.

In order to avoid multipliers for the input vector generator a DA based schemeis used.


DA LMS Volterra Block Diagram

There would be basically two kind of LUTs DA-F-LUT and DA-A-LUT wherethe DA-F-LUT is responsible for the filtering operation and the DA-A-LUT isresponsible for the updating operations. There would be N +1 DA-F-LUTs forthe architecture and N + 1 DA-A-LUTs for each corresponding DA-F-LUT.


DA LMS Voletrra DA-F-LUT and DA-A-LUT

The content of the first DA-F-LUT at the n-th time index is as follows,

DA F LUT(r) =N1i=0

c(r)i w1(i;n) (30)

and the contents of the first DA-A-LUT at the n-th time index is as follows,

DAA LUT(r) =N1i=0

c(r)i x(n i) (31)

The contents of the k-th DA-F-LUT would be at the n-th time index would beas follows,

DA F LUT k(r) =Nk1i=0

c(r)i w2(i, i+ k;n) (32)

The contents of the k-th DA-A-LUT would be at the n-th time index would beas follows,

DAA LUT k(r) =Nk1i=0

c(r)i x(n i)x(n k i) (33)

where r = 0, ..., 2p 1 and k = 0, ...., N 1 and c(r)i is the ith bit in the p bitrepresentation of the address r where for the k-th LUTs p has a value of N -k

r =

N1i=0

c(r)i 2

i (34)Anit Kumar Sahu 08EC3401 Improvisation of Two Signal Processing Algorithms with respect to Accuracy and Throughput

DA-F-LUT Block Diagram


Filtering

The first DA-F-LUT would need B clock cycles to do the filtering. For theother N DA-F-LUTs the length of each component of the input vector is of thelength 2B 1 bits. Hence the other N DA-F-LUTs would need 2B 1 clockcycles to do the filtering.


Filtering Contd

As the output from each DA-F-LUT contributes to y(n) they are added up atthe end of 2B 1 clock cycles to give y(n) and hence e(n) which is equal to

e(n) = d(n) y(n) (35)To avoid another multiplication e(n) is quantized to a power of 2 i.e 2p andhence when it is multiplied with the components of the input vector eachcomponent is just shifted by p bits.


DA-A-LUT Update

DAA LUT k(2l+1)(n) = DAA LUT k(2l)(n) + x(n)x(n k) (36)


Flowchart


Throughput

The number of clock cycles taken by the input vector generator for generatingthe new terms of the vector is B where B is the number of bits in each of thesample. the total number of cycles for the DA-A-LUT bank update and thefiltering of DA-F-LUT bank takes max(2N , 2B 1) cycles. The updatedDA-A-LUT bank is then used to update the DA-F-LUT bank which takes 2N+1

cycles.

Throughput =clockrate

B +max(2N , 2B 1) + 2N+1 (37)


Number of Logic Elements

For simulating the DA LMS Volterra filter a filter was taken which works onthree input samples x(n 2),x(n 1) and x(n) to produce the input vector

Number of slices 57 outof 6144Number of slice flip-flops 74 outof 12288Number of 4 input LUTs 93 outof 12288Number of bonded IOBs 52 outof 240Number of IOs 52

Table: Resource Utilization by the Input Vector Generator


Table: Resource Utilization by the basic units of DA-A-LUT bank and DA-F-LUT bank


Number of Logic Elements Contd.


Table: Resource Utilization by the entire DA LMS Volterra Filter


Power Summary Report

On Chip Power(W) Used/Available Utilization (Percent)Clocks 0.006 1/ Logic 0.000 91/12288 1Signals 0.000 122/ IOs 0.000 52/240 22Leakage 0.161 /

Supply Power(W) Total Dynamic QueiscentSupply Power 0.168 0.006 0.161

Table: Detailed Power summary of the block Input Vector Generator

On Chip Power(mW) Used/Available Utilization (Percent)Clocks 7.06 1/ Logic 0.000 119/12288 1Signals 0.000 257/ IOs 0.000 49/240 20Leakage 161.48 /

Supply Power(mW) Total Dynamic QueiscentSupply Power 168.53 7.06 161.48

Table: Detailed Power summary of basic unit of DA-A-LUT bank and DA-A-LUT bank


Power Summary Report Contd.

On Chip Power(mW) Used/Available Utilization (Percent)Clocks 22.37 1/ Logic 0.000 520/12288 4Signals 0.000 840/ IOs 0.000 106/240 44Leakage 161.77 /

Supply Power(mW) Total Dynamic QueiscentSupply Power 184.14 22.37 161.77

Table: Detailed Power summary of the DA LMS Volterra Filter


Timing Summary Report

Minimum period 2.930ns (Maximum Frequency: 341.314MHz)Minimum input arrival time before clock 2.634nsMaximum output required time after clock 3.806nsMaximum combinational path delay No path found

Table: Detailed timing summary of the block Input Vector Generator


Table: Detailed timing summary of the basic unit of DA-A-LUT bank and DA-A-LUTbank


Table: Detailed Timing Summary of the entire DA LMS Volterra Filter


Conclusion

A new estimator is proposed which requires very few number of operations peroutput sample.The estimator has a correction term for bias.The goodperformance of the estimator is justified in this work.The proposed estimatorhas low bias and variance values which makes it a really valuable tool in thefield of radar signal processing. Simulation results showing superiority of theproposed estimator is provided. A new Distributed Arithmetic based LMSVolterra Filter is proposed which reduces the number of operations considerablyby exploiting the redundancies in the LUTs during subsequent iterations.Thethroughput value of the proposed DA LMS Volterra Filter is found to be highfrom the simulation results after simulating the filter in Xilinx 12.4 ISE.


Future Work

The present work revolves around accurately estimating a single tone frequencyor a multi tone frequencies where the tones have large seperations.A potentialfuture work is the extension of present work to accurately estimating multi tonefrequencies which are closely seperated without increasing the number ofcomputations.For the LMS Volterra Filter the offset binary coding has not been consideredfor the current implementation. On using Offset Binary Coding though the sizeof each LUT reduces my half , the update operation becomes more complexwhere an intermediate sum needs to be calculated instead of the simplisticupdate employed here. The latest work of Anderson et.al. employs a techniquecalled Sliding Block Distributed Arithmetic which reduces the number ofadditions for the filter can also be employed here.The filter is output centric asthe correct output is obtained but the optimal filter coefficients cant beretrieved for the same.A potential future work could be to employ a mechanismto extract the optimal filter coefficients as well.


References

A. Oppenheim, R. Schafer, and J. Buck, Discrete-Time SignalProcessing-Principlesand Algorithms, 3rd ed. Upper Saddle River, NJ: PrenticeHall, 1996, pp. 480-481

B. Farhang-Boroujeny, Y. C. Lim, A Comment on the ComputationalComplexity of Sliding FFT, IEEE Transactions on Circuits and Systems - II :Analog and Digital Signal Processing vol.-39, No. 12, pp.875-876, December1992.

B.G.Quinn,Estimating Frequency by interpolation using Fourier coefficients,IEEE Trans.Signal Process.,vol. 42,no. 5,pp. 1264-1268,May 1994

E.Jacobsen and P.Kootsookos,Fast Accurate Frequency Estimators, IEEETrans.Signal Process.,vol. 24,pp. 123-125,May 2007

C.Candan,A Method for Fine Resolution Frequency Estimation from Three DFTSamples,IEEE Signal Processing Letters.,vol. 18,pp. 351-354,April 2011

E.Jacobsen and R.Lyons,The sliding DFT, IEEE Signal ProcessingMagazine.,vol. 20,no. 2,pp. 74-80,March 2003

B.G.Quinn,Estimation of frequency,amplitude and phase from the DFT of atime series, IEEE Trans.Signal Process.,vol. 45,no. 3,pp. 814-817,March 1997


References..contd

D. J. Allred, H. Yoo, V. Krishnan, W. Huang, and D. V. Anderson, LMSadaptive filters using distributed arithmetic for high throughput, IEEE Trans. Circuits Syst. I , Reg. Papers , vol. 52, no. 7, pp. 13271337, Jul. 2005.

D. J. Allred, H. Yoo, V. Krishnan, W. Huang, and D. V. Anderson, A novel highperformance distributed arithmetic adaptive filter implemen-tation on an FPGA,in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2004, vol. 5, pp.V-161V-164

S. A. White, Applications of distributed arithmetic to digital signal processing: Atutorial review, IEEE ASSP Mag., vol. 6, no. 3, pp. 419, Jul. 1989.

C. F. N. Cowan, S. G. Smith, and J. H. Elliott, A digital adaptive filter using amemory-accumulator architecture: Theory and realization, IEEE Trans. Acoust.,Speech, Signal Process., vol. ASSP-31, no. 3, pp. 541 549, Jun. 1983.

W. Huang and D. V. Anderson, Modified sliding-block distributed arith-meticwith offset binary coding for adaptive filters, J. Signal Process. Syst , vol. 63,no. 1, pp. 153163, Apr. 13, 2010.

V. J. Mathews, Adaptive polynomial filters, IEEE Signal Processing Mag., vol. 8,pp. 1026, July 1991.

G. L. Sicuranza and G. Ramponi, Adaptive nonlinear digital filters usingdistributed arithmetics, IEEE Trans. Acoust., Speech, Signal Processing, vol.ASSP-34, no. 3, pp. 518-526, June 1986.

Rui Guo and L.S.DeBrunner,Two High-Performance Adaptive FilterImplementation Schemes Using Distributed Arithmetic,IEEE Transactions onCircuits and Systems II: Express Briefs, vol. 58,no. 9,pp. 600-604, Sept 2011


AbstractWork Done till Last SemesterRecursive Sliding DFTImplementationStabilityLast Semester Work:Proposed EstimatorLast Semester Work:Adaptation of Quinn's EstimatorLast Semester Work:Multi tone Frequency DetectionDistributed ArithmeticDA LMSLMS Voletrra FilterDA LMS Volterra FilterFiltering in DA-F-LUTUpdatingAlgorithmResults and Analysis

Frequency Estimation

Documents

Transcript of Frequency Estimation