Enhanced Adaptive Volterra Filtering by Automatic Attenuation of Memory Regions and Its Application...

6
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013 2745 Enhanced Adaptive Volterra Filtering by Automatic Attenuation of Memory Regions and Its Application to Acoustic Echo Cancellation Luis A. Azpicueta-Ruiz, Member, IEEE, Marcus Zeller, Aníbal R. Figueiras-Vidal, Fellow, IEEE, Walter Kellermann, Fellow, IEEE, and Jerónimo Arenas-García, Senior Member, IEEE Abstract—This paper presents a novel scheme for nonlinear acoustic echo cancellation based on adaptive Volterra Filters with linear and quadratic kernels, which automatically prefers those diagonals con- tributing most to the output of the quadratic kernel with the goal of minimizing the overall mean-square error. In typical echo cancellation scenarios, not all coefcients will be equally relevant for the modeling of the nonlinear echo, but coefcients close to the main diagonal of the second-order kernel will describe most of the nonlinear echo distortions, such that not all diagonals need to be implemented. However, it is difcult to decide the most appropriate number of diagonals a priori, since there are many factors that inuence this decision, such as the energy of the nonlinear echo, the shape of the room impulse response, or the step size used for the adaptation of kernel coefcients. Our proposed scheme incorporates adaptive scaling factors that control the inuence of each group of adjacent diagonals contributing to the quadratic kernel output. An appropriate selection of these factors serves to emphasize or neglect diagonals of the model as required by the present situation. We provide adaptation rules for these factors based on previous works on combination of adaptive lters, and comprehensive simulations showing the reduced gradient noise reached by the new echo canceller. Index Terms—Combination of lters, combination of kernels, nonlinear acoustic echo cancellation, Volterra lters. I. INTRODUCTION Acoustic echo cancellers (AECs) have become a crucial component of many of today’s communication scenarios, including teleconference applications or video telephony situations, among others. Both hands- free systems and mobile devices need to incorporate an AEC to pre- vent the echo component that is being fed back to the far-end speaker [1]. Although the sound propagation from a loudspeaker to a micro- phone can safely be assumed to be linear and therefore be described by a room impulse response (RIR), the widespread use of hands-free telephony and mobile phones suggests to incorporate nonlinear AECs Manuscript received August 20, 2012; revised December 13, 2012 and February 16, 2013; accepted February 19, 2013. Date of publication March 06, 2013; date of current version April 30, 2013. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Jonathon A. Chambers. Their work was supported in part by MICINN projects TIN2011-24533, TEC2011-22480 and PRI-PIBIN-2011-1266 and in part by the Deutsche Forschungsgemeinschaft (DFG) under project number KE 890/5-1. L. A. Azpicueta-Ruiz, A. R. Figueiras-Vidal, and J. Arenas-García are with the Department of Signal Theory and Communications, Universidad Carlos III de Madrid, 28911 Leganés, Spain (e-mail: [email protected]; arfv@tsc. uc3m.es; [email protected]). M. Zeller and W. Kellermann are with the Chair of Multimedia Communi- cations and Signal Processing, University of Erlangen-Nuremberg, 91058 Er- langen, Germany (e-mail: [email protected]; [email protected]). This paper has supplementary downloadable multimedia material available at http://ieeexplore.ieee.org provided by the authors. This includes a document which shows the derivations of the theoretical analysis of the steady-state per- formance of the scheme proposed in this paper. This material is 142 KB in size. Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TSP.2013.2251340 (NAECs), since traditional linear schemes obtain an insufcient can- cellation performance in these scenarios. This is mainly due to the fact that such devices include low-quality loudspeakers and ampliers driven at high power levels which cause non-negligible nonlinear dis- tortions, and thus call for nonlinear models with memory to compensate these distortions [2], [3]. Recently, Volterra lters (VF) have been used by numerous re- searchers as the basis of NAECs, e.g., [3]–[7], since they represent a broad nonlinear lter class with memory. Considering VFs with a twofold truncation of the Volterra series, i.e., a limitation to a max- imum kernel order and a maximum length of taps per dimension for the -th order kernel, the output of a VF can be expressed as the superposition of kernel outputs as follows, [4] (1) where is the input signal at time instant , and stands for VF coefcients. A problem that appears when VFs are adapted following LMS-type algorithms is the amount of gradient noise introduced by the coefcient updates [3]. This is particularly signicant for the higher-order kernels, which require an exponentially growing number of coefcients. As a consequence, in some situations a VF can even be outperformed by a purely linear lter, e.g., when mainly linear distortions are present, or when the amount of gradient noise introduced by higher-order ker- nels exceeds the power of the nonlinear distortions they seek to cancel. To alleviate this drawback, combination approaches [8] have been ex- plored to adjust lter parameters [9] and kernel sizes [10]. In [9], we proposed an adaptive combination of kernels, replacing each kernel’s output by a combination of two kernels of the same order: (2) Here, refers to the adaptive combination of the outputs of two kernels of order with complementary capabilities, and , respectively, combined by means of a mixing parameter . The scheme was also extended to account for all-zero-kernels with outputs , to provide the exibility to add or remove kernel contributions as necessary, increasing the robustness of VFs with respect to the degree of the nonlinearity. This is especially useful for NAEC in scenarios where the linear-to-nonlinear distortion power ratio (LNLR), dened as the ratio of the power of non-linear distortions caused by a system to the power of linear distortions, is a priori unknown or even time varying [9]. In this paper, we propose a novel nonlinear echo cancellation scheme based on combination strategies which offers further improvements to adaptive VFs, considering only second-order structures for simplicity 1 . The main idea relies on the fact that, for NLMS updates, gradient noise introduced by the quadratic kernel adaptation is usually evenly dis- tributed among all of its coefcients, whereas not all of them contribute equally to the nonlinear distortions in the output [3], [11]. Our proposal 1 Although our scheme could be easily generalized, higher-order kernels ex- cessively increase the computational cost, making them much less suitable for practical applications. 1053-587X/$31.00 © 2013 IEEE

Transcript of Enhanced Adaptive Volterra Filtering by Automatic Attenuation of Memory Regions and Its Application...

Page 1: Enhanced Adaptive Volterra Filtering by Automatic Attenuation of Memory Regions and Its Application to Acoustic Echo Cancellation

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013 2745

Enhanced Adaptive Volterra Filtering by AutomaticAttenuation of Memory Regions and Its Application to

Acoustic Echo Cancellation

Luis A. Azpicueta-Ruiz, Member, IEEE, Marcus Zeller,Aníbal R. Figueiras-Vidal, Fellow, IEEE,Walter Kellermann, Fellow, IEEE, and

Jerónimo Arenas-García, Senior Member, IEEE

Abstract—This paper presents a novel scheme for nonlinear acousticecho cancellation based on adaptive Volterra Filters with linear andquadratic kernels, which automatically prefers those diagonals con-tributing most to the output of the quadratic kernel with the goal ofminimizing the overall mean-square error. In typical echo cancellationscenarios, not all coefficients will be equally relevant for the modelingof the nonlinear echo, but coefficients close to the main diagonal of thesecond-order kernel will describe most of the nonlinear echo distortions,such that not all diagonals need to be implemented. However, it is difficultto decide the most appropriate number of diagonals a priori, since thereare many factors that influence this decision, such as the energy of thenonlinear echo, the shape of the room impulse response, or the step sizeused for the adaptation of kernel coefficients. Our proposed schemeincorporates adaptive scaling factors that control the influence of eachgroup of adjacent diagonals contributing to the quadratic kernel output.An appropriate selection of these factors serves to emphasize or neglectdiagonals of the model as required by the present situation. We provideadaptation rules for these factors based on previous works on combinationof adaptive filters, and comprehensive simulations showing the reducedgradient noise reached by the new echo canceller.

Index Terms—Combination of filters, combination of kernels, nonlinearacoustic echo cancellation, Volterra filters.

I. INTRODUCTION

Acoustic echo cancellers (AECs) have become a crucial componentof many of today’s communication scenarios, including teleconferenceapplications or video telephony situations, among others. Both hands-free systems and mobile devices need to incorporate an AEC to pre-vent the echo component that is being fed back to the far-end speaker[1]. Although the sound propagation from a loudspeaker to a micro-phone can safely be assumed to be linear and therefore be describedby a room impulse response (RIR), the widespread use of hands-freetelephony and mobile phones suggests to incorporate nonlinear AECs

Manuscript received August 20, 2012; revised December 13, 2012 andFebruary 16, 2013; accepted February 19, 2013. Date of publication March 06,2013; date of current version April 30, 2013. The associate editor coordinatingthe review of this manuscript and approving it for publication was Prof.Jonathon A. Chambers. Their work was supported in part by MICINN projectsTIN2011-24533, TEC2011-22480 and PRI-PIBIN-2011-1266 and in partby the Deutsche Forschungsgemeinschaft (DFG) under project number KE890/5-1.L. A. Azpicueta-Ruiz, A. R. Figueiras-Vidal, and J. Arenas-García are with

the Department of Signal Theory and Communications, Universidad Carlos IIIde Madrid, 28911 Leganés, Spain (e-mail: [email protected]; [email protected]; [email protected]).M. Zeller and W. Kellermann are with the Chair of Multimedia Communi-

cations and Signal Processing, University of Erlangen-Nuremberg, 91058 Er-langen, Germany (e-mail: [email protected]; [email protected]).This paper has supplementary downloadable multimedia material available

at http://ieeexplore.ieee.org provided by the authors. This includes a documentwhich shows the derivations of the theoretical analysis of the steady-state per-formance of the scheme proposed in this paper. This material is 142 KB in size.Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TSP.2013.2251340

(NAECs), since traditional linear schemes obtain an insufficient can-cellation performance in these scenarios. This is mainly due to thefact that such devices include low-quality loudspeakers and amplifiersdriven at high power levels which cause non-negligible nonlinear dis-tortions, and thus call for nonlinearmodels withmemory to compensatethese distortions [2], [3].Recently, Volterra filters (VF) have been used by numerous re-

searchers as the basis of NAECs, e.g., [3]–[7], since they representa broad nonlinear filter class with memory. Considering VFs with atwofold truncation of the Volterra series, i.e., a limitation to a max-imum kernel order and a maximum length of taps per dimensionfor the -th order kernel, the output of a VF can be expressedas the superposition of kernel outputs as follows, [4]

(1)

where is the input signal at time instant , and standsfor VF coefficients.A problem that appears when VFs are adapted following LMS-type

algorithms is the amount of gradient noise introduced by the coefficientupdates [3]. This is particularly significant for the higher-order kernels,which require an exponentially growing number of coefficients. As aconsequence, in some situations a VF can even be outperformed bya purely linear filter, e.g., when mainly linear distortions are present,or when the amount of gradient noise introduced by higher-order ker-nels exceeds the power of the nonlinear distortions they seek to cancel.To alleviate this drawback, combination approaches [8] have been ex-plored to adjust filter parameters [9] and kernel sizes [10]. In [9], weproposed an adaptive combination of kernels, replacing each kernel’soutput by a combination of two kernels of the same order:

(2)

Here, refers to the adaptive combination of the outputs oftwo kernels of order with complementary capabilities, and

, respectively, combined by means of a mixing parameter. The scheme was also extended to account for all-zero-kernels

with outputs , to provide the flexibility to add or removekernel contributions as necessary, increasing the robustness of VFswith respect to the degree of the nonlinearity. This is especiallyuseful for NAEC in scenarios where the linear-to-nonlinear distortionpower ratio (LNLR), defined as the ratio of the power of non-lineardistortions caused by a system to the power of linear distortions, is apriori unknown or even time varying [9].In this paper, we propose a novel nonlinear echo cancellation scheme

based on combination strategies which offers further improvements toadaptive VFs, considering only second-order structures for simplicity1.The main idea relies on the fact that, for NLMS updates, gradient noiseintroduced by the quadratic kernel adaptation is usually evenly dis-tributed among all of its coefficients, whereas not all of them contributeequally to the nonlinear distortions in the output [3], [11]. Our proposal

1Although our scheme could be easily generalized, higher-order kernels ex-cessively increase the computational cost, making them much less suitable forpractical applications.

1053-587X/$31.00 © 2013 IEEE

Page 2: Enhanced Adaptive Volterra Filtering by Automatic Attenuation of Memory Regions and Its Application to Acoustic Echo Cancellation

2746 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013

exploits this asymmetry by removing the noise introduced by the adap-tation of the so-called inactive coefficients (having energy close to 0).To this end, partial kernel outputs referring to groups of adjacent di-agonals will be scaled by factors from the unit interval, such that theirgradient noise contribution is automatically optimized.The rest of the paper is organized as follows: Section II describes

our proposed scheme for echo cancellation, whose performance willthen be empirically assessed in Section III. In Section IV we outlinethe main conclusions of this work.

II. SCHEME FOR AUTOMATIC ATTENUATION OF MEMORY REGIONS

It is well known that the output of a VF based on the usual represen-tation in Cartesian coordinates (1) can also be reinterpreted bymeans ofdiagonal coordinates [4]. For the second-order case, such a transform isgiven by , where denotes the relative delay be-tween the input product samples and uniquely specifies each diagonal.While the linear output is given by , theoutput of the quadratic kernel can be calculated as the sum ofthe partial outputs corresponding to each diagonal, i.e.,

(3)

The above representation is especially beneficial in the context ofcascaded systems, where the coefficients close to the main diagonalconcentrate most of the energy, due to the final linear block, repre-senting the RIR [12]. This fact has been exploited in efficient VFs thatimplement and update only themain and some of the adjacent diagonals[13]. However, the number of diagonals with non-negligible energy isa priori unknown, which motivates the introduction of schemes thatare based on pruning strategies [12] or on the estimation of the numberof active diagonals of a quadratic kernel [10].The above problem gets even more involved when considering

adaptive kernels which update coefficients using adaptive algorithms,since this results in the introduction of gradient noise, which is usuallyevenly caused by all estimated coefficients. In this case, the decision ofwhether a diagonal should be fully considered or attenuated dependson whether the amount of gradient noise associated to its estimationexceeds the energy of the echo signal that can be removed. However,there are several factors which affect this tradeoff, such as the shapeof the RIR, the power of background noise, the LNLR and the stepsize used for kernel adaptation (since a larger step size results in anincreased amount of gradient noise). Typically, this information is notknown a priori, and can even be time-varying. In the next subsection,we present a scheme that provides an automatic weighting of theinfluence of each kernel diagonal, reducing the amount of gradientnoise in the estimation of the echo channel.

A. Automatic Attenuation of Memory Regions

Fig. 1 represents the proposed echo cancellation scheme, to whichwe will refer as D-NLAEC-AZK (Nonlinear Acoustic Echo Cancellerwith All Zero Kernels in the Diagonals), and whose output is obtainedas the superposition ofmodified linear and quadratic kernels as follows:

(4)

Fig. 1. D-NLAEC-AZK using groups. (a) Block diagram of the schemerepresenting groups of diagonals as a whole quadratic kernel where only thewhite coefficients are adapted; for the sake of simplicity adaptation loops havebeen omitted, where error signals , are calculated as describedin [9]. (b) Location of active (shaded and delimited by the dashed line) andinactive (no fill) coefficients of the true and unknown echo channel consideringtwelve active diagonals. The groups of diagonals used by D-NLAEC-AZK inthe experiments are delimited by the wider grey lines.

Here, we have defined , withbeing the partial output of the th diagonal of the quadratic kernel. Inother words, refers to the overall output of a group of diagonalswith indices in the set , where the total number of groups with disjointdiagonals is .For linear component , we use a combination of two kernels

with different step sizes and combination parameter , to improvethe convergence speed vs precision tradeoff, enhancing the robustnesswith respect to time-varying room impulse responses [9].However, the most prominent feature of the proposed algorithm is

the automatic attenuation of memory regions of the quadratic kernel.Considering in (4), we notice that the output of each block of di-agonals is multiplied by a scaling factor . By

Page 3: Enhanced Adaptive Volterra Filtering by Automatic Attenuation of Memory Regions and Its Application to Acoustic Echo Cancellation

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013 2747

appropriately selecting such factors, the contributions of all diagonalsare easily added or removed

from the output of the echo canceller . It is interesting to note thatoptimal values of can also take intermediate values, whichcan be explained from a bias vs variance tradeoff in the estimationof kernel coefficients [14]. Although the proposed modified quadratickernel could be seen as a two-level hierarchical adaptive kernel [15],one important difference, among others, is that we update all coeffi-cients employing the same error signal,

, as justified in [9].An important design parameter is the number of groups, .We could

use a different scaling factor for each diagonal in the quadratic kernel( ). However, grouping diagonals offers the advantage of re-ducing the number of required factors (and, therefore, the computa-tional cost for their estimation), without significantly degrading filterperformance, since adjacent diagonals typically exhibit a similar be-havior. The other extreme case, , yields the NLAEC-AZK (Non-linear Acoustic Echo Canceller with All Zero Kernel) scheme [9]. So,our proposal can be seen as a generalization of [9], and therefore, apartfrom the automatic attenuation of memory regions, it also shares itsmain benefits:• Enhanced robustness with respect to time-varying RIRs due to theutilization of a combination of two linear kernels with differentstep sizes for obtaining in (4), as it was done in [9].

• Robustness to the presence or absence of quadratic distor-tions. If the echo path only presents linear distortions, thesecond-order kernel can be removed by selecting ,for .

Although different grouping strategies could be adopted, thegrouping illustrated in Fig. 1(b) (for ) has been designed so thateach block of diagonals includes approximately the same number ofcoefficients. In the represented case, usingwill remove the influence from diagonals associated to inactive coef-ficients, while the optimal values for and depend onseveral factors, as we analyze in the next subsection.

B. Steady-State EMSE of D-NLAEC-AZK With Optimum ScalingFactors

In this section, we summarize some of the most relevant results re-garding the steady-state performance of the proposed D-NLAEC-AZKscheme when optimum scaling factors are selected. The details of theanalysis can be found in the supplementary material that accompaniesthe paper at http://ieeexplore.ieee.org. For the analysis, we assume: 1)Cancellation of the linear part of the echo path is considered perfect.This allows assuming the following stationary model for the desiredsignal: , where represents thequadratic kernel of the echo path, is the input regressor, andis output zero-mean noise with variance ; 2) the input signal is whitenoise with mean zero and variance ; and 3) adaptation of the VF co-efficients follows the average values of the input signal.Under these assumptions, it can be shown that the excess mean-

square error (EMSE), , can be approxi-mated in steady state as:

(5)

where we defined coefficient errors, for , are constant scaling parameters, and

is the expectation operator. The EMSE of a standard VF couldbe calculated setting in the above expression, i.e., as the sumof the gradient noise associated to the identification of all coefficientsin the quadratic kernel. However, (5) shows that scaling factors allowremoving some of that gradient noise if , at the cost ofintroducing a systematic error. This is related to the well-knowncompromise between bias and variance, as it was already noted in[14].Setting derivatives of (5) with respect to to zero, we can find

the optimum scaling factors as

(6)

which is a constant between 0 and 1. Replacing this expression backin (5), and after some manipulations, the optimum EMSE of theD-NLAEC-AZK scheme would be given by

(7)

Obviously, this error is smaller or equal to the EMSE of the VF, withthe equality holding if and only if . When comparedto the NLAEC-AZK scheme from [9] (i.e., ), we see thatD-NLAEC-AZK provides additional flexibility, leading to potentiallylarger EMSE reductions, as we will check in the experiments section.

C. Adaptation of and

In practical situations, the optimum scaling factors cannot be knownin advance, thus making advisable to implement adaptive strategiesthat can learn and adapt them depending on the filtering scenario. Wefollow [14] to derive rules for the adaptation of and ,with . Rather than directly updating them, we first defineauxiliary parameters and , , by means ofthe sigmoid function, . Adaptation of the auxil-iary parameters can be done using power-normalized gradient descentrules for the minimization of the overall error .

(8)

(9)

(10)

where and are used for truncation of and ,at every iteration, and are the step sizes, and

and, are low-pass filtered ver-

sions of the signals used for normalization. Selection of initial valuesand , and parameters and is not critical, and we

simply set them to , and (formore details, please refer to [14]).

Page 4: Enhanced Adaptive Volterra Filtering by Automatic Attenuation of Memory Regions and Its Application to Acoustic Echo Cancellation

2748 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013

Fig. 2. Steady-state performance of D-NLAEC-AZK when using Laplaciannoise as the filter input. (a) Steady-state EMSE reduction of the NLAEC-AZK

and D-NLAEC-AZK schemes with respect to a conven-tional VF. (b) From top to bottom: Steady-state values of NLAEC-AZK andD-NLAEC-AZK mixing parameters.

An additional and important benefit of the activation function (8) isthat it reduces the adaptation speed when parameters orget close to the limiting values of 0 or 1 via the derivatives in (9) and(10), and thus prevents the addition of gradient noise by these rules inthese important situations. Otherwise, even for a case where all diag-onals are necessary (and, therefore ), the adaptationof these coefficients would introduce some additional noise, thus de-grading the performance of the new scheme with respect to the use ofthe original quadratic kernel [14].The number of additional products required by the group-based

scaling of kernel diagonals is with respect to theNLAEC-AZK scheme. Thus, we can see that is an importantparameter that imposes a compromise between computational cost andperformance. In any case, this number of products is typically smallwhen compared to the computation requirements for the adaptation ofthe quadratic kernel.

III. EXPERIMENTS

We have carried out a number of experiments in an acoustic echocancellation scenario similar to that of [9]. Measured echo paths forthe linear and non-linear parts were represented in [9, Fig. 7]. For asampling frequency of 8 kHz, the sizes for the linear and quadratic

kernels are 320 and 64 64 taps, respectively. The reference signalfollows the model:

TABLE ICOMPUTATIONAL COST REQUIRED BY DIFFERENT SCHEMES, BEINGTHE NUMBER OF GROUPS OF DIAGONALS, AND AND THE

NUMBER OF MULTIPLICATIONS REQUIRED BY THE LINEAR ANDQUADRATIC KERNELS, RESPECTIVELY

, where allows us to control the LNLR, and is anadditive Gaussian i.i.d. noise, whose variance is set to simulate an SNR= 20 dB when , i.e., .According to (4), the D-NLAEC-AZK is constituted by two linear

and one quadratic kernels, whose dimensions have been fixed to matchthe real echo path (i.e., and ). Unlessotherwise stated, we use groups containing similar numberof taps [ , , ,and , see Fig. 1(b)]. Kernel coefficients are adaptedusing an NLMS-type rule with kernel-dependent normalization [12],with step sizes and for the linear, andfor the quadratic kernels. Parameters for the adaptation of and

are kept constant throughout all simulations: ,and , as recommended in [9].In a first set of experiments we evaluate the steady-state performance

of our proposal as a function of different parameters. As a figure ofmerit we use the EMSE, estimated averaging over 25.000 iterationsafter convergence. A VF with and is used as areference2, such that we can study the EMSE reduction of our schemewith respect to this baseline. For the considered SNR, this baseline VFachieves an EMSE of for all LNLR values represented in thefigures.Fig. 2(a) displays the steady-state EMSE improve-

ments of both NLAEC-AZK (i.e., ) and D-NLAEC-AZKwith as a function of the LNLR, when the input signal isLaplacian colored noise. In order to better illustrate the ability of thenew canceller to identify active diagonals, we have set all coefficientsof the quadratic kernel outside the first twelve diagonals of our testsystem to zero. In this way, the active coefficients of the true echo pathexpand only through the first and second block of diagonals (whichincludes inactive coefficients as well).As it can be seen, the quadratic kernel is not needed for high LNLRs,

and both echo cancellers replace it by the AZK reaching similar andhigh EMSE gains with respect to the conventional VF. However, whennon-negligible nonlinear distortions are present (LNLR ), theD-NLAEC-AZK consistently outperforms the NLAEC-AZK by ap-proximately 2 dB. This performance gain is due to the fact that theD-NLAEC-AZK is still using the AZK to model the diagonals withinactive coefficients, whereas the NLAEC-AZK is always forced to usethe full quadratic kernel, introducing gradient noise in the modeling ofinactive coefficients of . Note that the mixing parameters associ-ated to the third and fourth blocks of diagonals ( and ,respectively) remain close to 0 for all levels of LNLR [Fig. 2(b)].The computational cost of different schemes (linear, nonlinear,

NLAEC-AZK and our proposal D-NLAEC-AZK) is analyzed in termsof multiplications per iteration in Table I, both for a general case andfor the specific configuration of this experiment. As it can be seen, the

2Note that, after the algorithms have converged, the VF with lowest, i.e., with best performance, that can be formed using the kernels

of the D-NLAEC-AZK scheme is that with the smallest step size in the linearkernel ( ).

Page 5: Enhanced Adaptive Volterra Filtering by Automatic Attenuation of Memory Regions and Its Application to Acoustic Echo Cancellation

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013 2749

Fig. 3. Steady-state performance of D-NLAEC-AZK when using Laplaciannoise as the filter input. (a) Steady-state EMSE reduction of the D-NLAEC-AZK

scheme w.r.t. a conventional VF as a function of the number of activediagonals of . (b) Steady-state values of EMSE of the proposed scheme as afunction of , considering twelve active diagonals of .

performance gain of our scheme is obtained at the expense of a verymodest computational increment with respect to NLAEC-AZK, andeven with respect to a standard VF, since the most critical element isthe quadratic kernel.To better analyze D-NLAEC-AZK ability to identify an appropriate

number of diagonals to include in the model, we consider next second-order plants where the number of active diagonals is modified by dele-tion of a varying number of the diagonals lying furthest from the mainone. Fig. 3(a) shows the EMSE gain of our scheme as a function ofthe number of active diagonals in . The abrupt transitions corre-spond to the block-limits implemented by our scheme. As we wouldexpect, the gain increases with a decreasing number of active coeffi-cients, since in this case D-NLAEC-AZK can cancel the gradient noisederived from larger blocks of inactive coefficients. We can also see thatthe EMSE reduction is more important for larger LNLRs, since in suchcases the energy of the gradient noise introduced by coefficient adap-tation is more likely to exceed the energy of the nonlinear echo whichcan be cancelled.Fig. 3(b) analyzes the influence of the number of blocks, illustrating

EMSE gain with respect to a conventional VF when using 1, 2, 4, 8,and 16 blocks in the D-NLAEC-AZK scheme considering a case withtwelve active diagonals in . We can conclude that using a largernumber of blocks can provide improved performance, at the price ofan slightly increased computational cost.For completeness, we have also carried out experiments using real

speech as input signal and employing the Echo Return Loss Enhance-ment (ERLE), ,

Fig. 4. D-NLAEC-AZK performance when using speech as the input signaland . From top to bottom: input signal; ERLE evolution forNLAEC-AZK and D-NLAEC-AZK; time evolution of D-NLAEC-AZKmixingparameters .

Fig. 5. D-NLAEC-AZK performance when using speech as the input signaland . From top to bottom: input signal; ERLE evolution forNLAEC-AZK and D-NLAEC-AZK; time evolution of D-NLAEC-AZKmixingparameters .

as criterion. The ERLE has been estimated by averaging 100 runswith independent output noise signals. Figs. 4–6 show the logarithmicERLEs achieved by both NLAEC-AZK and D-NLAEC-AZKfor three different LNLR values: , 20 and 0 dB, respectively. Thebehavior of both schemes can be described in very similar terms tothose used for the previous examples. For , both can-cellers obtain very similar ERLEs, whereas D-NLAEC-AZK achievessome additional gain for lower LNLRs, which becomes especiallyclear for (panel (a) of Fig. 6). In the latter case, withonly an increment of 0.32% in terms of computational burden, ERLEimprovements of around 3 dB are observed during several intervals ofthe simulation, as it can be seen in the zoom plot shown in panel 6(b).Regarding the evolution of combination parameters, the coefficients

that are associated to active regions in the quadratic kernel, and, become closer to 1 as the LNLR decreases. However, during

silence periods all combination parameters take values around 0.5, even

Page 6: Enhanced Adaptive Volterra Filtering by Automatic Attenuation of Memory Regions and Its Application to Acoustic Echo Cancellation

2750 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013

Fig. 6. D-NLAEC-AZK performance when using speech as the input signaland . (a) From top to bottom: input signal; ERLE evolution forNLAEC-AZK and D-NLAEC-AZK; time evolution of D-NLAEC-AZKmixingparameters . (b) Zoom from five to eight seconds.

for . This can be explained from a noisy evolution of thescaling parameters in this case, and from the fact that we are showingaverages of 100 runs with independent output noise. Note, however,that in practical echo cancellers, it is usual to include activity detectorsthat halt filter adaptation during silence periods.

IV. CONCLUSIONS

We have presented a novel NAEC that provides an automatic scalingof memory regions of a second-order Volterra kernel, to appropriatelyminimize the overall MSE. By grouping and scaling the partial outputsof adjacent diagonals by a common adaptive attenuation factor, we areable to emphasize or reduce their respective influence on the nonlinearoutput. In terms of the well-known bias vs. variance trade-off, thisscheme yields a lower residual error of the resulting Volterra model.

The effectiveness of this approach has been experimentally demon-strated using noise and speech input signals, comparing to both a plainsecond-order VF and another variant using a single quadratic kernelwith output scaling.Future work will focus on migrating these techniques to DFT-do-

main implementations (of Volterra filters) and on a refined interpreta-tion of themixing parameters for further pruning strategies. In addition,another research line will cope with novel NAECs based on combina-tion of bilinear filters.

REFERENCES

[1] C. Breining et al., “Acoustic echo control—an application of very-high-order adaptive filters,” IEEE Signal Process. Mag., vol. 16, pp.42–69, 1999.

[2] B. S. Nollett and D. L. Jones, “Nonlinear echo cancellation forhands-free speakerphones,” presented at the IEEE Workshop Non-linear Signal Image Process. (NSIP), Mackinac Island, MI, USA,1997.

[3] A. Guérin, G. Faucon, and R. L. Bouquin-Jeannès, “Nonlinear acousticecho cancellation based on Volterra filters,” IEEE Trans. Speech AudioProcess., vol. 11, no. 6, pp. 672–683, 2003.

[4] V. J. Mathews and G. L. Sicuranza, Polynomial Signal Processing.New York, NY, USA: Wiley, 2000.

[5] A. Fermo, A. Carini, and G. L. Sicuranza, “Analysis of differentlow complexity nonlinear filters for acoustic echo cancellation,” inProc. IEEE 1st Int. Workshop Image Signal Process. Anal. (IWISPA),Croacia, 2000, pp. 261–266.

[6] G. A. Glentis, P. Koukoulas, and N. Kalouptsidis, “Efficient algorithmsfor Volterra system identification,” IEEE Trans. Signal Process., vol.47, no. 11, pp. 3042–3057, 1999.

[7] F. Küch and W. Kellermann, “Partitioned block frequency-domainadaptive second-order Volterra filter,” IEEE Trans. Signal Process.,vol. 53, no. 2, pp. 564–575, 2005.

[8] J. Arenas-García, A. R. Figueiras-Vidal, and A. H. Sayed,“Mean-square performance of a convex combination of two adaptivefilters,” IEEE Trans. Signal Process., vol. 54, no. 3, pp. 1078–1090,2006.

[9] L. A. Azpicueta-Ruiz, M. Zeller, A. R. Figueiras-Vidal, J.Arenas-García, and W. Kellermann, “Adaptive combination ofVolterra kernels and its application to nonlinear acoustic echo cancel-lation,” IEEE Trans. Audio, Speech and Language Process., vol. 19,no. 1, pp. 97–110, 2011.

[10] M. Zeller, L. A. Azpicueta-Ruiz, J. Arenas-García, and W. Keller-mann, “Adaptive Volterra filters with evolutionary quadratic kernelsusing a combination scheme for memory control,” IEEE Trans. SignalProcess., vol. 59, no. 4, pp. 1449–1464, 2011.

[11] A. H. Sayed, Fundamentals of Adaptive Filtering. New York, NY,USA: Wiley, 2003.

[12] M. Zeller and W. Kellermann, “Coefficient pruning for higher-orderdiagonals of Volterra filters representing Wiener-Hammersteinmodels,” presented at the Int. Workshop Acoust. Echo Noise Control(IWAENC), Seattle, WA, USA, 2008.

[13] A. Fermo, A. Carini, and G. L. Sicuranza, “Low-complexity non-linear adaptive filters for acoustic echo cancellation in GSM handsetreceivers,” Eur. Trans. Telecommun., vol. 14, pp. 161–169, 2003.

[14] M. Lázaro-Gredilla, L. A. Azpicueta-Ruiz, A. R. Figueiras-Vidal, andJ. Arenas-García, “Adaptively biasing the weights of adaptive filters,”IEEE Trans. Signal Process., vol. 58, no. 7, pp. 3890–3895, 2010.

[15] T. K. Woo, “Fast hierarchical least mean square algorithm,” IEEESignal Process. Lett., vol. 8, no. 11, pp. 289–291, 2001.