Alternating fixed-point algorithm for stereophonic ...

9
Alternating fixed-point algorithm for stereophonic acoustic echo cancellation N.T.Forsyth, J.A.Chambers and P.A.Naylor Abstract: A novel algorithm for use in stereophonic acoustic echo cancellation is introduced. The alternating fixed-point structure of the algorithm descends each of the channels alternatively and avoids nonuniqueness in the solutions by employing inter- and intra-channel orthogonal projec- tions. For a noisy environment, the algorithm is shown to provide improved misalignment performance and echo cancellation over leading competitors at a low computational cost. The inherent noise robustness is uncovered using H" estimation techniques and noise-error bounds are derived. 1 Introduction Demand is ever increasing for spatially realistic sounding teleconferencing systems as communication heads towards a networked future. Desktop conferencing has become a reality and users can enjoy full duplex sound and image communication without the need for lengthy journeys across the globe. This is possible for the rather limited mono case due to the development of echo cancellation methods that prevent disconcerting echoes from propagat- ing. These systems can be extended to the stereo case to help listeners distinguish who is talking at the other end by means of spatial information. Unfortunately, the extra channel introduces a problem since the required stereo echo cancellation is not so easily carried out. Fig. 1 shows the configuration of a stereo echo canceller and illustrates how the problems arise. Speech from a single source in the far-end transmission room is picked up by two microphones and the stereo signals are then played out through two loudspeakers in the near-end receiving room. These will then be picked up by the two microphones in the receiving room and in the absence of any echo cancellation, passed back through the channel and played out through the loudspeakers in the transmis- sion room. This coupling results in the source speech echoing around the system and, in the presence of instabil- ity, can cause howling, making the system unusable. To avoid this scenario, an echo canceller is employed to model adaptively the acoustic paths from loudspeaker to micro- phone in the receiving room. The error is calculated from the present acoustic path estimates and a desired response 0 IEE, 2002 IEE Proceedings online no. 20020148 DUI: 10.1049/ip-vis:20020148 Paper first received 15th December 2000 and in revised form 17th October 200 I N.T. Forsyth and P.A. Naylor are with the Communications and Signal Processing Group, Department of Electrical and Electronic Engineering, Imperial College of Science, Technology and Medicine, London SW7 2BT, UK J.A. Chambers is with the Centre for Digital Signal Processing Research, King's College London, London WC2R 2LS, UK created from the present input data. For the single channel case, the problem has been successively studied, [ 1,2], and the echo path can be identified uniquely. When the number of channels is increased above one, which is necessary for a realistic sounding environment, [3], there are extra complications due to the inherent nonuniqueness of the solution produced by multiple cross-couplings between the microphones and loudspeakers. This nonuniqueness is caused by the high cross-correlation between the two channels of a stereo signal from a common source and gives rise to poor convergence of the adaptive filters. The nonuniqueness problem can be alleviated by decorrelating the input stereo signal, but any such changes must be made without altering the speech quality or stereo image that the introduction of the extra channel provides. The unique problems of stereophonic acoustic echo cancellation, (SAEC), were originally discussed in [4] and several unsatisfactory potential solutions outlined. The first major breakthrough was made by Benesty et al. in [5, 61 where a nonlinearity is added to each of the channels to tackle the nonuniqueness issue. It is important to note here that the nonlinearity method can easily be combined with the solution proposed herein to further increase perfor- mance [7] in an integrated system. Another challenge that must be tackled by the designer is keeping the computational load to a minimum. In typical applications, over 1000 taps may be required to reasonably identify the echo paths. Two paths are needed for each channel in a stereo echo canceller, making a total of four long adaptive filters that must be updated every sample. Consequently, it is desirable that the adaptive algorithm employed be of low complexity and converge quickly, even when the input signal is coloured and nonstationary. 2 Overview of the AFP-NLMSS algorithm The problem of SAEC can be considered as the solution of a system of very ill-conditioned and underdetermined linear equations. The search space contains many local minima which may impede upon the progress of the algorithm towards the global error minimum. One way of creating a constrained search-space is to apply fixed-point iterations, (FPI). A fixed-point (FP) can be implemented by 1 IEE Proc.-vis. Image Signal Process., Vol. 149, No. I, February 2002 Authorized licensed use limited to: Imperial College London. Downloaded on January 4, 2010 at 08:17 from IEEE Xplore. Restrictions apply.

Transcript of Alternating fixed-point algorithm for stereophonic ...

Page 1: Alternating fixed-point algorithm for stereophonic ...

Alternating fixed-point algorithm for stereophonic acoustic echo cancellation

N.T.Forsyth, J.A.Chambers and P.A.Naylor

Abstract: A novel algorithm for use in stereophonic acoustic echo cancellation is introduced. The alternating fixed-point structure of the algorithm descends each of the channels alternatively and avoids nonuniqueness in the solutions by employing inter- and intra-channel orthogonal projec- tions. For a noisy environment, the algorithm is shown to provide improved misalignment performance and echo cancellation over leading competitors at a low computational cost. The inherent noise robustness is uncovered using H" estimation techniques and noise-error bounds are derived.

1 Introduction

Demand is ever increasing for spatially realistic sounding teleconferencing systems as communication heads towards a networked future. Desktop conferencing has become a reality and users can enjoy full duplex sound and image communication without the need for lengthy journeys across the globe. This is possible for the rather limited mono case due to the development of echo cancellation methods that prevent disconcerting echoes from propagat- ing. These systems can be extended to the stereo case to help listeners distinguish who is talking at the other end by means of spatial information. Unfortunately, the extra channel introduces a problem since the required stereo echo cancellation is not so easily carried out.

Fig. 1 shows the configuration of a stereo echo canceller and illustrates how the problems arise. Speech from a single source in the far-end transmission room is picked up by two microphones and the stereo signals are then played out through two loudspeakers in the near-end receiving room. These will then be picked up by the two microphones in the receiving room and in the absence of any echo cancellation, passed back through the channel and played out through the loudspeakers in the transmis- sion room. This coupling results in the source speech echoing around the system and, in the presence of instabil- ity, can cause howling, making the system unusable. To avoid this scenario, an echo canceller is employed to model adaptively the acoustic paths from loudspeaker to micro- phone in the receiving room. The error is calculated from the present acoustic path estimates and a desired response

0 IEE, 2002 IEE Proceedings online no. 20020148 DUI: 10.1049/ip-vis:20020148 Paper first received 15th December 2000 and in revised form 17th October 200 I N.T. Forsyth and P.A. Naylor are with the Communications and Signal Processing Group, Department of Electrical and Electronic Engineering, Imperial College of Science, Technology and Medicine, London SW7 2BT, UK J.A. Chambers is with the Centre for Digital Signal Processing Research, King's College London, London WC2R 2LS, UK

created from the present input data. For the single channel case, the problem has been successively studied, [ 1,2], and the echo path can be identified uniquely. When the number of channels is increased above one, which is necessary for a realistic sounding environment, [3], there are extra complications due to the inherent nonuniqueness of the solution produced by multiple cross-couplings between the microphones and loudspeakers. This nonuniqueness is caused by the high cross-correlation between the two channels of a stereo signal from a common source and gives rise to poor convergence of the adaptive filters. The nonuniqueness problem can be alleviated by decorrelating the input stereo signal, but any such changes must be made without altering the speech quality or stereo image that the introduction of the extra channel provides. The unique problems of stereophonic acoustic echo cancellation, (SAEC), were originally discussed in [4] and several unsatisfactory potential solutions outlined. The first major breakthrough was made by Benesty et al. in [5, 61 where a nonlinearity is added to each of the channels to tackle the nonuniqueness issue. It is important to note here that the nonlinearity method can easily be combined with the solution proposed herein to further increase perfor- mance [7] in an integrated system.

Another challenge that must be tackled by the designer is keeping the computational load to a minimum. In typical applications, over 1000 taps may be required to reasonably identify the echo paths. Two paths are needed for each channel in a stereo echo canceller, making a total of four long adaptive filters that must be updated every sample. Consequently, it is desirable that the adaptive algorithm employed be of low complexity and converge quickly, even when the input signal is coloured and nonstationary.

2 Overview of the AFP-NLMSS algorithm

The problem of SAEC can be considered as the solution of a system of very ill-conditioned and underdetermined linear equations. The search space contains many local minima which may impede upon the progress of the algorithm towards the global error minimum. One way of creating a constrained search-space is to apply fixed-point iterations, (FPI). A fixed-point (FP) can be implemented by

1 IEE Proc.-vis. Image Signal Process., Vol. 149, No. I , February 2002

Authorized licensed use limited to: Imperial College London. Downloaded on January 4, 2010 at 08:17 from IEEE Xplore. Restrictions apply.

Page 2: Alternating fixed-point algorithm for stereophonic ...

I l Y (n) e 01) receiving room transmission room

Fig. 1 Schematic diagram for stereophonic acoustic echo cancellation

freezing one input to the system and continuing conver- gence by re-using relevant data for a certain number of iterations and then freezing the other channel and doing likewise. This naturally lends itself to the use of orthogonal projections whereby different components of a signal can be identified according to their orthogonality to another signal. An FPI can be used to isolate, momentarily, one channel of the system from the other by fixing one channel and adapting with respect to the other. An orthogonal projection can then be used to separate the data relevant to that channel alone, thus helping independent conver-

gence. This brings in a constraint on the otherwise coupled solutions and provides the inspiration for the so-called inter-channel or cross-channel orthogonalisations. It is well documented [ 2 , 81 that gradient descent type algorithms such as LMS will converge fastest if successive data vectors are orthogonal i.e. statistically uncorrelated white noise, and this is the basis for the use of so-called intra- channel or through-channel orthogonalisations in the struc- ture of this algorithm.

The system developed here uses the epsilon normalised least mean square, E-NLMS~, algorithm, [ 2 ] , as a main

I start of Ygorithm I I

I standard eNLMS2 I if [n mod 2 ==O]

Find cross-channel data orthogonal to x p

Update channel 1 filter, feed into error

Find through-channel data orthogonal to xl, delay 1

Update channel 1 filter, feed into error

1 orthogonal to x , , delay 2

Update channel 1 filter,

- unfreeze channel 2

if [n mod 2 ==1] freeze channel 1

Find cross-channel data orthogonal to x1

Update channel 2 filter, feed into error

1 (k=21

Find through-channel data orthogonal to x p , delay 1

Update channel 2 filter,

m Find through-channel data orthogonal to x2, delay 2

Update channel 2 filter,

unfreeze channel 1

I end of algorithm ] Fig. 2

2

Flow-chart showing operation o f the AFP-NLMS2 algorithm

IEE Proc.-Vis. Image Signal Process., El. 149, No. I , Februav 2002

Authorized licensed use limited to: Imperial College London. Downloaded on January 4, 2010 at 08:17 from IEEE Xplore. Restrictions apply.

Page 3: Alternating fixed-point algorithm for stereophonic ...

loop iteration and then fixes one channel and adapts the other in a FP manner. For the FP adaptations, the input data into one channel is frozen and data is reused after certain projections have been performed on it. These projections are in the form of Gram-Schmidt orthogonalisation proce- dures [9]. The first FPI maps one channel onto a subspace that is orthogonal to the other channel at that time. The second FPI maps the present channel onto a subspace that is orthogonal to the same channel at the previous main- loop iteration. The third then maps to a subspace that is orthogonal to the same channel two main-loop iterations ago, and so on. These second and third FPIs are through- channel projections as they remove data that is the same as the previous data in that channel. The number of iterations can be extended to more than three, but it was found useful only to use the first as a cross-channel iteration and the next two as through channel iterations, corresponding to two units look-back in time by the third iteration. It is important to note that the most recent estimate of the filter coefficients is fed through to each FP stage.

3 Presentation of AFP-NLMS2

The operation of the proposed algorithm is shown visually in Fig. 2. The alternating fixed-point 2-channel normalised least mean square algorithm, (AFP-NLMS2), filter coeffi- cient update is formalised in Table 1 for k FPIs where f$y denotes the Gram-Schmidt orthogonalisation of y with respect to x.

The operation of AFP-NLMS2 has been proven in [lo] to be a contraction mapping for a certain adaptation gain range.

4 Testing of AFP-NLMSZ

The algorithm is first tested with an input signal of white noise corrupted with additive white Gaussian noise, (AWGN), at an SNR of 30dB. All adaptation gains are set to 0.3 for comparative and stability reasons. The filter length of each channel is set to 100 taps and the unknown system is of the same length. Comparison is made between the standard E-NLMS~ algorithm, which can be viewed as AFP-NLMS2 with M= 0, and the AFP-NLMS2 algorithm with M = 1 ,2 and 3. The average weight error vector norm, (WEVN), or misalignment over both channels is shown in Fig. 3, where

It can be seen that the WEVN reduces as the number of FPIs is increased, indicating that there is improved echo- path identification. The improvement was observed to continue above five FPIs, but with ever decreasing rewards for the extra complexity.

The AFP-NLMS2 algorithm is next tested with a highly coloured input. United States of America Standards Insti- tute (USASI) noise data is chosen which has a long term spectrum approximately equal to that of speech, but is statistically stationary. The simulation is carried out as before and again a similar trend is seen in the convergence performance in Fig. 4, although the improvement between successive FPIs is irregular. The convergence rate with AFP-NLMS2M,3 is seen to be over twice as fast as that of the standard E-NLMS algorithm.

Table 1: AFP-NLMSZ filter coefficient update

Fixed-point initialisations

If mod2(n) = = 0, c=I ,E=2 else c=2,E=1 W(C)[l I = W,fl+l,C)

e[ 1 1 = e(,)

(a ) w,, denotes weight coefficient vector for %h main loop iteration number, and j th channel number; ~ = ~ ' / ( E + ~ ~ X ( , , ~ ) / ~ * + ~ ~ x ~ , , ~ ~ ~ I * ) with O s p ' s 1 and E a small positive constant.

(b) where c denotes left or right channel to be updated and main index denotes fixed-point iteration number

(c ) P = P ' h + II X(n+Z-k,c) IF) (d only one channel updated by FPls for each main loop iteration

IEE Proc.-Vis. image Signal Process., Vol. 149, No. I , February 2002 3

Authorized licensed use limited to: Imperial College London. Downloaded on January 4, 2010 at 08:17 from IEEE Xplore. Restrictions apply.

Page 4: Alternating fixed-point algorithm for stereophonic ...

I '-

I

200 400 600 800 1000 -25;

sample number

Fig. 3 M = I , 2, 3 FPI loops The input data is WGN with added measurement noise at an SNR of 30 dB

Comparison ofperformance of E-NLMS~ and AFP-NLMSZ with

In all the above simulations the data between the two channels is almost completely uncorrelated. Of course this is not true in a practical SAEC environment where the two channels will be highly correlated due to them originating from the same speaker source. To address this issue, the algorithm is tested with real speech data of a few seconds duration recorded in a realistic teleconferencing environ- ment. The data is taken from a database of various male and female speakers uttering the phonetically balanced sentence: 'Present zoos are rarely reached by efficient transportation'. This data will be the ultimate test for the algorithm as it is highly coloured and correlated and also nonstationary.

4. I Complexity issues The complexity of the AFP-NLMS2 algorithm is (3L + 3)M+ 3L + 2, where L is the filter length and A4 is the number of FPIs. From this, it seems fair to compare the performance of AFP-NLMS2 with a low order affine projection algorithm, (APA). The multichannel version of this, (MCAPA), developed by Benesty et al. [ 1 11, contains an extra cross-channel projection in an attempt to avoid the effects of strong cross-channel correlation and seems to be the best competitor to tackle the highly specialised field of

5 -10 I -1 21

'.

-I6; 500 1000 I500 2000 2500 3000 3500 4000

sample number

Fig. 4 Comparison of performance ofc-NLMS2 and AFP-NLMS2 with M = 1. 2, 3 FPI loops The input data is USASI noise with added measurement noise at an SNR of 30 dB

4

SAEC. It is chosen to use three FPIs in AFP-NLMS2, 0(12L), and a third-order MCAPA, 9(14L), to strike a balance between performance and complexity. The E-NLMS2 algorithm is kept as a benchmark and MCAPAM= is included to see the effect of increasing the projection order. If the projection order for MCAPA is set to one, then as with the AFP-NLMS2 algorithm, the performance is equal to that of E-NLMS~.

As can be seen from Fig. 5, the curves for MCAPA show highly unstable behaviour and actually can be seen to diverge. This is at a modest SNR of 30 dB and this unsatisfactory behaviour is even more prominent at SNRs of 20 dB or higher. The instability worsens as the projec- tion order is increased. On the other hand, AFP- NLMS2M= is relatively unaffected by the measurement noise. It is concluded then, that the AFP-NLMS2M= 3

algorithm is superior in terms of initial transient conver- gence speed than MCAPA of a low order with a noise corrupted desired response and input speech signal.

4.2 Regularisation The observed noise sensitivity of the MCAPA algorithm, as it stands, may make it unsuitable for direct use in practical SAEC systems. Attempts have been made, however, [ 12-14], to reduce this sensitivity by introducing an extra regularisation parameter, 61, into the inverse of the M x Mrank deficient estimate of the covariance matrix. This extra parameter helps prevent the measurement noise being amplified, [7], during the projection stage by adding 6 to each of the elements on the leading diagonal of the covariance matrix estimate i.e. adding 6 to each of its eigenvalues. Of course, this parameter will slow the convergence speed greatly in a classic gain/bias trade-off scenario.

The tap update for the standard single channel APA [ 151 is

where GIz is the L-length filter coefficient vector at time i X,, is the L x M data matrix consisting of the last M input data vectors (M<< L ) E, is the A4 x 1 vector containing the last M a priori errors 6 is the small positive regularisation constant r is the gain constant used to relax the affine projection (unity for no relaxation).

-25; , 0.5 1.0 1.5 2.0 2.5 3.0

sample number (x104)

Fig. 5 Misalignment performance comparison of E-NLMS~, AFP- N L M S ~ M , 3 and M C A P A M ~ ~ , ~ algorithms for preprocessed real speech input

IEE Proc.-Vis. Image Signal Process., Vol. 149, No. I . February 2002

Authorized licensed use limited to: Imperial College London. Downloaded on January 4, 2010 at 08:17 from IEEE Xplore. Restrictions apply.

Page 5: Alternating fixed-point algorithm for stereophonic ...

The noise sensitivity can be reduced by either relaxing the affine projection itself with a lower than unity r value or keeping r the same and increasing 6. The higher the projection order, the more regularisation or higher 6 value required. It was found useful in this work to include the regularisation parameter not only in the through- channel projections, but also when projecting across the channels. Neither of these methods was able to increase the performance of this modified version of the MCAPA algorithm above that of AFP-NLMS2 as the misalignment convergence curves of the former, although stabilised greatly, settled above those of the latter [7].

5 H" Comparison of AFP-NLMSS and MCAPA

Using the H" estimation framework developed by Hassibi et al. [ 161, it is possible to obtain noise-error bounds for the action of a particular adaptive algorithm. The bounds for both AFP-NLMS2 and MCAPA can be derived and compared to determine which is more robust to disturbance variation. The bounds come in the form of a normalised cost between disturbances and output predicted errors, which for real data and a finite memory sliding data window of size M is given by

The numerator term corresponds to the output prediction error sum. The first term in the denominator is the initial disturbance cost and the second is the measurement noise cost sum. The cost can be seen as a worst-case energy gain or transfer function between disturbances and output error. The supremum bound over all possible disturbances is given by yg, a lower value indicating more noise robust- ness. To obtain the H" solution involves rearranging (3) into an indefinite quadratic form cost function and testing its positivity. The solution is only quoted in this work, but full derivations and justification are given in [16]. For this comparison, the adaptive filtering problem is considered as a special case of the general state-space model given below, with xo the initial state,

which reduces to

for the case of adaptive filtering i.e. Fi = I, Gj = 0, so that the ui can be ignored. The state, wi, is known as the filter weight coefficient vector, with estimate ki and does not change with time for this analysis. The input data vector to the filter is now denoted by ri. It is now useful to quote the H" estimation solution from [16] for the a priori case, before deriving the y-bound. A full derivation and accom- panying proof for the a posteriori filter is given in [7].

5. I A priori H" filter For a given y p , if the matrix

[ F , Gjl IEE Proc.-Vis. Image Signal Process., Vol. 149, No. 1, February 2002

has full rank, then an estimator that achieves /~7,(Fp)llm i y p exists if

P-' J - Y - ~ L T L . P J J > 0, j = 0 , . . . , i , (6) where Po = no and Pi satisfies the Riccati recursion

(7) with

Re, = [' 0 -y2' ] + [ z ] P j [ q T L T ] . (8)

If this is the case, then one possible y-level H" filter is given by

;. = L.2. (9) J J J

where X j is computed recursively as

ij+' = + K a j b j - qij), io = initial guess (10)

and

KO,/ = F,PjqT(I + H,PjH;r)-'

5.2 The APA form To apply this H" theory to APA, the algorithm must be approximated into a simplified Kalman filter form Le.

(new state) = (old state) + (Kalman gain)(innovation vector) (1 1)

which involves making certain assumptions, described in [13]. This is to allow recursion of the sample covariance matrix, i.e. the Ricatti recursion, and the full operation shown in (2) reduces approximately to

G,, = + [x,x,T + G ~ l - ' x ( n ) e ( n ) (12)

The data matrix term to the right of the inverse has now been replaced by the most recent data vector and the error vector has been replaced by the most recent scalar error. It is now possible to derive a Riccati recursion to update the inverse sample covariance matrix, from [xx~+ SZI-' = P(n - 1) to

[XJZ+ 6 2 + x_(n)x_T(n) - r(n - M>xT (n - k9-I = P(n) The new data vector entering the sliding window is x_(n)x_'(n), and the data being removed is ~ ( n - M)zT(n - M). Using the matrix inversion lemma, [2], the Riccati recursion is

P(n - l)&)$(n)P(n - 1) 1 +gT(n)P(n - l)x(n)

P(n - l)g(n - M)gT(n - M)P(n - 1) - 1 + gT(n - M)P(n - l)z(n - M )

P(n) = P(n - 1) -

-

With these solutions, y-levels for the APA are derived in the next section for the a priori case only.

5.3 Update To derive the bound for the update case, the Riccati recursion in (7) must be solved, where H,=Lj=xT(j). This gives

P'l = Py' + ( 1 - y-2)g(j)gT(j) -Riccati (13)

5

Authorized licensed use limited to: Imperial College London. Downloaded on January 4, 2010 at 08:17 from IEEE Xplore. Restrictions apply.

Page 6: Alternating fixed-point algorithm for stereophonic ...

where it is assumed that P;' is regularised by 61. Checking the existence condition requires P;' > 0, where

P;' = P;' - Y-~L,TL, = P;' - y-2r(j)~T(j) e x i s t e n c e

(14) so, p-I -p- l -2 T .

J + l - / + I - Y x (1 + '>&'(j + '1 = P;' + (1 - y-2>XT(/)xT(j> - y - 2 ~ T ( j + l)xT(j + 1)

= + &(j)&*(j) - y 2 & ( j + 1)&j + 1) (15)

and from the- initial condition PO' = p-'Z, the Riccati recursion for Py' can be solved giving

J + l - P-'T + J

1=0

p-I - g(i)zT(i)(l - y-2>

- T 2 x ( j + l)xT(j + 1) (16) For APA, the covariance matrix is regularised with 61 and this recursion is only over a window of length M, so that

1 S I + 2 x(i>x_T(i) l = J - h f + 1

x (1 - y2) - y 2 X ( j + l)& + 1)

= p-*z + (1 - ~ - ~ ) 4 ( j ) - Y - ~ x _ ( ~ + l)gT(j + 1) (17)

which for pTJ 1 > 0 gives

p - ' ~ + (1 - yd2)6( j ) > Y - ~ [ ~ I + x ( j + l )xr ( j + 111 ( 1 8 )

A sufficient condition for this is 1 P - + (1 - y-2)a[4(j)] > y-20[6z + x(j + l)X*(j + l)]

(19) where a[.] and e[.] denote minimum and maximum singu- lar values respectively. Now, the small regularisation constant, 6, is the lowest eigenvalue of the sample covar- iance matrix $( j ) , i.e. the minimum singular value, is 6. If xz = sup, rz~x.. , then (1 9) gives - A

5.4 Downdate A similar procedure is followed for the downdate with the same existence check, but a different Riccati equation with HJ = Lj = zT( j - A4) and the 61 regularisation as before

P-' - P-' - (1 - ~ - ~ ) g ( j - M)gr ( j - M ) -Riccati J + 1 - J

(21) Letting the maximum singular value of 4 ( j - A4) be A, the following bound for y is calculated

5.5 Discussion of results In their present form, all the limits on y deviate from unity, thus indicating the nonoptimality, in the H" sense, of the APA. This agrees with the performance of the multichannel version of the APA reported in [ 101 where a moderate noise

6

level of 30dB could cause the algorithm to diverge. To make more sense of the limits it was found useful to make an approximation to the bounds by assuming the effect of the regularisation term, 6, is much larger than the effect of the initial disturbance term 1/11.. This is not strictly true as the disturbance will never completely disappear, but in practice the slightly tighter bounds that can be derived are good approximations to the exact bounds given above. This Seems a reasonable assumption as the windowing method in the APA cuts off access to old data, or information, abruptly each iteration. With this assumption the bounds can be analysed more freely.

The a priori update bound then becomes

Xi y 2 > 1 +- 6

The limit can be seen to deviate from unity by XJS. The numerator of this second term can be thought of as the maximum input energy up to that time, along similar lines to the inverse tap input power condition necessary to keep the LMS algorithm, [2] , stable. (The limit for the LMS algorithm is unity and it is optimum in the H" sense for predicted errors [17].) The denominator depends on the amount of regularisation, so the more regularisation, the lower the second term becomes overall. However, to force the algorithm to remain stable means that the convergence will slow as the solution is drawn away from the true solution by the 6Z term in the inverse covariance matrix. With no regularisation, the y limit would tend to infinity indicating the observed sensitivity to noise, although in practice the initial guess term would prevent this.

The a priori downdate bound becomes -

(24) y 2 > 1 - - X i - M + l 3,

The limit again has a term deviating from unity, but in the opposite direction. This is due to the fact that in the downdate, data is being removed which makes for a more robust situation i.e. less information, less uncertainty with respect to an unknown noise signal. As before, the numerator is an input energy term. The denominator is the maximum singular value of the inverse covariance matrix, A. As the minimum singular value is set by the regularisa- tion constant, 6, this A can now be thought of as the condition number [2] of the inverse covariance matrix. The higher the condition number of the matrix, i.e. the ratio of the maximum singular value to the minimum singular value, the harder it is to invert. So, for APA, if the matrix is reasonably regular, then A is low and the y-level deviates from unity greatly in a negative direction, indicating strong stability. If the inverse covariance matrix is strongly ill-conditioned, then 2 will be high and the y limit will only move negatively away from y by a small amount, indicating only slight improvement in stability. This move below the optimum value of unity for the y-level is possible when data is lost, as a negative Gramian with loss of information is a stabilisation in terms of noise sensitivity.

The convergence performance of APA increases with an increasing order of projection, but the bordering theorem, [ 181 states that the maximum eigenvalue will not decrease and the minimum will not increase when a matrix grows in size. This means that the higher the order of projection, the more ill-conditioned the inverse covariance matrix will become with a higher condition number and hence more sensitive to noise by the preceding arguments. This beha- viour has been observed in practice where more regular-

IEE P~oc.-fis Image Signal Process, Vol 149, No 1, February 2002

Authorized licensed use limited to: Imperial College London. Downloaded on January 4, 2010 at 08:17 from IEEE Xplore. Restrictions apply.

Page 7: Alternating fixed-point algorithm for stereophonic ...

isation is needed to stabilise APA at a particular noise level.

same. The second two terms therefore cancel and the bound collapses to unity with no robustness penalty from this main IOOD. The FPIs can be considered in a similar

5.6 H" bound for MCAPA of order 3 For a projection order of three, (M=3), MCAPA performs orthogonal projections in several directions. There is an orthogonal projection down each channel and an orthogonal projection across channels. All of these involve the inverse of a reduced rank 3 x 3 matrix which necessitates regularisation to allow inversion. These four projections are e(flpi(n~, e ( n > ~ 2 ( n > , e+i),x,(fl - 2pi(n) and P&fl-l),x2cfl - 2,x2(n). The former. two mappings are the respective cross orthogonalisations for channels 1 and 2, and the latter two are the through orthogonalisations for each channel. Each projection is associated with a simul- taneous update and downdate as the data window slides forward one step, each having a deviation of

from the suboptimal bound of unity. The subscripts corre- spond to the size of the inverse covariance matrix or projection order. As this inverse covariance matrix is always very ill-conditioned, its eigenvalue spread will be high, so that

Using this, successive terms can be added to the normal- ised cost by increasing the index and so the total y-limit can be written as

Now, using the fact that j i i ~ X i - - M + , and (26), the third term on the right-hand side of (27) disappears and the final limit for MCAPAME3 becomes

Without a great deal of extension, the preceding methods can be employed to derive similar limits for AFP-NLMS2 in the next section.

5.7 H" bound for AFP-NLMS2 with three FPls It is only necessary to derive an H" bound for one side of the operation of AFP-NLMS2 as the other limit will naturally be identical i.e. for one main loop and three FPIs. As for MCAPAM=3, the deviations from unity or robustness penalties for each iteration can be added to the cost by increasing the index to provide a y-bound for the descent of one side of the algorithm. The main loop is a standard iteration of the NLMS algorithm which has been shown to be H" optimal for a posteriori (filtered) errors in [ 171. However, we are dealing with the a priori case and if the NLMS loop is thought of as an equivalent APA iteration with projection order M = 1 then it will have the following a priori bound

Now, for M = 1, the numerators in the second two terms are identical, i.e. = j i r -M+l . The inverse covariance matrix here, of unity dimension, takes the form of dividing by a vector norm and thus its condition number is unity, i.e. 61 = A 1 , the maximum and minimum eigenvalues are the

IEE Proc -Vi& Image Sign01 Process ~ Vol 149, No I , February 2002

manner as MCAPA, only here all the projections are of lower order. In fact, each FPI corresponds to a projection of order one as the relevant data vector is orthogonalised with respect to only one other vector or direction. The inverse matrix that was present within the solution for MCAPA of orders higher than one is replaced by the division of a scalar inner product or vector norm. In a similar vein, the regularisation matrix of the former case is replaced by a regularisation constant to prevent division of a zero vector norm at low power levels, as in the E-NLMS algorithm.

This projection can be either a cross orthogonalisation for the first FPI or a through orthogonalisation for the next two. This means that the y-level contribution or robustness penalty from unity for each FPI is the same, given by jii/6, . There is only a contribution for the update as the window is simply growing by one dimension each iteration and no downdate is necessary. The final bound for the AFP-NLMS2M=3 is then

2 jii YAFP-NLMS2,,, 2 + - 61

5.8 Comparison of the bounds The aim of this section is to show that the bound for MCAPA of order three is greater than that of AFP-NLMS2 with three FPIs, i.e. it is desired to show the following

2 yMCAPA,=, ' y;FP-NLMS2,=,

This will mean that the observed superior noise robustness of the latter algorithm has been proven analytically. It should be noted here that the comparison is carried out using the approximate bounds that do not include the initial disturbance term, so the results should be viewed accord- ingly. From the Bordering theorem it is known that when we add another data vector to a row and a column of a matrix, the lowest singular value does not increase and the highest singular value does not decrease. This means that when applied to APA or similar, the eigenvalues are ordered as

6,? 5 ... 5 63 5 62 5 6, (32) where the subscript denotes the dimension of the inverse covariance matrix. Now the limits for MCAPAM=3 and AFP-NLMS2M= are listed again

- 2 x. 2

yMCAPAM=3 2 f 4-i YAFP-NLMSZ,,, 2 + :i 63 61

Then, from (32)

(33)

and clearly (31) stands.

5.9 Graphical comparison of H" bounds In this Section, the preceding bounds are backed up by several simulation results. The normalised cost between errors and disturbances for the simulation is plotted against the sample number, again with the initial disturbance term neglected. This seems correct in a practical situation as the initial guess at the room response(s) to be identified cannot be deemed to be anything other than zero, unless there is a priori knowledge about the form of the response. This a priori knowledge could possibly be that the envelope of

7

Authorized licensed use limited to: Imperial College London. Downloaded on January 4, 2010 at 08:17 from IEEE Xplore. Restrictions apply.

Page 8: Alternating fixed-point algorithm for stereophonic ...

the response is known to have a decreasing exponential form, but this may not necessarily be any closer to the true response than the simpler zero guess. In other words, zero is as good a guess as any and there is no way of telling how good a guess this is until the solution is estimated by operation of the relevant algorithm. By guessing zero for all simulations then, an unknown constant appears on the denominator and can justifiably be omitted.

The first simulation was carried out using synthetic data as the regularisation is not needed and there is one less variable to examine. The input data is white and the covariance matrix is not ill-conditioned, so there is no instability in the solution for MCAPA. The convergence rates of MCAPAM=3 and AFP-NLMS2MM,3 were matched, (approximately 10 dB in 200 iterations), and the resulting H“ norms plotted together in Fig. 6 over an ensemble average of 30 runs. The first few iterations where the error is large were omitted to allow for a more detailed view. It can be seen that the curve for AFP-NLMS2 is always lower than that for MCAPA, supporting the relative positions of the bounds derived for the two algorithms in the preceding sections.

The scenario for real speech data is more relevant, but also more complicated as differing values of the regular- isation constant must be studied. This case will agree with the ill-conditoned covariance matrix assumption in the foregoing proofs. As for the synthetic data case, the convergence curves were matched to provide a fair comparison, although the matching is not as exact due to the non-stationary nature of the data signal. The regular- isation (normalisation) constant, 6, for AFP-NLMS2 has little effect on the convergence curve even up to a value of 100, indicating the inherent robustness of the algorithm (Le. there is no bias/variance trade-off for AFP-NLMS2, whereas for MCAPA the 6 parameter must be finely tuned to get the best balance between a fast initial transient convergence (bias) and a non-oscillatory (variance) solu- tion). In fact, for any setting of 6, the H“ performance for AFP-NLMS2 was always below that for MCAPA.

To match the initial convergence of this with MCAPA, a value of unity was needed, although some convergence oscillation in the latter stages of the simulation did occur. The H“ norm is plotted in Fig. 7 and as before with the synthetic data, the curve for MCAPA is always above that for AFP-NLMS2. This was repeated for several other real speech datasets and equivalent results were obtained.

Ooor

200 400 600 800 1000 sample number

Fig. 6

8

Comparison of H“ normsfor synthetic data

400 r

300 h I - I \

“0 0.5 1.0 1.5 2.0 2.5 3.0 sample number (x104)

Fig. 7 Comparison o f P norms f o r real speech data, for 6 = I

6 Conclusion

In this paper, a new fast converging, low complexity algorithm for SAEC has been introduced and evaluated by simulation. The fixed-point operation of the algorithm helps to isolate the channels from one another and helps steer the adaptation away from non uniqueness in the solution. To accelerate the convergence during the fixed- point iterations, the highly correlated stereo input signal is decorrelated by inter and intra-channel orthogonal projec- tions which are kept stable by a main E-NLMS2 main loop. The AFP-NLMS2 algorithm was shown to provide increased performance over MCAPA with real speech data in a noisy environment. The true structure of AFP- NLMS2 was seen to be a succession of nested low-order affine projections which give rise to its stability in the presence of noise. This inherent noise robustness was revealed using H“ techniques and noise-error bounds were derived and confirmed by simulation.

7 Acknowledgments

The authors are extremely grateful to Dr. Imad Jaimoukha for helpful discussions on H“ estimation theory. The guidance from Dr. Babak Hassibi through his H“ frame- work and his finite memory proof was also much appre- ciated.

References HANSLER, E.: ‘The hands-free telephone problem-an annotated bibliography’, Signal Process., 1992, 27, pp. 259-271 HAYKIN, S.: ‘Adaptive filter theory. (Prentice Hall, Englewood Cliffs, NJ, 3rd edn., 1996) AMAND, E, BENESTY, J., GILLOIRE, A,, and GRENIER, Y.: ‘Multi- channel acoustic echo cancellation’. Proceedings of IWAENC’95, Roros, Norway, June 1995, pp. 57-60 SONDHI, M.M., MORGAN, D.R., and HALL, J.L.: ‘Stereophonic acoustic echo cancellation-an overview of the fundamental problem’, IEEESignal Process. Lett., 1995, 2, pp. 148-151 BENESTY, J., MORGAN, D., and SONDHI, M.: ‘A better understand- ing and an improved solution to the problems of stereophonic acoustic echo cancellation’. Proceedings of ICASSP’97, Munich, Germany, April

BENESTY, J., MORGAN, D., and SONDHI, M.M.: ‘A better under- standing and an improved solution to the specific problems of stereo- phonic acoustic echo cancellation’, IEEE Trans. Speech Audio Process.,

FORSYTH, N.: ‘A subband and noise robust approach to stereophonic acoustic echo cancellation’. PhD thesis, Communications and Signal Processing Group, Imperial College of Science, Technology and Medi-

1997, pp. 303-306

1998,6, pp. 156-165

cine, London, October 2000 BELLANGER, M.: ‘Adaptive digital filters and signal analysis’ (Marcel Dekker Inc., New York, 1987)

IEE Proc.-Vis. Image Signal Process., Vol. 149, No. I . February 2002

Authorized licensed use limited to: Imperial College London. Downloaded on January 4, 2010 at 08:17 from IEEE Xplore. Restrictions apply.

Page 9: Alternating fixed-point algorithm for stereophonic ...

9 MORRIS, A,: ‘Linear algebra-an introduction’ (Van Nostrand

I O FORSYTH. N., CHAMBERS, J., and NAYLOR, P.: ‘Noise robust Reinbold (UK) Co. Ltd., 1982, 2nd edn.)

altemating fixed-point algorithm for stereophonic acoustic echo cancel- lation’, Electron. Lett., 1999, 35, pp. 18 12-1 8 13

I 1 BENESTY, .I., DUHAMEL, P., and GRENIER, Y.: ‘A multichannel affine projection algorithm with applications to multichannel acoustic echo cancellation’, IEEE Signal Process. Lett., 1996, 3, pp. 35-37

12 GAY, S., and TAVATHIA, S.: ‘The fast affine projection algorithm’. Proceedings of ICASSP ’95, Detroit, USA, May 1995, Val. 5 , pp. 3023- 3026,

13 GAY, S.: ‘A fast converging, low complexity adaptive filtering algo- rithm’. IEEE Workshop on Applications of signal processing to audio and acoustics, 1993, pp. 4-7

14 MAKINO, S., STRAWS, K., SHIMAUCHI, S., HANEDA, Y., and NAKAGAWA, A.: ‘Subband stereo echo canceller using the projection

IEE Proc.-Vis. Image Signal Process., El. 149, No. I , February 2002

algorithm with fast convergence to the true echo path’. Proceedings ICASSP’97, Munich, Germany, April 1997, pp. 299-302

15 OZEKI, K., and UMEDA, T.: ‘An adaptive filtering algorithm using an orthogonal projection to an affine subspace and its properties’, Electron. Commun. Japan, 1984, 61-A, (S), pp. 19-27

16 HASSIBI, B., SAYED, A,, and KAILATH, T.: ‘Indefinite-quadratic estimation and control-a unified approach to H2 and fF theories’. Society of Industrial and Applied Mathematics, 3600 University City Science Centre, Philadelphia, USA: SIAM studies in applied and numerical mathematics, 1999, Vol. 16

17 HASSTBI, B., SAYED, A., and KAILATH, T.: ‘H infinity optimality of the LMS algorithm’, IEEE Fans. Signal Process., 1996, 44, pp. 267- 279

] 8 HAYES, M.: ‘Statistical digital signal processing and modelling’ (John Wiley and Sons Inc., New York, USA, 1996)

9

Authorized licensed use limited to: Imperial College London. Downloaded on January 4, 2010 at 08:17 from IEEE Xplore. Restrictions apply.