Application of adaptive digital signal processing to speech

VeteransAdministration

Journal of Rehabilitation Researchand Development Vol . 24 No . 4Pages 65--74

Application of adaptive digital signal processingspeech enhancement for the hearing impairedDOUGLAS M. CHABRIES, RICHARD W . CHRISTIANSEN, ROBERT H . BREY, MARTIN S.ROBINETTE, and RICHARD W . HARRIS*

Brigham Young University, Provo, Utah and Mayo Clinic Audiology Department, Rochester, Minn( sotc

Abstract--A major complaint of individuals with normalhearing and hearing impairments is a reduced ability tounderstand speech in a noisy environment . This paperdescribes the concept of adaptive noise cancelling forremoving noise from corrupted speech signals . Applica-tion of adaptive digital signal processing has long beenknown and is described from a historical as well astechnical perspective . The Widrow-Hoff LMS (least meansquare) algorithm developed in 1959 forms the introduc-tion to modern adaptive signal processing . This methoduses a "primary" input which consists of the desiredspeech signal corrupted with noise and a second "ref-erence" signal which is used to estimate the primarynoise signal . By subtracting the adaptively filtered esti-mate of the noise, the desired speech signal is obtained.Recent developments in the field as they relate to noisecancellation are described . These developments includemore computationally efficient algorithms as well asalgorithms that exhibit improved learning performance.

A second method for removing noise from speech, foruse when no independent reference for the noise exists,is referred to as single channel noise suppression . Bothadaptive and spectral subtraction techniques have beenapplied to this problem—often with the result of decreasedspeech intelligibility . Current techniques applied to this

* Douglas M. Chabries and Richard W. Christiansen are with theElectrical Engineering Department, Brigham Young University, Provo,Utah 84602.

Robert H . Brey and Richard W . Harris are with the Department ofEducational Psychology (Audiology), Brigham Young University, Provo,Utah 84602.

Martin S . Robinette is with the Mayo Clinic Audiology Department,Rochester, Minnesota 55905 .

problem are described, including signal processing tech-niques that offer promise in the noise suppression appli-cation.

INTRO?DUCTION

Hearing impairment is not only the most prevalentcommunicative disorder, it is also the number onechronic disability affecting people in the UnitedStates . A major complaint of those with hearingimpairments is a reduced ability to understand speechin everyday communication in a noisy environment.Even with the absence of hearing impairment, theaddition of background noise can signifiantly reducethe intelligibility of speech . In 1956, Widrow pro-posed an adaptive filter as shown in Figure 1 whichcan be used to reduce interference when a secondsample of the noise is available . This technique wasdeveloped at Stanford University in 1959 and appliedto a pattern-recognition scheme known as Adaline.In 1965 the first adaptive noise cancelling systemwas built by two students at Stanford University.In 1972, the first all-digital adaptive filter was builtby McCool and Widrow at the Naval UnderseaCenter in Pasadena, California . In 1975, severalapplications of the LMS algorithm were presentedwhich included adaptive noise cancelling and noisesuppression (32) . The LMS algorithm was simple—both in the number of calculations required for it'supdate and in it's derivation—and robust in a numberof applications . An adaptive feedback constant, p ,

65

67Section II. Noise Reduction : Chabries et al.

SIGNAL + INTERFERENCE

Figure 2.An adaptive LMS filter.

tion from the desired optimal filter results . Byreducing the size of the feedback coefficient inEquation [3], the misadjustment can be made arbi-trarily small . However, since the adaptation time isinversely proportional to µ, a large µ for fast learningis required in applications where the statistics of theinput signals vary with time . This selection, how-ever, results in increased misadjustment or residualerror. One may optimize the choice of µ in casesof these nonstationary inputs by selecting a feedbackconstant such that the error due to tracking thenonstationarity in the signal just equals the misad-justment or error residual that occurs because ofthe constant updates that occur after the filtercoefficients, wp(k), have converged to their desiredvalue (31).

Since the filter parameters adapt in such a wayas to provide an estimate of n p ,,, (k) from the referencesignal n,.ef (k), the task in applying the algorithm tonoise cancellation becomes one of providing suffi-cient degrees of freedom that an acceptable solutionmay be obtained . The following design or selectioncriteria must be followed:1. The number of digital filter stages (N in Equation

[1] should be selected so that N times T, thesample period of the digital system, is larger thanthe impulse response or reverbertion time of theacoustic environment . For small rooms, typicalfilter lengths are on the order of 1,000 to 2,000stages for a sample rate of 10 kHz . It is notuncommon to require 8,000 to 16,000 stages ofadaptive filtering for moderate size rooms at asample rate of 10 kHz.

2. The selection of the feedback constant µ is madeaccording to the desired adaptation rate . Thechoice of µ for a given adaption rate with commonbroadband noise interference is given as

where o represents the variance of the referencenoise process, n,.ef(k), and -r is the desired adap-tation time in samples. In no case should theupper limit for p, exceed oV2 or instability willresult.

3. A time delay must be inserted into either theprimary or reference channel as necessary toinsure that the desired filter is causal . That is,the reference noise signal should enter the ref-erence input to the adaptive filter during the sameprocessing period in which the correlated primaryinterference arrives at the summing junction inFigure 1.

4. In cases where there is leakage, care must betaken to minimize leakage of the desired speechsignal into the reference input . In such cases, theinterference cancellation is limited to the ratio ofthe interference signal to speech signal in thereference.Following these four procedures in the use of an

adaptive filter has resulted in interference reductionscorresponding to speech enhancement of :gip to 60dB ., but more typically 30 dB.

A variant of the LMS algorithm that provides theability to adapt the feedback constant µ was devel-oped by Harris (17) . In this approach known as theVS adaptive algorithm, a separate p p (k) is calculatedfor each stage of the filter . Results of the VSalgorithm applied to noise cancellation show a speed-up in adaptation time of up to a factor of 50 withoutincreasing the residual error while maintaining bothspeech quality and intelligibility.

Frequency Domain Adaptive Filters

One of the drawbacks of the LMS adaptive filterin processing speech signals is the error criterion,which is selected for the minimization of meansquare error and which results in an adaptationprocess that treats frequency regions of higherenergy content before adapting to regions of lowenergy. As a result, the lower frequency regions ofthe speech spectrum receive an inordinate amountof attention at the cost of the high-frequency speechregions—which contain much of the intelligibility.As a second consideration, users of adaptive filtersare always anxious to find more efficient computa-tional techniques to perform the adaptive filteringtasks . As a result, several papers describing fre-quency-domain implementations for adaptive filter-ing have been presented (7,9,10,12,26,30). One of[4]

69Section H . Noise Reduction : Chabries et al.

corrupted 1,y rence when no independent orsecond referel i- is available is referred to as noisesuppression.

In 1978 Sambur (27) proposed to apply an LMSversion of the digital adaptive filter described in theforegoing but using reference delays equal to oneor two voice pitch periods . Sambur reasoned thatin the speech component of the corrupted signalthere would be strong correlation between the pri-mary and delayed reference inputs . He reportedimproved SNR and speech quality but did not claimimproved intelligibility . In fact, the time domainLMS adaptive filter concentrates its computationalpower first on those frequencies in the signal withthe highest energy to minimize mean square error,(i . e., pitch and pitch harmonies where little intel-ligibility is carried) and finally on frequencies withthe least energy (i .e ., high frequency sounds wheremost of the information in speech is carried) . Thisresults in an output that sounds like muffled speechdeplete of high-frequency information.

The muffling effect appears to be present in mostspeech enhancement systems, prompting the state-ments by Lim (21) and Schafer (29) that the variousspeech enhancement systems appear to improve thesubjective speech quality but not speech intelligi-bility, and that successful approaches must exploitmore knowledge about the information-bearing ele-ments of speech .

Figure 3.Time domain adaptive LMS noise suppressor.

3 . Reverberation is introduced because the LMSalgorithm in Eqs . 1-3 responds to minimize meansquare error and will leave large values for w,,(k)when n,, t(k-p) becomes small or is zero—as isthe case during the silent portions of speech . Thesound introduces is reminiscent of listening to asea-shell and li narfrig that reverberant back-ground.The general behavior of the adaptive noise sup-

pressor with the inherent problems just describedmay be seen in Figure 4 which shows the spectrumof noise-free speech before and after processing withthe LMS algorithm as proposed by Sambur . Hereit is clear that the high-frequency information aboveabout 1 .2 kHz, which is evident in Figure 4a, isgone in Figure 46.

dB

Time Domain Filters for Noise Suppression

Figure 3 describes the application of the LMSWidrow-Hoff time domain adaptive algorithm to theproblem of noise suppression . This filter is attractivebecause of its relative simplicity . Sambur (27) pro-posed the time domain implementation of the adap-tive filter shown in Figure 3, which employs theLMS Widrow-Hoff algorithm in Eqs . 1-3 for thefilter coefficient updates . Implementation of thisLMS adaptive filter results in three deficiencies:1. The speech spectrum is distorted, with the low-

frequency region enhanced due to the high energycontent at the low frequencies . This is a resultof the mean square error criterion.

2. Unvoiced speech sounds are eliminated in thesignal processing by large delays which are typ-ically a pitch period . This results in confusionbetween such words as net, nets, next, etc.leading to reduced intelligibility .

1 .25

2 . 50 KHz

(a) Before ProcessingdB

1 .25

2 . 50 KHz

(b) After Processing

Figure 4.Noiseless speech before and after processing by an adaptiveLMS filter .

70Journal of Rehabilitation Research and Development Vol . 24 No . 4 Fall 1987

There are several approaches to reducing the dBdeleterious effects introduced by time domain adap-tive processing . By forcing increased processorattention to high frequencies, the spectral distortionmay be made acceptable. A second problem isobserved during the change from speech to silencebetween words . The time domain adaptive processorstops updating the filter weights when the inputsignal power decreases significantly (during speechsilence) so the filter "remembers" the weights fromthe prior speech . As the next word begins, anannoying echo or synthetic reverberation effect isproduced.

One solution for overcoming these deleteriouseffects has been presented by Andersen (6) . Histreatment included the following:

1 . Pre-whitening . The general behavior of the adap-tive LMS filter in processing both high frequency-low energy and low frequency-high energy por-tions of the speech spectrum is illustrated inFigure 4. In Figure 4b it is clear that the highfrequency information bearing elements of thespeech sample have been removed . By applyinga pre-whitening filter with the transfer functiongiven in Eq . 5

H(z) = 1 + az-'

[5]

where the value for a places the corner frequencyat 100 Hz.

a rKHz

2. Preservation of unvoiced speech . If the speech isdelayed more than .5 to 1 ms ., the high-frequencyportions become decorrelated and hence areremoved along with the decorrelated noise.Therefore the amount of delay, A, allowed fordecorrelation in speech is reduced . Typical valuesreported for the delay in Figure 4 were on theorder of 3 to 5 samples with a sample rate of 14kHz . This will work well as long as the noiseinterference is not still correlated for this shortdelay.

3. Lossy LMS Algorithm . A modified weight updateequation as proposed by Gitlin (15) and given inEquation [6] was employed.

wp(k + 1) = wp(k)(1– p) + 2 p,e(k)n, et (k – p) [6]

The purpose of this modification is to introducea "leak" factor so that the filter coefficient valuesare not "remembered" during silent portions ofspeech. Instead the weights are "forgotten" witha time constant set by the leak factor (1 – p) . The

1 .25

2 .50(a) Noiseless Speech

dB

l 1 s

1 .25

2.50

KHz(b) Pre-whitened Noiseless Speech

dB "

1 .25(c) Pre-whitened NoiselessFigure 5.Pre-whitened noiseless speech before and after processing byan adaptive LMS filter.

leak rate is proportional to p and may be adjustedfor listener preference.The effect of these techniques in preserving the

high frequency information is shown in Figure 5c.Other techniques to accomplish similar effects havebeen proposed by Graupe (16) . Application of an-other class of adaptive algorithms known as theleast-squares algorithms (previously discussed) alsopromises to solve the problems, introduced by theLMS adaptive filter, that arise primarily due to thenonuniformity of the signal spectrum and the slowadaptation time.

Spectral SubtractionSpectral subtraction is a technique that exploits

the idea that the human hearing system is insensitiveto phase information in monaural hearing applica-tions. Boll (3) and Lim et al . (23) proposed that theshort-time spectral magnitude be used to estimatethe speech spectrum . While a number of methodsexist to extract this estimate of the noise spectrumand subtract it from the contaminated speech signal,the basic technique uses an estimate derived fromEquation [7] .

2.50

KHzSpeech after Processing

[7]S(w) pri(t0 )D(w)h -

71

Section II . Noise Reduction : Chabries et al.

where the capital letters refer to the Fourier trans-forms of the s(k), d(k), and npr;(k), respectively . In

Equation 17], the value of S(w) is set to zero if theestimate of the noise, Np,-i(w), is larger than D(w).

Error CriterionCommon to all of the above techniques is the

underlying notion that an increase in signal-to-noiseratio will result in improved intelligibility . Indeed,the minimization of mean square error focuses uponportions of the speech spectrum that have the mostenergy, and tends to lightly address the higherfrequency portions of the speech spectrum that carrysignificant portions of the information required forspeech intelligiblity . With this question is the com-panion inquiry, "Does an increase in signal-to-noiseratio, which does little to aid normal-hearing lis-teners, offer relief for hearing-impaired listeners?"Can these algorithms, which provide their bestperformance at higher signal-to-noise ratios (i .e .,such as + 6 dB), provide hope for hearing-impairedlisteners to function with processing in such anenvironment where they might typically requireperhaps an additional 12 dB of signal to function?

To answer this question, considerable effort isplanned with normal and hearing-impaired popula-tions . However, in the midst of these efforts one istempted to revisit the basic notions of speech intel-ligiblity and the human hearing system . Currentefforts in noise suppression are focusing upon modelsof the hearing system . Examination of the Fletcher-Munson curves, which describe the contours ofequal loudness, suggest the use of homomorphicdigital signal processing techniques where one min-imizes the error, not in the incoming acoustic pres-sure field, but rather in a transformed space thatrepresents the response of the ear . Further, espe-cially in the context of the hearing-impaired listener,that nonlinear response of the ear to a linear increasein acoustic intensity known as recruitment must beaccommodated in such a model.

While several algorithms may be proposed tofacilitate this modified error criterion, the frequency-

domain techniques appear to come to the forefrontin noise suppression . The frequency domain algo-rithms discussed earlier may be adapted to a mod-ified linear error criterion by the selection of anappropriate vector µ ; however, care must be takento avoid circular convolution effects . Implementa-tion of recruitment in such algorithms is difficult.The frequency domain technique that seems to offerthe greatest flexibility is one proposed by Ferrarain 1985 (13) . This implementation utilizes an FFT-based implementation with a bank of band-passfilters and a basebanded output to accomplish adap-tive LMS filtering . It would appear that such algo-rithms offer some promise of noise suppression withintelligibility gains—but then only at positive signal-to-noise ratios.

CONCLUSION

Several implementations of techniques for accom-plishing adaptive noise cancelling and noise suppres-sion have been discussed . Application of two-chan-nel noise cancellation has yielded significant gainsin speech intelligibility, in some cases by 40 percent,while simultaneously improving signal-to-noise ratiofor speech corrupted with a variety of types ofcorrupting noise—including speech babble andbroadband noise . Reports in the literature indicateincreases in signal-to-noise ratio of greater than 18dB and of up 60 dB in specific applications . Theeffective application of single-channel adaptive noisesuppression has shown progress with uniform re-ports of an increase in "quality ." No data indicatingintelligibility improvement for normal-hearing pop-ulations is presently available in the literature . Testswith hearing-impaired populations to determine theefficacy of noise suppression should be available tothe community shortly; likewise, new algorithmswhich modify the error criteria in the context of thehuman hearing system would seem to provide thegreatest hope for improving the processing of acous-tic signals in the future .

72Journal of Rehabilitation Research and Development Vol . 24 No . 4 Fall 1987

REFERENCES

1. Bode H and Shannon C: A simplified derivation of linearleast squares smoothing and prediction theory . Proc. IRE,38 : 417–425, 1950.

2. Boll SF: Adaptive noise cancelling speech using the shorttime transform . Proceedings ICASSP, 3 : 692-695, 1980.

3. Boll SF: A spectral subtraction algorithm for suppressionof acoustic noise in speech . Proceedings of IEEE ICASSP,200–203, 1979.

4. Brey R, Chabries DM, Christiansen RW, Robinette M, andJolley: Improved intelligibility in noise using adaptivefiltering I : Normal subjects . Proceedings of ASHA Con-vention, San Francisco, 1984.

5. Chabries DM, Christiansen RW, Brey R, and Robinette M:Application of the LMS adaptive filter to improve speechcommunication in the presence of noise, Proceedings o/IEEE ICASSP, 148-151, 1982.

6. Christiansen RW, Chabries DM, and Anderson DB : Noisereduction in speech using a modified LMS adaptive pre-dictive filter . Proceedings IEEE/EMIRS, 1100—1102, Chi-cago, IL, 1985.

7. Christiansen RW, Chabries DM, and Lynn D : Noise re-duction in speech using adaptive filtering I : Signal-proc-essing algorithms 103rd ASA Conference, Chicago IL,1982.

8. Cioffi JM and Kailath T : Fast, recursive-least-squarestransversal filters for adaptive filtering . IEEE Trans . ASSP,ASSP-32 (2) :304–337, 1984.

9. Clark GA, Parker SR, and Mitra SK : A unified approachto time- and frequency-domain realization of FIR adaptivedigital filters . IEEE Trans . ASSP, ASSP-31 (5) :1073-1083,1983.

10. Dentino M, McCool J, and Widrow B: Adaptive filteringin the frequency domain . Proceedings IEEE, 66: 1658-1659, 1978.

11. Falconer D, and Ljung L : Application of fast kalmanestimation to adaptive equalization : IEEE Trans . on Comm .,COM-26 (10) :1439–1446, 1978.

12 Ferrara ER Jr . : Fast implementation of LMS adaptiveliters . IEEE Trans . ASSP, ASSP-28 (4) :474–5, 1980.

13 . Ferrara ER Jr . : Frequency-domain implementations ofperiodically time-varying filters . IEEE Trans . ASSP, ASSP-33 (4) :883–892, 1985.

14 . Ling F, Manolakis D, and Proakis JG: Numerically robustleast-squares lattice-ladder algorithms with direct updatingof the reflection coefficients . IEEE Trans . ASSP, ASSP-34 (4) :837–845, 1986.

15 . Gitlin RD, Mazo JE, and Taylor MG: On the design ofgradient algorithms for digitally implemented adaptivefilters . IEEE Transactions on Circuit Theory, CT-20 (2) : 1973 .

16. Graupe D: Applications of estimation and identificationtheory to hearing aids . In Auditory and Hearing ProstheticResearch, ed. Stile SW, 169-181 . New York : Grune andStratton, 1979.

17. Harris RW, Chabries DM, and Bishop FA : A variable step(VS) adaptive filter algorithm . IEEE Trans . ASSP, ASSP-34(2) :309–316, 1986.

18. Howells P : Intermediate Frequency Side-Lobe Canceller,Aug 24, 1965 . US Patent 3,202,990,

19. Kalman R : On the general theory of control . Proc. EstIFAC Congress . London: Butterworth, 1960,

20. Lee DTL, Morf M, and Friedlander B: Recursive leastsquares ladder estimation algorithms . IEEE Trans . ASSP,ASSP-29(3) :627-641, 1981.

21. Rim J : Signal processing for speech enhancement . In TheVanderbilt Hearing-Aid Report, ed . Fred H. Bess, 124—129 . Upper Darby, PA : Monographs in ContemporaryAudiology, 1981.

22. Lim JS : In Speech Enhancement, Prentice-Hall, 1983."3 . Lim JS, and Oppenheim AV: Enhancement and bandwidth

compression of noisy speech . Proc' . IEEE 67(23), 1977.24. Lucky R: Automatic equalization for digital communica-

tion . Bell System Triclinia/ Journal 44 :547–588, 1965.25. Lucky R, Salz J, and Weldon EJ: In Principles of Data

Communication, New York : McGraw-Hill, 1968.26. Mansour D and Gray AH Jr . : Unconstrained frequency-

domain adaptive filter. IEEE' Trans . ASSP, ASSP-30(5) :726-734, 1982.

27. Sambur MR : Adaptive noise cancelling for speech signals.IEEE Trans . ASSP, ASSP-26(5) :419–423, 1978.

28. Satorius EH, and Pack JD : A least square adaptive latticeequalizer algorithm . TR 575, Naval Ocean System Center,1980.

29. Schafer R : Speech processing for the hearing impaired . InThe Vanderbilt Hearing-Aid Report, ed. Fred H . Bess,130–132 . Upper Darby, PA : Monographs in ContemporaryAudiology, 1981.

30. Narayan SS, Peterson AM, and Narasimha MJ : Transformdomain LMS algorithm : IEEE Trans . ASSP, ASSP-31(3) :609—615,1983.

31. Widrow B and McCool JM: Stationary and non-stationarylearning characteristics of the LMS filter . Proceedings ofthe IEEE, 64(8) :1151—1162, 1976.

32. Widrow B, Glover JR, McCool JIM Kaunitz J, WilliamsCS, Hearn RH, Zeidler JR, Dong E, and Goodlin RC:Adaptive noise cancelling: Principles and applications,Proceedings of the IEEE, 63(12) : 1692—1716, 1975.

33. Wiener N : Extrapolation, Interpolation and Smoothing ofStationary Time Series, with Engineering Applications.New York : Wiley, 1949 .

73Section IL Noise Reduction : Chabries et al.

F [A3]

APPENDIX

The Frequency Domain Algorithm

The classical adaptive noise-cancelling problemis formulated in Figure Al . Defining the primary

input as d(n) and the reference inputs as As,,(n), withn the sample index, the desired and reference inputs

and e Further let x .2L” .m (k) represent theEFT of the (k — 1)st and kth consecutive blocks ofthe mth reference given as

Xi„,.,,,(,,,(k)

and the output of the rnth filter

yL m,m(k) = last L,,, terms of

m ,(k) X.21”,,m(k)]

[A5]

where the notation A B denotes the element byelement multiplication of the two vectors A and Bwhich results in a vector . The sum of the outputsfrom all filters of various lengths, Lm , blocked to Loutput samples is

[A4]

Figure Al.lime-domain representation of a digital adaptive filter with Mreferences of length le.,

may be divided into blocks with index k and rep-resented by the vectors dL(k) and xJ„=, ,,,(k) asfollows

dj:(k) = [d(kL) d(kL + 1) re d(kL+ L

] [Al]

,m(k) = [xm (kL m ) .y m(kL m

x,,,(kL m

[A2]

where

m = 0,1,2,°” ,M

= reference channel number

L,=2"

and

ot,ot m = integers specifying the block lengths.

Transforms may be obtained using the matrix FFT L

as

yL(k)

y L,m(k) .

[A7]

Similarly, the error blocked to L samples becomes

E L(k) = dL l (k) — y l (k)

[A8]

Padding with zeroes and transforming,

2L(k) = FFT 2L

where the definition

Of, = [000° . r0] L

will be used. The weight update equation using the

yL,,H)i(k)

+

Yr,rr(k)=

L

[A6]

and

[A9]

74

Journal of Rehabilitation Research and Development Vol . 24 No. 4 Fall 1987

method of steepest descents becomes

wzL, n .n,(k + 1)

= (1 — p)w,>„n,(k) + 2pE ,Ln,(k)XL n„n,(k) [A10]

where the symbol * denotes conjugation, p specifiesthe rate of leakage, and the quantity 'R is

converged to the Wiener solution ; however in thegeneral case, Equation [Al2] is required.

The weight vector corresponding to the tnth ref-erence, w2L,,,,,,,(k), is updated once each L,,, samplesand the output vector y L"',n,(k) is obtained fromEquation [A5].

1o0 •

0

0

0

• 0p µ I

. 00 0 •

• 0

11Ln, I

0

µ'L ,n

0 0 • 0

0

1L= •

[All]

The fact that the weights have been obtained bycircular convolution is denoted by - . To force theresultant output yLyn, ,n (k) to correspond to a linearconvolution, the frequency-domain weight vector isobtained as

,,, (k)]

[Al2]

where IL , is the L,,, x L,,, identity matrix . The trun-cation of the weight vector in [Al2] ensures that thelast half of a time-domain representation of theweights is identically zero. Mansour and Gray showed

Figure A2 is the block diagram for the frequency-that this truncation was unnecessary with proper domain algorithm that is embodied in Equationsconstraints on the input and that the weight vector

[Al] through [Al2].

w2L , ,n,(k) = FFT2L,

Figure A2.Frequency-domainfilter .

algorithm for the inth reference adaptive

Application of adaptive digital signal processing to speech

Documents

Transcript of Application of adaptive digital signal processing to speech