Joint Dereverberation and Noise Reduction 4pt Using ...

16
Joint Dereverberation and Noise Reduction Using Beamforming and a Single-Channel Speech Enhancement Scheme B. Cauchi, I. Kodrasi, R. Rehr, S. Gerlach, T. Gerkmann, S. Doclo, S. Goetze Fraunhofer IDMT, Project Group Hearing, Speech and Audio Technology Oldenburg University, Signal Processing Group Florence, May 10th 2014 [email protected] phone 0441 2172-450 c Fraunhofer IDMT 1/16

Transcript of Joint Dereverberation and Noise Reduction 4pt Using ...

Page 1: Joint Dereverberation and Noise Reduction 4pt Using ...

Joint Dereverberation and Noise ReductionUsing Beamforming and a Single-Channel

Speech Enhancement Scheme

B. Cauchi, I. Kodrasi, R. Rehr,S. Gerlach, T. Gerkmann,

S. Doclo, S. Goetze

Fraunhofer IDMT,Project Group Hearing, Speech and Audio Technology

Oldenburg University,Signal Processing Group

Florence, May 10th 2014

[email protected]

phone 0441 2172-450

c©Fraunhofer IDMT1/16

Page 2: Joint Dereverberation and Noise Reduction 4pt Using ...

Introduction

� Overview of the proposed system

� Design of the MVDR beamformer

� DOA estimated using MUSIC� Estimated noise covariance

� Single-channel enhancement scheme

� Combination and optimization of published estimators

� Results

� Objective measures� MUSHRA scores� WER using a baseline recognizer

c©Fraunhofer IDMT2/16

Page 3: Joint Dereverberation and Noise Reduction 4pt Using ...

1. Proposed System

Overview

MV

DR

bea

mfo

rmer

singl

e-ch

annel

enhan

cem

ent

VAD

DOAestimation

coherenceestimation

s(n)

x(n)

θ

Γ(k)

y1(n)

y2(n)

yM(n)

h1 (n)

h2 (n)

hM (n)

θ

s(n)

� Beamformer: towards estimated direction of arrival (DOA)

� Single-channel enhancement: based on statistical estimators

� Late reverberant spectral variance (LRSV)� Noise spectral variance (NSV)� Speech spectral variance (SSV)

c©Fraunhofer IDMT3/16

Page 4: Joint Dereverberation and Noise Reduction 4pt Using ...

2. MVDR Beamformer

With Ym(k, ℓ) the STFT of the input signal in the m-thmicrophone we define

Y(k, ℓ) = [Y1(k, ℓ) Y2(k, ℓ) . . . YM(k, ℓ)]T

The output X(k, ℓ) of the beamformer is obtained as

X(k, ℓ) = WHθ (k)Y(k, ℓ)

where

Wθ(k) = Γ−1(k)dθ(k)

dH

θ(k)Γ−1(k)dθ(k)

� Noise coherence matrix: Γ(k) estimated using a VAD.

� Steering vector: dθ(k) from θ using a far-field assumption.

c©Fraunhofer IDMT4/16

Page 5: Joint Dereverberation and Noise Reduction 4pt Using ...

2. MVDR Beamformer

Estimation of noisefield coherence

� Noise periods identified with a VAD� Comparison between the long-term spectral envelope and the

average noise spectrum

� Γ(k) is estimated using detected noise-only frames

� Alternatively, a theoretically diffuse noise field is used:

Wθ(k) = (Γdiff(k)+(k)IM )−1dθ(k)

dH

θ(k)(Γdiff(k)+(k)IM )−1

dθ(k)

with (k) a constraint such that

WHθ (k)Wθ(k) ≤ WNGmax = 10 dB

Ramirez, J., Segura, J.C., Benitez, C., de la Torre, A., and Rubio, A., Efficient voice activity detection algorithms using long-term speech information, 2003.

c©Fraunhofer IDMT5/16

Page 6: Joint Dereverberation and Noise Reduction 4pt Using ...

2. MVDR Beamformer

DOA Estimation

-3 -2 -1 0 1 2 3

-3

-2

-1

0

1

2

3

b

b

b

b

bb

b

b

b

b

b bb

b b

b

b

b

b

bb

b

b

b

θ

θ = argmaxθ

1

K

khigh∑

klow

Uθ(k, ℓ),

where Uθ(k, ℓ) is the MUSIC pseudo-spectra:

Uθ(k, ℓ) = 1dH

θ(k)E(k,ℓ)EH(k,ℓ)dθ(k)

E(k, ℓ) = [eQ+1(k, ℓ) . . . eM (k, ℓ)]

with em denoting eigenvectors of thecovariance matrix of Y(k, ℓ).

Schmidt, R., Multiple emitter location and signal parameter estimation, 1986.

c©Fraunhofer IDMT6/16

Page 7: Joint Dereverberation and Noise Reduction 4pt Using ...

3. Single-channel Enhancement

Overview

noise powerestimation

speechpower

estimation

reverbpower

estimation

T60

estimator

speechpower re-estimation

+

gainfunction

×

σ2v(k)

σ2z(k)

σ2 r(k

)

σ2r(k) + σ2

v(k)

σ2s(k) G(k)

S(k)

T60

X(k)

� σ2v(k, ℓ) estimated using Minimum Statistics

� σ2s(k, ℓ) estimated using Cepstral Smoothing

� σ2r (k, ℓ) estimated using Lebart’s approach

Martin, R., Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics, 2001.Breithaupt, C., Gerkmann T. and Martin, R., A Novel A Priori SNR Estimation Approach Based on Selective Cepstro-Temporal Smoothing, 2008.Eaton, J., Gaubitch, N.D., Naylor, P.A., Noise-robust reverberation time estimation using spectral decay distributions with reduced computational cost, 2012.Lebart, K., Boucher J.M. and Denbigh, P., A new method based on spectral subtraction for speech dereverberation, 2013.

c©Fraunhofer IDMT7/16

Page 8: Joint Dereverberation and Noise Reduction 4pt Using ...

3. Single-channel Enhancement

LRSV estimation

� RIR modeled as Gaussian noise with decay ∆ = 3 ln 10T60fs

decay

RIR

Time [s]

Am

plitu

de

0 0.1 0.2 0.3 0.4 0.5

� Representing the variance of the reverberant speech as:� σ2

z(k, ℓ) = σ2r (k, ℓ) + σ2

s(k, ℓ)

� Leads to the estimator� σ2

r (k, ℓ) = e−2∆Tdfs σ2z(k, ℓ − Td/Ts)

Lebart, K., Boucher J.M. and Denbigh, P., A new method based on spectral subtraction for speech dereverberation, 2001.

c©Fraunhofer IDMT8/16

Page 9: Joint Dereverberation and Noise Reduction 4pt Using ...

3. Single-channel Enhancement

Gain function

� The output X(k, ℓ) of the beamformer contains the anechoicspeech, remaining noise and spatially filtered reverberation

� X(k, ℓ) = S(k, ℓ) + V (k, ℓ) + R(k, ℓ)

� We aim to compute a real gain such that:

� S(k, ℓ) = G(k, ℓ)X(k, ℓ)

� Computation of G(k, ℓ) using an MMSE estimation of thespeech amplitude based on a super Gaussian speech model.

Breithaupt, C., Krawczyk, M., and Martin, R., Parameterized MMSE spectral magnitude estimation for the enhancement of noisy speech, 2008.

c©Fraunhofer IDMT9/16

Page 10: Joint Dereverberation and Noise Reduction 4pt Using ...

4. Objective Measures

SRMR

Real

Unprocessed8 Channels1 Channel

SR

MR

[dB

]Simulated

700 ms700 ms500 ms250 msnear far Meannear far near far near far Mean

1

3

5

7

9

1

3

5

7

9

� Illustrates dereverberation performance in all condition

� Better dereverberation achieved by multichannel, except forT60=500 ms

c©Fraunhofer IDMT10/16

Page 11: Joint Dereverberation and Noise Reduction 4pt Using ...

4. Objective Measures

FWSSNR

Unprocessed8 Channels1 Channel

FW

SSN

R[d

B]

Simulated

700 ms500 ms250 msnear far near far near far Mean

2

6

10

14

� Illustrates noise reduction in all condition

� Beamforming step advantageous for the noise reduction

c©Fraunhofer IDMT11/16

Page 12: Joint Dereverberation and Noise Reduction 4pt Using ...

4. Objective Measures

PESQ

Unprocessed8 Channels1 Channel

PE

SQ

Simulated

700 ms500 ms250 msnear far near far near far Mean

1.5

2.5

3.5

� Improvement of PESQ score in all condition illustrate theoverall improvement in speech quality

c©Fraunhofer IDMT12/16

Page 13: Joint Dereverberation and Noise Reduction 4pt Using ...

5. Subjective Tests

MUSHRA test

Intermediate results of the subjective test ran by the organizers:

� Tests carried out separately for 1 and 8 channels

UnprocessedOverall QualityReverberation

Real

Single-channel

Sim

MU

SH

RA

scor

e[%

]

near farnear far0

1020304050607080

Real

Multichannel

Sim

near farnear far0

1020304050607080

� Improvement for all tested condition

� Higher improvement of the overall quality

c©Fraunhofer IDMT13/16

Page 14: Joint Dereverberation and Noise Reduction 4pt Using ...

6. Preprocessing for ASR

Word Error Rate

� Baseline recognizer provided by the organizers

� Using pre-trained models on clean data

Simulated

2.3

3

3.2

7

−11.9

4

−20.3

1

−15.7

8

−18.6

9

−10.1

8

1.6

5

2.8

3

−24.9

1

−42.5

3

−21.6

3

−30.4

8

−19.1

7

WE

R \

%

250, near 250, far 500, near 500, far 700, near 700, far Mean10

20

30

40

50

60

70

80

90

100

1 Channel

8 Channels

baseline

Real

−12.7

4

−11.4

8

−12.1

1

−25.9

3

−25.7

6

−25.8

5

700, near 700, far Mean10

20

30

40

50

60

70

80

90

100

c©Fraunhofer IDMT14/16

Page 15: Joint Dereverberation and Noise Reduction 4pt Using ...

7. Conclusion

� System based on combination of MVDR beamformer andspectral enhancement

� All parameters are blindly estimated

� Speech enhancement achieved in all conditions in terms of:

� Objective measures� Subjective tests� Word error rate

c©Fraunhofer IDMT15/16

Page 16: Joint Dereverberation and Noise Reduction 4pt Using ...

Thank you very much for your attention

House of Hearing, Oldenburg

Questions ?

Fraunhofer IDMTProject Group Hearing, Speech and Audio Technology

Oldenburg UniversitySignal Processing Group

[email protected]

c©Fraunhofer IDMT16/16