Advances in WP1

20
Advances in WP1 Nancy Meeting – 6-7 July 2006 www.loquendo.com

description

Nancy Meeting – 6-7 July 2006. Advances in WP1. www.loquendo.com. WP1: Environment & Sensor Robustness T1.2 Noise Independence. Noise Reduction: Spectral Subtraction (YEAR 1) and Spectral Attenuation (YEAR2) “Automatic Speech Recognition With a Modified Ephraim-Malah Rule”, - PowerPoint PPT Presentation

Transcript of Advances in WP1

Page 1: Advances in WP1

Advances in WP1

Nancy Meeting – 6-7 July 2006

www.loquendo.com

Page 2: Advances in WP1

2

WP1: Environment & Sensor RobustnessT1.2 Noise Independence

Noise Reduction:– Spectral Subtraction (YEAR 1) and Spectral Attenuation (YEAR2)

“Automatic Speech Recognition

With a Modified Ephraim-Malah Rule”,

Roberto Gemello, Franco Mana and Renato De Mori

IEEE Signal Processing Letters, VOL 13, NO 1, January 2006

– Evaluation of HEQ for feature normalization (HEQ study + Revision 2)

Page 3: Advances in WP1

3

Denoising Techniques for Y2 evaluations (1)

kkk YGX ˆ

kv

t

k

kk dt

t

eG

2

1exp

1

kk

kkv

1

Ephraim–Malah MMSE log estimator rule:

Spectral Attenuation (or spectral weighting) is a form of audio signal enhancement in which noise suppression can be viewed as the application of a suppression rule, or non-negative real-valued gain Gk, to each bin k of the observed signal magnitude

spectrum, in order to form an estimate of the original signal magnitude spectrum.

2

2

k

kk

D

X 1,0,1)(,0max)1(

)1(ˆ

)1(ˆˆ

2

2

m

mD

mXk

k

k

k

2

2

k

kk

D

Y

2

2

ˆˆ

k

kk

D

Y

Page 4: Advances in WP1

4

Denoising Techniques for Y2 evaluations (2)

)(~

),(~)(),(~

mmGmmG kkkkkk

Modified Ephraim–Malah MMSE log estimator rule:

2

2

k

kk

D

X

)(,1)(~))(1(

)1(ˆ)(

)1(ˆ)(max

~̂2

2

mmmmDm

mXmm k

k

k

k

2

2

k

kk

D

Y 1)(,1

)(ˆ)(

)(max)(~

2

2

m

mDm

mYm

k

kk

We propose to make the estimation of the a priori and the a posteriori SNR dependent on the noise overestimation factor (m) and the spectral floor (m) as follows:

(m)

1.5

0 10 20 SNR(m) dB

0.001

(m)

1.0

0 15 20 SNR(m) dB

0.01

Page 5: Advances in WP1

5

Denoising Techniques for Y2 evaluations (3)

otherwisemD

falseVADmmDmYif

mYmD

mD

k

kk

kk

k

)1(ˆ

)()(ˆ)(

)(1)1(ˆ

)(ˆ

222 ˆ)(1)1()( mDmYmm kk

The noise spectrum amplitude is obtained by a first-order recursion in conjunction with an energy based Voice Activity Detector (VAD) as follows:

Where: controls the update speed of the recursion (0.9), controls the allowed dynamics of noise (4.0), and the noise standard deviation (m) is estimated as:

Page 6: Advances in WP1

Baseline evaluations of Loquendo ASR on Aurora2

speech databases

Page 7: Advances in WP1

7

Year 1+2 Performance evaluations

Test A Test B Test C A-B-C Avg

Models Clean Multi Clean Multi Clean Multi Clean Multi

ND 24.4 6.5 22.5 8.9 24.7 9.8 23.7 8.1

WM 16.0(34.4)

6.1(6.1)

15.6(30.7)

7.9(11.2)

16.7(32.4)

9.5(3.0)

16.0(32.5)

7.5(7.4)

EMM 14.7(39.7)

6.0(7.7)

15.8(29.8)

8.0(10.1)

15.2(38.5)

8.9(9.2)

15.2(35.9)

7.4(8.6)

The testing conditions used in the experiments are the following:1) No Denoising (ND): Rasta PLP features (RPLP) are used without any preliminary noise reduction.2) Wiener modified (WM): RPLP with Wiener filtering dependent on global SNR.3) Ephraim-Malah modified (EMM): RPLP with noise reduction based on the modified Ephraim-Malah spectral attenuation rule.

Page 8: Advances in WP1

Baseline evaluations of Loquendo ASR on Aurora3

speech databases

Page 9: Advances in WP1

9

Year 1+2 Performance evaluationsThe testing conditions used in the experiments are the following:1) No Denoising (ND): Rasta PLP features (RPLP) are used without any preliminary noise reduction.2) Wiener modified (WM): RPLP with Wiener filtering dependent on global SNR.3) Ephraim-Malah modified (EMM): RPLP with noise reduction based on the modified Ephraim-Malah spectral attenuation rule.

Ita WM Ita HM Spa WM Spa HM

ND 1.8 53.4 2.7 25.4

WM 1.7(5.5)

22.5(57.9)

2.4(11.1)

10.1(60.2)

EMM 1.6(11.1)

17.8(66.7)

2.3(14.8)

11.5(54.7)

Page 10: Advances in WP1

Baseline evaluations of Loquendo ASR on Aurora4

speech databases

Page 11: Advances in WP1

11

Year 1+2 Performance evaluationsThe testing conditions used in the experiments are the following:1) No Denoising (ND): Rasta PLP features (RPLP) are used without any preliminary noise reduction.2) Wiener modified (WM): RPLP with Wiener filtering dependent on global SNR.3) Ephraim-Malah modified (EMM): RPLP with noise reduction based on the modified Ephraim-Malah spectral attenuation rule.

CLEANModels

CLEAN Car Babble Restaurant Street Airport Train Station

Noise avg.

ND 14.8 45.7 76.9 70.6 66.0 70.7 67.7 66.3

WM 14.8 (00.0)

33.0(27.8)

63.4 (17.5)

69.3(1.8)

56.9 (13.8)

68.1 (3.7)

51.2 (24.4)

57.0(14.0)

EMM 14.5 (2.02)

29.6 (35.2)

62.9 (18.2)

68.4 (3.1)

54.2 (17.8)

68.4 (3.2)

46.3 (31.6)

55.0 (17.0)

Page 12: Advances in WP1

12

Year 1+2 Performance evaluationsThe testing conditions used in the experiments are the following:1) No Denoising (ND): Rasta PLP features (RPLP) are used without any preliminary noise reduction.2) Wiener modified (WM): RPLP with Wiener filtering dependent on global SNR.3) Ephraim-Malah modified (EMM): RPLP with noise reduction based on the modified Ephraim-Malah spectral attenuation rule.

MULTIModels

CLEAN Car Babble Restaurant Street

Airport Train Station

Noise avg.

ND 15.7 24.8 40.1 41.8 41.9 39.1 42.3 38.3

WM 16.6(-5.7)

24.1 (2.8)

39.7 (1.0)

43.2(-3.3)

39.6 (5.5)

39.5(-1.0)

37.1 (12.3)

37.2(2.9)

EMM 15.5 (1.3)

24.7 (0.4)

40.4 (-0.7)

44.2 (-5.7)

39.5 (5.7)

40.4 (-3.3)

38.2 (9.7)

37.9 (1.0)

Page 13: Advances in WP1

HEQ + Denoising techniques

Page 14: Advances in WP1

14

Problems:

(1) Context dependency (whole utterance CDF estimation the best)

(2) High variability in background noise segment

HEQ Evaluation: Revision 1 (1)(Loquendo & UGR)

HEQ (121)

E+12CEP

DE+12DEP

DDE+12DDEP

(39 coefficients)

Page 15: Advances in WP1

15

HEQ Integration: Revision 1 (2)(Loquendo & UGR)

Loquendo FE

UGR HEQ

Loquendo ASR

Denoise

(Power Spectrum level)

Feature Normalization

(Frame -39coeff- level)

Phoneme-based

Models

AURORA3 ITA - HM

SA WA WI WD WS

Loquendo 46.6% 77.5% 4.8% 7.2% 10.4%

+HEQ121 38.2% 69.6% 4.3% 12.6% 13.5%

HEQ121 37.9% 69.1% 3.5% 13.8% 13.5%

+HEQ1001 46.5% 77.7% 4.0% 7.3% 11.0%

Page 16: Advances in WP1

16

HEQ Evaluation: Revision 2 (3)(Loquendo & UGR)

HEQ (1573)E+12CEP

DE+12DEP

DDE+12DDEP

(39 coefficients)

HEQ (1573)

HEQ (1573)Benefits:

(1) Relation in magnitude and dynamics among coefficients are preserved

(2) More stable CDF estimation similar to extend the HEQ temporal window

Page 17: Advances in WP1

17

HEQ Evaluation: Revision 2 (4)(Loquendo & UGR)

AURORA3 ITA - HM

SA WA WI WD WS

WM 46.6% 77.5% 4.8% 7.2% 10.4%

HEQ121 47.9% 77.7% 5.1% 6.7% 10.5%

HEQ241 49.7% 79.7% 4.3% 6.6% 9.3%

WM+HEQ121 49.0% 79.2% 5.1% 5.7% 10.0%

WM+HEQ241 50.8% 79.8% 4.6% 6.1% 9.4%

Page 18: Advances in WP1

18

HEQ for denoising (5)(Loquendo & UGR)

Comparing RPLP / HEQrev1 / HEQrev2 using the same clean and noisy signal

Page 19: Advances in WP1

19

HEQ for signal level equalization (6)(Loquendo & UGR)

Comparing RPLP / HEQrev1 / HEQrev2 using the same clean signal at normal gain level and at low gain level

Page 20: Advances in WP1

20

WP1: Workplan

• Selection of suitable benchmark databases; (m6)

• Completion of LASR baseline experimentation of Spectral Subtraction (Wiener SNR

dependent) (m12)

• Discriminative VAD (training+AURORA3 testing) (m16)

• Exprimentation of Spectral Attenuation rule

(Ephraim-Malah SNR dependent) (m21)

• Preliminary results on spectral subtraction and HEQ techniques (m24)

• Integration of denoising and normalization techniques (m33)

• Noise estimation and reduction for non-stationary noises (m33)