Advances in WP1

Advances in WP1

Turin Meeting – 9-10 March 2006

www.loquendo.com

2

WP1: Environment & Sensor RobustnessT1.2 Noise Independence

• Voice Activity Detection:– Model based approach using NN

“Non-linear estimation of voice activity to improve

automatic recognition of noisy speech”,

Roberto Gemello, Franco Mana and Renato De Mori

Eurospeech 2005, Lisboa, September 2005

• Noise Reduction:– Spectral Subtraction (standard, Wiener and SNR dependent) and

Spectral Attenuation (Ephraim-Malah SA standard and SNR dependent)

“Automatic Speech Recognition

With a Modified Ephraim-Malah Rule”,

Roberto Gemello, Franco Mana and Renato De Mori

IEEE Signal Processing Letters, VOL 13, NO 1, January 2006

– Evaluation of HEQ for feature normalization

– New techniques for non-stationary noises

Baseline evaluations of Loquendo ASR on Aurora2

speech databases

6

Baseline Performance evaluations

Performances in terms of Word Accuracy and (Error Reduction)

CLEAN Models Test A Test B Test C A-B-C

RPLP 75.6 77.5 75.3 76.3

+ Wiener SNR Dep. 84.0(34.4) 84.4(30.7) 83.3(32.4) 84.0(32.5)

+ EphMal SNR Dep. 85.3(39.7) 84.2(29.5) 84.8(34.5) 84.8(35.9)

MULTI Models Test A Test B Test C A-B-C Avg.

RPLP 93.5 91.1 90.2 91.9 84.1

+ Wiener SNR Dep. 93.9(6.1) 92.1(11.2) 90.5(3.1) 92.5(7.4) 88.2(25.8)

+ EphMal SNR Dep. 94.0(7.7) 92.0(10.1) 91.1(9.2) 92.6(8.6) 88.7(28.9)

LASR Models Test A Test B Test C A-B-C

RPLP 80.9 83.3 77.6 81.2

+ Wiener SNR Dep. 88.1(37.7) 88.3(29.9) 86.2(38.4) 87.8(35.1)

+ EphMal SNR Dep. 89.0(42.4) 88.6(31.7) 87.0(41.9) 88.4(38.3)


speech databases

8

Baseline Performance evaluations

Performances in terms of Word Accuracy and (Error Reduction)

Aurora3 Models Ita WM Ita HM Spa WM Spa HM

RPLP 98.2 46.6 97.3 74.6

+ Wiener SNR dep. 98.3 (5.5) 77.5 (59.4) 97.6 (11.1) 89.9 (60.2)

+ EphMal SNR dep. 98.4 (11.1) 82.2 (66.7) 97.7 (14.8) 88.5 (54.7)

LASR Models Ita WM Ita HM Spa WM Spa HM

RPLP - 56.4 - 79.4

+ Wiener SNR dep. - 74.6 (41.7) - 84.9 (26.6)

+ EphMal SNR dep. - 75.5 (43.8) - 86.2 (33.0)


speech databases

(to be done)

13

HEQ Evaluation (1) (Loquendo & UGR)

The HEQ algorithm introduces an amplification of the coefficient (energy in this case) in the background noise audio segment.

14

HEQ Evaluation (2)(Loquendo & UGR)

The HEQ algorithm introduces a context dependent normalization.

This could be a drawback for open-vocabulary recognizer where phoneme based acoustic models are used.

15

HEQ Integration (3)(Loquendo & UGR)

Loquendo FE

UGR HEQ

Loquendo ASR

Denoise

(Power Spectrum level)

Feature Normalization

(Frame -39coeff- level)

Phoneme-based

Models

AURORA3 ITA - HM

SA WA WI WD WS

Loquendo 46.6% 77.5% 4.8% 7.2% 10.4%

+HEQ121 38.2% 69.6% 4.3% 12.6% 13.5%

+HEQ1001 46.5% 77.7% 4.0% 7.3% 11.0%

16

WP1: Workplan

• Selection of suitable benchmark databases; (m6)

• Completion of LASR baseline experimentation of Spectral Subtraction

(Wiener SNR dependent) (m12)

• Discriminative VAD (training+AURORA3 testing) (m16)

• Exprimentation of Spectral Attenuation rule

(Ephraim-Malah SNR dependent) (m21)

• Integration of denoising and normalization techniques (m33)

• Noise estimation and reduction for non-stationary noises (m33)

Advances in WP1

Documents

Transcript of Advances in WP1