Advances in WP1
description
Transcript of Advances in WP1
Advances in WP1
Turin Meeting – 9-10 March 2006
www.loquendo.com
2
WP1: Environment & Sensor RobustnessT1.2 Noise Independence
• Voice Activity Detection:– Model based approach using NN
“Non-linear estimation of voice activity to improve
automatic recognition of noisy speech”,
Roberto Gemello, Franco Mana and Renato De Mori
Eurospeech 2005, Lisboa, September 2005
• Noise Reduction:– Spectral Subtraction (standard, Wiener and SNR dependent) and
Spectral Attenuation (Ephraim-Malah SA standard and SNR dependent)
“Automatic Speech Recognition
With a Modified Ephraim-Malah Rule”,
Roberto Gemello, Franco Mana and Renato De Mori
IEEE Signal Processing Letters, VOL 13, NO 1, January 2006
– Evaluation of HEQ for feature normalization
– New techniques for non-stationary noises
Baseline evaluations of Loquendo ASR on Aurora2
speech databases
6
Baseline Performance evaluations
Performances in terms of Word Accuracy and (Error Reduction)
CLEAN Models Test A Test B Test C A-B-C
RPLP 75.6 77.5 75.3 76.3
+ Wiener SNR Dep. 84.0(34.4) 84.4(30.7) 83.3(32.4) 84.0(32.5)
+ EphMal SNR Dep. 85.3(39.7) 84.2(29.5) 84.8(34.5) 84.8(35.9)
MULTI Models Test A Test B Test C A-B-C Avg.
RPLP 93.5 91.1 90.2 91.9 84.1
+ Wiener SNR Dep. 93.9(6.1) 92.1(11.2) 90.5(3.1) 92.5(7.4) 88.2(25.8)
+ EphMal SNR Dep. 94.0(7.7) 92.0(10.1) 91.1(9.2) 92.6(8.6) 88.7(28.9)
LASR Models Test A Test B Test C A-B-C
RPLP 80.9 83.3 77.6 81.2
+ Wiener SNR Dep. 88.1(37.7) 88.3(29.9) 86.2(38.4) 87.8(35.1)
+ EphMal SNR Dep. 89.0(42.4) 88.6(31.7) 87.0(41.9) 88.4(38.3)
Baseline evaluations of Loquendo ASR on Aurora3
speech databases
8
Baseline Performance evaluations
Performances in terms of Word Accuracy and (Error Reduction)
Aurora3 Models Ita WM Ita HM Spa WM Spa HM
RPLP 98.2 46.6 97.3 74.6
+ Wiener SNR dep. 98.3 (5.5) 77.5 (59.4) 97.6 (11.1) 89.9 (60.2)
+ EphMal SNR dep. 98.4 (11.1) 82.2 (66.7) 97.7 (14.8) 88.5 (54.7)
LASR Models Ita WM Ita HM Spa WM Spa HM
RPLP - 56.4 - 79.4
+ Wiener SNR dep. - 74.6 (41.7) - 84.9 (26.6)
+ EphMal SNR dep. - 75.5 (43.8) - 86.2 (33.0)
Baseline evaluations of Loquendo ASR on Aurora4
speech databases
(to be done)
13
HEQ Evaluation (1) (Loquendo & UGR)
The HEQ algorithm introduces an amplification of the coefficient (energy in this case) in the background noise audio segment.
14
HEQ Evaluation (2)(Loquendo & UGR)
The HEQ algorithm introduces a context dependent normalization.
This could be a drawback for open-vocabulary recognizer where phoneme based acoustic models are used.
15
HEQ Integration (3)(Loquendo & UGR)
Loquendo FE
UGR HEQ
Loquendo ASR
Denoise
(Power Spectrum level)
Feature Normalization
(Frame -39coeff- level)
Phoneme-based
Models
AURORA3 ITA - HM
SA WA WI WD WS
Loquendo 46.6% 77.5% 4.8% 7.2% 10.4%
+HEQ121 38.2% 69.6% 4.3% 12.6% 13.5%
+HEQ1001 46.5% 77.7% 4.0% 7.3% 11.0%
16
WP1: Workplan
• Selection of suitable benchmark databases; (m6)
• Completion of LASR baseline experimentation of Spectral Subtraction
(Wiener SNR dependent) (m12)
• Discriminative VAD (training+AURORA3 testing) (m16)
• Exprimentation of Spectral Attenuation rule
(Ephraim-Malah SNR dependent) (m21)
• Integration of denoising and normalization techniques (m33)
• Noise estimation and reduction for non-stationary noises (m33)