HIWIRE MEETING Athens, November 3-4, 2005 José C. Segura, Ángel de la Torre.

38
HIWIRE MEETING HIWIRE MEETING Athens, November 3-4, 2005 Athens, November 3-4, 2005 José C. Segura, Ángel de la Torre José C. Segura, Ángel de la Torre
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of HIWIRE MEETING Athens, November 3-4, 2005 José C. Segura, Ángel de la Torre.

HIWIRE MEETINGHIWIRE MEETINGAthens, November 3-4, 2005Athens, November 3-4, 2005

José C. Segura, Ángel de la TorreJosé C. Segura, Ángel de la Torre

2 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Schedule

HIWIRE database evaluations

Non-linear feature normalization ECDF segmental implementation Parametric equalization

Robust VAD Bispectrum-based VAD

Model-based feature compensation VTS results on AURORA4 Including uncertainty caused by noise

3 HIWIRE Meeting – Athens, 3 - 4 November, 2005

HIWIRE database evaluations

PARAMETERS: MFCC_0_D_A_Z (39 component)

MODELS: TIMIT: 46 phone models / 3 states / 128 Gaussians (17.664 G) WSJ16k: 16.825 triphones / 3.608 tied-states / 6 Gaussians (21.648 G) WSJ16kFon: 40 phone models / 3 states / 128 Gaussians (15.360 G)

ADAPTATION: MLLR: 32 regression classes / 50 adaptation utterances

GRAMMAR: LORIA & Word-Loop MODIFICATIONS: Some transcriptions have been modified to match

the grammar definition

4 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Transcription modifications

BEGIN { lista = LISTA; nfrase = 0;}

{ linea=$0; gsub("-","_",linea); gsub("Due_to_","Due_to ",linea); gsub("Mayday_Mayday","Mayday Mayday",linea); gsub("Pan_Pan","Pan Pan",linea); gsub("three hundred twenty","three_hundred_twenty",linea); gsub("one hundred sixty","one_hundred_sixty",linea); printf("%s\n",tolower(linea)); nfrase = nfrase+1;}

5 HIWIRE Meeting – Athens, 3 - 4 November, 2005

HIWIRE database results

MODELS French Greek Italian Spanish World AvgTIMIT 7,30 9,93 11,87 9,27 6,26 8,93WSJ16k 14,70 25,11 20,66 18,01 14,32 18,56WSJ16kFon 10,43 19,51 16,52 15,33 8,72 14,10TIMIT_WL 26,79 33,77 35,61 30,88 22,53 29,92

RESULTS WITHOUT ADAPTATION (WER)

MODELS French Greek Italian Spanish World AvgTIMIT+MLLR 3,13 2,51 3,80 2,99 3,16 3,12WSJ16k+MLLR 3,85 4,48 5,94 4,53 4,00 4,56WSJ16kFon+MLLR 3,50 2,98 7,00 5,55 3,94 4,59TIMIT_WL+MLLR 11,12 9,43 14,61 13,14 12,20 12,10

RESULTS WITH MLLR (WER)

6 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Schedule

HIWIRE database evaluations

Non-linear feature normalization ECDF segmental implementation Parametric equalization

Robust VAD Bispectrum-based VAD

Model-based feature compensation VTS results on AURORA4 Including uncertainty caused by noise

7 HIWIRE Meeting – Athens, 3 - 4 November, 2005

ECDF segmental implementation

ECDF segmental implementation

Provided LOQUENDO with a reference “C” implementation of segmental Gaussian transformation to be tested within LOQUENDO recognizer

Current work Nonlinear feature transformation with a clean reference to

avoid the problem of system retraining

8 HIWIRE Meeting – Athens, 3 - 4 November, 2005

HEQ limitations

Influence of relative amount of silence in utterances

With a parametric model, a more robust equalization can be obtained

Parametric Equalization (1)

PARAMETRIC NONLINEAR FEATURE EQUALIZATIONFOR ROBUST SPEECH RECOGNITION (submitted ICASSP’06)

9 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Parametric Equalization (2)

CLASS-DEPENDENT LINEAR EQUALIZATION

SOFT DECISSION VAD (two-class Gaussian classifier on C0)NONLINEAR INTERPOLATION

10 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Parametric Equalization (3)

11 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Parametric Equalization (4)

In comparison with HEQ, PEQ transformations are smoother

For C0 a monotonic transformation is obtained

For other coefficients, the interpolated transformation is not monotonic

12 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Parametric Equalization (5)

BASE MFCC_0_D_A_Z (39 component)

HEQ Quantile based CDF-transformation Clean reference Implemented over MFCC_0 / CMS and regressions computed after HEQ

AFE Standard implementation

PEQ Clean reference Implemented over MFCC_0 / CMS and regressions computed after PEQ

13 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Parametric Equalization (6)

Current work

Development of an on-line version

Relax the diagonal covariance assumption

Investigate the normalization of dynamic features

Using a more detailed model of speech frames (i.e. More than one Gaussian)

14 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Schedule

HIWIRE database evaluations

Non-linear feature normalization ECDF segmental implementation (LOQ) Parametric equalization

Robust VAD Bispectrum-based VAD

Model-based feature compensation VTS results on AURORA4 Including uncertainty caused by noise

15 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Bispectrum-based VAD (1)

Motivations: Ability of higher order statistics to detect signals in noise Polyspectra methods rely on an a priori knowledge of the input

processes

Issues to be addressed: Computationally expensive Variance of the bispectrum estimators is much higher than that of

power spectral estimators for identical data record size

Solution: Integrated bispectrum J. K. Tugnait, “Detection of non-Gaussian signals using integrated

polyspectrum,” IEEE Trans. on Signal Processing, vol. 42, no. 11, pp. 3137–3149, 1994.

Computationally efficient and reduced variance statistical test based on the integrated polyspectra

Detection of an unknown random, stationary, non-Gaussian signal in Gaussian noise

16 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Bispectrum-based VAD (2)

Integrated bispectrum: Defined as a cross spectrum between the signal and its square,

and therefore, it is a function of a single frequency variable

Benefits: Its computation as a cross spectrum leads to significant

computational savings

The variance of the estimator is of the same order as that of the power spectrum estimator

Properties Bispectrum of a Gaussian process is identically zero, its integrated

bispectrum is as well

17 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Two alternatives explored for formulating the decision rule: Estimation by block averaging:

MO-LRT Given a set of N= 2m+1 consecutive observations:

Bispectrum-based VAD (3)

)( )H(P

)H(P

)H|ˆ(

)H|ˆ()ˆ(

1

0

0H|

1H|

1H

0H0

1

l

ll p

pL

y

yy

y

y

ml

mlk k

kmllmlN

k

k

p

pL

)H|ˆ(

)H|ˆ()ˆ,...,ˆ,...,ˆ(

0H|

1H|

0

1

y

yyyy

y

y

18 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Bispectrum-based VAD (4)

Likelihoods

Variances

19 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Bispectrum-based VAD results (1)

0

20

40

60

80

100

0 10 20 30 40 50 60FALSE ALARM RATE (FAR0)

PA

US

E H

IT R

AT

E (

HR

0)

G.729AMR1AMR2AFE (Noise Est.)AFE (frame-dropping)LiMarzinzikSohnWooBA-IBI (KB= 1, NB= 256)BA-IBI (KB= 3, NB= 256)BA-IBI (KB= 5, NB= 256)

20 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Bispectrum-based VAD results (2)

0

20

40

60

80

100

0 10 20 30 40 50 60FALSE ALARM RATE (FAR0)

PA

US

E H

IT R

AT

E (

HR

0)

G.729AMR1AMR2AFE (Noise Est.)AFE (frame-dropping)LiMarzinzikSohnWooMO-LRT IBI (KB= 1, NB= 256, m= 2)MO-LRT IBI (KB= 1, NB= 256, m= 5)MO-LRT IBI (KB= 1, NB= 256, m= 7)

21 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Bispectrum-based VAD results (3)

22 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Schedule

HIWIRE database evaluations

Non-linear feature normalization ECDF segmental implementation (LOQ) Parametric equalization

Robust VAD Bispectrum-based VAD

Model-based feature compensation VTS results on AURORA4 Including uncertainty caused by noise

23 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Schedule

Model-based feature compensation VTS: results on AURORA4

VTS formulation VTS vs non linear feature normalization procedures VTS results on AURORA 4

Including uncertainty caused by noise Including uncertainty in noise compensation Wiener filtering + uncertainty: results on Aurora 2 Wiener filtering + uncertainty: results on Aurora 4 VTS + uncertainty: formulation Numerical integration of probabilities: formulation

24 HIWIRE Meeting – Athens, 3 - 4 November, 2005

VTS formulation

VTS: Vector Taylor Series approach to remove additive (and channel) noise

References: P.J. Moreno. “Speech recognition in noisy environments” Ph.D.

Thesis, Carnegie-Mellon University, Pittsburgh, Pensilvania, Apr. 1996.

A. de la Torre. “Técnicas de mejora de la representación en los sistemas de reconocimiento automático del habla” Ph.D. Thesis, University of Granada, Spain, Apr. 1999.

25 HIWIRE Meeting – Athens, 3 - 4 November, 2005

VTS formulation

VTS provides an estimation of the clean speech in a statistical framework:

Log-FBO domain, assumed additive noise:

Effect of noise described using the “correction function” g():

26 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Auxiliary functions f() and h(): 1st and 2nd derivatives:

VTS provides estimation of noisy-speech Gaussian given the clean-speech and the noise Gaussians:

Noisy-speech Gaussian obtained with the expected values:

VTS formulation

27 HIWIRE Meeting – Athens, 3 - 4 November, 2005

VTS formulation

Noisy-speech Gaussian: formulas:

Models for noise and clean speech:

28 HIWIRE Meeting – Athens, 3 - 4 November, 2005

VTS formulation

Model for clean speech provides the model for noisy speech, and also P(k|y) (posterior probability of each Gaussian):

Estimation of clean speech:

29 HIWIRE Meeting – Athens, 3 - 4 November, 2005

VTS vs non-linear feature normalization

VTS: Statistical framework: Model for noise in log-FBO domain: 1 Gaussian PDF Model for clean-speech in log-FBO domain: Gaussian mixture Noise assumed to be additive in FBO domain Accurate description of noise process

ACCURATE COMPENSATION

Non-linear feature normalization: No a-priori assumption Component-by-component

MORE FLEXIBLE, LESS ACCURATE

30 HIWIRE Meeting – Athens, 3 - 4 November, 2005

VTS results on AURORA 4

Experiment Train mode

Test size

WER exp. 01-07

WER exp. 08-14

WER exp. 01-14

Baseline Clean 166 40.53 % 50.60 % 45.57 %

HEQ Clean 166 32.19 % 42.74 % 37.47 %

Parametric non-linear EQ

Clean 166 28.78 % 34.27 % 31.53 %

VTS Clean 166 29.46 % 37.22 % 33.34 %

VTS (noise known)

Clean 166 26.97 % 32.25 % 26.97 %

AFE Clean 166 27.57 % 34.99 % 31.28 %

Baseline Multi 166 24.58 % 29.88 % 27.23 %

31 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Including uncertainty in noise compensation

Noise is a random process: we do not know n, but p(n)

Then, from an observation y we cannot find x, but p(x|y,x,n)

Usually, compensation procedures provide E[x|y,x,n]

What about uncertainty of x ?

Mean and variance of x :

32 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Including uncertainty in noise compensation

33 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Including uncertainty in noise compensation

An approach for the estimation of the variance:

Evaluation of HMM Gaussians:

34 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Wiener filt. + uncertainty: results on AURORA 2

Preliminary results with Wiener filtering:

Results on Aurora 2 with Wiener filtering + uncertainty

Train mode WER Set A WER Set B WER Set C Aver. WER

Wiener Clean 15.75 % 15.87 % 17.62 % 16.17 %

Wiener + Uncert. Clean 12.13 % 12.90 % 13.28 % 12.67 %

Wiener Multi 8.91 % 10.44 % 10.95 % 9.93 %

Wiener + Uncert. Multi 8.87 % 10.34 % 10.69 % 9.82 %

35 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Wiener filter + uncertainty: results on AURORA 4

Experiment Train mode

Test size

WER exp. 01-07

WER exp. 08-14

WER exp. 01-14

Baseline Clean 166 40.53 % 50.60 % 45.57 %

HEQ Clean 166 32.19 % 42.74 % 37.47 %

Parametric non-linear EQ

Clean 166 28.78 % 34.27 % 31.53 %

VTS Clean 166 29.46 % 37.22 % 33.34 %

Wiener + Uncertainty

Clean 166 27.68 % 33.79 % 30.74 %

AFE Clean 166 27.57 % 34.99 % 31.28 %

Baseline Multi 166 24.58 % 29.88 % 27.23 %

36 HIWIRE Meeting – Athens, 3 - 4 November, 2005

VTS + uncertainty: formulation

VTS based estimation of clean speech:

VTS based estimation of variance:

37 HIWIRE Meeting – Athens, 3 - 4 November, 2005

Numerical integration of probabilities: formulation

Computation of expected values:

Numerical integration of expected values:

HIWIRE MEETINGHIWIRE MEETINGAthens, November 3-4, 2005Athens, November 3-4, 2005

José C. Segura, Ángel de la TorreJosé C. Segura, Ángel de la Torre