1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.

1

MAXENT 2007

R. F. Astudillo, D. Kolossa and R. Orglmeister

2

MAXENT 2007


PROPAGATION OF STATISTICAL INFORMATION THROUGH NON-LINEAR FEATURE EXTRACTIONS FOR ROBUST

SPEECH RECOGNITION

Overview:

1. Introduction: Automatic speech recognition.2. Problem: Imperfect noise suppression.3. Proposed solution: Uncertainty propagation.4. Tests & results.5. Conclusions.

R. F. Astudillo, D. Kolossa and R. Orglmeister - TU-Berlin

3

MAXENT 2007


Automatic Speech Recognizer (ASR)

• Feature extraction transforms signal into a domain more suitable for recognition.

• Speech recognizer models abstract speech components like phonemes or triphones, generates transcription.

• Most of speech recognition applications need noise suppression preprocessing.

4

MAXENT 2007


• Non-linear transformations that imitate the way humans process speech.

• Robust against inter-speaker and intra-speaker variability.

• Mel-cepstral and RASTA-PLP transformations.

Feature Extraction

5

MAXENT 2007


Speech Recognition

• Statistical models are used to model speech.

• Hidden Markov models with mixture of Gaussians (multivariable) for the emitting states.

Example:Mel-cepstral features

6

MAXENT 2007


Noise Suppression

• MMSE-LSA bayesian estimation [Ephraim1985] is one of the most used.

• Leaves residual noise.

• Introduces artifacts in speech.

• Most methods obtain an estimation of the short-time spectrum (STFT) of the clean signal .

Problem: Imperfect estimation.

7

MAXENT 2007


Solution: Modeling Uncertainty of Estimation

We model each element of the STFT as a complex Gaussian random distribution .

• Mean set equal to estimated clean value .

• Parameter controls the uncertainty.

8

MAXENT 2007


Propagation of Uncertainty

• We propagate first and second order moments of the distributions.

• Correlation between feature appears (covariance).

• Resulting uncertainty is combined with statistical model parameters for robust speech recognition

9

MAXENT 2007


Propagation of Uncertainty

• We propagate first and second order moments of the distributions.

• Correlation between feature appears (covariance).

• Resulting uncertainty is combined with statistical model parameters for robust speech recognition

10

MAXENT 2007


Approaches to Uncertainty Propagation

Analytic solutions Imply complex calculations. Specific for each transformation.

Pseudo-Montecarlo Unscented Transform [Julier1996]. Inefficient for high number of dimensions

(i.e. STFT 256 dim./frame).

►Piecewise Propagation Efficient combination of both methods. Valid for many feature extractions (i.e. MELSPEC, MFCC, RASTA-PLP).

11

MAXENT 2007


Piecewise Uncertainty Propagation

Exemplified with Mel-Ceptral transformation:

1. Modulus extraction (non-linear).2. Filterbank (linear).3. Logarithm (non-linear).4. Discrete-cosine-transform (linear).5. Delta and acceleration coefficients (linear).

12

MAXENT 2007


Propagation through Modulus

• By integrating the phase of a complex Gaussian distribution we obtain the Rice distribution.

• Mean and variance can be calculated as:

were L is the Legendre polynom.

13

MAXENT 2007


Propagation through filterbank

• Each filter output m is a weighted sum of frequency moduli.

• It can be expressed as a matrix multiplication.

• Mean and variance can be calculated as:

14

MAXENT 2007


Full Covariance and other linear transformations

• DCT, delta and acceleration can be computed similarly.

• Covariance after filterbank is no longer diagonal.

• Additional computation costs.

15

MAXENT 2007


Propagation through Logarithm

• Non-linear transformation

• Distribution after filterbank difficult to model

• not diagonal

• Dimesionality of the Mel features much smaller than the STFT features

► Unscented transform can be applied efficiently

16

MAXENT 2007


Unscented Transform

• Only points must be propagated.

• Points on the th covariace contour and the mean.

• = feature dimension

• Example for =2

17

MAXENT 2007


Unscented Transform II

• Mean and covariances are calculated by using weighted averages:

• Parameter allows higher moments of the distribution to be considered.

18

MAXENT 2007


Use of Uncertainty

• After propagation of uncertainty, missing feature techniques or uncertainty decoding may be applied.

• These techniques combine uncertainty and model information to ignore or reestimate noisy features.

Parametersof state f1

19

MAXENT 2007


Use of Uncertainty II

• Modified imputation [Kolossa2005] showed the best performance.

• It reestimates features for state q by maximizing the probability:

• Assuming multivariate Gaussian distribution for uncertaintyand model:

20

MAXENT 2007


Recognition Tests TI-DIGITS database

% correct identified words

Windnoise Streetnoise

Test Type Uncertainty -15dB 5dB -15dB 5dB

Clean Speech ( ) 98.76

Noisy ( ) 28.44 87.94 22.87 92.43

MMSE-LSA ( ) 34.78 75.27 36.63 92.43

+Aprox. uncertainty 46.68 88.72 22.72 94.90

+Ideal uncertainty 51.93 94.28 48.53 96.45

0

0

0

• 200 files (20 different speakers).

• Best, second best results.

21

MAXENT 2007


Conclusions

• The use of uncertainty in Mel-cepstral domain is useful to compensate imperfect estimation during noise suppression.

• Piecewise uncertainty propagation is valid for multiple feature extractions.

• Better estimation of uncertainty should improve the results.

22

MAXENT 2007


Thank You!

Some literature:

[Ephraim1985] Y. Ephraim, and D. Malah, Acoustics, Speech, and Signal Processing, IEEE Transactions on 33, 443–445 (1985).

[Julier1996] S. Julier, and J. Uhlmann, A general method for approximating nonlineartransformations of probability distributions, Tech. rep., University of Oxford, UK (1996).

[Kolossa2005] D. Kolossa, A. Klimas, and R. Orglmeister, “Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques,” Applications of Signal Processing to Audio and Acoustics, 2005. IEEE Workshop on, 2005, pp. 82-85.

1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.

Documents

Transcript of 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.