1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.
-
Upload
aryanna-tunnell -
Category
Documents
-
view
215 -
download
0
Transcript of 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.
![Page 1: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/1.jpg)
1
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
![Page 2: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/2.jpg)
2
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
PROPAGATION OF STATISTICAL INFORMATION THROUGH NON-LINEAR FEATURE EXTRACTIONS FOR ROBUST
SPEECH RECOGNITION
Overview:
1. Introduction: Automatic speech recognition.2. Problem: Imperfect noise suppression.3. Proposed solution: Uncertainty propagation.4. Tests & results.5. Conclusions.
R. F. Astudillo, D. Kolossa and R. Orglmeister - TU-Berlin
![Page 3: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/3.jpg)
3
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Automatic Speech Recognizer (ASR)
• Feature extraction transforms signal into a domain more suitable for recognition.
• Speech recognizer models abstract speech components like phonemes or triphones, generates transcription.
• Most of speech recognition applications need noise suppression preprocessing.
![Page 4: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/4.jpg)
4
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
• Non-linear transformations that imitate the way humans process speech.
• Robust against inter-speaker and intra-speaker variability.
• Mel-cepstral and RASTA-PLP transformations.
Feature Extraction
![Page 5: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/5.jpg)
5
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Speech Recognition
• Statistical models are used to model speech.
• Hidden Markov models with mixture of Gaussians (multivariable) for the emitting states.
Example:Mel-cepstral features
![Page 6: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/6.jpg)
6
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Noise Suppression
• MMSE-LSA bayesian estimation [Ephraim1985] is one of the most used.
• Leaves residual noise.
• Introduces artifacts in speech.
• Most methods obtain an estimation of the short-time spectrum (STFT) of the clean signal .
Problem: Imperfect estimation.
![Page 7: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/7.jpg)
7
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Solution: Modeling Uncertainty of Estimation
We model each element of the STFT as a complex Gaussian random distribution .
• Mean set equal to estimated clean value .
• Parameter controls the uncertainty.
![Page 8: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/8.jpg)
8
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Propagation of Uncertainty
• We propagate first and second order moments of the distributions.
• Correlation between feature appears (covariance).
• Resulting uncertainty is combined with statistical model parameters for robust speech recognition
![Page 9: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/9.jpg)
9
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Propagation of Uncertainty
• We propagate first and second order moments of the distributions.
• Correlation between feature appears (covariance).
• Resulting uncertainty is combined with statistical model parameters for robust speech recognition
![Page 10: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/10.jpg)
10
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Approaches to Uncertainty Propagation
Analytic solutions Imply complex calculations. Specific for each transformation.
Pseudo-Montecarlo Unscented Transform [Julier1996]. Inefficient for high number of dimensions
(i.e. STFT 256 dim./frame).
►Piecewise Propagation Efficient combination of both methods. Valid for many feature extractions (i.e. MELSPEC, MFCC, RASTA-PLP).
![Page 11: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/11.jpg)
11
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Piecewise Uncertainty Propagation
Exemplified with Mel-Ceptral transformation:
1. Modulus extraction (non-linear).2. Filterbank (linear).3. Logarithm (non-linear).4. Discrete-cosine-transform (linear).5. Delta and acceleration coefficients (linear).
![Page 12: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/12.jpg)
12
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Propagation through Modulus
• By integrating the phase of a complex Gaussian distribution we obtain the Rice distribution.
• Mean and variance can be calculated as:
were L is the Legendre polynom.
![Page 13: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/13.jpg)
13
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Propagation through filterbank
• Each filter output m is a weighted sum of frequency moduli.
• It can be expressed as a matrix multiplication.
• Mean and variance can be calculated as:
![Page 14: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/14.jpg)
14
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Full Covariance and other linear transformations
• DCT, delta and acceleration can be computed similarly.
• Covariance after filterbank is no longer diagonal.
• Additional computation costs.
![Page 15: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/15.jpg)
15
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Propagation through Logarithm
• Non-linear transformation
• Distribution after filterbank difficult to model
• not diagonal
• Dimesionality of the Mel features much smaller than the STFT features
► Unscented transform can be applied efficiently
![Page 16: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/16.jpg)
16
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Unscented Transform
• Only points must be propagated.
• Points on the th covariace contour and the mean.
• = feature dimension
• Example for =2
![Page 17: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/17.jpg)
17
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Unscented Transform II
• Mean and covariances are calculated by using weighted averages:
• Parameter allows higher moments of the distribution to be considered.
![Page 18: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/18.jpg)
18
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Use of Uncertainty
• After propagation of uncertainty, missing feature techniques or uncertainty decoding may be applied.
• These techniques combine uncertainty and model information to ignore or reestimate noisy features.
Parametersof state f1
![Page 19: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/19.jpg)
19
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Use of Uncertainty II
• Modified imputation [Kolossa2005] showed the best performance.
• It reestimates features for state q by maximizing the probability:
• Assuming multivariate Gaussian distribution for uncertaintyand model:
![Page 20: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/20.jpg)
20
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Recognition Tests TI-DIGITS database
% correct identified words
Windnoise Streetnoise
Test Type Uncertainty -15dB 5dB -15dB 5dB
Clean Speech ( ) 98.76
Noisy ( ) 28.44 87.94 22.87 92.43
MMSE-LSA ( ) 34.78 75.27 36.63 92.43
+Aprox. uncertainty 46.68 88.72 22.72 94.90
+Ideal uncertainty 51.93 94.28 48.53 96.45
0
0
0
• 200 files (20 different speakers).
• Best, second best results.
![Page 21: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/21.jpg)
21
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Conclusions
• The use of uncertainty in Mel-cepstral domain is useful to compensate imperfect estimation during noise suppression.
• Piecewise uncertainty propagation is valid for multiple feature extractions.
• Better estimation of uncertainty should improve the results.
![Page 22: 1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.](https://reader035.fdocuments.in/reader035/viewer/2022062511/551b652b550346ae7a8b5cdb/html5/thumbnails/22.jpg)
22
MAXENT 2007
R. F. Astudillo, D. Kolossa and R. Orglmeister
Thank You!
Some literature:
[Ephraim1985] Y. Ephraim, and D. Malah, Acoustics, Speech, and Signal Processing, IEEE Transactions on 33, 443–445 (1985).
[Julier1996] S. Julier, and J. Uhlmann, A general method for approximating nonlineartransformations of probability distributions, Tech. rep., University of Oxford, UK (1996).
[Kolossa2005] D. Kolossa, A. Klimas, and R. Orglmeister, “Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques,” Applications of Signal Processing to Audio and Acoustics, 2005. IEEE Workshop on, 2005, pp. 82-85.