Survey of INTERSPEECH 2013

20
Survey of INTERSPEECH 2013 Reporter: Yi-Ting Wang 2013/09/10

description

Survey of INTERSPEECH 2013. Reporter : Yi-Ting Wang 2013/09/10. Outline. Exemplar-based Individuality-Preserving Voice Conversion for Articulation Disorders in Noisy Environments Robust Speech Enhancement Techniques for ASR in Non-stationary Noise and Dynamic Environments - PowerPoint PPT Presentation

Transcript of Survey of INTERSPEECH 2013

Page 1: Survey of INTERSPEECH 2013

Survey of INTERSPEECH 2013

Reporter: Yi-Ting Wang2013/09/10

Page 2: Survey of INTERSPEECH 2013

Outline

Exemplar-based Individuality-Preserving Voice Conversion for Articulation Disorders in Noisy EnvironmentsRobust Speech Enhancement Techniques for ASR in Non-stationary Noise and Dynamic Environments NMF-base Temporal Feature Integration for Acoustic Event Classificaion

Page 3: Survey of INTERSPEECH 2013

Exemplar-based Individuality-Preserving Voice Conversion for Articulation Disorders in Noisy

Environments

Ryo AIHARA, Ryoichi TAKASHIMA, Tetsuya TAKIGUCHI, Yasuo ARIKI

Graduate School of System Informatics, Kobe University, Japen

Page 4: Survey of INTERSPEECH 2013

Introduction

We present in this paper a noise robust voice conversion(VC) method for a person with an articulation disorder resulting from athetoid cerebral pslsy.Exemplar-based spectral conversion using NMF is applied to a voice with an articulation disorder in real noisy environments.NMF is a well-known approach for source separation and speech enhancement.Poorly articulated noisy speech -> clean articulation

Page 5: Survey of INTERSPEECH 2013

Voice conversion based on NMF

Page 6: Survey of INTERSPEECH 2013

Constructing the individuality-preserving dictionary

Page 7: Survey of INTERSPEECH 2013

Experimental Results

ATR Japanese speech database.

Page 8: Survey of INTERSPEECH 2013

Conclusions

We proposed a noise robust spectral conversion method based on NMF for a voice with an articulation disorder.Our VC method can improve the listening intelligibility of words uttered by a person with an articulation disorder in noisy environments.

Page 9: Survey of INTERSPEECH 2013

Robust Speech Enhancement Techniques for ASR in Non-stationary Noise and Dynamic

Environments

Gang Liu, Dimitrios Dimitriadis, Enrico BocchieriCenter for Robust Speech Systems, University of Texas at Dallas

Page 10: Survey of INTERSPEECH 2013

Introduction

In the current ASR systems the presence of competing speakers greatly degrades the recognition performance.Furthermore, speakers are, most often, not standing still while speaking.We use Time Differences of Arrival(TDOA) estimation, multi-channel Wiener Filtering, NMF, multi-condition training, and robust feature extraction.

Page 11: Survey of INTERSPEECH 2013

Proposed cascaded system

The problem of source localization/separation is often addressed by the TDOA estimation.

Page 12: Survey of INTERSPEECH 2013

Experiment and results

Page 13: Survey of INTERSPEECH 2013

Experiment and results

NMF provides the largest boost, due to the suppression of the non-stationary interfering signals.

Page 14: Survey of INTERSPEECH 2013

Conclusion

We propose a cascaded system for speech recognition dealing with non-stationary noise in reverberated environments.The proposed system offers an average of 50% and 45% in relative improvements for the above mentioned two scenarios.

Page 15: Survey of INTERSPEECH 2013

NMF-base Temporal Feature Integration for Acoustic Event Classificaion

Jimmy Ludena-Choez, Ascension Gallardo-AntolinDep. of Signal Theory and Communications, Universidad Carlos III de Madrid, Avda de la Universidad 30,28911 – Leganes(Madrid), Spain

Page 16: Survey of INTERSPEECH 2013

Introduction

This paper propose a new front-end for Acoustic Event Classification tasks(AEC) based on the combination of the temporal feature integration technique called Filter Bank Coefficients(FC) and Non-Negative Matrix Factorization.FC allows to capture the dynamic structure in the short time features.We present an unsupervised method based on NMF for the design of a filter bank more suitable for AEC.

Page 17: Survey of INTERSPEECH 2013

Audio feature extraction

Page 18: Survey of INTERSPEECH 2013

Experiments and results

Here, use the NMF use KL divergence.

Page 19: Survey of INTERSPEECH 2013

Experiments and results

Page 20: Survey of INTERSPEECH 2013

Conclusions

We have presented a new front-end for AEC based on the combination of FC features and NMF.NMF is used for the unsupervised learning of the filter bank which captures the most relevant temporal behavior in the short-time features.Low modulation frequencies are more important than the high ones for distinguishing between different acoustic events.The experiments have shown that the features obtained with this method achieve significant improvements in the classification performance of a SVM-based AEC system in comparison with the baseline FC parameters.