INTRODUCTION
description
Transcript of INTRODUCTION
1
INTRODUCTIONINTRODUCTION METHODS RESULTS CONCLUSION
Noise Robust
Speech Recognition
Group SB740
Noise Robust
Speech Recognition
Group SB740
2
INTRODUCTIONINTRODUCTION METHODS RESULTS CONCLUSION
Standard feature extraction Standard feature extraction
Framing FFT Filter BankCepstrum
Coefficientsspeech features
3
INTRODUCTIONINTRODUCTION METHODS RESULTS CONCLUSION
Improved feature extraction Improved feature extraction
Filter BankCepstrum
CoefficientsFramed FFT
spectrumfeatures
Pre-Processing
Post-Processing
4
INTRODUCTION METHODSMETHODS RESULTS CONCLUSION
Pre-ProcessingQuantile Based Noise Estimationfor spectral subtraction (QBNE)
Pre-ProcessingQuantile Based Noise Estimationfor spectral subtraction (QBNE)
• Assuming that each frequency band contain only noise in a fraction of time even during speech
• For each frequency band the frames are sorted by amplitude
• A fixed q-value equal for all frequency bands
• Intersection between the vertical line and each frequency band is the noise estimate
• Problem with mis-matched training and test conditions
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
q-value
Nor
mal
ized
am
plitu
de
5
INTRODUCTION METHODSMETHODS RESULTS CONCLUSION
Pre-ProcessingAdaptive Quantile Based Noise Estimation for spectral
subtraction (AQBNE)
Pre-ProcessingAdaptive Quantile Based Noise Estimation for spectral
subtraction (AQBNE)• Goal is to improve the performance
when training with low noise and testing with high noise
• Adapt to the utterance and noise levels
• Adjust the q-value for each frequency band
• Result is a q-estimation curve as opposed to a fixed value
• High and low noise situations will converge to similar representations
6
INTRODUCTION METHODSMETHODS RESULTS CONCLUSION
Filter BankSpeech Band Emphasizing Filter Bank (SBE)
Filter BankSpeech Band Emphasizing Filter Bank (SBE)
0 500 1000 1500 2000 2500 3000 3500 40000
0.2
0.4
0.6
0.8
1
Mel filterbank
0 500 1000 1500 2000 2500 3000 3500 40000
0.2
0.4
0.6
0.8
1
Speech Band Emphasizing filter bank
Frequency [Hertz]
• Mel Frequency Cepstrum Coefficient (MFCC)
– Motivated from human perception and critical bands
• Mel Frequency Filter Bank– Triangular filters– Highest resolution at low
frequencies– Resulting Importance Function
• Speech Band Emphasizing Filter Bank
– Emphasizes the primary speech band
– Highest resolution at 1500 Hz
7
INTRODUCTION METHODS RESULTSRESULTS CONCLUSION
ResultsResults
• QBNE with Mel Frequency Filter Bank showed an improvement of 15%
• AQBNE with SBE Filter Bank showed an improvement of 28%
• AQBNE with SBE Filter Bank showed a remarkable result under highly mis-matched conditions: 80% improvement compared to 21% when using QBNE with Mel Frequency Filter Bank
8
INTRODUCTION METHODS RESULTS CONCLUSIONCONCLUSION
ConclusionConclusion
• AQBNE avoids describing speech signals during training to a level of detail which is unattainable during testing under noisy conditions
• The suggested SBE Filter Bank, though empirically chosen, indicates that filter distributions other than the standard Mel-scale may attain improved performance in noisy conditions
9
Presentation of Abstract
Agenda:Purpose of the abstract.Structure of the abstract.Content of the abstract.
10
Purpose of abstract
Announcement to the 17th 7 semester conference the 21th of December 2004.
Appetizer to attract the right audience. In the abstract it is kept in mind that the
audience for this project is other 7 semester students from the institute of electronic systems in Aalborg and Esbjerg.
11
Structure of the abstract
Title: Topic:The long title gives a detailed description of the
content: ”Noise Robust Automatic Speech Recognition with Adaptive
Quantile Based Noise Estimation and Speech Band Emphasizing Filter Bank”
Nature: Noise estimation. Scope: Automatic speech recognition.
Text is structured as IMRaD structure.
12
Structure of the abstract
Throughout the text important keywords are used:ASR, Noise Estimation, Feature Extraction.
Known methods presented before new methods to create continuity.
Complexity increased during the abstract.
13
Content of the abstract
Introduction:Contains information of the initial problem, the
proposes made in the paper and field of operation.
This is the shortest section in the abstract, but contains a lot of keywords.
14
Content of the abstract
Methods:This section is the longest of the abstract, and
contains references to known methods as well as new methods and solutions are introduced.
The first sentence in this section is linket to the introduction by the phrase ”feature extraction”.
This section ends with an advertisment to the results.
15
Content of the abstract
Results:The methods that have improved the
recognition performance is presented first. The best result is mentioned with the exact
result compared to known methods.The proposed solutions that have not
improved the recognition is mentioned last in the section.
16
Content of the abstract
Discussion:First the method that did not improve the
recognition performance is explained.Secondly the methods that have improved the
recognition performance are described.The abstract is concluded by the
recommendations based on the results achieved in this project.
18
Structure of Paper
IMRaD model Introduction - Introduction Methods - Methods (PP, QBNE, AQBNE,
SBE)Results - Experimental framework
- Experimental resultsDiscussion - Conclusion
19
Introduction
Problem definition Noise in speech signals has a dramatic effect on
ASR.
Analysis Analysis of known methods. Interesting known methods (PP, QBNE, MFCC). Results: Develop new methods and combine different
methods.
20
Methods
Known methods PP – Short presentation of method and
implementation. QBNE – Short presentation of method and thorough
description of implementation. New methods
AQBNE and SBE – Motivation (Why is this a good method?) – Implementation (Compared to QBNE and MFCC)
21
Results
Description of measurement instrument (HTK) and SpeechDat-Car database.
Results in tables
23
Conclusion
Contains a summary of the important results, so it can be read and understood right after reading the abstract.
25
Structure and organization
The worksheets are basis for the paper and the implementation of our system Directly information about methods Necessary background knowledge
Give the group members the necessary knowledge to understand a subject Write in english The topic of the project was completely new to us
Impossible to plan work for a long time period Discuss subjects, study, discuss new subjects
Writing procedure: The group discusses which subjects that need to be investigated 1-2 persons work together and write a work sheet The group read and give feedback 1 person finish it
26
Brief presentation of work sheets
1. Introduction State the aim of the project and our initial problem
2. Speech production Human speech characteristics
3. Hidden Markov Model Often used in speech recognition systems
4. Unwanted noise and effects Noise and affects that can affect our system
5. Java execution speed test Consideration of implementation language
6. Java processor blocks Documents the implementation of our system
7. Matlab related How to read sound files from SpeechDat-Car database
27
Brief presentation of work sheets
8. Frontend Interfaces Input: SpeechDat-Car audio wave format, Output: HTK format
9. The standard frontend Transformation of the sampled audio data into freature vectors
10. Post-Processing 11. The Mel filterbank 12. Quantile Based Noise Estimation 13. Spectral subtraction 14. Experimental framework
How we have tested the methods influence on the speech recognition 15. Experimental results
Describes our baseline and refer to App. A 16. Structure of abstract and paper
Overview of the important elements App. A: Raw results
28
Causality
Causal:Post-ProcessingSpeech Band Emphasizing Filter Bank
Non-causal:(Adaptive) Quantile Based Noise Estimation
29
Ordinary (non-causal) QBNE
N()
AmplitudeAmplitude
Timeq
0 0.5 1
One discrete frequency () Entire utterance is used for noise estimate
30
Causal QBNE
x[n]x[n-1]x[n-2]
Input (10ms ~ 100Hz)
Z[q2]Z[0] Z[Q]Z[q1]
Z[q1] < x[n] < Z[q2]
discard (n odd)discard (n even)fixed length quantile buffer
Amplitude
q0 0.5 1
N(n)
Z[Q]
Z[0]
Z[Q/2]
One discrete frequency () Noise estimate updated for each new frame
31
Causal QBNE
Amplitude
q0 0.5 1
N(n)
Z[Q]
Z[0]
Z[Q/2]
Amplitude
q0 0.5 1
Z[Q]
Z[0]
x[n]
Amplitude
q0 0.5 1
Z[Q]
Z[0]
x[n]N(n)
Z[Q/2]
Amplitude
q0 0.5 1
Z[Q]
Z[0]
x[n]
Amplitude
q0 0.5 1
Z[Q]x[n]
Z[0]N(n)
Z[Q/2]
n=0n=1n=2
33
Causality
PP and SBE are inherently causal QBNE and AQBNE can be made causal
by using af buffer for the quantileAdditional computational costReduced storage requirement
36
Future work (2/2) Combine AQBNE and SBE with advanced front-
end (WI008)
Source: ETSI ES 202 050 V1.1.3 (2003-11)
AQBNE SBE Filter-Bank
37
Project working process
Project reporting form No 3 weeks final report correction Worksheets easier to write than report chapters
Difficult to parallelize tasks Few tasks Large groups
Information gathering State of the art knowledge from scientific papers No textbooks with up to date information exist