INTRODUCTION

1

INTRODUCTIONINTRODUCTION METHODS RESULTS CONCLUSION

Noise Robust

Speech Recognition

Group SB740

Noise Robust

Speech Recognition

Group SB740

2


Standard feature extraction Standard feature extraction

Framing FFT Filter BankCepstrum

Coefficientsspeech features

3


Improved feature extraction Improved feature extraction

Filter BankCepstrum

CoefficientsFramed FFT

spectrumfeatures

Pre-Processing

Post-Processing

4

INTRODUCTION METHODSMETHODS RESULTS CONCLUSION

Pre-ProcessingQuantile Based Noise Estimationfor spectral subtraction (QBNE)

Pre-ProcessingQuantile Based Noise Estimationfor spectral subtraction (QBNE)

• Assuming that each frequency band contain only noise in a fraction of time even during speech

• For each frequency band the frames are sorted by amplitude

• A fixed q-value equal for all frequency bands

• Intersection between the vertical line and each frequency band is the noise estimate

• Problem with mis-matched training and test conditions

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

q-value

Nor

mal

ized

am

plitu

de

5


Pre-ProcessingAdaptive Quantile Based Noise Estimation for spectral

subtraction (AQBNE)

Pre-ProcessingAdaptive Quantile Based Noise Estimation for spectral

subtraction (AQBNE)• Goal is to improve the performance

when training with low noise and testing with high noise

• Adapt to the utterance and noise levels

• Adjust the q-value for each frequency band

• Result is a q-estimation curve as opposed to a fixed value

• High and low noise situations will converge to similar representations

6


Filter BankSpeech Band Emphasizing Filter Bank (SBE)

Filter BankSpeech Band Emphasizing Filter Bank (SBE)

0 500 1000 1500 2000 2500 3000 3500 40000

0.2

0.4

0.6

0.8

1

Mel filterbank

0 500 1000 1500 2000 2500 3000 3500 40000

0.2

0.4

0.6

0.8

1

Speech Band Emphasizing filter bank

Frequency [Hertz]

• Mel Frequency Cepstrum Coefficient (MFCC)

– Motivated from human perception and critical bands

• Mel Frequency Filter Bank– Triangular filters– Highest resolution at low

frequencies– Resulting Importance Function

• Speech Band Emphasizing Filter Bank

– Emphasizes the primary speech band

– Highest resolution at 1500 Hz

7

INTRODUCTION METHODS RESULTSRESULTS CONCLUSION

ResultsResults

• QBNE with Mel Frequency Filter Bank showed an improvement of 15%

• AQBNE with SBE Filter Bank showed an improvement of 28%

• AQBNE with SBE Filter Bank showed a remarkable result under highly mis-matched conditions: 80% improvement compared to 21% when using QBNE with Mel Frequency Filter Bank

8

INTRODUCTION METHODS RESULTS CONCLUSIONCONCLUSION

ConclusionConclusion

• AQBNE avoids describing speech signals during training to a level of detail which is unattainable during testing under noisy conditions

• The suggested SBE Filter Bank, though empirically chosen, indicates that filter distributions other than the standard Mel-scale may attain improved performance in noisy conditions

9

Presentation of Abstract

Agenda:Purpose of the abstract.Structure of the abstract.Content of the abstract.

10

Purpose of abstract

Announcement to the 17th 7 semester conference the 21th of December 2004.

Appetizer to attract the right audience. In the abstract it is kept in mind that the

audience for this project is other 7 semester students from the institute of electronic systems in Aalborg and Esbjerg.

11

Structure of the abstract

Title: Topic:The long title gives a detailed description of the

content: ”Noise Robust Automatic Speech Recognition with Adaptive

Quantile Based Noise Estimation and Speech Band Emphasizing Filter Bank”

Nature: Noise estimation. Scope: Automatic speech recognition.

Text is structured as IMRaD structure.

12

Structure of the abstract

Throughout the text important keywords are used:ASR, Noise Estimation, Feature Extraction.

Known methods presented before new methods to create continuity.

Complexity increased during the abstract.

13

Content of the abstract

Introduction:Contains information of the initial problem, the

proposes made in the paper and field of operation.

This is the shortest section in the abstract, but contains a lot of keywords.

14


Methods:This section is the longest of the abstract, and

contains references to known methods as well as new methods and solutions are introduced.

The first sentence in this section is linket to the introduction by the phrase ”feature extraction”.

This section ends with an advertisment to the results.

15


Results:The methods that have improved the

recognition performance is presented first. The best result is mentioned with the exact

result compared to known methods.The proposed solutions that have not

improved the recognition is mentioned last in the section.

16


Discussion:First the method that did not improve the

recognition performance is explained.Secondly the methods that have improved the

recognition performance are described.The abstract is concluded by the

recommendations based on the results achieved in this project.

18

Structure of Paper

IMRaD model Introduction - Introduction Methods - Methods (PP, QBNE, AQBNE,

SBE)Results - Experimental framework

- Experimental resultsDiscussion - Conclusion

19

Introduction

Problem definition Noise in speech signals has a dramatic effect on

ASR.

Analysis Analysis of known methods. Interesting known methods (PP, QBNE, MFCC). Results: Develop new methods and combine different

methods.

20

Methods

Known methods PP – Short presentation of method and

implementation. QBNE – Short presentation of method and thorough

description of implementation. New methods

AQBNE and SBE – Motivation (Why is this a good method?) – Implementation (Compared to QBNE and MFCC)

21

Results

Description of measurement instrument (HTK) and SpeechDat-Car database.

Results in tables

22

Results

Discussion of results in text.

Chosen results in graph.

23

Conclusion

Contains a summary of the important results, so it can be read and understood right after reading the abstract.

24

Worksheets

Agenda:Structure and organizationBrief presentation of worksheets

25

Structure and organization

The worksheets are basis for the paper and the implementation of our system Directly information about methods Necessary background knowledge

Give the group members the necessary knowledge to understand a subject Write in english The topic of the project was completely new to us

Impossible to plan work for a long time period Discuss subjects, study, discuss new subjects

Writing procedure: The group discusses which subjects that need to be investigated 1-2 persons work together and write a work sheet The group read and give feedback 1 person finish it

26

Brief presentation of work sheets

1. Introduction State the aim of the project and our initial problem

2. Speech production Human speech characteristics

3. Hidden Markov Model Often used in speech recognition systems

4. Unwanted noise and effects Noise and affects that can affect our system

5. Java execution speed test Consideration of implementation language

6. Java processor blocks Documents the implementation of our system

7. Matlab related How to read sound files from SpeechDat-Car database

27

Brief presentation of work sheets

8. Frontend Interfaces Input: SpeechDat-Car audio wave format, Output: HTK format

9. The standard frontend Transformation of the sampled audio data into freature vectors

10. Post-Processing 11. The Mel filterbank 12. Quantile Based Noise Estimation 13. Spectral subtraction 14. Experimental framework

How we have tested the methods influence on the speech recognition 15. Experimental results

Describes our baseline and refer to App. A 16. Structure of abstract and paper

Overview of the important elements App. A: Raw results

28

Causality

Causal:Post-ProcessingSpeech Band Emphasizing Filter Bank

Non-causal:(Adaptive) Quantile Based Noise Estimation

29

Ordinary (non-causal) QBNE

N()

AmplitudeAmplitude

Timeq

0 0.5 1

One discrete frequency () Entire utterance is used for noise estimate

30

Causal QBNE

x[n]x[n-1]x[n-2]

Input (10ms ~ 100Hz)

Z[q2]Z[0] Z[Q]Z[q1]

Z[q1] < x[n] < Z[q2]

discard (n odd)discard (n even)fixed length quantile buffer

Amplitude

q0 0.5 1

N(n)

Z[Q]

Z[0]

Z[Q/2]

One discrete frequency () Noise estimate updated for each new frame

31

Causal QBNE

Amplitude

q0 0.5 1

N(n)

Z[Q]

Z[0]

Z[Q/2]

Amplitude

q0 0.5 1

Z[Q]

Z[0]

x[n]

Amplitude

q0 0.5 1

Z[Q]

Z[0]

x[n]N(n)

Z[Q/2]

Amplitude

q0 0.5 1

Z[Q]

Z[0]

x[n]

Amplitude

q0 0.5 1

Z[Q]x[n]

Z[0]N(n)

Z[Q/2]

n=0n=1n=2

32

Causal Adaptive QBNE

Amplitude

q0 0.5 1

Z[Q]

N(n)Z[0]

33

Causality

PP and SBE are inherently causal QBNE and AQBNE can be made causal

by using af buffer for the quantileAdditional computational costReduced storage requirement

34

Closure

Agenda: Future work Project working process

35

Future work (1/2) Implement causal AQBNE

Find optimal q-estimation curve etc.

36

Future work (2/2) Combine AQBNE and SBE with advanced front-

end (WI008)

Source: ETSI ES 202 050 V1.1.3 (2003-11)

AQBNE SBE Filter-Bank

37

Project working process

Project reporting form No 3 weeks final report correction Worksheets easier to write than report chapters

Difficult to parallelize tasks Few tasks Large groups

Information gathering State of the art knowledge from scientific papers No textbooks with up to date information exist

INTRODUCTION

Documents

Transcript of INTRODUCTION