Research Presentation:

37
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 1 of 36 Seungchan Lee Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering On a Utility for Speaker Verification Research Presentation:

description

Research Presentation:. On a Utility for Speaker Verification. Seungchan Lee Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering. Set up a standard IES environment. The first appearance at CAVS is good. - PowerPoint PPT Presentation

Transcript of Research Presentation:

Page 1: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 1 of 36

Seungchan LeeIntelligent Electronic Systems

Human and Systems EngineeringDepartment of Electrical and Computer Engineering

On a Utility for Speaker Verification

Research Presentation:

Page 2: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 2 of 36

Set up a standard IES environment

• The first appearance at CAVS is good.• The first thing to do is set up IES environment.

Create Enlistment Our production system is consist of many classes I’m surprised at the structure of our software

environment. Even though many works has been

already done, I need to consolidate our system

with other IFCers. GroupWise :

Good communication and schedule management

tools within our group After that, I could make a program and compile it in my local machine.

client

CVS

repository

SERVER

Page 3: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 3 of 36

First IFC program, instruction

• First simple IFC program do the following instructions

Reads a 3×3 float matrix from an Sof file.

Reads a 3×1 float vector from an Sof file.

Multiples the vector and matrix using the equation Z=alpha*A*B

Writes the result to an Sof file.

Allows the value of alpha to be set from the command line:

foo.exe –alpha 2.0 input.sof output.sof

Page 4: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 4 of 36

First IFC program, flow

Foo.exe

Foo.exe –alpha 2.0 input_file output_file

Read Input Sof

Write to Output Sof file

Read 3×3 float matrix

Read 3×1 float vector

Multiples the vector and matrix

Page 5: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 5 of 36

First IFC program

• After completing first IFC program, I ’m more familiar with our production system.

• When I have questions about our production system, our prominent group members always helps me about my questions.

It’s good to study alone, but sometime

it is better to ask an expert in the programming.

The more I know about our production system,The more I know about our production system,

The more I have many questions.The more I have many questions.

Page 6: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 6 of 36

First IFC program

• First Question

- How can we view the contents of the class?

• Answer :

It is possible through debug method.

In order to view the contents of the Sof object, it is so hard to figure out during the debugging time. Instead of, I used debug method that is included in the source code. Sometimes this may retard our debugging time, but I know this is best way until now. Thus, I can figure out which data is contained in the Sof object. All other class variables are same.

Page 7: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 7 of 36

First IFC program

• Second Question

- Using debug method, why changes string capacity? SysString token = “oscar had a heap of apples”

Using debug method, we can see the each value.

<SysString::token> value_d = (5 >= 5) oscar

<SysString::token> value_d = (5 >= 3) had

<SysString::token> value_d = (5 >= 1) a

<SysString::token> value_d = (5 >= 4) heap

<SysString::token> value_d = (5 >= 2) of

<SysString::token> value_d = (6 >= 6) apples

• Answer In the expression (n >= m), 'n' is to total capacity of the data structure,

and 'm' is the current length. So for the line:

<SysString::token> value_d = (5 >= 3) had

The capacity of the SysString is 5 and the current length is 3, which is obvious from the string 'had'.

Page 8: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 8 of 36

First IFC program

• Third Question

- What “L” means? - In our production system, all of the classes uses “L” character.

For example,

SysString file1;

file1.assign(L"/tmp/foo_bin.sof");

I didn’t exactly figure out why this “L” is used.

• Answer

The "L" is just a macro that tells the compiler that the following string is a Unicode string.

Page 9: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 9 of 36

ISIP_VERIFY

• Basic Work Flow

- Decide what added and removed in the new version

- Analyze old version

- Draw class diagram

- Design new version

- Coding and Compilation

- Testing and fixing bugs

Page 10: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 10 of 36

ISIP_VERIFY

• Decide what added and removed in the new version - Currently, isip_verify does Speaker Verification, but only uses HMM

algorithm. We want new isip_verify performs that function using HMM, SVM, RVM algorithm. This means new version of “isip_verify” will be more general utility than the old version.

• Analyze old version - isip_verify utility uses SpeakerVerifier,VerifyHMM,HMM classes, and does both training and testing. Different to the “GMM” case, “SVM”

statistical model have “isip_svm_learn” and “isip_svm_classify”. While “isip_svm_learn” utility can process training, “isip_svm_classify” can process testing.

Page 11: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 11 of 36

ISIP_VERIFY

• The problem : 1. isip_verify can process only using “GMM” statistical model. 2. We does not have “RVM” routine which can do same function of “SVM” utility.

• Solution :

1. Add SVM, RVM routine in the isip_verify

2. Add same functionality in the RVM class.

3. Modify the SpeakerVerifier class.

We can make a utility which can do all functions which I mentioned.

To begin with, I drew class block diagrams of each utility and make sure the relationship of classes and functions. After that, I could figure out more easily about these utilities. Next, I drew the flow chart of new utilities.

Page 12: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 12 of 36

ISIP_VERIFY

ISIP_VERIFY (util/speech)

SpeakerVerifier (asr)

VerifyHMM (pr)

If algorithm = HMM

If algorithm = VERIFY

Verify()

HiddenMarkovModel(pr)If algorithm = TRAIN

Train and model creation

If implementation = LIKELIHOOD

Verifyl()

LIKELIHOOD RATIO

Verifylr()

run()

algorithm = TRAIN

Implementation = BAUM WELCH

linearDecoder() Run()

else

Set algorithm

Set implementation

Class Block Diagram

Parameter check

Page 13: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 13 of 36

ISIP_VERIFY ISIP_SVM_LEARN

isip_svm_learn (util/speech)

SupportVectorMachine(pr)

if algorithm = SEQUENTIAL_MINIMAL_OPTIMIZATION

sequentialMinimalOptimization()

train()

determine the support vector

writeModel()

loadFeature() positive example, negative example

StatisticalModel(stat) – SupportVectorModel type

StatisticalModelBase

SupportVectorModel(stat)getSupportVectorModel()

getBias()

getKernels()

getAlphas()

getSupportVectors()

write()

Parameter check

Page 14: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 14 of 36

ISIP_VERIFY ISIP_SVM_CLASSIFY

isip_svm_classify (util/speech)

StatisticalModel(stat)

AudioDatabase(mmedia)

FeatureFile(mmedia)

read()

getRecord() getBufferData()getSupportVectorModel()

open()

getDistance()

open

write the distance to output file

Page 15: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 15 of 36

ISIP_VERIFY FLOW CHART

Isip_verify (new version)

algorithm

HMM SVM RVM

mode mode

train test train test

svmTrain() svmTest() rvmTrain() rvmTest()

isip_verify (old version)

isip_verify -param.sof .... -algo_type [hmm,svm,rvm] –mode [train, test]

Check statistical_model

= “GMM”

error

No

error error

No No

Check statistical_model

= “SVM”

Check statistical_model

= “RVM”

Check “algo_type” option

Check “mode” option Check “mode” option(Model incorrect)(Model incorrect) (Model incorrect)

verifyHMM class processes parameter file for isip_verify

which can do both training and testing

= gmmVerify()

Yes

No

Since no algo_type was specified, HMM algo_type was chosen

statistical model

Yes

statistical model

statistical model

NoNo

You must specify modeerror error

YesYes

Yes Yes

Page 16: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 16 of 36

ISIP_VERIFY

• Coding and Compilation 1. Add and remove parameters and check the parameters (Won)

2. Combine three functionality

- new “isip_verify” performs run() method in that utility and run() method

call support vector machine object or relevance vector machine object,

then performing training. This enables us to implement three models

on one utility.

3. SpeakerVerifier class

- include SVM, RVM class

- modify parameter check

- modify run(sdb) method

- add run(pos_sdb,neg_sdb) method

4. RVM class

- Add training and testing module (Sridhar)

Page 17: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 17 of 36

ISIP_VERIFY

• Problems during coding and compilation

- How to verify SpeakerVerifier class?• After modifying existing class, we need to verify the correctness.• Diagnose method performs this functionality in our production system.• This method is implemented *_02.cc in every class.• After compiling the class, we execute “make test”.• This automatically check every function in that class.

- How can we resolve segmentation fault? One of the most difficult things to figure out the reason.

Comment out all new modules, and then add one module, compile the

class. And then test it. This is continued when every new module is

tested.

Page 18: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 18 of 36

ISIP_VERIFY

• Problems during coding and compilation

- Compilation, debugging time - When developing a new program, one of the most time consuming

works is compiling and debugging.

- In our production system, it takes much time to compile and debug a

program. We have so many linking processes when compiling a

program.

- How can we resolve it? It is faster to do in our local repository.

Page 19: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 19 of 36

ISIP_VERIFY

• Testing and fixing bugs This part is as important as previous steps. We can find faults and missing points during this step.

Problems : What happens sdb object? Normally, sdb object contains every commandline options.

(except parameters) However, the sdb object loses its contents when passing to the

SpeakerVerifier class.

How can fix that?• Comment out all code except control code.

• This is because I did not give list file option.

Page 20: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 20 of 36

Software Release

• What need to know for Software Release?

Varmint utility : to track down all problems Production system :

• In order to better understand our system, I did and will do the followings.

• Data Preparation

• Feature Extraction

• Recognition

• Acoustic modeling

• Language modeling

These will be more specifically explain after this topic

Page 21: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 21 of 36

Software Release

• ProductionRuleTokenType class

It uses lots of if-else statement when doing read/write function.

Instead of doing this, we can use NameMap class. In order to do that,

• Declare the NameMap class and modified related module.

• Problems :

I met run-time errors.

• Solution : – I made a simple program that includes diagnose method in

prtt_02.cc.– After track down the function, I could find the reason. – I firstly checked in this class on our production system.

Page 22: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 22 of 36

Software Release

• isip_lm_tester This utility randomly generates sentences based on the language model file

and tests the language model.

Problem : Currently, generating state transcriptions won’t generate past first

symbols at the highest level.

What to do? I need to track down this problem, but it requires to the understanding

of language model.

Read and study our tutorial on the production system thoroughly,

and then can involve in fixing bugs in isip_lm_tester.

Page 23: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 23 of 36

Production System

• In this part, I will go from Data preparation to Feature extraction.

• How can we better understand our production system?

- Data Preparation

- Feature extraction

- Recognition

- Acoustic modeling

- Language modeling

Page 24: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 24 of 36

Production System, Data Preparation

• Data Preparation Why difficult as a beginner?

- In normal programming, preparing input data is not hard.

- But, in our production system, it is not easy to prepare that for a

beginner.

It requires the knowledge of speech.

This includes speech file format, file conversion, sampling

Speech file - Header + Sampled data

- Sampled data raw files

headerData

headerData

headerData

header

DataData

WAV, Sof

SPHERE, AU

Raw

Page 25: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 25 of 36

Production System, Data Preparation

• Sof Format Information of thelocation of each object stored in the file, and the

corresponding object data. Support two basic storage formats

- text : human readable files

- binary : sampled data Used by all data objects in the ISIP environment to unify and simplify I/O. Binary format :

- Handle machine architecture differences with automatic byte

transformations.

- Used for large quantities of data for the obvious efficiency gains.

- The objects are stored in a binary tree and a symbol table is used to

hold the object class names.

Page 26: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 26 of 36

Production System, Data Preparation

Text format :

- Used User input parameter files in the ISIP environment.

- Simple format that consists of object names and tags,

followed by the object data

- Example :

@ Sof v1.0 @ @ Float 0 @ value = 1.3; @ VectorFloat 0 @ value = 3.5,5.7,3.8; @ VectorLong 0 @ value = 2,3;

Page 27: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 27 of 36

Production System, Data Preparation

• Converting from external (i.e., SPHERE, WAV) format to raw format.

speech.sph speech.raw

1. Convert the SPHERE file's binary data

to 16-bit linear samples using w_decode

w_decode -o pcm speech.sph speech-nb.sph

2. Strip the file's header using h_strip

h_strip speech-nb.sph speech.raw

3. The result is speech.raw which is identical

everything except missing first 1024-bytes

header information

4. One line command : 2 + 3

w_decode -o pcm speech.sph - | h_strip - - > speech.raw

Header

Data

Page 28: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 28 of 36

Production System, Data Preparation

• Verification of Conversion to Raw SoX: Audio Playback

sox -t .sw -r 16000 speech.raw -t .au speech.au

audioplay speech.au File Size Comparison: Using "ls -l"

ls -l speech.*

-rw-rw-r-- 1 may isip 97486 Sep 10 15:19 speech.raw -rw-rw-r-- 1 may isip 98510 Sep 10 15:12 speech.sph

We can see the fifth field is file size.

Speech.raw is 1024 bytes smaller than speech.sph. Octel Dump (od): Listing Values

od -t d2 speech.raw

Page 29: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 29 of 36

Production System, Data Preparation

• Creating Sof file : raw file Sof file Using isip_make_sof type the following :

isip_make_sof speech.raw

This creates binary file. If you want to create text version, type the following

isip_make_sof -type text -suffix _text speech.raw

DataData

Header

Isip_make_sof

Page 30: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 30 of 36

Production System, Feature Extraction

• What is feature extraction? Speech Recognizer dose not

understand human voice Only certain features of human voice

are useful for recognizer decoding Must be numerically measured and

stored feature vector The process of taking these

measurements is known as

feature extraction.

Include the followings.

- converting the signal to a digital form

- measuring some important characters of the signal

- augmenting these measurements

Human voice

MicroPhone

Digital Signal

Page 31: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 31 of 36

Production System, Feature Extraction

• Frame Typical frame duration in speech recognition is 10 ms Determines the number of times we produce a feature vector

• Window Typical window duration is 25 ms Surrounding the frame for smoother representation of the speech data Determine the number of samples

• Sampling rate :

number of samples per second taken from a continuous signal to make a discrete signal

Example) 8 Khz sampling rate with a frame duration of 10 ms, measurements would be taken over 80 samples to produce one feature vector.

Page 32: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 32 of 36

Production System, Feature Extractionm, Signal Flow Graph

• Basic process of extracting a single feature

Input : Speech data stored in digital form on a computer. Energy : A computer program or algorithm specifically designed to

measure energy values in the speech data. Ouput : A computer file which stores the measurements of features

• Including window Determine the number of samples used to calculate the energy

measurements

input Energy output

input Energy outputWind

Page 33: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 33 of 36

Production System, Feature Extraction, Signal Flow Graph

• Process of computing the frequency spectrum for a speech signal

Energy – time domain Converting signals from the time domain to the frequency domain Spec : represents the Fourier Transform

• Additional methods are needed to fully measure the features needed by a

speech recognizer. • Further analyze FFT of speech signal• MFCC : Use a mathematical transformation called the cepstrum which

computes the inverse Fourier transform of the log-spectrum of the speech signal.

input Spec outputWind

input Spec outputWind Ceps

Page 34: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 34 of 36

Production System, Feature Extraction, Signal Flow Graph

• Recipe : The information for each component is stored in a single entity.

- format of the speech input

- algorithms for extracting the features

- format of the output

- make recipe using isip_transform

- Example) simple signal flow graph for extracting energy

inp

out

Engy

Recipe1

Recipe

File

Page 35: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 35 of 36

Production System, Feature Extraction, Signal Flow Graph

• More complex Recipes A single recipe file is produced for the entire graph.

inp

out

Wind

Recipe2

Recipe

File

Engy Ceps

Page 36: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 36 of 36

Q & A

1. ordinary data type and function - In our production system, all data type is used in our classes.

Instead of using float, why we use Float?

This made me so confused. When I tried to use commandline

interface, I used cout, cin function in C++ class. However, the

situation is different in our system.

Page 37: Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 37 of 36

Reference

• Production System Tutorial http://www.cavs.msstate.edu/hse/ies/projects/speech/software/tutorials/production/fundamentals/current/