Research Presentation:

ISIP: Research Presentation Seungchan Lee Feb.16.2006 of 36

Seungchan LeeIntelligent Electronic Systems

Human and Systems EngineeringDepartment of Electrical and Computer Engineering

On a Utility for Speaker Verification

Research Presentation:


Set up a standard IES environment

• The first appearance at CAVS is good.• The first thing to do is set up IES environment.

Create Enlistment Our production system is consist of many classes I’m surprised at the structure of our software

environment. Even though many works has been

already done, I need to consolidate our system

with other IFCers. GroupWise :

Good communication and schedule management

tools within our group After that, I could make a program and compile it in my local machine.

client

CVS

repository

SERVER


First IFC program, instruction

• First simple IFC program do the following instructions

Reads a 3×3 float matrix from an Sof file.

Reads a 3×1 float vector from an Sof file.

Multiples the vector and matrix using the equation Z=alpha*A*B

Writes the result to an Sof file.

Allows the value of alpha to be set from the command line:

foo.exe –alpha 2.0 input.sof output.sof


First IFC program, flow

Foo.exe

Foo.exe –alpha 2.0 input_file output_file

Read Input Sof

Write to Output Sof file

Read 3×3 float matrix

Read 3×1 float vector

Multiples the vector and matrix


First IFC program

• After completing first IFC program, I ’m more familiar with our production system.

• When I have questions about our production system, our prominent group members always helps me about my questions.

It’s good to study alone, but sometime

it is better to ask an expert in the programming.

The more I know about our production system,The more I know about our production system,

The more I have many questions.The more I have many questions.


First IFC program

• First Question

- How can we view the contents of the class?

• Answer :

It is possible through debug method.

In order to view the contents of the Sof object, it is so hard to figure out during the debugging time. Instead of, I used debug method that is included in the source code. Sometimes this may retard our debugging time, but I know this is best way until now. Thus, I can figure out which data is contained in the Sof object. All other class variables are same.


First IFC program

• Second Question

- Using debug method, why changes string capacity? SysString token = “oscar had a heap of apples”

Using debug method, we can see the each value.

<SysString::token> value_d = (5 >= 5) oscar

<SysString::token> value_d = (5 >= 3) had

<SysString::token> value_d = (5 >= 1) a

<SysString::token> value_d = (5 >= 4) heap

<SysString::token> value_d = (5 >= 2) of

<SysString::token> value_d = (6 >= 6) apples

• Answer In the expression (n >= m), 'n' is to total capacity of the data structure,

and 'm' is the current length. So for the line:

<SysString::token> value_d = (5 >= 3) had

The capacity of the SysString is 5 and the current length is 3, which is obvious from the string 'had'.


First IFC program

• Third Question

- What “L” means? - In our production system, all of the classes uses “L” character.

For example,

SysString file1;

file1.assign(L"/tmp/foo_bin.sof");

I didn’t exactly figure out why this “L” is used.

• Answer

The "L" is just a macro that tells the compiler that the following string is a Unicode string.


ISIP_VERIFY

• Basic Work Flow

- Decide what added and removed in the new version

- Analyze old version

- Draw class diagram

- Design new version

- Coding and Compilation

- Testing and fixing bugs


ISIP_VERIFY

• Decide what added and removed in the new version - Currently, isip_verify does Speaker Verification, but only uses HMM

algorithm. We want new isip_verify performs that function using HMM, SVM, RVM algorithm. This means new version of “isip_verify” will be more general utility than the old version.

• Analyze old version - isip_verify utility uses SpeakerVerifier,VerifyHMM,HMM classes, and does both training and testing. Different to the “GMM” case, “SVM”

statistical model have “isip_svm_learn” and “isip_svm_classify”. While “isip_svm_learn” utility can process training, “isip_svm_classify” can process testing.


ISIP_VERIFY

• The problem : 1. isip_verify can process only using “GMM” statistical model. 2. We does not have “RVM” routine which can do same function of “SVM” utility.

• Solution :

1. Add SVM, RVM routine in the isip_verify

2. Add same functionality in the RVM class.

3. Modify the SpeakerVerifier class.

We can make a utility which can do all functions which I mentioned.

To begin with, I drew class block diagrams of each utility and make sure the relationship of classes and functions. After that, I could figure out more easily about these utilities. Next, I drew the flow chart of new utilities.


ISIP_VERIFY

ISIP_VERIFY (util/speech)

SpeakerVerifier (asr)

VerifyHMM (pr)

If algorithm = HMM

If algorithm = VERIFY

Verify()

HiddenMarkovModel(pr)If algorithm = TRAIN

Train and model creation

If implementation = LIKELIHOOD

Verifyl()

LIKELIHOOD RATIO

Verifylr()

run()

algorithm = TRAIN

Implementation = BAUM WELCH

linearDecoder() Run()

else

Set algorithm

Set implementation

Class Block Diagram

Parameter check


ISIP_VERIFY ISIP_SVM_LEARN

isip_svm_learn (util/speech)

SupportVectorMachine(pr)

if algorithm = SEQUENTIAL_MINIMAL_OPTIMIZATION

sequentialMinimalOptimization()

train()

determine the support vector

writeModel()

loadFeature() positive example, negative example

StatisticalModel(stat) – SupportVectorModel type

StatisticalModelBase

SupportVectorModel(stat)getSupportVectorModel()

getBias()

getKernels()

getAlphas()

getSupportVectors()

write()

Parameter check


ISIP_VERIFY ISIP_SVM_CLASSIFY

isip_svm_classify (util/speech)

StatisticalModel(stat)

AudioDatabase(mmedia)

FeatureFile(mmedia)

read()

getRecord() getBufferData()getSupportVectorModel()

open()

getDistance()

open

write the distance to output file


ISIP_VERIFY FLOW CHART

Isip_verify (new version)

algorithm

HMM SVM RVM

mode mode

train test train test

svmTrain() svmTest() rvmTrain() rvmTest()

isip_verify (old version)

isip_verify -param.sof .... -algo_type [hmm,svm,rvm] –mode [train, test]

Check statistical_model

= “GMM”

error

No

error error

No No


= “SVM”


= “RVM”

Check “algo_type” option

Check “mode” option Check “mode” option(Model incorrect)(Model incorrect) (Model incorrect)

verifyHMM class processes parameter file for isip_verify

which can do both training and testing

= gmmVerify()

Yes

No

Since no algo_type was specified, HMM algo_type was chosen

statistical model

Yes

statistical model

statistical model

NoNo

You must specify modeerror error

YesYes

Yes Yes


ISIP_VERIFY

• Coding and Compilation 1. Add and remove parameters and check the parameters (Won)

2. Combine three functionality

- new “isip_verify” performs run() method in that utility and run() method

call support vector machine object or relevance vector machine object,

then performing training. This enables us to implement three models

on one utility.

3. SpeakerVerifier class

- include SVM, RVM class

- modify parameter check

- modify run(sdb) method

- add run(pos_sdb,neg_sdb) method

4. RVM class

- Add training and testing module (Sridhar)


ISIP_VERIFY

• Problems during coding and compilation

- How to verify SpeakerVerifier class?• After modifying existing class, we need to verify the correctness.• Diagnose method performs this functionality in our production system.• This method is implemented *_02.cc in every class.• After compiling the class, we execute “make test”.• This automatically check every function in that class.

- How can we resolve segmentation fault? One of the most difficult things to figure out the reason.

Comment out all new modules, and then add one module, compile the

class. And then test it. This is continued when every new module is

tested.


ISIP_VERIFY

• Problems during coding and compilation

- Compilation, debugging time - When developing a new program, one of the most time consuming

works is compiling and debugging.

- In our production system, it takes much time to compile and debug a

program. We have so many linking processes when compiling a

program.

- How can we resolve it? It is faster to do in our local repository.


ISIP_VERIFY

• Testing and fixing bugs This part is as important as previous steps. We can find faults and missing points during this step.

Problems : What happens sdb object? Normally, sdb object contains every commandline options.

(except parameters) However, the sdb object loses its contents when passing to the

SpeakerVerifier class.

How can fix that?• Comment out all code except control code.

• This is because I did not give list file option.


Software Release

• What need to know for Software Release?

Varmint utility : to track down all problems Production system :

• In order to better understand our system, I did and will do the followings.

• Data Preparation

• Feature Extraction

• Recognition

• Acoustic modeling

• Language modeling

These will be more specifically explain after this topic


Software Release

• ProductionRuleTokenType class

It uses lots of if-else statement when doing read/write function.

Instead of doing this, we can use NameMap class. In order to do that,

• Declare the NameMap class and modified related module.

• Problems :

I met run-time errors.

• Solution : – I made a simple program that includes diagnose method in

prtt_02.cc.– After track down the function, I could find the reason. – I firstly checked in this class on our production system.


Software Release

• isip_lm_tester This utility randomly generates sentences based on the language model file

and tests the language model.

Problem : Currently, generating state transcriptions won’t generate past first

symbols at the highest level.

What to do? I need to track down this problem, but it requires to the understanding

of language model.

Read and study our tutorial on the production system thoroughly,

and then can involve in fixing bugs in isip_lm_tester.


Production System

• In this part, I will go from Data preparation to Feature extraction.

• How can we better understand our production system?

- Data Preparation

- Feature extraction

- Recognition

- Acoustic modeling

- Language modeling


Production System, Data Preparation

• Data Preparation Why difficult as a beginner?

- In normal programming, preparing input data is not hard.

- But, in our production system, it is not easy to prepare that for a

beginner.

It requires the knowledge of speech.

This includes speech file format, file conversion, sampling

Speech file - Header + Sampled data

- Sampled data raw files

headerData

headerData

headerData

header

DataData

WAV, Sof

SPHERE, AU

Raw



• Sof Format Information of thelocation of each object stored in the file, and the

corresponding object data. Support two basic storage formats

- text : human readable files

- binary : sampled data Used by all data objects in the ISIP environment to unify and simplify I/O. Binary format :

- Handle machine architecture differences with automatic byte

transformations.

- Used for large quantities of data for the obvious efficiency gains.

- The objects are stored in a binary tree and a symbol table is used to

hold the object class names.



Text format :

- Used User input parameter files in the ISIP environment.

- Simple format that consists of object names and tags,

followed by the object data

- Example :

@ Sof v1.0 @ @ Float 0 @ value = 1.3; @ VectorFloat 0 @ value = 3.5,5.7,3.8; @ VectorLong 0 @ value = 2,3;



• Converting from external (i.e., SPHERE, WAV) format to raw format.

speech.sph speech.raw

1. Convert the SPHERE file's binary data

to 16-bit linear samples using w_decode

w_decode -o pcm speech.sph speech-nb.sph

2. Strip the file's header using h_strip

h_strip speech-nb.sph speech.raw

3. The result is speech.raw which is identical

everything except missing first 1024-bytes

header information

4. One line command : 2 + 3

w_decode -o pcm speech.sph - | h_strip - - > speech.raw

Header

Data



• Verification of Conversion to Raw SoX: Audio Playback

sox -t .sw -r 16000 speech.raw -t .au speech.au

audioplay speech.au File Size Comparison: Using "ls -l"

ls -l speech.*

-rw-rw-r-- 1 may isip 97486 Sep 10 15:19 speech.raw -rw-rw-r-- 1 may isip 98510 Sep 10 15:12 speech.sph

We can see the fifth field is file size.

Speech.raw is 1024 bytes smaller than speech.sph. Octel Dump (od): Listing Values

od -t d2 speech.raw



• Creating Sof file : raw file Sof file Using isip_make_sof type the following :

isip_make_sof speech.raw

This creates binary file. If you want to create text version, type the following

isip_make_sof -type text -suffix _text speech.raw

DataData

Header

Isip_make_sof


Production System, Feature Extraction

• What is feature extraction? Speech Recognizer dose not

understand human voice Only certain features of human voice

are useful for recognizer decoding Must be numerically measured and

stored feature vector The process of taking these

measurements is known as

feature extraction.

Include the followings.

- converting the signal to a digital form

- measuring some important characters of the signal

- augmenting these measurements

Human voice

MicroPhone

Digital Signal


Production System, Feature Extraction

• Frame Typical frame duration in speech recognition is 10 ms Determines the number of times we produce a feature vector

• Window Typical window duration is 25 ms Surrounding the frame for smoother representation of the speech data Determine the number of samples

• Sampling rate :

number of samples per second taken from a continuous signal to make a discrete signal

Example) 8 Khz sampling rate with a frame duration of 10 ms, measurements would be taken over 80 samples to produce one feature vector.


Production System, Feature Extractionm, Signal Flow Graph

• Basic process of extracting a single feature

Input : Speech data stored in digital form on a computer. Energy : A computer program or algorithm specifically designed to

measure energy values in the speech data. Ouput : A computer file which stores the measurements of features

• Including window Determine the number of samples used to calculate the energy

measurements

input Energy output

input Energy outputWind


Production System, Feature Extraction, Signal Flow Graph

• Process of computing the frequency spectrum for a speech signal

Energy – time domain Converting signals from the time domain to the frequency domain Spec : represents the Fourier Transform

• Additional methods are needed to fully measure the features needed by a

speech recognizer. • Further analyze FFT of speech signal• MFCC : Use a mathematical transformation called the cepstrum which

computes the inverse Fourier transform of the log-spectrum of the speech signal.

input Spec outputWind

input Spec outputWind Ceps



• Recipe : The information for each component is stored in a single entity.

- format of the speech input

- algorithms for extracting the features

- format of the output

- make recipe using isip_transform

- Example) simple signal flow graph for extracting energy

inp

out

Engy

Recipe1

Recipe

File



• More complex Recipes A single recipe file is produced for the entire graph.

inp

out

Wind

Recipe2

Recipe

File

Engy Ceps


Q & A

1. ordinary data type and function - In our production system, all data type is used in our classes.

Instead of using float, why we use Float?

This made me so confused. When I tried to use commandline

interface, I used cout, cin function in C++ class. However, the

situation is different in our system.


Reference

• Production System Tutorial http://www.cavs.msstate.edu/hse/ies/projects/speech/software/tutorials/production/fundamentals/current/

Research Presentation:

Documents

Transcript of Research Presentation: