Research Presentation:
description
Transcript of Research Presentation:
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 1 of 36
Seungchan LeeIntelligent Electronic Systems
Human and Systems EngineeringDepartment of Electrical and Computer Engineering
On a Utility for Speaker Verification
Research Presentation:
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 2 of 36
Set up a standard IES environment
• The first appearance at CAVS is good.• The first thing to do is set up IES environment.
Create Enlistment Our production system is consist of many classes I’m surprised at the structure of our software
environment. Even though many works has been
already done, I need to consolidate our system
with other IFCers. GroupWise :
Good communication and schedule management
tools within our group After that, I could make a program and compile it in my local machine.
client
CVS
repository
SERVER
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 3 of 36
First IFC program, instruction
• First simple IFC program do the following instructions
Reads a 3×3 float matrix from an Sof file.
Reads a 3×1 float vector from an Sof file.
Multiples the vector and matrix using the equation Z=alpha*A*B
Writes the result to an Sof file.
Allows the value of alpha to be set from the command line:
foo.exe –alpha 2.0 input.sof output.sof
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 4 of 36
First IFC program, flow
Foo.exe
Foo.exe –alpha 2.0 input_file output_file
Read Input Sof
Write to Output Sof file
Read 3×3 float matrix
Read 3×1 float vector
Multiples the vector and matrix
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 5 of 36
First IFC program
• After completing first IFC program, I ’m more familiar with our production system.
• When I have questions about our production system, our prominent group members always helps me about my questions.
It’s good to study alone, but sometime
it is better to ask an expert in the programming.
The more I know about our production system,The more I know about our production system,
The more I have many questions.The more I have many questions.
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 6 of 36
First IFC program
• First Question
- How can we view the contents of the class?
• Answer :
It is possible through debug method.
In order to view the contents of the Sof object, it is so hard to figure out during the debugging time. Instead of, I used debug method that is included in the source code. Sometimes this may retard our debugging time, but I know this is best way until now. Thus, I can figure out which data is contained in the Sof object. All other class variables are same.
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 7 of 36
First IFC program
• Second Question
- Using debug method, why changes string capacity? SysString token = “oscar had a heap of apples”
Using debug method, we can see the each value.
<SysString::token> value_d = (5 >= 5) oscar
<SysString::token> value_d = (5 >= 3) had
<SysString::token> value_d = (5 >= 1) a
<SysString::token> value_d = (5 >= 4) heap
<SysString::token> value_d = (5 >= 2) of
<SysString::token> value_d = (6 >= 6) apples
• Answer In the expression (n >= m), 'n' is to total capacity of the data structure,
and 'm' is the current length. So for the line:
<SysString::token> value_d = (5 >= 3) had
The capacity of the SysString is 5 and the current length is 3, which is obvious from the string 'had'.
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 8 of 36
First IFC program
• Third Question
- What “L” means? - In our production system, all of the classes uses “L” character.
For example,
SysString file1;
file1.assign(L"/tmp/foo_bin.sof");
I didn’t exactly figure out why this “L” is used.
• Answer
The "L" is just a macro that tells the compiler that the following string is a Unicode string.
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 9 of 36
ISIP_VERIFY
• Basic Work Flow
- Decide what added and removed in the new version
- Analyze old version
- Draw class diagram
- Design new version
- Coding and Compilation
- Testing and fixing bugs
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 10 of 36
ISIP_VERIFY
• Decide what added and removed in the new version - Currently, isip_verify does Speaker Verification, but only uses HMM
algorithm. We want new isip_verify performs that function using HMM, SVM, RVM algorithm. This means new version of “isip_verify” will be more general utility than the old version.
• Analyze old version - isip_verify utility uses SpeakerVerifier,VerifyHMM,HMM classes, and does both training and testing. Different to the “GMM” case, “SVM”
statistical model have “isip_svm_learn” and “isip_svm_classify”. While “isip_svm_learn” utility can process training, “isip_svm_classify” can process testing.
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 11 of 36
ISIP_VERIFY
• The problem : 1. isip_verify can process only using “GMM” statistical model. 2. We does not have “RVM” routine which can do same function of “SVM” utility.
• Solution :
1. Add SVM, RVM routine in the isip_verify
2. Add same functionality in the RVM class.
3. Modify the SpeakerVerifier class.
We can make a utility which can do all functions which I mentioned.
To begin with, I drew class block diagrams of each utility and make sure the relationship of classes and functions. After that, I could figure out more easily about these utilities. Next, I drew the flow chart of new utilities.
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 12 of 36
ISIP_VERIFY
ISIP_VERIFY (util/speech)
SpeakerVerifier (asr)
VerifyHMM (pr)
If algorithm = HMM
If algorithm = VERIFY
Verify()
HiddenMarkovModel(pr)If algorithm = TRAIN
Train and model creation
If implementation = LIKELIHOOD
Verifyl()
LIKELIHOOD RATIO
Verifylr()
run()
algorithm = TRAIN
Implementation = BAUM WELCH
linearDecoder() Run()
else
Set algorithm
Set implementation
Class Block Diagram
Parameter check
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 13 of 36
ISIP_VERIFY ISIP_SVM_LEARN
isip_svm_learn (util/speech)
SupportVectorMachine(pr)
if algorithm = SEQUENTIAL_MINIMAL_OPTIMIZATION
sequentialMinimalOptimization()
train()
determine the support vector
writeModel()
loadFeature() positive example, negative example
StatisticalModel(stat) – SupportVectorModel type
StatisticalModelBase
SupportVectorModel(stat)getSupportVectorModel()
getBias()
getKernels()
getAlphas()
getSupportVectors()
write()
Parameter check
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 14 of 36
ISIP_VERIFY ISIP_SVM_CLASSIFY
isip_svm_classify (util/speech)
StatisticalModel(stat)
AudioDatabase(mmedia)
FeatureFile(mmedia)
read()
getRecord() getBufferData()getSupportVectorModel()
open()
getDistance()
open
write the distance to output file
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 15 of 36
ISIP_VERIFY FLOW CHART
Isip_verify (new version)
algorithm
HMM SVM RVM
mode mode
train test train test
svmTrain() svmTest() rvmTrain() rvmTest()
isip_verify (old version)
isip_verify -param.sof .... -algo_type [hmm,svm,rvm] –mode [train, test]
Check statistical_model
= “GMM”
error
No
error error
No No
Check statistical_model
= “SVM”
Check statistical_model
= “RVM”
Check “algo_type” option
Check “mode” option Check “mode” option(Model incorrect)(Model incorrect) (Model incorrect)
verifyHMM class processes parameter file for isip_verify
which can do both training and testing
= gmmVerify()
Yes
No
Since no algo_type was specified, HMM algo_type was chosen
statistical model
Yes
statistical model
statistical model
NoNo
You must specify modeerror error
YesYes
Yes Yes
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 16 of 36
ISIP_VERIFY
• Coding and Compilation 1. Add and remove parameters and check the parameters (Won)
2. Combine three functionality
- new “isip_verify” performs run() method in that utility and run() method
call support vector machine object or relevance vector machine object,
then performing training. This enables us to implement three models
on one utility.
3. SpeakerVerifier class
- include SVM, RVM class
- modify parameter check
- modify run(sdb) method
- add run(pos_sdb,neg_sdb) method
4. RVM class
- Add training and testing module (Sridhar)
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 17 of 36
ISIP_VERIFY
• Problems during coding and compilation
- How to verify SpeakerVerifier class?• After modifying existing class, we need to verify the correctness.• Diagnose method performs this functionality in our production system.• This method is implemented *_02.cc in every class.• After compiling the class, we execute “make test”.• This automatically check every function in that class.
- How can we resolve segmentation fault? One of the most difficult things to figure out the reason.
Comment out all new modules, and then add one module, compile the
class. And then test it. This is continued when every new module is
tested.
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 18 of 36
ISIP_VERIFY
• Problems during coding and compilation
- Compilation, debugging time - When developing a new program, one of the most time consuming
works is compiling and debugging.
- In our production system, it takes much time to compile and debug a
program. We have so many linking processes when compiling a
program.
- How can we resolve it? It is faster to do in our local repository.
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 19 of 36
ISIP_VERIFY
• Testing and fixing bugs This part is as important as previous steps. We can find faults and missing points during this step.
Problems : What happens sdb object? Normally, sdb object contains every commandline options.
(except parameters) However, the sdb object loses its contents when passing to the
SpeakerVerifier class.
How can fix that?• Comment out all code except control code.
• This is because I did not give list file option.
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 20 of 36
Software Release
• What need to know for Software Release?
Varmint utility : to track down all problems Production system :
• In order to better understand our system, I did and will do the followings.
• Data Preparation
• Feature Extraction
• Recognition
• Acoustic modeling
• Language modeling
These will be more specifically explain after this topic
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 21 of 36
Software Release
• ProductionRuleTokenType class
It uses lots of if-else statement when doing read/write function.
Instead of doing this, we can use NameMap class. In order to do that,
• Declare the NameMap class and modified related module.
• Problems :
I met run-time errors.
• Solution : – I made a simple program that includes diagnose method in
prtt_02.cc.– After track down the function, I could find the reason. – I firstly checked in this class on our production system.
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 22 of 36
Software Release
• isip_lm_tester This utility randomly generates sentences based on the language model file
and tests the language model.
Problem : Currently, generating state transcriptions won’t generate past first
symbols at the highest level.
What to do? I need to track down this problem, but it requires to the understanding
of language model.
Read and study our tutorial on the production system thoroughly,
and then can involve in fixing bugs in isip_lm_tester.
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 23 of 36
Production System
• In this part, I will go from Data preparation to Feature extraction.
• How can we better understand our production system?
- Data Preparation
- Feature extraction
- Recognition
- Acoustic modeling
- Language modeling
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 24 of 36
Production System, Data Preparation
• Data Preparation Why difficult as a beginner?
- In normal programming, preparing input data is not hard.
- But, in our production system, it is not easy to prepare that for a
beginner.
It requires the knowledge of speech.
This includes speech file format, file conversion, sampling
Speech file - Header + Sampled data
- Sampled data raw files
headerData
headerData
headerData
header
DataData
WAV, Sof
SPHERE, AU
Raw
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 25 of 36
Production System, Data Preparation
• Sof Format Information of thelocation of each object stored in the file, and the
corresponding object data. Support two basic storage formats
- text : human readable files
- binary : sampled data Used by all data objects in the ISIP environment to unify and simplify I/O. Binary format :
- Handle machine architecture differences with automatic byte
transformations.
- Used for large quantities of data for the obvious efficiency gains.
- The objects are stored in a binary tree and a symbol table is used to
hold the object class names.
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 26 of 36
Production System, Data Preparation
Text format :
- Used User input parameter files in the ISIP environment.
- Simple format that consists of object names and tags,
followed by the object data
- Example :
@ Sof v1.0 @ @ Float 0 @ value = 1.3; @ VectorFloat 0 @ value = 3.5,5.7,3.8; @ VectorLong 0 @ value = 2,3;
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 27 of 36
Production System, Data Preparation
• Converting from external (i.e., SPHERE, WAV) format to raw format.
speech.sph speech.raw
1. Convert the SPHERE file's binary data
to 16-bit linear samples using w_decode
w_decode -o pcm speech.sph speech-nb.sph
2. Strip the file's header using h_strip
h_strip speech-nb.sph speech.raw
3. The result is speech.raw which is identical
everything except missing first 1024-bytes
header information
4. One line command : 2 + 3
w_decode -o pcm speech.sph - | h_strip - - > speech.raw
Header
Data
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 28 of 36
Production System, Data Preparation
• Verification of Conversion to Raw SoX: Audio Playback
sox -t .sw -r 16000 speech.raw -t .au speech.au
audioplay speech.au File Size Comparison: Using "ls -l"
ls -l speech.*
-rw-rw-r-- 1 may isip 97486 Sep 10 15:19 speech.raw -rw-rw-r-- 1 may isip 98510 Sep 10 15:12 speech.sph
We can see the fifth field is file size.
Speech.raw is 1024 bytes smaller than speech.sph. Octel Dump (od): Listing Values
od -t d2 speech.raw
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 29 of 36
Production System, Data Preparation
• Creating Sof file : raw file Sof file Using isip_make_sof type the following :
isip_make_sof speech.raw
This creates binary file. If you want to create text version, type the following
isip_make_sof -type text -suffix _text speech.raw
DataData
Header
Isip_make_sof
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 30 of 36
Production System, Feature Extraction
• What is feature extraction? Speech Recognizer dose not
understand human voice Only certain features of human voice
are useful for recognizer decoding Must be numerically measured and
stored feature vector The process of taking these
measurements is known as
feature extraction.
Include the followings.
- converting the signal to a digital form
- measuring some important characters of the signal
- augmenting these measurements
Human voice
MicroPhone
Digital Signal
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 31 of 36
Production System, Feature Extraction
• Frame Typical frame duration in speech recognition is 10 ms Determines the number of times we produce a feature vector
• Window Typical window duration is 25 ms Surrounding the frame for smoother representation of the speech data Determine the number of samples
• Sampling rate :
number of samples per second taken from a continuous signal to make a discrete signal
Example) 8 Khz sampling rate with a frame duration of 10 ms, measurements would be taken over 80 samples to produce one feature vector.
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 32 of 36
Production System, Feature Extractionm, Signal Flow Graph
• Basic process of extracting a single feature
Input : Speech data stored in digital form on a computer. Energy : A computer program or algorithm specifically designed to
measure energy values in the speech data. Ouput : A computer file which stores the measurements of features
• Including window Determine the number of samples used to calculate the energy
measurements
input Energy output
input Energy outputWind
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 33 of 36
Production System, Feature Extraction, Signal Flow Graph
• Process of computing the frequency spectrum for a speech signal
Energy – time domain Converting signals from the time domain to the frequency domain Spec : represents the Fourier Transform
• Additional methods are needed to fully measure the features needed by a
speech recognizer. • Further analyze FFT of speech signal• MFCC : Use a mathematical transformation called the cepstrum which
computes the inverse Fourier transform of the log-spectrum of the speech signal.
input Spec outputWind
input Spec outputWind Ceps
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 34 of 36
Production System, Feature Extraction, Signal Flow Graph
• Recipe : The information for each component is stored in a single entity.
- format of the speech input
- algorithms for extracting the features
- format of the output
- make recipe using isip_transform
- Example) simple signal flow graph for extracting energy
inp
out
Engy
Recipe1
Recipe
File
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 35 of 36
Production System, Feature Extraction, Signal Flow Graph
• More complex Recipes A single recipe file is produced for the entire graph.
inp
out
Wind
Recipe2
Recipe
File
Engy Ceps
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 36 of 36
Q & A
1. ordinary data type and function - In our production system, all data type is used in our classes.
Instead of using float, why we use Float?
This made me so confused. When I tried to use commandline
interface, I used cout, cin function in C++ class. However, the
situation is different in our system.
ISIP: Research Presentation Seungchan Lee Feb.16.2006 Page 37 of 36
Reference
• Production System Tutorial http://www.cavs.msstate.edu/hse/ies/projects/speech/software/tutorials/production/fundamentals/current/