2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh
-
Upload
beverly-wilkins -
Category
Documents
-
view
19 -
download
1
description
Transcript of 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh
![Page 1: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/1.jpg)
1
2009 Almost-Spring Short Course on Speech Recognition
Instructors: Bhiksha Raj and Rita Singh
Welcome
![Page 2: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/2.jpg)
2
What will the course be about
• We will cover most relevant topics of speech recognition
• The focus will be on the theory and practice– We will not discuss code for the most part– We will keep maths out of it as far as possible,
however
• We will discuss algorithms and implementation details
![Page 3: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/3.jpg)
3
Instructors
• Bhiksha Raj: Carnegie Mellon University– Expert in speech recognition
• Rita Singh: Carnegie Mellon University– Expert in speech recognition
• Peter Wolf: Independent Consultant– Previously in Dragon Systems Inc.– Sphinx4 expert, expert in speech recogintion
application development– Brought in primarily as a resource for helping with
sphinx4 and answering applications related questions
![Page 4: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/4.jpg)
4
Format of Course
• 3 Lectures daily– Morning: 8.00 AM, 1.00 – 1.30 ours– Late Morning / Early Afternoon: 11:00 AM– Afternoon: 2.30 PM
• The schedule is flexible – timings may vary depending on how much is covered
• Lectures expected to last 1.00 – 1.5 hours each
• Intervening times expected to be taken up by exercises
![Page 5: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/5.jpg)
5
Instruction Format
• Lectures will be pictorially oriented
• Although we will cover general topics, the specific implementations described will be based on CMU Sphinx– Most other systems are similar
• Exercises will be based on sphinx
![Page 6: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/6.jpg)
6
Lecture Outline: Day 1
• Lecture 1: “Speech recognition for dummies”– a quick development of speech recognition as string
matching
• Lecture 2: “Feature computation”– Explaining how features are computed for speech
recognition, including all signal processing
• Lecture 3: “Hidden Markov Models”– Describing HMMs and all associated problems
![Page 7: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/7.jpg)
7
Lecture Outline: Day 2
• Lecture 1: “Training From Continuous Speech”– How to train models from continuous speech– Phonemes, why we need them and how to train them
• Lecture 2: “Context dependent phonemes”– What are context dependent phonemes– Various types of context dependent phonemes– Training CD phonemes
• Lecture 3: “Decision Trees and State Tying”– All about decision trees for parameter sharing in ASR systems
![Page 8: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/8.jpg)
8
Lecture Outline: Day 3
• Lecture 1: “Training context-dependent models with tied states”– A (relatively) short lecture explaining the final overall process for
training models
• Lecture 2: “Language Modelling”– How to model “language” for speech recognition– Statistical language modelling
• Lecture 3: “Decoding: Basics”– Describing the basic ideas behind the decoding strategies for
continuous speech
![Page 9: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/9.jpg)
9
Lecture Outline: Day 4
• Lecture 1: “Decoding: Advanced”– Explaining various more advanced approaches to decoding
• Arriving at the state of art
• Lecture 2: “Advanced Topics”– Adaptation, Normalization, Discriminative Training etc.
• Session 3: Open.– Any spillover– Question Answering
![Page 10: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/10.jpg)
10
Exercises: Day 1
• There will be exercises following most lectures
• Lecture 1: None
• Lecture 2: Exercise on capture and feature computation from speech signals
• Lecture 3: None
![Page 11: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/11.jpg)
11
Exercises: Day 2
• Lecture 1: “Training From Continuous Speech”– Exercise on training phoneme models and
recognizing with them
• Lecture 2: “Context dependent phonemes”– Exercise on training models for context-dependent
phonemes and recognizing with them
• Lecture 3: “Decision Trees and State Tying”– Exercise on learning decision trees
![Page 12: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/12.jpg)
12
Exercises: Day 3
• Lecture 1: “Training context-dependent models with tied states”– Exercise on complete training of the ASR system
• Lecture 2: “Language Modelling”– Exercises on building JSGF grammars and Ngram
LMs for speech recognition
• Lecture 3: “Decoding: Basics”
![Page 13: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/13.jpg)
13
Lecture Outline: Day 4
• Lecture 1: “Decoding: Advanced”– Decoding with various speech recognition system
variants:• Sphinx3 flat, Sphinx3 tree, Sphinx4
• Lecture 2: “Advanced Topics”– No exercises
• Session 3: Open.– No exercises
![Page 14: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/14.jpg)
14
Software to Install
• We will be using the CMU sphinx extensively– Sphinxtrain– Sphinx3 decoder– Sphinx4 decoder– CMU LM Toolkit or SRI LM Toolkit
• We will need additional software to go with it– Java, ant, groovy for S4
![Page 15: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/15.jpg)
15
Sphinx Downloads: http://cmusphinx.sourceforge.net
![Page 16: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/16.jpg)
16
• Sphinxbase: – Click on the “sphinxbase” link on the left
– Click “all releases”
– Download version 0.4.1• http://downloads.sourceforge.net/cmusphinx/sphinxbase-0.4.1.tar.bz2?use_
mirror=superb-east
• Sphinx3: – Click on “sphinx3” link on left
– Click on “all releases”
– Download version 3-0.8• http://downloads.sourceforge.net/cmusphinx/sphinx3-0.8.zip?
use_mirror=internap
Sphinx Downloads: http://cmusphinx.sourceforge.net
![Page 17: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/17.jpg)
17
• Cepview: – Click on the “cepview” link on the left
• lm3g2dmp: – Click on “lm3g2dmp” link on left
• The above two are visualization / data-structure optimization tools and are not critical– But they are small, so you might as well download them
• CMULM toolkit: You may install SRI LM toolkit instead– Better maintained – CMU toolkit is not currently maintained
Sphinx Downloads: http://cmusphinx.sourceforge.net
![Page 18: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/18.jpg)
18
• Sphinx4: – For this workshop download a copy of sphinx that is under development
at github.com– http://github.com/juanzanos/sphinx4/tree/master
• Click on download link– Caveat: some scripts may not run; if so we will revert to release version
• Sphinx4 will also need– Java JDK 1.6 -- from http://javasoft.com– Apache ant -- from http://ant.apache.org– A useful scripting tool (some of our latest scripts are in it): Groovy– Groovy can be had from http://groovy.codehaus.org
• Bookmark this link:– http://cmusphinx.sourceforge.net/sphinx4/doc/
UsingSphinxTrainModels.html
Sphinx Downloads: http://cmusphinx.sourceforge.net
![Page 19: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/19.jpg)
19
Operating Systems
• Sphinxbase and Sphinx3 packages have been tried and tested on linux– We are not windows people
• Suggestion: Prefer linux-based machines– You may also try to run these programs on cygwin under
windows• Sphinx* should compile under cygwin
• Install “tcsh” under cygwin
• We will provide tcsh scripts
• Sphinx4 is platform independent
![Page 20: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/20.jpg)
20
Additional Packages
• Would be useful to have a visualization tool– Need to visualize matrices as surfaces
• Matlab would be great
• If you don’t have matlab, download octave– http://www.gnu.org/software/octave/
![Page 21: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/21.jpg)
21
Data
• You may use any data you wish to
• For exercise we will attempt to provide a small amount of data– As much as can be dealt with on your
computers
![Page 22: 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh](https://reader035.fdocuments.in/reader035/viewer/2022062422/56812cd3550346895d918e24/html5/thumbnails/22.jpg)
22
Questions
• ?