8/6/2019 T1 Intro Speech Processing
1/21
2. Introduction toSpeech Processing
8/6/2019 T1 Intro Speech Processing
2/21
The Speech processing stack
Speech Applications: Coding, synthesis, recognition,understanding, speaker verification, language translation, speed-up/slow-down
Speech Measurements: energy, zero crossings, autocorrelations
Speech properties: speech-silence, voiced-unvoiced, pitch,formants
Speech representations: temporal, spectral, homomorphic, LPC
Fundamentals: acoustics, linguistics, pragmatics, speechperception
8/6/2019 T1 Intro Speech Processing
3/21
SPEECH GENERATION ANDTRANSMISSION
8/6/2019 T1 Intro Speech Processing
4/21
Tractament Digital de la Parla 4
Speech Chain
SPEAKER
HEARER
(Denes and Pinson)
8/6/2019 T1 Intro Speech Processing
5/21
Tractament Digital de la Parla 5
Speech Production/Perception
(after Flanagan)
8/6/2019 T1 Intro Speech Processing
6/21
Tractament Digital de la Parla 6
Speech Processing Diagram
8/6/2019 T1 Intro Speech Processing
7/21
Application of digital speechprocessing
Speech coding
Speech Synthesis (from text to speech)
Speech recognition
Speaker/language recognition Many others
8/6/2019 T1 Intro Speech Processing
8/21
SPEECH CODING
8/6/2019 T1 Intro Speech Processing
9/21
Tractament Digital de la Parla 9
Speech Coding The aim of speech coding is to compress (and then
decompress) the speech waveform without any loss oflistenabilityor intelligibility.
Various standards exist for speech coding.
The desired bit rateand associated quality of speech is highlyapplication dependent. Low bit rate: these basically have rates of between 75 and 2400 bps
(bitspersecond). Mediumtohigh bit rate: operate at greater than 2400 bps.
8/6/2019 T1 Intro Speech Processing
10/21
Tractament Digital de la Parla 10
Applications of Speech Coding
Reduction in bitrate for transmission/storage Speech enhancement (removal of noise)
Allows the development of applications for
Security
High definition TV
Teleconference
Etc.
8/6/2019 T1 Intro Speech Processing
11/21
SPEECH SYNTHESIS
8/6/2019 T1 Intro Speech Processing
12/21
Tractament Digital de la Parla 12
Linguistic analysis stage: mapsthe input text into a standard form;determines the structure of theinput, and finally decides how topronounce it.
Synthesis stage: converts thesymbolic representation of what tosay into an actual speechwaveform.
Speech Synthesis The aim of speech synthesis is to be able to take a word
sequence and produce human like speech
LinguisticAnalysis
SpeechSynthesis
Text
Prosody / Phone sequence
Sound Wave
8/6/2019 T1 Intro Speech Processing
13/21
Tractament Digital de la Parla 13
Applications of Speech Synthesis/Text-to-Speech (TTS)
Games Telephone-based Information
directions, air travel, banking, etc.
Accessing variable information
Machine-human interfaces Eyes-free (in car)
Reading/speaking for disabled
Reading of texts/books
Email access Education (Reading tutors)
Alarm systems
....
8/6/2019 T1 Intro Speech Processing
14/21
AUTOMATIC SPEECHRECOGNITION (ASR)
8/6/2019 T1 Intro Speech Processing
15/21
Tractament Digital de la Parla 15
Automatic Speech Recognition
Automatic Speech Recognition
(ASR) is the process ofconverting an unknown speechwaveform into thecorresponding orthographictranscription.
& language model
8/6/2019 T1 Intro Speech Processing
16/21
Extraction of feature vectors
Speech signalUsually every 10ms,25ms window
Acoustic featuresTypically around 39
8/6/2019 T1 Intro Speech Processing
17/21
Current issues in ASR
Steady reduction has been achieved over the last 20 yearsin many domains. Still more research is needed:
Increase robustness for new acoustic environments
Vocabulary increase and topic independence
Improve OOV (out-of-vocabulary) recognition
8/6/2019 T1 Intro Speech Processing
18/21
Tractament Digital de la Parla 18
Applications of SpeechRecognition/Understanding (ASR/ASU)
Dictation Telephone-based Information
directions, air travel, banking, etc
Polls, online shopping
Call routing
Hands-free
in car, computer, home(domotics), controlling tools
Second language (accent reduction)
Audio archive searching Help for disabled people
8/6/2019 T1 Intro Speech Processing
19/21
SPEAKER/LANGUAGERECOGNITION
8/6/2019 T1 Intro Speech Processing
20/21
Speaker/Language identification
Feature extraction
Speaker/language
models
Audio
Feature vectors
Selected speaker/language
8/6/2019 T1 Intro Speech Processing
21/21
Tractament Digital de la Parla 21
Applications of Speaker/LanguageRecognition
Language recognition for call routing
Speaker Recognition:
Speaker verification (binary decision) Voice password, telephone assistant
Speaker identification (one of N) (open set/closed set)
Criminal investigation
Top Related