Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. ·...

76
Automatic Speech Recognition Yulia Tsvetkov – CMU Slides: Preethi Jyothi – IIT Bombay, Dan Klein – UC Berkeley Algorithms for NLP

Transcript of Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. ·...

Page 1: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Automatic Speech RecognitionYulia Tsvetkov – CMU

Slides: Preethi Jyothi – IIT Bombay, Dan Klein – UC Berkeley

Algorithms for NLP

Page 2: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Skip-gram Prediction

Page 3: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Skip-gram Prediction

▪ Training data

wt , w

t-2w

t , w

t-1w

t , w

t+1w

t , w

t+2...

Page 4: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Skip-gram Prediction

Page 5: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

How to compute p(+|t,c)?

Page 6: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

FastText: Motivation

Page 7: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Subword Representation

skiing = {^skiing$, ^ski, skii, kiin, iing, ing$}

Page 8: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

FastText

Page 9: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

ELMO

The Broadway play premiered yesterday .

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

λ1

ELMoλ2 λ0= + +( ( () ) )

Page 10: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Announcements

▪ HW1 due Sept 24▪ HW2 out Oct 2

Page 11: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Automatic Speech Recognition (ASR)

▪ Automatic speech recognition (or speech-to-text) systems transform speech utterances into their corresponding text form, typically in the form of a word sequence

▪ Downstream applications of ASR▪ Speech understanding▪ Audio information retrieval▪ Speech translation▪ Keyword search

Speech signal

She sells sea shells

Speech transcript

Page 12: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

What ASR is Not

Slide credit: Preethi Jyothi

Page 13: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

ASR is the Front Engine

Slide credit: Preethi Jyothi

Page 14: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Why is ASR a Challenging Problem?▪ Style:

▪ Read speech vs spontaneous (conversational) speech▪ Command & control vs continuous natural speech

▪ Speaker characteristics: ▪ Rate of speech, accent, prosody (stress, intonation), speaker age,

pronunciation variability even when the same speaker speaks the same word

▪ Channel characteristics: ▪ Background noise, room acoustics, microphone properties,

interfering speakers

▪ Task specifics: ▪ Vocabulary size (very large number of words to be recognized),

language-specific complexity, resource limitations Slide credit: Preethi Jyothi

Page 15: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

History of ASR

The very first ASR

Slide credit: Preethi Jyothi

Page 16: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

History of ASR

Slide credit: Preethi Jyothi

Page 17: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

History of ASR

Slide credit: Preethi Jyothi

Page 18: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

History of ASR

Slide credit: Preethi Jyothi

Page 19: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Statistical ASR : The Noisy Channel Model

Acoustic model Language model: Distributions over sequences

of words (sentences)

~80s

Page 20: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

History of ASR

Slide credit: Preethi Jyothi

Page 21: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

History of ASR

Slide credit: Preethi Jyothi

Page 22: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Evaluating an ASR system

▪ Word/Phone error rate (ER) ▪ uses the Levenshtein distance measure: What are the minimum number of

edits (insertions/deletions/substitutions) required to convert W* to Wref

?

From J&M

Page 23: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

NIST ASR Benchmark Test History

Page 24: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

What’s Next?

Slide credit: Preethi Jyothi

Page 25: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

What’s Next?

https://www.youtube.com/watch?v=gNx0huL9qsQ

Link credit: Preethi Jyothi

▪ accented speech▪ low-resource▪ speaker

separation▪ short queries▪ etc.

Page 26: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

In our course

Page 27: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Statistical ASR

Slide by Preethi Jyothi

Page 28: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

ASR Topics

Slide by Preethi Jyothi

Page 29: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

In our course

Slide by Preethi Jyothi

Page 30: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Acoustic Analysis

Slide by Preethi Jyothi

Page 31: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

What is speech - physical realisation

▪ Waves of changing air pressure▪ Realised through excitation from the vocal cords▪ Modulated by the vocal tract, the articulators (tongue, teeth, lips)▪ Vowels: open vocal tract▪ Consonants are constrictions

of vocal tract

▪ Representation:▪ acoustics▪ linguistics

Page 32: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Acoustics

Page 33: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Simple Periodic Waves of Sound

▪ Y axis: Amplitude = amount of air pressure at that point in time

▪ X axis: Time

▪ Frequency = number of cycles per second.▪ 20 cycles in .02 seconds = 1000 cycles/second = 1000 Hz

Page 34: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Complex Waves: 100Hz+1000Hzam

plitu

de

Page 35: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Spectrum

100 1000Frequency in Hz

Am

plit

ud

eFrequency components (100 and 1000 Hz) on x-axis

Page 36: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

“She just had a baby”

▪ What can we learn from a wavefile?▪ No gaps between words (!)

▪ Vowels are voiced, long, loud

▪ Voicing: regular peaks in amplitude

▪ When stops closed: no peaks, silence

▪ Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on

▪ Silence of stop closure (1.06 to 1.08 for first [b], or 1.26 to 1.28 for second [b])

▪ Fricatives like [sh]: intense irregular pattern; see .33 to .46

Page 37: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Part of [ae] waveform from “had”

▪ Note complex wave repeating nine times in figure

▪ Plus smaller waves which repeats 4 times for every large pattern

▪ Large wave has frequency of 250 Hz (9 times in .036 seconds)

▪ Small wave roughly 4 times this, or roughly 1000 Hz

▪ Two little tiny waves on top of peak of 1000 Hz waves

Am

plitu

de

Time

Page 38: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Spectrum of an Actual Speech

Coe

ffici

ent

Page 39: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Spectrogramsam

pl

time

slice

freq

coef

f

FFT

time

amp

l

Page 40: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Spectrograms

time

amp

l

Page 41: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Spectrogramsfr eq

time

time

amp

l

Page 42: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Types of Graphs

fr eq

time

time

amp

l

amp

l

time

freq

coef

f

Page 43: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

■ Frequency gives pitch; amplitude gives volume

■ Frequencies at each time slice processed into observation vectors

s p ee ch l a b

ampl

itude

Speech in a Slidefr

eque

ncy

……………………………………………..x12x13x12x14x14………..

Page 44: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Articulation

Page 45: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Text from Ohala, Sept 2001, from Sharon Rose slideSagittal section of the vocal tract (Techmer 1880)

Nasal cavity

Pharynx

Vocal folds (in the larynx)

TracheaLungs

Articulatory System

Oral cavity

Page 46: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Space of Phonemes

▪ Standard international phonetic alphabet (IPA) chart of consonants

Page 47: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Place

Page 48: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Places of Articulation

labial

dentalalveolar post-alveolar/palatal

velaruvular

pharyngeal

laryngeal/glottal

Figure thanks to Jennifer Venditti

Page 49: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Labial place

bilabial

labiodental

Figure thanks to Jennifer Venditti

Bilabial:p, b, m

Labiodental:f, v

Page 50: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Coronal place

dentalalveolar post-alveolar/palatal

Figure thanks to Jennifer Venditti

Dental:th/dh

Alveolar:t/d/s/z/l/n

Post:sh/zh/y

Page 51: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Dorsal Place

velaruvular

pharyngeal

Figure thanks to Jennifer Venditti

Velar:k/g/ng

Page 52: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Space of Phonemes

▪ Standard international phonetic alphabet (IPA) chart of consonants

Page 53: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Manner

Page 54: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Manner of Articulation

▪ In addition to varying by place, sounds vary by manner

▪ Stop: complete closure of articulators, no air escapes via mouth▪ Oral stop: palate is raised (p, t, k, b, d, g)▪ Nasal stop: oral closure, but palate is lowered (m,

n, ng)

▪ Fricatives: substantial closure, turbulent: (f, v, s, z)

▪ Approximants: slight closure, sonorant: (l, r, w)

▪ Vowels: no closure, sonorant: (i, e, a)

Page 55: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Space of Phonemes

▪ Standard international phonetic alphabet (IPA) chart of consonants

Page 56: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Vowels

Page 57: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Vowel Space

Page 58: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Seeing Formants: the Spectrogram

Page 59: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Vowel Space

Page 60: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Spectrograms

Page 61: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Pronunciation is Context Dependent

▪ [bab]: closure of lips lowers all formants: so rapid increase in all formants at beginning of "bab”

▪ [dad]: first formant increases, but F2 and F3 slight fall

▪ [gag]: F2 and F3 come together: this is a characteristic of velars. Formant transitions take longer in velars than in alveolars or labials

From Ladefoged “A Course in Phonetics”

Page 62: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Dialect Issues

▪ Speech varies from dialect to dialect (examples are American vs. British English)▪ Syntactic (“I could” vs. “I could

do”)▪ Lexical (“elevator” vs. “lift”)▪ Phonological▪ Phonetic

▪ Mismatch between training and testing dialects can cause a large increase in error rate

American British

all

old

Page 63: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Acoustic Analysis

Slide by Preethi Jyothi

Page 64: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Frame Extraction

▪ A frame (25 ms wide) extracted every 10 ms

25 ms

10ms

a1 a2 a3

Figure: Simon Arnfield

Preview of feature extraction for each frame:

1) DFT (Spectrum)2) Log (Calibrate)3) another DFT (!!??)

Page 65: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Why these Peaks?

▪ Articulation process:▪ The vocal cord vibrations

create harmonics▪ The mouth is an amplifier▪ Depending on shape of

mouth, some harmonics are amplified more than others

Page 66: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Figures from Ratree Wayland

A3

A4

A2

C4

C3

F#3

F#2

Vowel [i] at increasing pitches

Page 67: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Deconvolution / The Cepstrum

Page 68: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Deconvolution / The Cepstrum

Graphs from Dan Ellis

Page 69: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Final Feature Vector

▪ 39 (real) features per 25 ms frame:▪ 12 MFCC features

▪ 12 delta MFCC features

▪ 12 delta-delta MFCC features

▪ 1 (log) frame energy

▪ 1 delta (log) frame energy

▪ 1 delta-delta (log frame energy)

▪ So each frame is represented by a 39D vector

Page 70: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Acoustic Analysis

Slide by Preethi Jyothi

Page 71: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Phonetic Analysis

Slide by Preethi Jyothi

Page 72: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

CMU Pronunciation Dict

Page 73: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Speech Model

w1

w2

Words

s1

s2

s3

s4

s5

s6

s7

Sound types

a1

a2

a3

a4

a5

a6

a7

Acousticobservations

Languagemodel

Acousticmodel

Page 74: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Acoustic Modeling

Slide by Preethi Jyothi

Page 75: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Vector Quantization

▪ Idea: discretization▪ Map MFCC vectors onto discrete symbols ▪ Compute probabilities just by counting

▪ This is called vector quantization or VQ

▪ Not used for ASR any more

▪ But: useful to consider as a starting point

Page 76: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture... · 2018. 9. 18. · Automatic speech recognition (or speech-to-text) systems transform speech utterances

Next class: HMMs for Continuous Observations

▪ Feature vectors are real-valued▪ Solution 1: discretization▪ Solution 2: continuous emissions

▪ Gaussians▪ Multivariate Gaussians▪ Mixtures of multivariate Gaussians