Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional...

86
Speech Processing Simon King University of Edinburgh additional lecture slides for 2018-19

Transcript of Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional...

Page 1: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Speech Processing

Simon KingUniversity of Edinburgh

additional lecture slides for 2018-19

Page 2: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Two courses in one

• Undergraduate course code: LASC10061• Postgraduate course code: LASC11065

• Shared:• Course materials, lectures, labs• Exam/coursework weightings

• Differences:• Undergraduate course: 20 credits• Postgraduate course: 10 credits• Coursework expectations

Page 3: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Assessment

• Three items of assessment : two practical lab-based assignments and an exam1. First assignment: speech synthesis practical write-up (20%)2. Second assignment: speech recognition practical write-up (30%)3. Multiple choice exam in December (50%)

• Assignment due dates are given in the weekly schedule• Exam date is set by the University and will be published in due course

Page 4: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Introduction to the course

• learning outcomes• delivery• timetable• background required• course coverage• assignments

Page 5: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Introduction to the course

• learning outcomes• delivery• timetable• background required• course coverage• assignments

Page 6: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Learning outcomes

• Overview of the components of speech recognition and speech synthesis systems• Understand the main concepts and what each component does

• Describe a simple version of each component• See what the difficult problems are in recognition and synthesis. • Use tools for visualising and manipulating speech waveforms

• Experiment with two speech technology systems (Festival & HTK)• See knowledge and skills from different areas come together in an interdisciplinary field

Page 7: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Introduction to the course

• learning outcomes• delivery• timetable• background required• course coverage• assignments

Page 8: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Delivery - speech.zone website & Learn

• The website speech.zone contains almost everything you will need• Video material, reading lists, forums, calendar, coursework instructions

• you must have an account on this site, so that you can post on the forums - make sure you can log in, and email me if you have any trouble

• You only need to use Learn to:• Sign up for a lab group• Optional Introduction to Unix material• Submit your coursework

Page 9: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Delivery - what I provide

• The material on speech.zone is divided into modules

• the video content provides only the bare bones and there are gaps in coverage• you need to flesh out the details by taking full advantage of other modes of learning

• readings• lectures, including activities, quizzes, etc• labs• forums

Page 10: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Delivery - what you need to do

• Every week you will watch all videos and do essential readings before the lecture• You can then post questions on the forums

• feel free to ask questions at any level - I’ll tell you if you go “off topic”• try to choose an appropriate forum / topic / thread (I will re-organise as required)• often, the most basic questions are the most helpful

• Lectures will help you synthesise together all the different sources of information and different ways of learning. No single source will be enough on its own.

• To do well in this course, you need to be an active learner, not a passive observer• don’t worry - this course does not require much “audience participation”

Page 11: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Laptops are the single most reported distracter to both users and fellow students. (Fried 2008, Zhu et al 2011)

Laptop & device use is negatively correlated with understanding of course material and overall course performance. (Hembrooke & Gay 2003, Fried 2008, Patterson & Patterson 2017)

The negative effects of device use are the highest among males and low performing students (Patterson & Patterson 2017)

References

Carrie B. Fried (2008). In-class laptop use and its effects on student learning. Computers & Education 50(3): 906-914.

Helene Hembrooke and Geri Gay (2003). The Laptop and the Lecture: The Effects of Multitasking in Learning Environments. Journal of Computing in Higher Education 15(1): 46-64.

Richard W. Patterson, Robert M. Patterson (2017). Computers and productivity: Evidence from laptop use in the college classroom. Economics of Education Review 57: 66-79.

Erping Zhu, Matthew Kaplan, R. Charles Dershimer, Inger Bergom (2011). Use of Laptops in the Classroom: Research and Best Practices. University of Michigan CRLT Occasional Paper 30.

Students who use digital devices in class 'perform worse in exams' by Richard Adams, 11 May 2016.

Students are Better Off without a Laptop in the Classroom by Cindi May, 11 July 2017.

Why smart kids shouldn’t use laptops in classby Jeff Guo, 16 May 2016.

:

:

:

CC By Simon Fokt Washington Post illustration; iStock

Page 12: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Delivery - lecture recording

• The video clips on speech.zone are extracted from previous lectures

• Lectures are also recorded, but…

• sometimes the technology fails• recordings will not make any sense for interactive parts, activities, quizzes, group

exercises, etc…

Page 13: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

• PDF slides on speech.zone, approximately matching the video clips

• Slides used in lectures are improved every year, and will be made available as we proceed• these slides are dynamically updated in response to your questions• usually, they become available shortly before each lecture

Delivery - keep making it better

Page 14: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

• Please give me feedback (email, forum posts, verbally, class reps, PPLS teaching offices, notes slipped under my office door,…) about course structure and delivery mode, throughout the course.

• I also want feedback on speech.zone• is it clearly organised?• is the website reliable and fast enough?• is it obvious what relates to this course, and what does not?• does everything work correctly on your device?

Delivery - keep making it better

Page 15: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

tour of the website

Page 16: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Introduction to the course

• learning outcomes• delivery• timetable• background required• course coverage• assignments

Page 17: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Timetable

• Lecture - in two parts • Thursday 09:00 – 09:50

• type of content varies - see the course calendar• for foundation material, you decide whether to attend

• Thursday 10:00 – 10:50 - always attend• Each part of the lecture will start and finish at the above times (except today)

• Labs (you need to attend one session every week - sign up for a group on Learn)• group 1 : Tuesday 14.10 - 16.00• group 2 : Wednesday 11.10 - 13.00

Page 18: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Introduction to the course

• learning outcomes• delivery• timetable• background required• course coverage• assignments

Page 19: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

What background do you need?

• This course involves:• Linguistics: phonetics, phonology, intonation, (perhaps syntax)• Mathematics: statistics and probability, optimisation• Engineering: practical implementations, empirical findings• Computer science: algorithms, efficient implementation

• So, there will be something new for everyone

Page 20: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

What background do you need?

• No single background is assumed, but…

• If everything on that list is new to you, then you will probably find the course too hard

• You’ll get the most out of this course if you know a little bit about• speech (e.g., phonetics)• sound in general (including music and audio engineering)• engineering• computer science

Page 21: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Introduction to the course

• learning outcomes• delivery• timetable• background required• course coverage• assignments

Page 22: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Roadmap

• Modules 1-2: The basics• Modules 3-5: Speech synthesis• Modules 6-9: Speech recognition

• What is examinable?• everything in the videos• all “Essential” readings

Page 23: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Image source: https://www.amazon.co.uk

Page 24: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Image source: https://developer.amazon.com

Page 25: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved
Page 26: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Image source: https://developer.amazon.com

Automatic Speech Recognition ( ASR )

Text-to-Speech also called “Speech Synthesis”

( TTS )

Page 27: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Roadmap

• Modules 1-2: The basics• Modules 3-5: Speech synthesis• Modules 6-9: Speech recognition

Page 28: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Modules 1-2: The basics

• Waveform, spectrum, spectrogram• Speech production, speech perception• Acoustic phonetics

Page 29: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved
Page 30: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

A speech waveform

time

ampl

itude

Page 31: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

frequency

mag

nitud

e (lo

g sc

ale)

8kHz0

Page 32: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved
Page 33: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved
Page 34: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Key concepts we will understand

• Sound as a pressure wave• Speech waveforms• Short-term analysis, converting from the time domain to the frequency domain• Speech production

• possible sources of sound• different ways in which that sound is shaped into different vowels and consonants

• Speech perception• hearing • forming categories

Page 35: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Roadmap

• Modules 1-2: The basics• Modules 3-5: Speech synthesis• Modules 6-9: Speech recognition

Page 36: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Modules 3-5: Speech synthesis

• Text processing - normalisation of non-standard words• Pronunciation - lexicons and letter-to-sound modelling• Prosody - phrasing and pitch accents• Waveform generation - concatenation of pre-stored units• Prosodic manipulation - time and pitch modification in the time domain

Page 37: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

linguistic specification

The classic two-stage pipeline of text-to-speech synthesis

Front endWaveform generator

text waveform

Author of the… Author of the ...

ao th er ah f dh ax

syl syl syl syl

sil

NN of DT

1 0 0 0

...

...

this is the original Keynote figure - everything is editable (if you un-group) - no good for scaling

Page 38: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Author of the ...

ao th er ah f dh ax

syl syl syl syl

sil

NN of DT

1 0 0 0

...

...

The linguistic specification

Page 39: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

linguistic specification

Extracting features from text using the front end

Front end

text

feature extraction

Author of the ...

ao th er ah f dh ax

syl syl syl syl

sil

NN of DT

1 0 0 0

...

...

Author of the…

Page 40: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Text processing pipeline

Front end

LTS Phrase breakstokenize POS

tag intonation

individually learned from labelled data

textlinguistic

specification

Page 41: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Text processing pipelineText processing pipeline

Front end

LTS Phrase breakstokenize POS

tag intonation

Page 42: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Text processing pipeline

Front end

LTS Phrase breakstokenize POS

tag intonation

• Step 1: divide input stream into tokens, which are potential words

• For English and many other languages• rule based• whitespace and punctuation are good features

• For some other languages, especially those that don’t use whitespace• may be more difficult• other techniques required (out of scope here)

Tokenize & Normalize

Page 43: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Text processing pipeline

Front end

LTS Phrase breakstokenize POS

tag intonation

• Step 2: classify every token, finding Non-Standard Words that need further processing

Tokenize & Normalize

In 2011, I spent £100 at IKEA on 100 DVD holders.

NYER MONEY ASWD NUM LSEQ

Page 44: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Tokenize & Normalize

Text processing pipeline

Front end

LTS Phrase breakstokenize POS

tag intonation

• Step 3: a set of specialised modules to process NSWs of a each type

2011 ꔄ NYER ꔄ twenty eleven£100 ꔄ MONEY ꔄ one hundred poundsIKEA ꔄ ASWD ꔄ apply letter-to-sound100 ꔄ NUM ꔄ one hundredDVD ꔄ LSEQ ꔄ D. V. D. ꔄ dee vee dee

Page 45: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

NN DirectorIN ofDT theNP McCormickNP PublicNPS AffairsNP InstituteIN atNP U-MassNP Boston,NP DoctorNP EdNP Beard,VBZ saysDT theNN pushIN forVBP doPP itPP yourselfNN lawmaking

POS tagging

• Part-of-speech tagger• Accuracy can be very high• Trained on annotated text data• Categories are designed for text, not speech

Text processing pipeline

Front end

LTS Phrase breakstokenize POS

tag intonation

Page 46: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

ADVOCATING AE1 D V AH0 K EY2 T IH0 NGADVOCATION AE2 D V AH0 K EY1 SH AH0 NADWEEK AE1 D W IY0 KADWELL AH0 D W EH1 LADY EY1 D IY0ADZ AE1 D ZAE EY1AEGEAN IH0 JH IY1 AH0 NAEGIS IY1 JH AH0 SAEGON EY1 G AA0 NAELTUS AE1 L T AH0 SAENEAS AE1 N IY0 AH0 SAENEID AH0 N IY1 IH0 DAEQUITRON EY1 K W IH0 T R AA0 NAER EH1 RAERIAL EH1 R IY0 AH0 LAERIALS EH1 R IY0 AH0 L ZAERIE EH1 R IY0AERIEN EH1 R IY0 AH0 NAERIENS EH1 R IY0 AH0 N ZAERITALIA EH2 R IH0 T AE1 L Y AH0AERO EH1 R OW0AEROBATIC EH2 R AH0 B AE1 T IH0 K

• Pronunciation model• dictionary look-up, plus

• letter-to-sound model• But

• need deep knowledge of the language to design the phoneme set

• human expert must write dictionary

Pronunciation / LTS

Text processing pipeline

Front end

LTS Phrase breakstokenize POS

tag intonation

Page 47: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved
Page 48: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Key concepts we will understand

• Breaking a complex problem down into simpler steps• Combining many components into a single architecture

• representing information in data structures• The pros and cons of rules vs. learning from data• Generalising to previously-unseen words or sentences• Creating new utterances from fragments of pre-recorded speech• Manipulating the pitch and duration of speech

Page 49: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Roadmap

• Modules 1-2: The basics• Modules 3-5: Speech synthesis• Modules 6-9: Speech recognition

Page 50: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Modules 6-9: Speech recognition

• Dynamic Time Warping• Probability distributions• Hidden Markov Models• Bayes’ Rule• Viterbi algorithm• Training HMMs• Simple language models.

Page 51: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved
Page 52: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved
Page 53: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved
Page 54: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved
Page 55: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved
Page 56: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Key concepts we will understand

• Simple pattern recognition by comparing to stored examples• The concept of generative models

• and how they can be used to make a classifier• The importance of choosing the right representation of the speech signal

• feature extraction• feature engineering

• Combining prior knowledge and new evidence in a principled probabilistic framework• the Bayesian approach• modelling variability and uncertainty using probability distributions

Page 57: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Roadmap

• Modules 1-2: The basics• Modules 3-5: Speech synthesis• Modules 6-9: Speech recognition

• along the way, we will see some important techniques, models, and algorithms — including machine learning — that are useful in other applications

Page 58: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Some of the things we will learn

• Basic digital signal processing• Text processing - rules vs learning from data• Classification and Regression Trees (CARTs)• Dynamic programming - Dynamic Time Warping (DTW); the Viterbi algorithm• Generative models - Hidden Markov Models (HMMs)• Probability - Bayes’ Rule• Expectation-Maximisation - the Baum-Welch algorithm• Finite state networks - N-gram language models• Feature extraction and engineering - Mel Frequency Cepstral Coefficients (MFCCs)

Page 59: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Introduction to the course

• learning outcomes• delivery• timetable• background required• course coverage• assignments

Page 60: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Lab facilities

• All practicals are done on Apple computers, running OS X• Appleton Tower 4.02: fully-supported computers, file backup, 24/7 access

• We’ll automatically create accounts for everyone. Look out for announcements.• We have no resources to support use of your own Mac (so, for experts only)

• Unix/Linux/command line• Basics: Terminal, mv, cp, cd, mkdir, starting programs - intro sessions available

• Never switch off machines, just log out• Lab access

• Via matriculation card at any time (PIN required out of hours)• Follow instructions on Learn to get your card activated

Page 61: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

assignments are described on the website

Page 62: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

end of first half of lecture 01

Page 63: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Roadmap

• Modules 1-2: The basics• Modules 3-5: Speech synthesis• Modules 6-9: Speech recognition

Page 64: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

The basics

• How speech is produced and perceived• Waveform, spectrum, spectrogram

Page 65: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

The basics

• How speech is produced and perceived• Waveform, spectrum, spectrogram

Page 66: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

How speech is produced and perceived

• Speech production• sound source, vocal tract, source-filter model

• Speech perception• the auditory system

Page 67: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

What you should already know

• Vocal tract is a tube of variable shape

• A tube (containing air) acts as a resonator for sound waves

• The frequencies of the resonances depend on the shape of the tube

Page 68: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

What you should already know

• Vocal tract is a tube of variable shape

• A tube (containing air) acts as a resonator for sound waves

• The frequencies of the resonances depend on the shape of the tube

Page 69: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Orientation

• Our goal is now• to build on our basic understanding of

the acoustics of the vocal tract • to understand speech production• to explain what we observe in

speech signals

• See other material on• digital signals• phonetics

Page 70: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Orientation

• Our goal is now• to build on our basic understanding of

the acoustics of the vocal tract • to understand speech production• to explain what we observe in

speech signals

• See other material on• digital signals• phonetics

Page 71: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

How speech is produced and perceived

• Speech production• sound source, vocal tract, source-filter model

• Speech perception• the auditory system

Page 72: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Vocal tract anatomy

• Vocal tract is a tube

• Shape can be changed by moving the tongue, jaw and lips

• The nasal branch can be connected by lowering the velum

• The tongue is larger than you might have thought - a complex set of muscles

• The nasal cavity is surprisingly large

Page 73: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

How speech is produced and perceived

• Speech production• sound source, vocal tract, source-filter model

• Speech perception• the auditory system

Page 74: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Image credit: Department of Mathematics and Systems Analysis, Aalto University, School of Science

Page 75: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

The vocal tract is a resonator. A resonator can act as a filter.So, what is a filter?

filterinput signal output signal

Page 76: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Filtering in the time domain

filter

y[t] = e[t]�KX

k=1

bk y[t� k]

y[t]e[t]

input signal output signal

Page 77: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

How speech is produced and perceived

• Speech production• sound source, vocal tract, source-filter model

• Speech perception• the auditory system

Page 78: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Two possible sources: periodic and aperiodic (non-periodic, random)

filter

y[t] = e[t]�KX

k=1

bk y[t� k]

y[t]e[t]

source output speech

Page 79: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Describing the source-filter model in the frequency domain

filter

source output speech

Page 80: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

How speech is produced and perceived

• Speech production• sound source, vocal tract, source-filter model

• Speech perception• the auditory system

Page 81: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Image credit: Rice University (CC-BY-4.0)

Page 82: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Image credit: Rice University (CC-BY-4.0)

Page 83: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Image credit: Rice University (CC-BY-4.0)

Page 84: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

Image credit: Rice University (CC-BY-4.0)

Page 85: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

What next?

• What we have now learned about

• Speech signals• speech production• the source-filter model

• Speech perception• filterbank• Mel scale

Page 86: Simon King University of Edinburgh - speech.zoneSimon King University of Edinburgh additional lecture slides for 2018-19. Two courses in one ... • Slides used in lectures are improved

What next?

• What we have now learned about

• Speech signals• speech production• the source-filter model

• Speech perception• filterbank• Mel scale

Speech synthesis

Automatic speech recognition