Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input
Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation &...
Transcript of Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation &...
![Page 1: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/1.jpg)
Module 4
Pronunciation & prosody
![Page 2: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/2.jpg)
Roadmap
• Modules 1-2: The basics• Modules 3-5: Speech synthesis• Modules 6-9: Speech recognition
• Block 1 Week 4• Module 3: text processing
• Block 1 Week 5• Class trip• Module 4: pronunciation & prosody
• Block 1 Week 6• Assignment Q&A• Module 5: waveform generation
• Block 1 Week 7• Submission of first assignment
![Page 3: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/3.jpg)
Orientation
• Text-to-speech pipeline architecture
• Normalise text
• Predict pronunciation & prosody
• Generate waveform
Coffee costs £2.
SIL K AA F IY K AA S T ST UW P AW N D Z SIL
coffee costs two pounds .
![Page 4: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/4.jpg)
What you should already know• morphology• POS• dictionary lookup of word + POS• syllables & lexical stress• LTS (rules or model)• post-lexical rules
• gathering and preparing training data• choosing the predictors • growing the tree (learning from data)
• placement of events (classification)• deciding their types (classification)• realisation (regression)
• From the videos & readings• Letter to sound (LTS)• A worked example of LTS using a
classification tree• Prosody prediction
![Page 5: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/5.jpg)
Today’s topics - Module 4: pronunciation & prosody
Theory Application
SpeechSignal
processingProbabilistic modelling
Speech Synthesis Automatic Speech Recognition
Signals Production Perception Front endWaveform
generationFeature
extractionPattern matching
Hidden Markov Models
Connected speech
Concepts
Time domainSound source
Pitch Digital signal Describing dataTokenisation & normalisation
Waveform concatena
tion
Series expansion
ExemplarGenerative model of sequences
Hierarchy
Periodic signal
Harmonics CochleaShort-term
analysis
Discrete & continuous variables
Pronunciation Diphone FeatureS DistanceSub-word
unit
Frequency domain
Vocal tract resonance & formants
Mel scaleSpectral envelope
Joint, conditional,
Bayes’ formulaProsody
Feature engineering
SequenceHidden state sequence
N-grams
Models & data
structures
FilterResonant
tubeFilterbank Impulse train Gaussian
Finite state transducer
Feature vector
Sequence of feature vectors
Hidden Markov Model
Impulse response
Source-filter model
Phoneme Pitch periodGenerative
modelDecision tree Grid Lattice Graph
Algorithms & analysis
Fourier analysis
Fitting a Gaussian to
data
Handwritten rules
Overlap-add
MFCCsDynamic
programming (DTW)
Dynamic programming
(Viterbi)
Composition (“compiling”)
Cepstral analysis
ClassificationLearning
decision treesTD-PSOLA Baum Welch
Approximation (pruning)
Phoneme
Pronunciation
Prosody
Decision tree
Learning decision trees
![Page 6: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/6.jpg)
Today’s topics - Module 4: pronunciation & prosody
Phoneme Pronunciation
Prosody
Decision tree
Learning decision trees
![Page 7: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/7.jpg)
Speech synthesis - pronunciation & prosody
• Machine Learning• Classification And Regression Trees (CARTs)
• classification: understanding entropy as a measure of predictability• regression: measuring the predictability of a continuous variable• stopping criteria
![Page 8: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/8.jpg)
Step 1 - define the overall task we are going to solve
from the orthographic form: HOGWASH
predict the pronunciation: HH AA G W AA SH
Phoneme Pronunciation
![Page 9: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/9.jpg)
Step 2 - break the task down into simple, solvable, sub-tasks
from one letterof the orthographic form: HOGWASH
predict zero, one or two phonesof the pronunciation: HH AA G W AA SH
![Page 10: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/10.jpg)
Step 3 - obtain the raw training data - for Letter-to-Sound (LTS), this is simply a pre-existing dictionary
HOGWASH HH AA G W AA SHCARWASH K AA R W AA SHWARRANT W AO R AH N TWARRANTY W AO R AH N T IYHARDWARE HH AA R D W EH RSOFTWARE S AO F T W EH RWARES W EH R Z
here are some words from the CMU dictionary that use the letter “A”
![Page 11: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/11.jpg)
Step 4 - define the predictee - which phone are we going to predict from this letter?
H O G W A S H
HH AA G W AA SH
![Page 12: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/12.jpg)
Step 5 - choose the predictors
H O G W A S H
HH AA G W AA SH
![Page 13: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/13.jpg)
Step 6 - get the training data ready for machine learning
predictors predicteeppp pp p n nn nnno g w a s h - aaa r w a s h - aa- - w a r r a ao- - w a r r a aor d w a r e - ehf t w a r e - eh- - w a r e s eh
![Page 14: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/14.jpg)
Speech synthesis - pronunciation & prosody
• Machine Learning• Classification And Regression Trees (CARTs)
• classification: understanding entropy as a measure of predictability• regression: measuring the predictability of a continuous variable• stopping criteria
![Page 15: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/15.jpg)
In-class exercise
Building a decision tree for phrase-break prediction
![Page 16: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/16.jpg)
Step 1 - define the overall task we are going to solve
For a sentence, predict where the phrase breaks should go.
I like to ride bikes.
break
Prosody
![Page 17: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/17.jpg)
For each word, predict whether there is phrase break after it.
I like to ride bikes.
break
Step 2 - break the task down into simple, solvable, sub-tasks
no break
no break
no break
no break
![Page 18: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/18.jpg)
Step 3 - obtain the raw training datathis is going to be expensive because it will involve manually labelling spoken utterances
Words I like to ride bikes .Breaks BREAKWords Food and drink are nice .Breaks BREAK BREAKWords Apples but not pears .Breaks BREAK BREAKWords He is but she’s not !Breaks BREAK BREAKWords One , two , three .Breaks BREAK BREAK BREAKWords Shaken yet not stirred .Breaks BREAK BREAK
![Page 19: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/19.jpg)
Break !
![Page 20: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/20.jpg)
Words I like to ride bikes .Breaks No break No break No break No break BREAK No breakWords Food and drink are nice .Breaks BREAK No break No break No break BREAK No breakWords Apples but not pears .Breaks BREAK No break No break BREAK No breakWords He is but she’s not !Breaks No break BREAK No break No break BREAK No breakWords One , two , three .Breaks BREAK No break BREAK No break BREAK No breakWords Shaken yet not stirred .Breaks BREAK No break No break BREAK No break
Step 4 - define the predictee and the possible values it can take: BREAK —or— No break
![Page 21: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/21.jpg)
Step 5 - choose the predictorsthey can only be things that you will also know for the test data
Words I like to ride bikes .POS N V TO V N PUNCBreaks No break No break No break No break BREAK No break
![Page 22: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/22.jpg)
Words I like to ride bikes .POS N V TO V N PUNCBreaks No break No break No break No break BREAK No breakWords Food and drink are nice .POS N CC N V JJ PUNCBreaks BREAK No break No break No break BREAK No breakWords Apples but not pears .POS N CC RB N PUNCBreaks BREAK No break No break BREAK No breakWords He is but she’s not !POS N V CC N V PUNCBreaks No break BREAK No break No break BREAK No breakWords One , two , three .POS CD PUNC CD PUNC CD PUNCBreaks BREAK No break BREAK No break BREAK No breakWords Shaken yet not stirred .POS JJ CC RB JJ PUNCBreaks BREAK No break No break BREAK No break
Step 5 - choose the predictors
![Page 23: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/23.jpg)
Words I like to ride bikes .Predictor 1 : L POSPredictor 2 : C POSPredictor 3 : R POSPredictee NO-BREAK NO-BREAK NO-BREAK NO-BREAK BREAK NO-BREAK
Words Food and drink are nice .Predictor 1 : L POSPredictor 2 : C POSPredictor 3 : R POSPredictee BREAK NO-BREAK NO-BREAK NO-BREAK BREAK NO-BREAK
Words Apples but not pears .Predictor 1 : L POSPredictor 2 : C POSPredictor 3 : R POSPredictee BREAK NO-BREAK NO-BREAK BREAK NO-BREAK
Words He is but she’s not !Predictor 1 : L POSPredictor 2 : C POSPredictor 3 : R POSPredictee NO-BREAK BREAK NO-BREAK NO-BREAK BREAK NO-BREAK
Words One , two , three .Predictor 1 : L POSPredictor 2 : C POSPredictor 3 : R POSPredictee BREAK NO-BREAK BREAK NO-BREAK BREAK NO-BREAK
Words Shaken yet not stirred .Predictor 1 : L POSPredictor 2 : C POSPredictor 3 : R POSPredictee BREAK NO-BREAK NO-BREAK BREAK NO-BREAK
Step 6 - get the training data ready for machine learning
![Page 24: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/24.jpg)
Words I like to ride bikes .Predictor 1 : L POS - N V TO V NPredictor 2 : C POS N V TO V N PUNCPredictor 3 : R POS V TO V N PUNC -Predictee NO-BREAK NO-BREAK NO-BREAK NO-BREAK BREAK NO-BREAK
Words Food and drink are nice .Predictor 1 : L POS - N CC N V JJPredictor 2 : C POS N CC N V JJ PUNCPredictor 3 : R POS CC N V JJ PUNC -Predictee BREAK NO-BREAK NO-BREAK NO-BREAK BREAK NO-BREAK
Words Apples but not pears .Predictor 1 : L POS - N CC RB NPredictor 2 : C POS N CC RB N PUNCPredictor 3 : R POS CC RB N PUNC -Predictee BREAK NO-BREAK NO-BREAK BREAK NO-BREAK
Words He is but she’s not !Predictor 1 : L POS - N V CC N VPredictor 2 : C POS N V CC N V PUNCPredictor 3 : R POS V CC N V PUNC -Predictee NO-BREAK BREAK NO-BREAK NO-BREAK BREAK NO-BREAK
Words One , two , three .Predictor 1 : L POS - CD PUNC CD PUNC CDPredictor 2 : C POS CD PUNC CD PUNC CD PUNCPredictor 3 : R POS PUNC CD PUNC CD PUNC -Predictee BREAK NO-BREAK BREAK NO-BREAK BREAK NO-BREAK
Words Shaken yet not stirred .Predictor 1 : L POS - JJ CC RB JJPredictor 2 : C POS JJ CC RB JJ PUNCPredictor 3 : R POS CC RB JJ PUNC -Predictee BREAK NO-BREAK NO-BREAK BREAK NO-BREAK
Step 6 - get the training data ready for machine learning
![Page 25: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/25.jpg)
Words I like to ride bikes .Predictor 1 : L POS - N V TO V NPredictor 2 : C POS N V TO V N PUNCPredictor 3 : R POS V TO V N PUNC -Predictee NO-BREAK NO-BREAK NO-BREAK NO-BREAK BREAK NO-BREAK
Words Food and drink are nice .Predictor 1 : L POS - N CC N V JJPredictor 2 : C POS N CC N V JJ PUNCPredictor 3 : R POS CC N V JJ PUNC -Predictee BREAK NO-BREAK NO-BREAK NO-BREAK BREAK NO-BREAK
Words Apples but not pears .Predictor 1 : L POS - N CC RB NPredictor 2 : C POS N CC RB N PUNCPredictor 3 : R POS CC RB N PUNC -
Step 6 - get the training data ready for machine learning
-NVNO-BREAK
NVTONO-BREAK
VTOVNO-BREAK
. . . etc
convert sequence into a set of independent data points
![Page 26: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/26.jpg)
The training data-
N
V
NO-BREAK
N
V
TO
NO-BREAK
V
TO
V
NO-BREAK
TO
V
N
NO-BREAK
V
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
-
N
CC
BREAK
N
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
JJ
NO-BREAK
V
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK
-
N
CC
BREAK
N
CC
RB
NO-BREAK
CC
RB
N
NO-BREAK
RB
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
-
N
V
NO-BREAK
N
V
CC
BREAK
V
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
PUNC
BREAK
V
PUNC
-
NO-BREAK
-
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK
CD
PUNC
-
NO-BREAK
-
JJ
CC
BREAK
JJ
CC
RB
NO-BREAK
CC
RB
JJ
NO-BREAK
RB
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK
![Page 27: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/27.jpg)
Decision tree
![Page 28: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/28.jpg)
Make a list of all possible questions
L POS = C POS = R POS =
L POS = C POS = R POS =
L POS = C POS = R POS =
L POS = C POS = R POS =
L POS = C POS = R POS =
L POS = C POS = R POS =
L POS = C POS = R POS =
L POS = C POS = R POS =
![Page 29: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/29.jpg)
My list of all possible questions
L POS = PUNC C POS = PUNC R POS = PUNC
L POS = N C POS = N R POS = N
L POS = V C POS = V R POS = V
L POS = TO C POS = TO R POS = TO
L POS = CC C POS = CC R POS = CC
L POS = JJ C POS = JJ R POS = JJ
L POS = RB C POS = RB R POS = RB
L POS = CD C POS = CD R POS = CD
![Page 30: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/30.jpg)
Try question “C POS = N ?”
C POS = N ?
![Page 31: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/31.jpg)
The answer to the question “C POS = N ?” is either “yes” or “no” for each individual training data point
-
N
V
NO-BREAK
N
V
TO
NO-BREAK
V
TO
V
NO-BREAK
TO
V
N
NO-BREAK
V
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
-
N
CC
BREAK
N
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
JJ
NO-BREAK
V
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK
-
N
CC
BREAK
N
CC
RB
NO-BREAK
CC
RB
N
NO-BREAK
RB
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
-
N
V
NO-BREAK
N
V
CC
BREAK
V
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
PUNC
BREAK
V
PUNC
-
NO-BREAK
-
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK
CD
PUNC
-
NO-BREAK
-
JJ
CC
BREAK
JJ
CC
RB
NO-BREAK
CC
RB
JJ
NO-BREAK
RB
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK
![Page 32: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/32.jpg)
The answer to the question “C POS = N ?” is either “yes” or “no” for each individual training data point
-
N
V
NO-BREAK
N
V
TO
NO-BREAK
V
TO
V
NO-BREAK
TO
V
N
NO-BREAK
V
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
-
N
CC
BREAK
N
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
JJ
NO-BREAK
V
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK
-
N
CC
BREAK
N
CC
RB
NO-BREAK
CC
RB
N
NO-BREAK
RB
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
-
N
V
NO-BREAK
N
V
CC
BREAK
V
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
PUNC
BREAK
V
PUNC
-
NO-BREAK
-
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK
CD
PUNC
-
NO-BREAK
-
JJ
CC
BREAK
JJ
CC
RB
NO-BREAK
CC
RB
JJ
NO-BREAK
RB
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK
![Page 33: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/33.jpg)
Place all data at the root, then partition with the question “C POS = N ?”N
V
TO
NO-BREAK
V
TO
V
NO-BREAKTO
V
N
NO-BREAKV
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
N
CC
N
NO-BREAK
N
V
JJ
NO-BREAK
V
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK
-
N
CC
BREAK
N
CC
RB
NO-BREAK
CC
RB
N
NO-BREAK
RB
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
-
N
V
NO-BREAK
N
V
CC
BREAK
V
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
PUNC
BREAK
V
PUNC
-
NO-BREAK
-
CD
PUNC
BREAK CD
PUNC
CD
NO-BREAKPUNC
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK
CD
PUNC
-
NO-BREAK
-
JJ
CC
BREAK
JJ
CC
RB
NO-BREAKCC
RB
JJ
NO-BREAK
RB
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK
CC
N
V
NO-BREAK
-
N
CC
BREAK
-
N
V
NO-BREAK
yes no
C POS = N ?
![Page 34: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/34.jpg)
Place all data at the root, then partition with the question “C POS = N ?”
N
V
TO
NO-BREAK
V
TO
V
NO-BREAKTO
V
N
NO-BREAK
V
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
N
CC
N
NO-BREAK
N
V
JJ
NO-BREAK
V
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK-
N
CC
BREAK
N
CC
RB
NO-BREAK
CC
RB
N
NO-BREAK
RB
N
PUNC
BREAK
N
PUNC
-
NO-BREAK-
N
V
NO-BREAK
N
V
CC
BREAK
V
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
PUNC
BREAKV
PUNC
-
NO-BREAK
-
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK
CD
PUNC
-
NO-BREAK
-
JJ
CC
BREAK
JJ
CC
RB
NO-BREAK
CC
RB
JJ
NO-BREAK
RB
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK
CC
N
V
NO-BREAK
-
N
CC
BREAK
-
N
V
NO-BREAK
yes no
C POS = N ?
![Page 35: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/35.jpg)
Measure goodness of split for the question “C POS = N ?”
Entropy at the Y node 4 BREAK 0.50 -0.504 NO BREAK 0.50 -0.50
# data points 8 1.00 bits
Entropy at the N node 8 BREAK 0.31 -0.5218 NO BREAK 0.69 -0.37
# data points 26 0.89 bits
# data points in total 34Total entropy 0.92 bits
![Page 36: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/36.jpg)
Try another question: “R POS = CC ?”-
N
V
NO-BREAK
N
V
TO
NO-BREAK
V
TO
V
NO-BREAK
TO
V
N
NO-BREAK
V
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
-
N
CC
BREAK
N
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
JJ
NO-BREAK
V
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK
-
N
CC
BREAK
N
CC
RB
NO-BREAK
CC
RB
N
NO-BREAK
RB
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
-
N
V
NO-BREAK
N
V
CC
BREAK
V
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
PUNC
BREAK
V
PUNC
-
NO-BREAK
-
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK
CD
PUNC
-
NO-BREAK
-
JJ
CC
BREAK
JJ
CC
RB
NO-BREAK
CC
RB
JJ
NO-BREAK
RB
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK
![Page 37: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/37.jpg)
Try another question: “R POS = CC ?”-
N
V
NO-BREAK
N
V
TO
NO-BREAK
V
TO
V
NO-BREAK
TO
V
N
NO-BREAK
V
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
-
N
CC
BREAK
N
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
JJ
NO-BREAK
V
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK
-
N
CC
BREAK
N
CC
RB
NO-BREAK
CC
RB
N
NO-BREAK
RB
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
-
N
V
NO-BREAK
N
V
CC
BREAK
V
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
PUNC
BREAK
V
PUNC
-
NO-BREAK
-
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK
CD
PUNC
-
NO-BREAK
-
JJ
CC
BREAK
JJ
CC
RB
NO-BREAK
CC
RB
JJ
NO-BREAK
RB
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK
![Page 38: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/38.jpg)
Place all data at the root, then partition with the question “R POS = CC ?”
yes no
R POS = CC ?
-
N
V
NO-BREAK
N
V
TO
NO-BREAK
V
TO
V
NO-BREAK
TO
V
N
NO-BREAKV
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
N
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
JJ
NO-BREAK
V
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK
-
N
CC
BREAKN
CC
RB
NO-BREAK
CC
RB
N
NO-BREAK
RB
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
-
N
V
NO-BREAK N
V
CC
BREAK
V
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
PUNC
BREAK
V
PUNC
-
NO-BREAK
-
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAKPUNC
CD
PUNC
BREAKCD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK
CD
PUNC
-
NO-BREAK
-
JJ
CC
BREAK
JJ
CC
RB
NO-BREAK
CC
RB
JJ
NO-BREAK
RB
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK
-
N
CC
BREAK
![Page 39: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/39.jpg)
Place all data at the root, then partition with the question “R POS = CC ?”
yes no
R POS = CC ?
-
N
V
NO-BREAK
N
V
TO
NO-BREAK
V
TO
V
NO-BREAK
TO
V
N
NO-BREAKV
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
N
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
JJ
NO-BREAK
V
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK-
N
CC
BREAK
N
CC
RB
NO-BREAKCC
RB
N
NO-BREAK
RB
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
-
N
V
NO-BREAK
N
V
CC
BREAK
V
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
PUNC
BREAK
V
PUNC
-
NO-BREAK
-
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK
CD
PUNC
-
NO-BREAK-
JJ
CC
BREAK
JJ
CC
RB
NO-BREAK
CC
RB
JJ
NO-BREAK
RB
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK
-
N
CC
BREAK
![Page 40: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/40.jpg)
Try all possible questions, measuring goodness of split (as entropy, in bits)
L POS = PUNC C POS = PUNC R POS = PUNC 0.47
L POS = N C POS = N 0.92 R POS = N
L POS = V C POS = V R POS = V
L POS = TO C POS = TO R POS = TO
L POS = CC C POS = CC R POS = CC 0.74
L POS = JJ C POS = JJ R POS = JJ
L POS = RB C POS = RB R POS = RB
L POS = CD C POS = CD R POS = CD
![Page 41: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/41.jpg)
Insert best question into tree and permanently partition data
yes no
R POS = PUNC ?
-
N
V
NO-BREAK
N
V
TO
NO-BREAK
V
TO
V
NO-BREAK
TO
V
N
NO-BREAK
V
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
N
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
JJ
NO-BREAK
V
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK-
N
CC
BREAK
N
CC
RB
NO-BREAK
CC
RB
N
NO-BREAK
RB
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
-
N
V
NO-BREAK
N
V
CC
BREAK
V
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
PUNC
BREAK
V
PUNC
-
NO-BREAK
-
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK CD
PUNC
-
NO-BREAK
-
JJ
CC
BREAK
JJ
CC
RB
NO-BREAK
CC
RB
JJ
NO-BREAK
RB
JJ
PUNC
BREAKJJ
PUNC
-
NO-BREAK
-
N
CC
BREAK
![Page 42: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/42.jpg)
Recurse
yes no
R POS = PUNC ?
-
N
V
NO-BREAK
N
V
TO
NO-BREAK
V
TO
V
NO-BREAK
TO
V
N
NO-BREAK
V
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
N
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
JJ
NO-BREAK
V
JJ
PUNC
BREAK
JJ
PUNC
-
NO-BREAK-
N
CC
BREAK
N
CC
RB
NO-BREAK
CC
RB
N
NO-BREAK
RB
N
PUNC
BREAK
N
PUNC
-
NO-BREAK
-
N
V
NO-BREAK
N
V
CC
BREAK
V
CC
N
NO-BREAK
CC
N
V
NO-BREAK
N
V
PUNC
BREAK
V
PUNC
-
NO-BREAK
-
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK
CD
PUNC
CD
NO-BREAK
PUNC
CD
PUNC
BREAK CD
PUNC
-
NO-BREAK
-
JJ
CC
BREAK
JJ
CC
RB
NO-BREAK
CC
RB
JJ
NO-BREAK
RB
JJ
PUNC
BREAKJJ
PUNC
-
NO-BREAK
-
N
CC
BREAK
![Page 43: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/43.jpg)
My tree
R POS = PUNC ?
yes
BREAKR POS = CC ?
BREAK NO-BREAK
no
yes no
Decision tree
![Page 44: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/44.jpg)
Words He likes to sail ships .POS N V TO V N PUNCPredictor 1 : L POSPredictor 2 : C POSPredictor 3 : R POSPredictee
Words Horses and cows are animals .POS N CC N V N PUNCPredictor 1 : L POSPredictor 2 : C POSPredictor 3 : R POSPredictee
Words One or two more things .POS CD CC CD JJ N PUNCPredictor 1 : L POSPredictor 2 : C POSPredictor 3 : R POSPredictee
Test data - use the tree to predict where the phrase breaks should be placed
![Page 45: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/45.jpg)
Speech synthesis - pronunciation & prosody
• Machine Learning• Classification And Regression Trees (CARTs)
• classification: understanding entropy as a measure of predictability• regression: measuring the predictability of a continuous variable• stopping criteria
![Page 46: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/46.jpg)
13.4 3.14 2.7311.3 1.23 4.52 9.42 2.1110.1 1.87
3.14 2.73 1.23 4.52 2.11 1.87
13.411.3 9.4210.1
σ2 = 18.7
σ2 = 1.11
σ2 = 2.29
![Page 47: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/47.jpg)
Speech synthesis - pronunciation & prosody
• Machine Learning• Classification And Regression Trees (CARTs)
• classification: understanding entropy as a measure of predictability• regression: measuring the predictability of a continuous variable• stopping criteria
![Page 48: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/48.jpg)
Stopping criteria (we may use several)
• Classification or Regression• all data points have the same value for the predictee (job done!)• all data points have the same values for all predictors
• equivalently: no available question can split them• number of data points in parent node is below a threshold• number of data points in a child node would fall below a threshold
• Classification only• cannot reduce entropy by more than some pre-specified amount
• Regression only• cannot reduce variance by more than some pre-specified amount
![Page 49: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/49.jpg)
Today’s topics - what we covered
Phoneme Pronunciation
Prosody
Decision tree
Learning decision trees
![Page 50: Module 4 Pronunciation & prosody - speech.zone · Today’s topics - Module 4: pronunciation & prosody Theory Application Speech Signal processing Probabilistic modelling Speech Synthesis](https://reader030.fdocuments.in/reader030/viewer/2022040900/5e6fe82d8978ee4144744f66/html5/thumbnails/50.jpg)
What next?
In Module 5
• We have• normalised the text• predicted pronunciation
• predicted prosody
• That completes the linguistic specification
• Next, from that linguistic specification• it’s time to generate a waveform