Modeling the creaky excitationfor parametric speech synthesis.
1Thomas Drugman, 2John Kane, 2Christer Gobl
September 11th, 2012Interspeech
Portland, Oregon, USA
1University of Mons, Belgium2Trinity College Dublin, Ireland
1 / 27
Creaky voice - examples
TTS corpora examples
American Male
Finnish female
Finnish Male
Conversational speech examples
Japanese female
American female
American Male
2 / 27
Creaky voice in speech
Phonetic contrast
e.g., Jalapa Mazatec
Phrase/sentence/turn boundaries
Commonly in American English, Finnish etc.
Interactive speech
Turn-takingHesitationsExpression of affective statesStylistic device
3 / 27
Creaky voice - acoustic characteristics
4 / 27
Creaky voice - acoustic characteristics
5 / 27
Problem statement
Unique acoustic characteristics of creak poorly modelled instandard vocoders
Silen et al. (2009) - improved robustness of f0 and voicingdecision
Our Aim: Provide a method for modelling the creakyexcitation to improve the timbre of creak in parametricsynthesis.
6 / 27
Speech data
American male (BDL) and Finnish male (MV)
100 sentences containing creak
7 / 27
Manual annotation
A rough quality with the sensation of repeating impulses- Ishi et al. (2008)
8 / 27
Glottal closure instants (GCIs)
Newly developed SE-VQ algorithm - Kane & Gobl, In Press
0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7−0.4−0.2
00.20.40.6
Am
plitu
de
Speech waveform
SEDREAMS − GCI
0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7−0.5
0
0.5
Am
plitu
de
Resonator output
0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7
0
0.5
1
Time (seconds)
Am
plitu
de
DEGG
DEGG − GCI
9 / 27
The deterministic plusstochastic model (DSM)
The Deterministic plus Stochastic Modelof the Residual Signal and its Applications
-Drugman & Dutoit (2012), IEEE TASLP
10 / 27
DSM - residual excitation
Univ ersité de Mons
Residual excitation
4
11 / 27
DSM - Residual frames
Univ ersité de Mons
The Deterministic plus Stochastic
Model
7
MGC Analys is
Inverse Filtering
GCI Estimation
PS Window ing
Speech Database
GCI positions
Dataset of PS residual frames
Residual signals
12 / 27
DSM - Deterministic modelling
Univ ersité de Mons
The Deterministic Component
8
Pitch Norm alizatio n
F0* Dataset of
PS residual frames
Energy Norm alizatio n
Dataset for the Deterministic
Modeling
13 / 27
DSM - vocoder
Univ ersité de Mons
The DSM vocoder
5
Deterministic component of the excitation
Stochastic component of the excitation
Filter
14 / 27
Extended DSM for creaky voice
15 / 27
DSM (creak) - Fundamental period/opening phase
100 150 200 250 300 350 4000
50
100
150
200
250
300
350
400
Fundamental period (samples)
Op
enin
g p
erio
d (
sam
ple
s)
200 250 300 350 400 450 500 5500
50
100
150
200
250
300
350
400
450
500
Fundamental period (samples)
Op
enin
g p
erio
d (
sam
ple
s)
16 / 27
DSM (creak) - Excitation modelling
Separate residual datasets for opening phase (secondary peak=> GCI) and closed phase (GCI => secondary peak)
Principal component analysis of each dataset separately,excitation model combining first eigenvectors for deterministiccomponent.
Energy envelope also derived for the two datasets separately.
17 / 27
DSM (creak) - Data-driven excitation signal
0 100 200 300 400−0.1
0
0.1
0.2
0.3
0.4
Time (samples)
Am
plitu
de
0 100 200 300 4000
0.05
0.1
0.15
0.2
0.25
Time (samples)
Am
plitu
de
18 / 27
DSM (creak) - Vocoder
19 / 27
Evaluation
20 / 27
Experimental setup
Subjective evaluation with 22 participants.
Copy-synthesis of short utterances by the American andFinnish speaker using the standard DSM vocoder and theproposed method.
ABX testOriginal utterance (X) and the two copy synthesis versions (A& B). Select most like original
Comparative Mean Opinion Score (CMOS) testCopy synthesis by both vocoders - signal preference on gradual7 point CMOS scale.
21 / 27
Results - ABX
22 / 27
Results - Comparative Mean Opinion Score (CMOS)
23 / 27
Results - Samples
American Male
1 Original standard HTS vocoder DSM vocoder DSM-creak
2 Original standard HTS vocoder DSM vocoder DSM-creak
3 Original standard HTS vocoder DSM vocoder DSM-creak
Finnish Male
1 Original standard HTS vocoder DSM vocoder DSM-creak
2 Original standard HTS vocoder DSM vocoder DSM-creak
3 Original standard HTS vocoder DSM vocoder DSM-creak
24 / 27
Ongoing/future research directions
Automate creak segmentation (see our poster at specialsession - glottal source processing!)
Prediction of creaky regions from contextual features (e.g.,phoneme, word stress, position in sentence, prosodic contextetc.)
Transformation of speakers voice characteristics.
25 / 27
Acknowledgements
This work was supported by the Science Foundation Ireland,Grant 07 / CE / I 1142 (Centre for Next GenerationLocalisation, www.cngl.ie) and Grant 09 / IN.1 / I 2631(FASTNET).
26 / 27
Thank you!
27 / 27
Top Related