Modeling the creaky excitation for parametric speech

27
Modeling the creaky excitation for parametric speech synthesis. 1 Thomas Drugman, 2 John Kane, 2 Christer Gobl September 11th, 2012 Interspeech Portland, Oregon, USA 1 University of Mons, Belgium 2 Trinity College Dublin, Ireland 1 / 27

Transcript of Modeling the creaky excitation for parametric speech

Page 1: Modeling the creaky excitation for parametric speech

Modeling the creaky excitationfor parametric speech synthesis.

1Thomas Drugman, 2John Kane, 2Christer Gobl

September 11th, 2012Interspeech

Portland, Oregon, USA

1University of Mons, Belgium2Trinity College Dublin, Ireland

1 / 27

Page 2: Modeling the creaky excitation for parametric speech

Creaky voice - examples

TTS corpora examples

American Male

Finnish female

Finnish Male

Conversational speech examples

Japanese female

American female

American Male

2 / 27

Page 3: Modeling the creaky excitation for parametric speech

Creaky voice in speech

Phonetic contrast

e.g., Jalapa Mazatec

Phrase/sentence/turn boundaries

Commonly in American English, Finnish etc.

Interactive speech

Turn-takingHesitationsExpression of affective statesStylistic device

3 / 27

Page 4: Modeling the creaky excitation for parametric speech

Creaky voice - acoustic characteristics

4 / 27

Page 5: Modeling the creaky excitation for parametric speech

Creaky voice - acoustic characteristics

5 / 27

Page 6: Modeling the creaky excitation for parametric speech

Problem statement

Unique acoustic characteristics of creak poorly modelled instandard vocoders

Silen et al. (2009) - improved robustness of f0 and voicingdecision

Our Aim: Provide a method for modelling the creakyexcitation to improve the timbre of creak in parametricsynthesis.

6 / 27

Page 7: Modeling the creaky excitation for parametric speech

Speech data

American male (BDL) and Finnish male (MV)

100 sentences containing creak

7 / 27

Page 8: Modeling the creaky excitation for parametric speech

Manual annotation

A rough quality with the sensation of repeating impulses- Ishi et al. (2008)

8 / 27

Page 9: Modeling the creaky excitation for parametric speech

Glottal closure instants (GCIs)

Newly developed SE-VQ algorithm - Kane & Gobl, In Press

0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7−0.4−0.2

00.20.40.6

Am

plitu

de

Speech waveform

SEDREAMS − GCI

0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7−0.5

0

0.5

Am

plitu

de

Resonator output

0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7

0

0.5

1

Time (seconds)

Am

plitu

de

DEGG

DEGG − GCI

9 / 27

Page 10: Modeling the creaky excitation for parametric speech

The deterministic plusstochastic model (DSM)

The Deterministic plus Stochastic Modelof the Residual Signal and its Applications

-Drugman & Dutoit (2012), IEEE TASLP

10 / 27

Page 11: Modeling the creaky excitation for parametric speech

DSM - residual excitation

Univ ersité de Mons

Residual excitation

4

11 / 27

Page 12: Modeling the creaky excitation for parametric speech

DSM - Residual frames

Univ ersité de Mons

The Deterministic plus Stochastic

Model

7

MGC Analys is

Inverse Filtering

GCI Estimation

PS Window ing

Speech Database

GCI positions

Dataset of PS residual frames

Residual signals

12 / 27

Page 13: Modeling the creaky excitation for parametric speech

DSM - Deterministic modelling

Univ ersité de Mons

The Deterministic Component

8

Pitch Norm alizatio n

F0* Dataset of

PS residual frames

Energy Norm alizatio n

Dataset for the Deterministic

Modeling

13 / 27

Page 14: Modeling the creaky excitation for parametric speech

DSM - vocoder

Univ ersité de Mons

The DSM vocoder

5

Deterministic component of the excitation

Stochastic component of the excitation

Filter

14 / 27

Page 15: Modeling the creaky excitation for parametric speech

Extended DSM for creaky voice

15 / 27

Page 16: Modeling the creaky excitation for parametric speech

DSM (creak) - Fundamental period/opening phase

100 150 200 250 300 350 4000

50

100

150

200

250

300

350

400

Fundamental period (samples)

Op

enin

g p

erio

d (

sam

ple

s)

200 250 300 350 400 450 500 5500

50

100

150

200

250

300

350

400

450

500

Fundamental period (samples)

Op

enin

g p

erio

d (

sam

ple

s)

16 / 27

Page 17: Modeling the creaky excitation for parametric speech

DSM (creak) - Excitation modelling

Separate residual datasets for opening phase (secondary peak=> GCI) and closed phase (GCI => secondary peak)

Principal component analysis of each dataset separately,excitation model combining first eigenvectors for deterministiccomponent.

Energy envelope also derived for the two datasets separately.

17 / 27

Page 18: Modeling the creaky excitation for parametric speech

DSM (creak) - Data-driven excitation signal

0 100 200 300 400−0.1

0

0.1

0.2

0.3

0.4

Time (samples)

Am

plitu

de

0 100 200 300 4000

0.05

0.1

0.15

0.2

0.25

Time (samples)

Am

plitu

de

18 / 27

Page 19: Modeling the creaky excitation for parametric speech

DSM (creak) - Vocoder

19 / 27

Page 20: Modeling the creaky excitation for parametric speech

Evaluation

20 / 27

Page 21: Modeling the creaky excitation for parametric speech

Experimental setup

Subjective evaluation with 22 participants.

Copy-synthesis of short utterances by the American andFinnish speaker using the standard DSM vocoder and theproposed method.

ABX testOriginal utterance (X) and the two copy synthesis versions (A& B). Select most like original

Comparative Mean Opinion Score (CMOS) testCopy synthesis by both vocoders - signal preference on gradual7 point CMOS scale.

21 / 27

Page 22: Modeling the creaky excitation for parametric speech

Results - ABX

22 / 27

Page 23: Modeling the creaky excitation for parametric speech

Results - Comparative Mean Opinion Score (CMOS)

23 / 27

Page 24: Modeling the creaky excitation for parametric speech

Results - Samples

American Male

1 Original standard HTS vocoder DSM vocoder DSM-creak

2 Original standard HTS vocoder DSM vocoder DSM-creak

3 Original standard HTS vocoder DSM vocoder DSM-creak

Finnish Male

1 Original standard HTS vocoder DSM vocoder DSM-creak

2 Original standard HTS vocoder DSM vocoder DSM-creak

3 Original standard HTS vocoder DSM vocoder DSM-creak

24 / 27

Page 25: Modeling the creaky excitation for parametric speech

Ongoing/future research directions

Automate creak segmentation (see our poster at specialsession - glottal source processing!)

Prediction of creaky regions from contextual features (e.g.,phoneme, word stress, position in sentence, prosodic contextetc.)

Transformation of speakers voice characteristics.

25 / 27

Page 26: Modeling the creaky excitation for parametric speech

Acknowledgements

This work was supported by the Science Foundation Ireland,Grant 07 / CE / I 1142 (Centre for Next GenerationLocalisation, www.cngl.ie) and Grant 09 / IN.1 / I 2631(FASTNET).

26 / 27

Page 27: Modeling the creaky excitation for parametric speech

Thank you!

27 / 27