Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how?...

20
Speech Synthesis: A Brief Motivation and Overview Erhard Rank Speech Synthesis: A Brief Motivation and Overview – p.1/17

Transcript of Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how?...

Page 1: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Speech SynthesisA Brief Motivation

and OverviewErhard Rank

Speech Synthesis A Brief Motivation and Overview ndash p117

Contents

bull Speech synthesisbull Examplesbull Human speech productionbull Human speech signals

bull Synthesis conceptsbull Summary

Speech Synthesis A Brief Motivation and Overview ndash p217

Speech synthesis

Why is it of interest

Speech communicationbull is the most natural form of human

communicationbull is not bound to a display (can be used while

driving a carbike working in adverseenvironment etc)

bull many people today have an output terminal onthem (mobile phone)

bull Speech Synthesis A Brief Motivation and Overview ndash p317

Speech synthesis

What is it

Generation of a speech signal by a machine

Fixed (small) number of utterancesrarr Replay recorded speech (eg alarm system)

Arbitrary utterances (ldquounrestrictedrdquo vocabulary)rarr Synthesize speech

ndash from rules using a production modelndash using recordings of natural speech

Speech Synthesis A Brief Motivation and Overview ndash p417

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

Example E-mail reading

Sg Mag Pospischil

Das MRT meeting findet am 16 schon um 8h im Sem217 statt und nicht wie angekuumlndigt um 10 (

MfG Christian Fiala

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

Example Weather forecast

Wien 17 GradGraz 19 GradLinz 13 Grad

rArr

ldquoDas Wetter in Oumlsterreich beruhigtsich wieder Am Nachmittagkoumlnnen allerdings noumlrdlich desAlpenhauptkamms lokaleRegenschauer rdquo

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

User

SpeechRecognition

Under-standing

Pro

cess

ing

ContentGeneration

SpeechSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Quality criteriabull Intelligibilitybull Naturalness

Speech Synthesis A Brief Motivation and Overview ndash p617

Examples (TTS)

bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural

Voices von ATampT

(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt

um viertel vor vier

bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia

Mestre

Speech Synthesis A Brief Motivation and Overview ndash p717

Examples (TTS)

bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen

Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na

dann gehrsquo ich halt nach Hause

bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)

Speech Synthesis A Brief Motivation and Overview ndash p817

Human speech production

Acoustic sourcebull oscillations of the vocal folds voiced sounds

(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced

soundsbull noise bursts due to sudden realease of

closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds

Speech Synthesis A Brief Motivation and Overview ndash p917

Human speech signals

File Page 1 of 1 Printed Tue Oct 21 192217

11570

minus14753

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time

Live analysis

Speech Synthesis A Brief Motivation and Overview ndash p1017

Human speech signals

bull Non-stationarybull Voiced (almost periodic) ndash unvoiced

(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation

contradiction bull Situation dialogue partner emotion etc

Speech Synthesis A Brief Motivation and Overview ndash p1117

Synthesis concepts

Categories

ArticulatorySynthesis

FormantSynthesis

ConcatenativeSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p1217

Synthesis concepts

Physical Modeling (Artic Synth)

bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise

sources)bull Coupling between source and filter

bull Model parameters rely on (inaccurate)measurements

bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317

Synthesis concepts

Formant synthesis

bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise

(unvoiced) source signal

t

f

voiced

unvoiced

s(t)|H(f)|

a(t)

t

source filter

xvoi(t)

xuv(t)

Speech Synthesis A Brief Motivation and Overview ndash p1417

Synthesis concepts

Articulatory and formant synthesis

Rule-based synthesis

Howeverbull Often the rules have to be derived from natural

recorded signals or measurements (X-ray)Alternative

Database-driven synthesis

Speech Synthesis A Brief Motivation and Overview ndash p1517

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 2: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Contents

bull Speech synthesisbull Examplesbull Human speech productionbull Human speech signals

bull Synthesis conceptsbull Summary

Speech Synthesis A Brief Motivation and Overview ndash p217

Speech synthesis

Why is it of interest

Speech communicationbull is the most natural form of human

communicationbull is not bound to a display (can be used while

driving a carbike working in adverseenvironment etc)

bull many people today have an output terminal onthem (mobile phone)

bull Speech Synthesis A Brief Motivation and Overview ndash p317

Speech synthesis

What is it

Generation of a speech signal by a machine

Fixed (small) number of utterancesrarr Replay recorded speech (eg alarm system)

Arbitrary utterances (ldquounrestrictedrdquo vocabulary)rarr Synthesize speech

ndash from rules using a production modelndash using recordings of natural speech

Speech Synthesis A Brief Motivation and Overview ndash p417

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

Example E-mail reading

Sg Mag Pospischil

Das MRT meeting findet am 16 schon um 8h im Sem217 statt und nicht wie angekuumlndigt um 10 (

MfG Christian Fiala

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

Example Weather forecast

Wien 17 GradGraz 19 GradLinz 13 Grad

rArr

ldquoDas Wetter in Oumlsterreich beruhigtsich wieder Am Nachmittagkoumlnnen allerdings noumlrdlich desAlpenhauptkamms lokaleRegenschauer rdquo

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

User

SpeechRecognition

Under-standing

Pro

cess

ing

ContentGeneration

SpeechSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Quality criteriabull Intelligibilitybull Naturalness

Speech Synthesis A Brief Motivation and Overview ndash p617

Examples (TTS)

bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural

Voices von ATampT

(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt

um viertel vor vier

bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia

Mestre

Speech Synthesis A Brief Motivation and Overview ndash p717

Examples (TTS)

bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen

Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na

dann gehrsquo ich halt nach Hause

bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)

Speech Synthesis A Brief Motivation and Overview ndash p817

Human speech production

Acoustic sourcebull oscillations of the vocal folds voiced sounds

(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced

soundsbull noise bursts due to sudden realease of

closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds

Speech Synthesis A Brief Motivation and Overview ndash p917

Human speech signals

File Page 1 of 1 Printed Tue Oct 21 192217

11570

minus14753

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time

Live analysis

Speech Synthesis A Brief Motivation and Overview ndash p1017

Human speech signals

bull Non-stationarybull Voiced (almost periodic) ndash unvoiced

(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation

contradiction bull Situation dialogue partner emotion etc

Speech Synthesis A Brief Motivation and Overview ndash p1117

Synthesis concepts

Categories

ArticulatorySynthesis

FormantSynthesis

ConcatenativeSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p1217

Synthesis concepts

Physical Modeling (Artic Synth)

bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise

sources)bull Coupling between source and filter

bull Model parameters rely on (inaccurate)measurements

bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317

Synthesis concepts

Formant synthesis

bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise

(unvoiced) source signal

t

f

voiced

unvoiced

s(t)|H(f)|

a(t)

t

source filter

xvoi(t)

xuv(t)

Speech Synthesis A Brief Motivation and Overview ndash p1417

Synthesis concepts

Articulatory and formant synthesis

Rule-based synthesis

Howeverbull Often the rules have to be derived from natural

recorded signals or measurements (X-ray)Alternative

Database-driven synthesis

Speech Synthesis A Brief Motivation and Overview ndash p1517

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 3: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Speech synthesis

Why is it of interest

Speech communicationbull is the most natural form of human

communicationbull is not bound to a display (can be used while

driving a carbike working in adverseenvironment etc)

bull many people today have an output terminal onthem (mobile phone)

bull Speech Synthesis A Brief Motivation and Overview ndash p317

Speech synthesis

What is it

Generation of a speech signal by a machine

Fixed (small) number of utterancesrarr Replay recorded speech (eg alarm system)

Arbitrary utterances (ldquounrestrictedrdquo vocabulary)rarr Synthesize speech

ndash from rules using a production modelndash using recordings of natural speech

Speech Synthesis A Brief Motivation and Overview ndash p417

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

Example E-mail reading

Sg Mag Pospischil

Das MRT meeting findet am 16 schon um 8h im Sem217 statt und nicht wie angekuumlndigt um 10 (

MfG Christian Fiala

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

Example Weather forecast

Wien 17 GradGraz 19 GradLinz 13 Grad

rArr

ldquoDas Wetter in Oumlsterreich beruhigtsich wieder Am Nachmittagkoumlnnen allerdings noumlrdlich desAlpenhauptkamms lokaleRegenschauer rdquo

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

User

SpeechRecognition

Under-standing

Pro

cess

ing

ContentGeneration

SpeechSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Quality criteriabull Intelligibilitybull Naturalness

Speech Synthesis A Brief Motivation and Overview ndash p617

Examples (TTS)

bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural

Voices von ATampT

(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt

um viertel vor vier

bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia

Mestre

Speech Synthesis A Brief Motivation and Overview ndash p717

Examples (TTS)

bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen

Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na

dann gehrsquo ich halt nach Hause

bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)

Speech Synthesis A Brief Motivation and Overview ndash p817

Human speech production

Acoustic sourcebull oscillations of the vocal folds voiced sounds

(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced

soundsbull noise bursts due to sudden realease of

closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds

Speech Synthesis A Brief Motivation and Overview ndash p917

Human speech signals

File Page 1 of 1 Printed Tue Oct 21 192217

11570

minus14753

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time

Live analysis

Speech Synthesis A Brief Motivation and Overview ndash p1017

Human speech signals

bull Non-stationarybull Voiced (almost periodic) ndash unvoiced

(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation

contradiction bull Situation dialogue partner emotion etc

Speech Synthesis A Brief Motivation and Overview ndash p1117

Synthesis concepts

Categories

ArticulatorySynthesis

FormantSynthesis

ConcatenativeSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p1217

Synthesis concepts

Physical Modeling (Artic Synth)

bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise

sources)bull Coupling between source and filter

bull Model parameters rely on (inaccurate)measurements

bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317

Synthesis concepts

Formant synthesis

bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise

(unvoiced) source signal

t

f

voiced

unvoiced

s(t)|H(f)|

a(t)

t

source filter

xvoi(t)

xuv(t)

Speech Synthesis A Brief Motivation and Overview ndash p1417

Synthesis concepts

Articulatory and formant synthesis

Rule-based synthesis

Howeverbull Often the rules have to be derived from natural

recorded signals or measurements (X-ray)Alternative

Database-driven synthesis

Speech Synthesis A Brief Motivation and Overview ndash p1517

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 4: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Speech synthesis

What is it

Generation of a speech signal by a machine

Fixed (small) number of utterancesrarr Replay recorded speech (eg alarm system)

Arbitrary utterances (ldquounrestrictedrdquo vocabulary)rarr Synthesize speech

ndash from rules using a production modelndash using recordings of natural speech

Speech Synthesis A Brief Motivation and Overview ndash p417

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

Example E-mail reading

Sg Mag Pospischil

Das MRT meeting findet am 16 schon um 8h im Sem217 statt und nicht wie angekuumlndigt um 10 (

MfG Christian Fiala

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

Example Weather forecast

Wien 17 GradGraz 19 GradLinz 13 Grad

rArr

ldquoDas Wetter in Oumlsterreich beruhigtsich wieder Am Nachmittagkoumlnnen allerdings noumlrdlich desAlpenhauptkamms lokaleRegenschauer rdquo

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

User

SpeechRecognition

Under-standing

Pro

cess

ing

ContentGeneration

SpeechSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Quality criteriabull Intelligibilitybull Naturalness

Speech Synthesis A Brief Motivation and Overview ndash p617

Examples (TTS)

bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural

Voices von ATampT

(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt

um viertel vor vier

bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia

Mestre

Speech Synthesis A Brief Motivation and Overview ndash p717

Examples (TTS)

bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen

Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na

dann gehrsquo ich halt nach Hause

bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)

Speech Synthesis A Brief Motivation and Overview ndash p817

Human speech production

Acoustic sourcebull oscillations of the vocal folds voiced sounds

(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced

soundsbull noise bursts due to sudden realease of

closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds

Speech Synthesis A Brief Motivation and Overview ndash p917

Human speech signals

File Page 1 of 1 Printed Tue Oct 21 192217

11570

minus14753

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time

Live analysis

Speech Synthesis A Brief Motivation and Overview ndash p1017

Human speech signals

bull Non-stationarybull Voiced (almost periodic) ndash unvoiced

(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation

contradiction bull Situation dialogue partner emotion etc

Speech Synthesis A Brief Motivation and Overview ndash p1117

Synthesis concepts

Categories

ArticulatorySynthesis

FormantSynthesis

ConcatenativeSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p1217

Synthesis concepts

Physical Modeling (Artic Synth)

bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise

sources)bull Coupling between source and filter

bull Model parameters rely on (inaccurate)measurements

bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317

Synthesis concepts

Formant synthesis

bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise

(unvoiced) source signal

t

f

voiced

unvoiced

s(t)|H(f)|

a(t)

t

source filter

xvoi(t)

xuv(t)

Speech Synthesis A Brief Motivation and Overview ndash p1417

Synthesis concepts

Articulatory and formant synthesis

Rule-based synthesis

Howeverbull Often the rules have to be derived from natural

recorded signals or measurements (X-ray)Alternative

Database-driven synthesis

Speech Synthesis A Brief Motivation and Overview ndash p1517

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 5: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

Example E-mail reading

Sg Mag Pospischil

Das MRT meeting findet am 16 schon um 8h im Sem217 statt und nicht wie angekuumlndigt um 10 (

MfG Christian Fiala

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

Example Weather forecast

Wien 17 GradGraz 19 GradLinz 13 Grad

rArr

ldquoDas Wetter in Oumlsterreich beruhigtsich wieder Am Nachmittagkoumlnnen allerdings noumlrdlich desAlpenhauptkamms lokaleRegenschauer rdquo

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

User

SpeechRecognition

Under-standing

Pro

cess

ing

ContentGeneration

SpeechSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Quality criteriabull Intelligibilitybull Naturalness

Speech Synthesis A Brief Motivation and Overview ndash p617

Examples (TTS)

bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural

Voices von ATampT

(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt

um viertel vor vier

bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia

Mestre

Speech Synthesis A Brief Motivation and Overview ndash p717

Examples (TTS)

bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen

Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na

dann gehrsquo ich halt nach Hause

bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)

Speech Synthesis A Brief Motivation and Overview ndash p817

Human speech production

Acoustic sourcebull oscillations of the vocal folds voiced sounds

(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced

soundsbull noise bursts due to sudden realease of

closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds

Speech Synthesis A Brief Motivation and Overview ndash p917

Human speech signals

File Page 1 of 1 Printed Tue Oct 21 192217

11570

minus14753

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time

Live analysis

Speech Synthesis A Brief Motivation and Overview ndash p1017

Human speech signals

bull Non-stationarybull Voiced (almost periodic) ndash unvoiced

(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation

contradiction bull Situation dialogue partner emotion etc

Speech Synthesis A Brief Motivation and Overview ndash p1117

Synthesis concepts

Categories

ArticulatorySynthesis

FormantSynthesis

ConcatenativeSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p1217

Synthesis concepts

Physical Modeling (Artic Synth)

bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise

sources)bull Coupling between source and filter

bull Model parameters rely on (inaccurate)measurements

bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317

Synthesis concepts

Formant synthesis

bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise

(unvoiced) source signal

t

f

voiced

unvoiced

s(t)|H(f)|

a(t)

t

source filter

xvoi(t)

xuv(t)

Speech Synthesis A Brief Motivation and Overview ndash p1417

Synthesis concepts

Articulatory and formant synthesis

Rule-based synthesis

Howeverbull Often the rules have to be derived from natural

recorded signals or measurements (X-ray)Alternative

Database-driven synthesis

Speech Synthesis A Brief Motivation and Overview ndash p1517

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 6: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

Example E-mail reading

Sg Mag Pospischil

Das MRT meeting findet am 16 schon um 8h im Sem217 statt und nicht wie angekuumlndigt um 10 (

MfG Christian Fiala

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

Example Weather forecast

Wien 17 GradGraz 19 GradLinz 13 Grad

rArr

ldquoDas Wetter in Oumlsterreich beruhigtsich wieder Am Nachmittagkoumlnnen allerdings noumlrdlich desAlpenhauptkamms lokaleRegenschauer rdquo

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

User

SpeechRecognition

Under-standing

Pro

cess

ing

ContentGeneration

SpeechSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Quality criteriabull Intelligibilitybull Naturalness

Speech Synthesis A Brief Motivation and Overview ndash p617

Examples (TTS)

bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural

Voices von ATampT

(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt

um viertel vor vier

bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia

Mestre

Speech Synthesis A Brief Motivation and Overview ndash p717

Examples (TTS)

bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen

Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na

dann gehrsquo ich halt nach Hause

bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)

Speech Synthesis A Brief Motivation and Overview ndash p817

Human speech production

Acoustic sourcebull oscillations of the vocal folds voiced sounds

(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced

soundsbull noise bursts due to sudden realease of

closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds

Speech Synthesis A Brief Motivation and Overview ndash p917

Human speech signals

File Page 1 of 1 Printed Tue Oct 21 192217

11570

minus14753

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time

Live analysis

Speech Synthesis A Brief Motivation and Overview ndash p1017

Human speech signals

bull Non-stationarybull Voiced (almost periodic) ndash unvoiced

(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation

contradiction bull Situation dialogue partner emotion etc

Speech Synthesis A Brief Motivation and Overview ndash p1117

Synthesis concepts

Categories

ArticulatorySynthesis

FormantSynthesis

ConcatenativeSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p1217

Synthesis concepts

Physical Modeling (Artic Synth)

bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise

sources)bull Coupling between source and filter

bull Model parameters rely on (inaccurate)measurements

bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317

Synthesis concepts

Formant synthesis

bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise

(unvoiced) source signal

t

f

voiced

unvoiced

s(t)|H(f)|

a(t)

t

source filter

xvoi(t)

xuv(t)

Speech Synthesis A Brief Motivation and Overview ndash p1417

Synthesis concepts

Articulatory and formant synthesis

Rule-based synthesis

Howeverbull Often the rules have to be derived from natural

recorded signals or measurements (X-ray)Alternative

Database-driven synthesis

Speech Synthesis A Brief Motivation and Overview ndash p1517

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 7: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

Example Weather forecast

Wien 17 GradGraz 19 GradLinz 13 Grad

rArr

ldquoDas Wetter in Oumlsterreich beruhigtsich wieder Am Nachmittagkoumlnnen allerdings noumlrdlich desAlpenhauptkamms lokaleRegenschauer rdquo

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

User

SpeechRecognition

Under-standing

Pro

cess

ing

ContentGeneration

SpeechSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Quality criteriabull Intelligibilitybull Naturalness

Speech Synthesis A Brief Motivation and Overview ndash p617

Examples (TTS)

bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural

Voices von ATampT

(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt

um viertel vor vier

bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia

Mestre

Speech Synthesis A Brief Motivation and Overview ndash p717

Examples (TTS)

bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen

Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na

dann gehrsquo ich halt nach Hause

bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)

Speech Synthesis A Brief Motivation and Overview ndash p817

Human speech production

Acoustic sourcebull oscillations of the vocal folds voiced sounds

(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced

soundsbull noise bursts due to sudden realease of

closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds

Speech Synthesis A Brief Motivation and Overview ndash p917

Human speech signals

File Page 1 of 1 Printed Tue Oct 21 192217

11570

minus14753

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time

Live analysis

Speech Synthesis A Brief Motivation and Overview ndash p1017

Human speech signals

bull Non-stationarybull Voiced (almost periodic) ndash unvoiced

(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation

contradiction bull Situation dialogue partner emotion etc

Speech Synthesis A Brief Motivation and Overview ndash p1117

Synthesis concepts

Categories

ArticulatorySynthesis

FormantSynthesis

ConcatenativeSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p1217

Synthesis concepts

Physical Modeling (Artic Synth)

bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise

sources)bull Coupling between source and filter

bull Model parameters rely on (inaccurate)measurements

bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317

Synthesis concepts

Formant synthesis

bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise

(unvoiced) source signal

t

f

voiced

unvoiced

s(t)|H(f)|

a(t)

t

source filter

xvoi(t)

xuv(t)

Speech Synthesis A Brief Motivation and Overview ndash p1417

Synthesis concepts

Articulatory and formant synthesis

Rule-based synthesis

Howeverbull Often the rules have to be derived from natural

recorded signals or measurements (X-ray)Alternative

Database-driven synthesis

Speech Synthesis A Brief Motivation and Overview ndash p1517

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 8: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Speech synthesis

Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system

User

SpeechRecognition

Under-standing

Pro

cess

ing

ContentGeneration

SpeechSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p517

Speech synthesis

Quality criteriabull Intelligibilitybull Naturalness

Speech Synthesis A Brief Motivation and Overview ndash p617

Examples (TTS)

bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural

Voices von ATampT

(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt

um viertel vor vier

bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia

Mestre

Speech Synthesis A Brief Motivation and Overview ndash p717

Examples (TTS)

bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen

Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na

dann gehrsquo ich halt nach Hause

bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)

Speech Synthesis A Brief Motivation and Overview ndash p817

Human speech production

Acoustic sourcebull oscillations of the vocal folds voiced sounds

(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced

soundsbull noise bursts due to sudden realease of

closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds

Speech Synthesis A Brief Motivation and Overview ndash p917

Human speech signals

File Page 1 of 1 Printed Tue Oct 21 192217

11570

minus14753

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time

Live analysis

Speech Synthesis A Brief Motivation and Overview ndash p1017

Human speech signals

bull Non-stationarybull Voiced (almost periodic) ndash unvoiced

(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation

contradiction bull Situation dialogue partner emotion etc

Speech Synthesis A Brief Motivation and Overview ndash p1117

Synthesis concepts

Categories

ArticulatorySynthesis

FormantSynthesis

ConcatenativeSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p1217

Synthesis concepts

Physical Modeling (Artic Synth)

bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise

sources)bull Coupling between source and filter

bull Model parameters rely on (inaccurate)measurements

bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317

Synthesis concepts

Formant synthesis

bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise

(unvoiced) source signal

t

f

voiced

unvoiced

s(t)|H(f)|

a(t)

t

source filter

xvoi(t)

xuv(t)

Speech Synthesis A Brief Motivation and Overview ndash p1417

Synthesis concepts

Articulatory and formant synthesis

Rule-based synthesis

Howeverbull Often the rules have to be derived from natural

recorded signals or measurements (X-ray)Alternative

Database-driven synthesis

Speech Synthesis A Brief Motivation and Overview ndash p1517

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 9: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Speech synthesis

Quality criteriabull Intelligibilitybull Naturalness

Speech Synthesis A Brief Motivation and Overview ndash p617

Examples (TTS)

bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural

Voices von ATampT

(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt

um viertel vor vier

bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia

Mestre

Speech Synthesis A Brief Motivation and Overview ndash p717

Examples (TTS)

bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen

Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na

dann gehrsquo ich halt nach Hause

bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)

Speech Synthesis A Brief Motivation and Overview ndash p817

Human speech production

Acoustic sourcebull oscillations of the vocal folds voiced sounds

(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced

soundsbull noise bursts due to sudden realease of

closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds

Speech Synthesis A Brief Motivation and Overview ndash p917

Human speech signals

File Page 1 of 1 Printed Tue Oct 21 192217

11570

minus14753

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time

Live analysis

Speech Synthesis A Brief Motivation and Overview ndash p1017

Human speech signals

bull Non-stationarybull Voiced (almost periodic) ndash unvoiced

(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation

contradiction bull Situation dialogue partner emotion etc

Speech Synthesis A Brief Motivation and Overview ndash p1117

Synthesis concepts

Categories

ArticulatorySynthesis

FormantSynthesis

ConcatenativeSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p1217

Synthesis concepts

Physical Modeling (Artic Synth)

bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise

sources)bull Coupling between source and filter

bull Model parameters rely on (inaccurate)measurements

bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317

Synthesis concepts

Formant synthesis

bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise

(unvoiced) source signal

t

f

voiced

unvoiced

s(t)|H(f)|

a(t)

t

source filter

xvoi(t)

xuv(t)

Speech Synthesis A Brief Motivation and Overview ndash p1417

Synthesis concepts

Articulatory and formant synthesis

Rule-based synthesis

Howeverbull Often the rules have to be derived from natural

recorded signals or measurements (X-ray)Alternative

Database-driven synthesis

Speech Synthesis A Brief Motivation and Overview ndash p1517

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 10: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Examples (TTS)

bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural

Voices von ATampT

(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt

um viertel vor vier

bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia

Mestre

Speech Synthesis A Brief Motivation and Overview ndash p717

Examples (TTS)

bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen

Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na

dann gehrsquo ich halt nach Hause

bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)

Speech Synthesis A Brief Motivation and Overview ndash p817

Human speech production

Acoustic sourcebull oscillations of the vocal folds voiced sounds

(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced

soundsbull noise bursts due to sudden realease of

closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds

Speech Synthesis A Brief Motivation and Overview ndash p917

Human speech signals

File Page 1 of 1 Printed Tue Oct 21 192217

11570

minus14753

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time

Live analysis

Speech Synthesis A Brief Motivation and Overview ndash p1017

Human speech signals

bull Non-stationarybull Voiced (almost periodic) ndash unvoiced

(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation

contradiction bull Situation dialogue partner emotion etc

Speech Synthesis A Brief Motivation and Overview ndash p1117

Synthesis concepts

Categories

ArticulatorySynthesis

FormantSynthesis

ConcatenativeSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p1217

Synthesis concepts

Physical Modeling (Artic Synth)

bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise

sources)bull Coupling between source and filter

bull Model parameters rely on (inaccurate)measurements

bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317

Synthesis concepts

Formant synthesis

bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise

(unvoiced) source signal

t

f

voiced

unvoiced

s(t)|H(f)|

a(t)

t

source filter

xvoi(t)

xuv(t)

Speech Synthesis A Brief Motivation and Overview ndash p1417

Synthesis concepts

Articulatory and formant synthesis

Rule-based synthesis

Howeverbull Often the rules have to be derived from natural

recorded signals or measurements (X-ray)Alternative

Database-driven synthesis

Speech Synthesis A Brief Motivation and Overview ndash p1517

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 11: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Examples (TTS)

bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen

Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na

dann gehrsquo ich halt nach Hause

bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)

Speech Synthesis A Brief Motivation and Overview ndash p817

Human speech production

Acoustic sourcebull oscillations of the vocal folds voiced sounds

(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced

soundsbull noise bursts due to sudden realease of

closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds

Speech Synthesis A Brief Motivation and Overview ndash p917

Human speech signals

File Page 1 of 1 Printed Tue Oct 21 192217

11570

minus14753

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time

Live analysis

Speech Synthesis A Brief Motivation and Overview ndash p1017

Human speech signals

bull Non-stationarybull Voiced (almost periodic) ndash unvoiced

(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation

contradiction bull Situation dialogue partner emotion etc

Speech Synthesis A Brief Motivation and Overview ndash p1117

Synthesis concepts

Categories

ArticulatorySynthesis

FormantSynthesis

ConcatenativeSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p1217

Synthesis concepts

Physical Modeling (Artic Synth)

bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise

sources)bull Coupling between source and filter

bull Model parameters rely on (inaccurate)measurements

bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317

Synthesis concepts

Formant synthesis

bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise

(unvoiced) source signal

t

f

voiced

unvoiced

s(t)|H(f)|

a(t)

t

source filter

xvoi(t)

xuv(t)

Speech Synthesis A Brief Motivation and Overview ndash p1417

Synthesis concepts

Articulatory and formant synthesis

Rule-based synthesis

Howeverbull Often the rules have to be derived from natural

recorded signals or measurements (X-ray)Alternative

Database-driven synthesis

Speech Synthesis A Brief Motivation and Overview ndash p1517

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 12: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Human speech production

Acoustic sourcebull oscillations of the vocal folds voiced sounds

(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced

soundsbull noise bursts due to sudden realease of

closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds

Speech Synthesis A Brief Motivation and Overview ndash p917

Human speech signals

File Page 1 of 1 Printed Tue Oct 21 192217

11570

minus14753

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time

Live analysis

Speech Synthesis A Brief Motivation and Overview ndash p1017

Human speech signals

bull Non-stationarybull Voiced (almost periodic) ndash unvoiced

(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation

contradiction bull Situation dialogue partner emotion etc

Speech Synthesis A Brief Motivation and Overview ndash p1117

Synthesis concepts

Categories

ArticulatorySynthesis

FormantSynthesis

ConcatenativeSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p1217

Synthesis concepts

Physical Modeling (Artic Synth)

bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise

sources)bull Coupling between source and filter

bull Model parameters rely on (inaccurate)measurements

bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317

Synthesis concepts

Formant synthesis

bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise

(unvoiced) source signal

t

f

voiced

unvoiced

s(t)|H(f)|

a(t)

t

source filter

xvoi(t)

xuv(t)

Speech Synthesis A Brief Motivation and Overview ndash p1417

Synthesis concepts

Articulatory and formant synthesis

Rule-based synthesis

Howeverbull Often the rules have to be derived from natural

recorded signals or measurements (X-ray)Alternative

Database-driven synthesis

Speech Synthesis A Brief Motivation and Overview ndash p1517

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 13: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Human speech signals

File Page 1 of 1 Printed Tue Oct 21 192217

11570

minus14753

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time

Live analysis

Speech Synthesis A Brief Motivation and Overview ndash p1017

Human speech signals

bull Non-stationarybull Voiced (almost periodic) ndash unvoiced

(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation

contradiction bull Situation dialogue partner emotion etc

Speech Synthesis A Brief Motivation and Overview ndash p1117

Synthesis concepts

Categories

ArticulatorySynthesis

FormantSynthesis

ConcatenativeSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p1217

Synthesis concepts

Physical Modeling (Artic Synth)

bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise

sources)bull Coupling between source and filter

bull Model parameters rely on (inaccurate)measurements

bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317

Synthesis concepts

Formant synthesis

bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise

(unvoiced) source signal

t

f

voiced

unvoiced

s(t)|H(f)|

a(t)

t

source filter

xvoi(t)

xuv(t)

Speech Synthesis A Brief Motivation and Overview ndash p1417

Synthesis concepts

Articulatory and formant synthesis

Rule-based synthesis

Howeverbull Often the rules have to be derived from natural

recorded signals or measurements (X-ray)Alternative

Database-driven synthesis

Speech Synthesis A Brief Motivation and Overview ndash p1517

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 14: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Human speech signals

bull Non-stationarybull Voiced (almost periodic) ndash unvoiced

(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation

contradiction bull Situation dialogue partner emotion etc

Speech Synthesis A Brief Motivation and Overview ndash p1117

Synthesis concepts

Categories

ArticulatorySynthesis

FormantSynthesis

ConcatenativeSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p1217

Synthesis concepts

Physical Modeling (Artic Synth)

bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise

sources)bull Coupling between source and filter

bull Model parameters rely on (inaccurate)measurements

bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317

Synthesis concepts

Formant synthesis

bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise

(unvoiced) source signal

t

f

voiced

unvoiced

s(t)|H(f)|

a(t)

t

source filter

xvoi(t)

xuv(t)

Speech Synthesis A Brief Motivation and Overview ndash p1417

Synthesis concepts

Articulatory and formant synthesis

Rule-based synthesis

Howeverbull Often the rules have to be derived from natural

recorded signals or measurements (X-ray)Alternative

Database-driven synthesis

Speech Synthesis A Brief Motivation and Overview ndash p1517

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 15: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Synthesis concepts

Categories

ArticulatorySynthesis

FormantSynthesis

ConcatenativeSynthesis

Speech Synthesis A Brief Motivation and Overview ndash p1217

Synthesis concepts

Physical Modeling (Artic Synth)

bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise

sources)bull Coupling between source and filter

bull Model parameters rely on (inaccurate)measurements

bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317

Synthesis concepts

Formant synthesis

bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise

(unvoiced) source signal

t

f

voiced

unvoiced

s(t)|H(f)|

a(t)

t

source filter

xvoi(t)

xuv(t)

Speech Synthesis A Brief Motivation and Overview ndash p1417

Synthesis concepts

Articulatory and formant synthesis

Rule-based synthesis

Howeverbull Often the rules have to be derived from natural

recorded signals or measurements (X-ray)Alternative

Database-driven synthesis

Speech Synthesis A Brief Motivation and Overview ndash p1517

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 16: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Synthesis concepts

Physical Modeling (Artic Synth)

bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise

sources)bull Coupling between source and filter

bull Model parameters rely on (inaccurate)measurements

bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317

Synthesis concepts

Formant synthesis

bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise

(unvoiced) source signal

t

f

voiced

unvoiced

s(t)|H(f)|

a(t)

t

source filter

xvoi(t)

xuv(t)

Speech Synthesis A Brief Motivation and Overview ndash p1417

Synthesis concepts

Articulatory and formant synthesis

Rule-based synthesis

Howeverbull Often the rules have to be derived from natural

recorded signals or measurements (X-ray)Alternative

Database-driven synthesis

Speech Synthesis A Brief Motivation and Overview ndash p1517

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 17: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Synthesis concepts

Formant synthesis

bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise

(unvoiced) source signal

t

f

voiced

unvoiced

s(t)|H(f)|

a(t)

t

source filter

xvoi(t)

xuv(t)

Speech Synthesis A Brief Motivation and Overview ndash p1417

Synthesis concepts

Articulatory and formant synthesis

Rule-based synthesis

Howeverbull Often the rules have to be derived from natural

recorded signals or measurements (X-ray)Alternative

Database-driven synthesis

Speech Synthesis A Brief Motivation and Overview ndash p1517

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 18: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Synthesis concepts

Articulatory and formant synthesis

Rule-based synthesis

Howeverbull Often the rules have to be derived from natural

recorded signals or measurements (X-ray)Alternative

Database-driven synthesis

Speech Synthesis A Brief Motivation and Overview ndash p1517

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 19: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Synthesis concepts

Database-driven (concatenative) synthesis

Generate speech by concatenationof recorded elements

bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations

bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations

Speech Synthesis A Brief Motivation and Overview ndash p1617

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary
Page 20: Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how? Quality: Intelligibility vs. naturalness Audio examples Human speech production/signals

Summary

bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals

bull Synthesis concepts

Speech Synthesis A Brief Motivation and Overview ndash p1717

  • Contents
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Speech synthesis
  • Examples (TTS)
  • Examples (TTS)
  • Human speech production
  • Human speech signals
  • Human speech signals
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Synthesis concepts
  • Summary