Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how?...
Transcript of Speech Synthesis: A Brief Motivation and Overview · Summary Speech synthesis: Why, what, how?...
Speech SynthesisA Brief Motivation
and OverviewErhard Rank
Speech Synthesis A Brief Motivation and Overview ndash p117
Contents
bull Speech synthesisbull Examplesbull Human speech productionbull Human speech signals
bull Synthesis conceptsbull Summary
Speech Synthesis A Brief Motivation and Overview ndash p217
Speech synthesis
Why is it of interest
Speech communicationbull is the most natural form of human
communicationbull is not bound to a display (can be used while
driving a carbike working in adverseenvironment etc)
bull many people today have an output terminal onthem (mobile phone)
bull Speech Synthesis A Brief Motivation and Overview ndash p317
Speech synthesis
What is it
Generation of a speech signal by a machine
Fixed (small) number of utterancesrarr Replay recorded speech (eg alarm system)
Arbitrary utterances (ldquounrestrictedrdquo vocabulary)rarr Synthesize speech
ndash from rules using a production modelndash using recordings of natural speech
Speech Synthesis A Brief Motivation and Overview ndash p417
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
Example E-mail reading
Sg Mag Pospischil
Das MRT meeting findet am 16 schon um 8h im Sem217 statt und nicht wie angekuumlndigt um 10 (
MfG Christian Fiala
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
Example Weather forecast
Wien 17 GradGraz 19 GradLinz 13 Grad
rArr
ldquoDas Wetter in Oumlsterreich beruhigtsich wieder Am Nachmittagkoumlnnen allerdings noumlrdlich desAlpenhauptkamms lokaleRegenschauer rdquo
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
User
SpeechRecognition
Under-standing
Pro
cess
ing
ContentGeneration
SpeechSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Quality criteriabull Intelligibilitybull Naturalness
Speech Synthesis A Brief Motivation and Overview ndash p617
Examples (TTS)
bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural
Voices von ATampT
(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt
um viertel vor vier
bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia
Mestre
Speech Synthesis A Brief Motivation and Overview ndash p717
Examples (TTS)
bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen
Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na
dann gehrsquo ich halt nach Hause
bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)
Speech Synthesis A Brief Motivation and Overview ndash p817
Human speech production
Acoustic sourcebull oscillations of the vocal folds voiced sounds
(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced
soundsbull noise bursts due to sudden realease of
closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds
Speech Synthesis A Brief Motivation and Overview ndash p917
Human speech signals
File Page 1 of 1 Printed Tue Oct 21 192217
11570
minus14753
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time
Live analysis
Speech Synthesis A Brief Motivation and Overview ndash p1017
Human speech signals
bull Non-stationarybull Voiced (almost periodic) ndash unvoiced
(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation
contradiction bull Situation dialogue partner emotion etc
Speech Synthesis A Brief Motivation and Overview ndash p1117
Synthesis concepts
Categories
ArticulatorySynthesis
FormantSynthesis
ConcatenativeSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p1217
Synthesis concepts
Physical Modeling (Artic Synth)
bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise
sources)bull Coupling between source and filter
bull Model parameters rely on (inaccurate)measurements
bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317
Synthesis concepts
Formant synthesis
bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise
(unvoiced) source signal
t
f
voiced
unvoiced
s(t)|H(f)|
a(t)
t
source filter
xvoi(t)
xuv(t)
Speech Synthesis A Brief Motivation and Overview ndash p1417
Synthesis concepts
Articulatory and formant synthesis
Rule-based synthesis
Howeverbull Often the rules have to be derived from natural
recorded signals or measurements (X-ray)Alternative
Database-driven synthesis
Speech Synthesis A Brief Motivation and Overview ndash p1517
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Contents
bull Speech synthesisbull Examplesbull Human speech productionbull Human speech signals
bull Synthesis conceptsbull Summary
Speech Synthesis A Brief Motivation and Overview ndash p217
Speech synthesis
Why is it of interest
Speech communicationbull is the most natural form of human
communicationbull is not bound to a display (can be used while
driving a carbike working in adverseenvironment etc)
bull many people today have an output terminal onthem (mobile phone)
bull Speech Synthesis A Brief Motivation and Overview ndash p317
Speech synthesis
What is it
Generation of a speech signal by a machine
Fixed (small) number of utterancesrarr Replay recorded speech (eg alarm system)
Arbitrary utterances (ldquounrestrictedrdquo vocabulary)rarr Synthesize speech
ndash from rules using a production modelndash using recordings of natural speech
Speech Synthesis A Brief Motivation and Overview ndash p417
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
Example E-mail reading
Sg Mag Pospischil
Das MRT meeting findet am 16 schon um 8h im Sem217 statt und nicht wie angekuumlndigt um 10 (
MfG Christian Fiala
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
Example Weather forecast
Wien 17 GradGraz 19 GradLinz 13 Grad
rArr
ldquoDas Wetter in Oumlsterreich beruhigtsich wieder Am Nachmittagkoumlnnen allerdings noumlrdlich desAlpenhauptkamms lokaleRegenschauer rdquo
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
User
SpeechRecognition
Under-standing
Pro
cess
ing
ContentGeneration
SpeechSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Quality criteriabull Intelligibilitybull Naturalness
Speech Synthesis A Brief Motivation and Overview ndash p617
Examples (TTS)
bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural
Voices von ATampT
(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt
um viertel vor vier
bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia
Mestre
Speech Synthesis A Brief Motivation and Overview ndash p717
Examples (TTS)
bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen
Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na
dann gehrsquo ich halt nach Hause
bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)
Speech Synthesis A Brief Motivation and Overview ndash p817
Human speech production
Acoustic sourcebull oscillations of the vocal folds voiced sounds
(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced
soundsbull noise bursts due to sudden realease of
closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds
Speech Synthesis A Brief Motivation and Overview ndash p917
Human speech signals
File Page 1 of 1 Printed Tue Oct 21 192217
11570
minus14753
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time
Live analysis
Speech Synthesis A Brief Motivation and Overview ndash p1017
Human speech signals
bull Non-stationarybull Voiced (almost periodic) ndash unvoiced
(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation
contradiction bull Situation dialogue partner emotion etc
Speech Synthesis A Brief Motivation and Overview ndash p1117
Synthesis concepts
Categories
ArticulatorySynthesis
FormantSynthesis
ConcatenativeSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p1217
Synthesis concepts
Physical Modeling (Artic Synth)
bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise
sources)bull Coupling between source and filter
bull Model parameters rely on (inaccurate)measurements
bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317
Synthesis concepts
Formant synthesis
bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise
(unvoiced) source signal
t
f
voiced
unvoiced
s(t)|H(f)|
a(t)
t
source filter
xvoi(t)
xuv(t)
Speech Synthesis A Brief Motivation and Overview ndash p1417
Synthesis concepts
Articulatory and formant synthesis
Rule-based synthesis
Howeverbull Often the rules have to be derived from natural
recorded signals or measurements (X-ray)Alternative
Database-driven synthesis
Speech Synthesis A Brief Motivation and Overview ndash p1517
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Speech synthesis
Why is it of interest
Speech communicationbull is the most natural form of human
communicationbull is not bound to a display (can be used while
driving a carbike working in adverseenvironment etc)
bull many people today have an output terminal onthem (mobile phone)
bull Speech Synthesis A Brief Motivation and Overview ndash p317
Speech synthesis
What is it
Generation of a speech signal by a machine
Fixed (small) number of utterancesrarr Replay recorded speech (eg alarm system)
Arbitrary utterances (ldquounrestrictedrdquo vocabulary)rarr Synthesize speech
ndash from rules using a production modelndash using recordings of natural speech
Speech Synthesis A Brief Motivation and Overview ndash p417
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
Example E-mail reading
Sg Mag Pospischil
Das MRT meeting findet am 16 schon um 8h im Sem217 statt und nicht wie angekuumlndigt um 10 (
MfG Christian Fiala
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
Example Weather forecast
Wien 17 GradGraz 19 GradLinz 13 Grad
rArr
ldquoDas Wetter in Oumlsterreich beruhigtsich wieder Am Nachmittagkoumlnnen allerdings noumlrdlich desAlpenhauptkamms lokaleRegenschauer rdquo
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
User
SpeechRecognition
Under-standing
Pro
cess
ing
ContentGeneration
SpeechSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Quality criteriabull Intelligibilitybull Naturalness
Speech Synthesis A Brief Motivation and Overview ndash p617
Examples (TTS)
bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural
Voices von ATampT
(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt
um viertel vor vier
bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia
Mestre
Speech Synthesis A Brief Motivation and Overview ndash p717
Examples (TTS)
bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen
Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na
dann gehrsquo ich halt nach Hause
bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)
Speech Synthesis A Brief Motivation and Overview ndash p817
Human speech production
Acoustic sourcebull oscillations of the vocal folds voiced sounds
(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced
soundsbull noise bursts due to sudden realease of
closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds
Speech Synthesis A Brief Motivation and Overview ndash p917
Human speech signals
File Page 1 of 1 Printed Tue Oct 21 192217
11570
minus14753
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time
Live analysis
Speech Synthesis A Brief Motivation and Overview ndash p1017
Human speech signals
bull Non-stationarybull Voiced (almost periodic) ndash unvoiced
(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation
contradiction bull Situation dialogue partner emotion etc
Speech Synthesis A Brief Motivation and Overview ndash p1117
Synthesis concepts
Categories
ArticulatorySynthesis
FormantSynthesis
ConcatenativeSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p1217
Synthesis concepts
Physical Modeling (Artic Synth)
bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise
sources)bull Coupling between source and filter
bull Model parameters rely on (inaccurate)measurements
bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317
Synthesis concepts
Formant synthesis
bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise
(unvoiced) source signal
t
f
voiced
unvoiced
s(t)|H(f)|
a(t)
t
source filter
xvoi(t)
xuv(t)
Speech Synthesis A Brief Motivation and Overview ndash p1417
Synthesis concepts
Articulatory and formant synthesis
Rule-based synthesis
Howeverbull Often the rules have to be derived from natural
recorded signals or measurements (X-ray)Alternative
Database-driven synthesis
Speech Synthesis A Brief Motivation and Overview ndash p1517
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Speech synthesis
What is it
Generation of a speech signal by a machine
Fixed (small) number of utterancesrarr Replay recorded speech (eg alarm system)
Arbitrary utterances (ldquounrestrictedrdquo vocabulary)rarr Synthesize speech
ndash from rules using a production modelndash using recordings of natural speech
Speech Synthesis A Brief Motivation and Overview ndash p417
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
Example E-mail reading
Sg Mag Pospischil
Das MRT meeting findet am 16 schon um 8h im Sem217 statt und nicht wie angekuumlndigt um 10 (
MfG Christian Fiala
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
Example Weather forecast
Wien 17 GradGraz 19 GradLinz 13 Grad
rArr
ldquoDas Wetter in Oumlsterreich beruhigtsich wieder Am Nachmittagkoumlnnen allerdings noumlrdlich desAlpenhauptkamms lokaleRegenschauer rdquo
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
User
SpeechRecognition
Under-standing
Pro
cess
ing
ContentGeneration
SpeechSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Quality criteriabull Intelligibilitybull Naturalness
Speech Synthesis A Brief Motivation and Overview ndash p617
Examples (TTS)
bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural
Voices von ATampT
(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt
um viertel vor vier
bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia
Mestre
Speech Synthesis A Brief Motivation and Overview ndash p717
Examples (TTS)
bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen
Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na
dann gehrsquo ich halt nach Hause
bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)
Speech Synthesis A Brief Motivation and Overview ndash p817
Human speech production
Acoustic sourcebull oscillations of the vocal folds voiced sounds
(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced
soundsbull noise bursts due to sudden realease of
closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds
Speech Synthesis A Brief Motivation and Overview ndash p917
Human speech signals
File Page 1 of 1 Printed Tue Oct 21 192217
11570
minus14753
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time
Live analysis
Speech Synthesis A Brief Motivation and Overview ndash p1017
Human speech signals
bull Non-stationarybull Voiced (almost periodic) ndash unvoiced
(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation
contradiction bull Situation dialogue partner emotion etc
Speech Synthesis A Brief Motivation and Overview ndash p1117
Synthesis concepts
Categories
ArticulatorySynthesis
FormantSynthesis
ConcatenativeSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p1217
Synthesis concepts
Physical Modeling (Artic Synth)
bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise
sources)bull Coupling between source and filter
bull Model parameters rely on (inaccurate)measurements
bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317
Synthesis concepts
Formant synthesis
bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise
(unvoiced) source signal
t
f
voiced
unvoiced
s(t)|H(f)|
a(t)
t
source filter
xvoi(t)
xuv(t)
Speech Synthesis A Brief Motivation and Overview ndash p1417
Synthesis concepts
Articulatory and formant synthesis
Rule-based synthesis
Howeverbull Often the rules have to be derived from natural
recorded signals or measurements (X-ray)Alternative
Database-driven synthesis
Speech Synthesis A Brief Motivation and Overview ndash p1517
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
Example E-mail reading
Sg Mag Pospischil
Das MRT meeting findet am 16 schon um 8h im Sem217 statt und nicht wie angekuumlndigt um 10 (
MfG Christian Fiala
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
Example Weather forecast
Wien 17 GradGraz 19 GradLinz 13 Grad
rArr
ldquoDas Wetter in Oumlsterreich beruhigtsich wieder Am Nachmittagkoumlnnen allerdings noumlrdlich desAlpenhauptkamms lokaleRegenschauer rdquo
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
User
SpeechRecognition
Under-standing
Pro
cess
ing
ContentGeneration
SpeechSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Quality criteriabull Intelligibilitybull Naturalness
Speech Synthesis A Brief Motivation and Overview ndash p617
Examples (TTS)
bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural
Voices von ATampT
(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt
um viertel vor vier
bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia
Mestre
Speech Synthesis A Brief Motivation and Overview ndash p717
Examples (TTS)
bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen
Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na
dann gehrsquo ich halt nach Hause
bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)
Speech Synthesis A Brief Motivation and Overview ndash p817
Human speech production
Acoustic sourcebull oscillations of the vocal folds voiced sounds
(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced
soundsbull noise bursts due to sudden realease of
closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds
Speech Synthesis A Brief Motivation and Overview ndash p917
Human speech signals
File Page 1 of 1 Printed Tue Oct 21 192217
11570
minus14753
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time
Live analysis
Speech Synthesis A Brief Motivation and Overview ndash p1017
Human speech signals
bull Non-stationarybull Voiced (almost periodic) ndash unvoiced
(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation
contradiction bull Situation dialogue partner emotion etc
Speech Synthesis A Brief Motivation and Overview ndash p1117
Synthesis concepts
Categories
ArticulatorySynthesis
FormantSynthesis
ConcatenativeSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p1217
Synthesis concepts
Physical Modeling (Artic Synth)
bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise
sources)bull Coupling between source and filter
bull Model parameters rely on (inaccurate)measurements
bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317
Synthesis concepts
Formant synthesis
bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise
(unvoiced) source signal
t
f
voiced
unvoiced
s(t)|H(f)|
a(t)
t
source filter
xvoi(t)
xuv(t)
Speech Synthesis A Brief Motivation and Overview ndash p1417
Synthesis concepts
Articulatory and formant synthesis
Rule-based synthesis
Howeverbull Often the rules have to be derived from natural
recorded signals or measurements (X-ray)Alternative
Database-driven synthesis
Speech Synthesis A Brief Motivation and Overview ndash p1517
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
Example E-mail reading
Sg Mag Pospischil
Das MRT meeting findet am 16 schon um 8h im Sem217 statt und nicht wie angekuumlndigt um 10 (
MfG Christian Fiala
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
Example Weather forecast
Wien 17 GradGraz 19 GradLinz 13 Grad
rArr
ldquoDas Wetter in Oumlsterreich beruhigtsich wieder Am Nachmittagkoumlnnen allerdings noumlrdlich desAlpenhauptkamms lokaleRegenschauer rdquo
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
User
SpeechRecognition
Under-standing
Pro
cess
ing
ContentGeneration
SpeechSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Quality criteriabull Intelligibilitybull Naturalness
Speech Synthesis A Brief Motivation and Overview ndash p617
Examples (TTS)
bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural
Voices von ATampT
(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt
um viertel vor vier
bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia
Mestre
Speech Synthesis A Brief Motivation and Overview ndash p717
Examples (TTS)
bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen
Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na
dann gehrsquo ich halt nach Hause
bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)
Speech Synthesis A Brief Motivation and Overview ndash p817
Human speech production
Acoustic sourcebull oscillations of the vocal folds voiced sounds
(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced
soundsbull noise bursts due to sudden realease of
closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds
Speech Synthesis A Brief Motivation and Overview ndash p917
Human speech signals
File Page 1 of 1 Printed Tue Oct 21 192217
11570
minus14753
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time
Live analysis
Speech Synthesis A Brief Motivation and Overview ndash p1017
Human speech signals
bull Non-stationarybull Voiced (almost periodic) ndash unvoiced
(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation
contradiction bull Situation dialogue partner emotion etc
Speech Synthesis A Brief Motivation and Overview ndash p1117
Synthesis concepts
Categories
ArticulatorySynthesis
FormantSynthesis
ConcatenativeSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p1217
Synthesis concepts
Physical Modeling (Artic Synth)
bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise
sources)bull Coupling between source and filter
bull Model parameters rely on (inaccurate)measurements
bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317
Synthesis concepts
Formant synthesis
bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise
(unvoiced) source signal
t
f
voiced
unvoiced
s(t)|H(f)|
a(t)
t
source filter
xvoi(t)
xuv(t)
Speech Synthesis A Brief Motivation and Overview ndash p1417
Synthesis concepts
Articulatory and formant synthesis
Rule-based synthesis
Howeverbull Often the rules have to be derived from natural
recorded signals or measurements (X-ray)Alternative
Database-driven synthesis
Speech Synthesis A Brief Motivation and Overview ndash p1517
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
Example Weather forecast
Wien 17 GradGraz 19 GradLinz 13 Grad
rArr
ldquoDas Wetter in Oumlsterreich beruhigtsich wieder Am Nachmittagkoumlnnen allerdings noumlrdlich desAlpenhauptkamms lokaleRegenschauer rdquo
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
User
SpeechRecognition
Under-standing
Pro
cess
ing
ContentGeneration
SpeechSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Quality criteriabull Intelligibilitybull Naturalness
Speech Synthesis A Brief Motivation and Overview ndash p617
Examples (TTS)
bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural
Voices von ATampT
(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt
um viertel vor vier
bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia
Mestre
Speech Synthesis A Brief Motivation and Overview ndash p717
Examples (TTS)
bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen
Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na
dann gehrsquo ich halt nach Hause
bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)
Speech Synthesis A Brief Motivation and Overview ndash p817
Human speech production
Acoustic sourcebull oscillations of the vocal folds voiced sounds
(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced
soundsbull noise bursts due to sudden realease of
closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds
Speech Synthesis A Brief Motivation and Overview ndash p917
Human speech signals
File Page 1 of 1 Printed Tue Oct 21 192217
11570
minus14753
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time
Live analysis
Speech Synthesis A Brief Motivation and Overview ndash p1017
Human speech signals
bull Non-stationarybull Voiced (almost periodic) ndash unvoiced
(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation
contradiction bull Situation dialogue partner emotion etc
Speech Synthesis A Brief Motivation and Overview ndash p1117
Synthesis concepts
Categories
ArticulatorySynthesis
FormantSynthesis
ConcatenativeSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p1217
Synthesis concepts
Physical Modeling (Artic Synth)
bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise
sources)bull Coupling between source and filter
bull Model parameters rely on (inaccurate)measurements
bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317
Synthesis concepts
Formant synthesis
bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise
(unvoiced) source signal
t
f
voiced
unvoiced
s(t)|H(f)|
a(t)
t
source filter
xvoi(t)
xuv(t)
Speech Synthesis A Brief Motivation and Overview ndash p1417
Synthesis concepts
Articulatory and formant synthesis
Rule-based synthesis
Howeverbull Often the rules have to be derived from natural
recorded signals or measurements (X-ray)Alternative
Database-driven synthesis
Speech Synthesis A Brief Motivation and Overview ndash p1517
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Speech synthesis
Flavorsbull Text-to-speech (TTS)bull Conceptcontent-to-speech (CTS)bull Dialogue system
User
SpeechRecognition
Under-standing
Pro
cess
ing
ContentGeneration
SpeechSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p517
Speech synthesis
Quality criteriabull Intelligibilitybull Naturalness
Speech Synthesis A Brief Motivation and Overview ndash p617
Examples (TTS)
bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural
Voices von ATampT
(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt
um viertel vor vier
bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia
Mestre
Speech Synthesis A Brief Motivation and Overview ndash p717
Examples (TTS)
bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen
Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na
dann gehrsquo ich halt nach Hause
bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)
Speech Synthesis A Brief Motivation and Overview ndash p817
Human speech production
Acoustic sourcebull oscillations of the vocal folds voiced sounds
(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced
soundsbull noise bursts due to sudden realease of
closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds
Speech Synthesis A Brief Motivation and Overview ndash p917
Human speech signals
File Page 1 of 1 Printed Tue Oct 21 192217
11570
minus14753
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time
Live analysis
Speech Synthesis A Brief Motivation and Overview ndash p1017
Human speech signals
bull Non-stationarybull Voiced (almost periodic) ndash unvoiced
(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation
contradiction bull Situation dialogue partner emotion etc
Speech Synthesis A Brief Motivation and Overview ndash p1117
Synthesis concepts
Categories
ArticulatorySynthesis
FormantSynthesis
ConcatenativeSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p1217
Synthesis concepts
Physical Modeling (Artic Synth)
bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise
sources)bull Coupling between source and filter
bull Model parameters rely on (inaccurate)measurements
bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317
Synthesis concepts
Formant synthesis
bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise
(unvoiced) source signal
t
f
voiced
unvoiced
s(t)|H(f)|
a(t)
t
source filter
xvoi(t)
xuv(t)
Speech Synthesis A Brief Motivation and Overview ndash p1417
Synthesis concepts
Articulatory and formant synthesis
Rule-based synthesis
Howeverbull Often the rules have to be derived from natural
recorded signals or measurements (X-ray)Alternative
Database-driven synthesis
Speech Synthesis A Brief Motivation and Overview ndash p1517
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Speech synthesis
Quality criteriabull Intelligibilitybull Naturalness
Speech Synthesis A Brief Motivation and Overview ndash p617
Examples (TTS)
bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural
Voices von ATampT
(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt
um viertel vor vier
bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia
Mestre
Speech Synthesis A Brief Motivation and Overview ndash p717
Examples (TTS)
bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen
Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na
dann gehrsquo ich halt nach Hause
bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)
Speech Synthesis A Brief Motivation and Overview ndash p817
Human speech production
Acoustic sourcebull oscillations of the vocal folds voiced sounds
(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced
soundsbull noise bursts due to sudden realease of
closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds
Speech Synthesis A Brief Motivation and Overview ndash p917
Human speech signals
File Page 1 of 1 Printed Tue Oct 21 192217
11570
minus14753
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time
Live analysis
Speech Synthesis A Brief Motivation and Overview ndash p1017
Human speech signals
bull Non-stationarybull Voiced (almost periodic) ndash unvoiced
(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation
contradiction bull Situation dialogue partner emotion etc
Speech Synthesis A Brief Motivation and Overview ndash p1117
Synthesis concepts
Categories
ArticulatorySynthesis
FormantSynthesis
ConcatenativeSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p1217
Synthesis concepts
Physical Modeling (Artic Synth)
bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise
sources)bull Coupling between source and filter
bull Model parameters rely on (inaccurate)measurements
bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317
Synthesis concepts
Formant synthesis
bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise
(unvoiced) source signal
t
f
voiced
unvoiced
s(t)|H(f)|
a(t)
t
source filter
xvoi(t)
xuv(t)
Speech Synthesis A Brief Motivation and Overview ndash p1417
Synthesis concepts
Articulatory and formant synthesis
Rule-based synthesis
Howeverbull Often the rules have to be derived from natural
recorded signals or measurements (X-ray)Alternative
Database-driven synthesis
Speech Synthesis A Brief Motivation and Overview ndash p1517
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Examples (TTS)
bull ATampT nextgen(1) Willkommen beim Sprachsynthesesystem Natural
Voices von ATampT
(2) Meine Oma faumlhrt im Huumlhnerstall Motorrad(3) Der Zug von Grammatneusiedl nach Sopron faumlhrt
um viertel vor vier
bull SVOX(1) Meine Oma faumlhrt im Huumlhnerstall Motorrad(2) Wann geht der naumlchste Zug nach Venezia
Mestre
Speech Synthesis A Brief Motivation and Overview ndash p717
Examples (TTS)
bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen
Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na
dann gehrsquo ich halt nach Hause
bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)
Speech Synthesis A Brief Motivation and Overview ndash p817
Human speech production
Acoustic sourcebull oscillations of the vocal folds voiced sounds
(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced
soundsbull noise bursts due to sudden realease of
closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds
Speech Synthesis A Brief Motivation and Overview ndash p917
Human speech signals
File Page 1 of 1 Printed Tue Oct 21 192217
11570
minus14753
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time
Live analysis
Speech Synthesis A Brief Motivation and Overview ndash p1017
Human speech signals
bull Non-stationarybull Voiced (almost periodic) ndash unvoiced
(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation
contradiction bull Situation dialogue partner emotion etc
Speech Synthesis A Brief Motivation and Overview ndash p1117
Synthesis concepts
Categories
ArticulatorySynthesis
FormantSynthesis
ConcatenativeSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p1217
Synthesis concepts
Physical Modeling (Artic Synth)
bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise
sources)bull Coupling between source and filter
bull Model parameters rely on (inaccurate)measurements
bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317
Synthesis concepts
Formant synthesis
bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise
(unvoiced) source signal
t
f
voiced
unvoiced
s(t)|H(f)|
a(t)
t
source filter
xvoi(t)
xuv(t)
Speech Synthesis A Brief Motivation and Overview ndash p1417
Synthesis concepts
Articulatory and formant synthesis
Rule-based synthesis
Howeverbull Often the rules have to be derived from natural
recorded signals or measurements (X-ray)Alternative
Database-driven synthesis
Speech Synthesis A Brief Motivation and Overview ndash p1517
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Examples (TTS)
bull MBROLA German(1) Die Boumlrse schwacher Wochenstart am deutschen
Aktienmarkt(2) Hallo houmlrt mich jemand Schade keiner da Na
dann gehrsquo ich halt nach Hause
bull Various (emotional synthesis)(1a) (1b) (1c) (1d) (1e)(2a) (2b) (2c) (2d) (2e)(3a) (3b) (3c) (3d)
Speech Synthesis A Brief Motivation and Overview ndash p817
Human speech production
Acoustic sourcebull oscillations of the vocal folds voiced sounds
(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced
soundsbull noise bursts due to sudden realease of
closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds
Speech Synthesis A Brief Motivation and Overview ndash p917
Human speech signals
File Page 1 of 1 Printed Tue Oct 21 192217
11570
minus14753
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time
Live analysis
Speech Synthesis A Brief Motivation and Overview ndash p1017
Human speech signals
bull Non-stationarybull Voiced (almost periodic) ndash unvoiced
(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation
contradiction bull Situation dialogue partner emotion etc
Speech Synthesis A Brief Motivation and Overview ndash p1117
Synthesis concepts
Categories
ArticulatorySynthesis
FormantSynthesis
ConcatenativeSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p1217
Synthesis concepts
Physical Modeling (Artic Synth)
bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise
sources)bull Coupling between source and filter
bull Model parameters rely on (inaccurate)measurements
bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317
Synthesis concepts
Formant synthesis
bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise
(unvoiced) source signal
t
f
voiced
unvoiced
s(t)|H(f)|
a(t)
t
source filter
xvoi(t)
xuv(t)
Speech Synthesis A Brief Motivation and Overview ndash p1417
Synthesis concepts
Articulatory and formant synthesis
Rule-based synthesis
Howeverbull Often the rules have to be derived from natural
recorded signals or measurements (X-ray)Alternative
Database-driven synthesis
Speech Synthesis A Brief Motivation and Overview ndash p1517
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Human speech production
Acoustic sourcebull oscillations of the vocal folds voiced sounds
(vowels or voiced consonants)bull noise due to turbulent air flow unvoiced
soundsbull noise bursts due to sudden realease of
closure plosive soundsDifferent positions of articulators (tongue teethlips )bull different spectral shaping of speech sounds
Speech Synthesis A Brief Motivation and Overview ndash p917
Human speech signals
File Page 1 of 1 Printed Tue Oct 21 192217
11570
minus14753
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time
Live analysis
Speech Synthesis A Brief Motivation and Overview ndash p1017
Human speech signals
bull Non-stationarybull Voiced (almost periodic) ndash unvoiced
(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation
contradiction bull Situation dialogue partner emotion etc
Speech Synthesis A Brief Motivation and Overview ndash p1117
Synthesis concepts
Categories
ArticulatorySynthesis
FormantSynthesis
ConcatenativeSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p1217
Synthesis concepts
Physical Modeling (Artic Synth)
bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise
sources)bull Coupling between source and filter
bull Model parameters rely on (inaccurate)measurements
bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317
Synthesis concepts
Formant synthesis
bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise
(unvoiced) source signal
t
f
voiced
unvoiced
s(t)|H(f)|
a(t)
t
source filter
xvoi(t)
xuv(t)
Speech Synthesis A Brief Motivation and Overview ndash p1417
Synthesis concepts
Articulatory and formant synthesis
Rule-based synthesis
Howeverbull Often the rules have to be derived from natural
recorded signals or measurements (X-ray)Alternative
Database-driven synthesis
Speech Synthesis A Brief Motivation and Overview ndash p1517
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Human speech signals
File Page 1 of 1 Printed Tue Oct 21 192217
11570
minus14753
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17time
Live analysis
Speech Synthesis A Brief Motivation and Overview ndash p1017
Human speech signals
bull Non-stationarybull Voiced (almost periodic) ndash unvoiced
(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation
contradiction bull Situation dialogue partner emotion etc
Speech Synthesis A Brief Motivation and Overview ndash p1117
Synthesis concepts
Categories
ArticulatorySynthesis
FormantSynthesis
ConcatenativeSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p1217
Synthesis concepts
Physical Modeling (Artic Synth)
bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise
sources)bull Coupling between source and filter
bull Model parameters rely on (inaccurate)measurements
bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317
Synthesis concepts
Formant synthesis
bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise
(unvoiced) source signal
t
f
voiced
unvoiced
s(t)|H(f)|
a(t)
t
source filter
xvoi(t)
xuv(t)
Speech Synthesis A Brief Motivation and Overview ndash p1417
Synthesis concepts
Articulatory and formant synthesis
Rule-based synthesis
Howeverbull Often the rules have to be derived from natural
recorded signals or measurements (X-ray)Alternative
Database-driven synthesis
Speech Synthesis A Brief Motivation and Overview ndash p1517
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Human speech signals
bull Non-stationarybull Voiced (almost periodic) ndash unvoiced
(noise-like)bull Realization of phonemes context dependentbull Prosody used for stressing accentuation
contradiction bull Situation dialogue partner emotion etc
Speech Synthesis A Brief Motivation and Overview ndash p1117
Synthesis concepts
Categories
ArticulatorySynthesis
FormantSynthesis
ConcatenativeSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p1217
Synthesis concepts
Physical Modeling (Artic Synth)
bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise
sources)bull Coupling between source and filter
bull Model parameters rely on (inaccurate)measurements
bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317
Synthesis concepts
Formant synthesis
bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise
(unvoiced) source signal
t
f
voiced
unvoiced
s(t)|H(f)|
a(t)
t
source filter
xvoi(t)
xuv(t)
Speech Synthesis A Brief Motivation and Overview ndash p1417
Synthesis concepts
Articulatory and formant synthesis
Rule-based synthesis
Howeverbull Often the rules have to be derived from natural
recorded signals or measurements (X-ray)Alternative
Database-driven synthesis
Speech Synthesis A Brief Motivation and Overview ndash p1517
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Synthesis concepts
Categories
ArticulatorySynthesis
FormantSynthesis
ConcatenativeSynthesis
Speech Synthesis A Brief Motivation and Overview ndash p1217
Synthesis concepts
Physical Modeling (Artic Synth)
bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise
sources)bull Coupling between source and filter
bull Model parameters rely on (inaccurate)measurements
bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317
Synthesis concepts
Formant synthesis
bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise
(unvoiced) source signal
t
f
voiced
unvoiced
s(t)|H(f)|
a(t)
t
source filter
xvoi(t)
xuv(t)
Speech Synthesis A Brief Motivation and Overview ndash p1417
Synthesis concepts
Articulatory and formant synthesis
Rule-based synthesis
Howeverbull Often the rules have to be derived from natural
recorded signals or measurements (X-ray)Alternative
Database-driven synthesis
Speech Synthesis A Brief Motivation and Overview ndash p1517
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Synthesis concepts
Physical Modeling (Artic Synth)
bull Signal generation by a physical model ofhuman speech production systemndash Vocal folds (oscillatory source)ndash Vocal tract (filtering turbulent noise
sources)bull Coupling between source and filter
bull Model parameters rely on (inaccurate)measurements
bull Control of parameter trajectories difficultSpeech Synthesis A Brief Motivation and Overview ndash p1317
Synthesis concepts
Formant synthesis
bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise
(unvoiced) source signal
t
f
voiced
unvoiced
s(t)|H(f)|
a(t)
t
source filter
xvoi(t)
xuv(t)
Speech Synthesis A Brief Motivation and Overview ndash p1417
Synthesis concepts
Articulatory and formant synthesis
Rule-based synthesis
Howeverbull Often the rules have to be derived from natural
recorded signals or measurements (X-ray)Alternative
Database-driven synthesis
Speech Synthesis A Brief Motivation and Overview ndash p1517
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Synthesis concepts
Formant synthesis
bull Establish formants by resonance filtersbull Excited by a periodic (voiced) or noise
(unvoiced) source signal
t
f
voiced
unvoiced
s(t)|H(f)|
a(t)
t
source filter
xvoi(t)
xuv(t)
Speech Synthesis A Brief Motivation and Overview ndash p1417
Synthesis concepts
Articulatory and formant synthesis
Rule-based synthesis
Howeverbull Often the rules have to be derived from natural
recorded signals or measurements (X-ray)Alternative
Database-driven synthesis
Speech Synthesis A Brief Motivation and Overview ndash p1517
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Synthesis concepts
Articulatory and formant synthesis
Rule-based synthesis
Howeverbull Often the rules have to be derived from natural
recorded signals or measurements (X-ray)Alternative
Database-driven synthesis
Speech Synthesis A Brief Motivation and Overview ndash p1517
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Synthesis concepts
Database-driven (concatenative) synthesis
Generate speech by concatenationof recorded elements
bull Dedicated inventoryndash Phonemes di-phones demi-syllables ndash Prosodic manipulations
bull Large inventory Unit selectionndash Selection optimal signal unitsndash Nofew signal manipulations
Speech Synthesis A Brief Motivation and Overview ndash p1617
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-
Summary
bull Speech synthesis Why what howbull Quality Intelligibility vs naturalnessbull Audio examplesbull Human speech productionsignals
bull Synthesis concepts
Speech Synthesis A Brief Motivation and Overview ndash p1717
- Contents
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Speech synthesis
- Examples (TTS)
- Examples (TTS)
- Human speech production
- Human speech signals
- Human speech signals
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Synthesis concepts
- Summary
-