MobAppDev (Fall 2014): Basics of Speech Synthesis

MobAppDev

Basics of Speech Synthesis

Vladimir Kulyukin

www.vkedco.blogspot.com

Outline

● Background on Speech Synthesis● Text-to-Speech Synthesis (TTS)● TTS on Android● Practical Approaches to TTS: Phonetic Spelling &

Human Recording

Background on

Speech Synthesis

What is Speech Synthesis

● Speech synthesis refers to the artificial production of human speech

● Speech synthesizers take a symbolic linguistic representation and generate audio streams

● The input to speech synthesizers can be a script in some language or a phonetic transcription

Phonetic Transcription

● Phonetic transcription is a formal system that describes how words are pronounced in a specific language (the language's phonology)

● There are many formalisms used for phonetic transcriptions

● International Phonetic Alphabet (IPA) is one of the better known systems of phonetic notation

International Phonetic Alphabet

Phones, Allophones, & Phonemes

● Phone is a unit of speech sound● Allophone is a member of a set of multiple possible

spoken phones to pronounce a single phoneme● In English, /p/ is a phoneme with two allophones: [ph]

and [p]● [ph] is an aspirated allophone of /p/ (e.g., paper)● [p] is an unaspirated allophone of /p/ (e.g., speak)

Intonation, Tone, & Prosody

● Intonation is a function of pitch used to specify the speaker's emotion (joyful, calm, angry, etc.), utterance types (question vs. statement)

● Tone is another element of intonation used to distinguish individual words

● Prosody is the combination of rhythm, stress, and intonation

Text to Speech Synthesis (TTS)

TTS: Text To Speech

● The General Problem: Take a sequence of characters and generate a waveform

● Words are pronounced as a sequence of individual units called phones

● Phonetic alphabets describe how phones are pronounced● Phonological rules specify how phones combine into

speech

TTS Engine Anatomy

● A typical TTS engine consists of three components: text analyzer, language analyzer, waveform generator

● Text Analysis – parse text (after transliterating it if necessary) and identify words and utterances

● Linguistic Analysis – identify phrases and assign prosodies (accents, emphasis, duration, pauses, etc)

● Waveform Generation - generate a waveform from a fully specified linguistic description

TTS Approaches

● Full Automation – machine does everything● Mixed Initiative – human records a set of known

texts; machine learning is used to extract the rules● Human-Based Recording – human records

words/sentences/texts; machine plays them as needed

Android TTS

● Android TTS is an multi-lingual speech synthesis engine

● Android TTS can be used as a black box: text in, speech out

● Android TTS can be parameterized

Starting TTS

● It is best practice to check if TTS is available on the device

● This is done via Intent to check TTS data● If the check is successful, a instance of TTS can be

created● Activity that uses TTS implements OnInitListener

interface

Checking TTS Availability

In Activity.onCreate(): Intent ttsi = new Intent();

ttsi.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);

startActivityForResult(ttsi, CHECK_TTS_REQ);

Creating TTS Instance

In Activity.onActivityResult():switch ( result_code ) {

case TextToSpeech.Engine.CHECK_VOICE_DATA_PASS:

mTTS = new TextToSpeech(this, this);

break;

Handling Missing TTS Data

In Activity.onActivityResult():

Intent insti = new Intent();

insti.setAction(TextToSpeech.Engine.ACTION_INSTALL_TTS_DATA);

startActivity(insti);

Handling TTS Unavailability

In Activity.onActivityResult():

switch ( result_code ) {

case TextToSpeech.Engine.CHECK_VOICE_DATA_FAIL:

// Let the user know that TTS is not available – Toast, Log Message, Notification, etc.

What To Do When TTS Is Ready

Override Activity.onInit():

@Override

public void onInit(int status) {if ( status == TextToSpeech.SUCCESS ) {

// do something when you know that TTS is available

Overriding onPause() and onDestroy()

● When your Activity is paused (e.g., it loses focus), have TTS stop synthesizing

● When your Activity is destroyed, shut TTS down to notify Android that the resources can be released and given to other activities or applications

public void onPause() {super.onPause();

if ( mTTS != null ) mTTS.stop();}

public void onDestroy() {super.onPause();

if ( mTTS != null ) mTTS.shutdown();}

Overcoming TTS Limitations

● Every TTS engine mispronounces some words● There are two ways of overcoming this limitation:

Phonetic spelling: spell mispronounced words the way they sound, generate waveforms, and associate words with waveforms

Human recording: have a human record mispronounced words and use the files

References & Reading Suggestions

● http://developer.android.com/reference/android/speech/tts/TextToSpeech.html

MobAppDev (Fall 2014): Basics of Speech Synthesis

Science

Transcript of MobAppDev (Fall 2014): Basics of Speech Synthesis

1 6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone Sequence To Speech –Articulatory Approaches –Concatenative.

Introduction to Speech Synthesis

Multimodal speech synthesis

Low-Cost Portable Text Recognition and Speech Synthesis ... · Low-cost Portable Text Recognition and Speech Synthesis with ... portable text recognition and speech synthesis ...

Speech Signal Processing - Phil Garner · Speech Signal Processing Milos Cernak Introduction Speech synthesis signal processing Analysis Speech parameter generation Re-synthesis Synthesis

As speech recognition and speech synthesis technology ...steveng/prosody/thesis.doc · Web viewAs speech recognition and speech synthesis technology continue to improve, it becomes

Speech Synthesis Using Damped Sinusoidshillenbr/Papers/DampedSinewaveSynthesizer.pdfHillenbrand & Houde: Speech Synthesis Using Damped Sinusoids 3 for voiced speech. Natural-sounding

Statistical Dialogue Management Speech Synthesis for ...projects.ict.usc.edu/nld/cs599s13/LectureNotes/cs599s13dialogue2... · Statistical Dialogue Management ± Speech Synthesis

1 Speech Synthesis User friendly machine must have complete voice communication abilities Voice communication involves Speech synthesis Speech recognition.

8 Speech Synthesis

Speech Synthesis Waveform generation 2 - Speech at · PDF file · 2008-10-01Speech Synthesis Waveform generation 2. Speech Synthesis Text Analysis ... HMM-generation based synthesis

Fine-tune Speech Synthesis Using Text-to-Speech Markup

Speech synthesis in

MobAppDev (Fall 2013): Android Bound Services

Festival Speech Synthesis System

Improvements in Speech Synthesis

Speech Enhancement Using Synthesis and Adaptive Techniqueszduan/teaching/ece477/...B. Speech Synthesis and Recognition Techniques Many approaches exist to solving speech synthesis.

The Use of Speech Synthesis in Exploring DifferentThe use of speech synthesis in exploring different

Deep Learning in Speech Synthesis · Deep Learning in Speech Synthesis Heiga Zen Google August 31st, 2013 ... statistical parametric speech synthesis Experiments Conclusion. Text-to-speech

speech synthesis ic