MobAppDev (Fall 2014): Basics of Speech Synthesis

Post on 14-Dec-2014

67 views 1 download

description

 

Transcript of MobAppDev (Fall 2014): Basics of Speech Synthesis

MobAppDev

Basics of Speech Synthesis

Vladimir Kulyukin

www.vkedco.blogspot.com

Outline

● Background on Speech Synthesis● Text-to-Speech Synthesis (TTS)● TTS on Android● Practical Approaches to TTS: Phonetic Spelling &

Human Recording

Background on

Speech Synthesis

What is Speech Synthesis

● Speech synthesis refers to the artificial production of human speech

● Speech synthesizers take a symbolic linguistic representation and generate audio streams

● The input to speech synthesizers can be a script in some language or a phonetic transcription

Phonetic Transcription

● Phonetic transcription is a formal system that describes how words are pronounced in a specific language (the language's phonology)

● There are many formalisms used for phonetic transcriptions

● International Phonetic Alphabet (IPA) is one of the better known systems of phonetic notation

International Phonetic Alphabet

Phones, Allophones, & Phonemes

● Phone is a unit of speech sound● Allophone is a member of a set of multiple possible

spoken phones to pronounce a single phoneme● In English, /p/ is a phoneme with two allophones: [ph]

and [p]● [ph] is an aspirated allophone of /p/ (e.g., paper)● [p] is an unaspirated allophone of /p/ (e.g., speak)

Intonation, Tone, & Prosody

● Intonation is a function of pitch used to specify the speaker's emotion (joyful, calm, angry, etc.), utterance types (question vs. statement)

● Tone is another element of intonation used to distinguish individual words

● Prosody is the combination of rhythm, stress, and intonation

Text to Speech Synthesis (TTS)

TTS: Text To Speech

● The General Problem: Take a sequence of characters and generate a waveform

● Words are pronounced as a sequence of individual units called phones

● Phonetic alphabets describe how phones are pronounced● Phonological rules specify how phones combine into

speech

TTS Engine Anatomy

● A typical TTS engine consists of three components: text analyzer, language analyzer, waveform generator

● Text Analysis – parse text (after transliterating it if necessary) and identify words and utterances

● Linguistic Analysis – identify phrases and assign prosodies (accents, emphasis, duration, pauses, etc)

● Waveform Generation - generate a waveform from a fully specified linguistic description

TTS Approaches

● Full Automation – machine does everything● Mixed Initiative – human records a set of known

texts; machine learning is used to extract the rules● Human-Based Recording – human records

words/sentences/texts; machine plays them as needed

Android TTS

● Android TTS is an multi-lingual speech synthesis engine

● Android TTS can be used as a black box: text in, speech out

● Android TTS can be parameterized

Starting TTS

● It is best practice to check if TTS is available on the device

● This is done via Intent to check TTS data● If the check is successful, a instance of TTS can be

created● Activity that uses TTS implements OnInitListener

interface

Checking TTS Availability

In Activity.onCreate(): Intent ttsi = new Intent();

ttsi.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);

startActivityForResult(ttsi, CHECK_TTS_REQ);

Creating TTS Instance

In Activity.onActivityResult():switch ( result_code ) {

case TextToSpeech.Engine.CHECK_VOICE_DATA_PASS:

mTTS = new TextToSpeech(this, this);

break;

}

Handling Missing TTS Data

In Activity.onActivityResult():

Intent insti = new Intent();

insti.setAction(TextToSpeech.Engine.ACTION_INSTALL_TTS_DATA);

startActivity(insti);

Handling TTS Unavailability

In Activity.onActivityResult():

switch ( result_code ) {

case TextToSpeech.Engine.CHECK_VOICE_DATA_FAIL:

// Let the user know that TTS is not available – Toast, Log Message, Notification, etc.

}

What To Do When TTS Is Ready

Override Activity.onInit():

@Override

public void onInit(int status) {if ( status == TextToSpeech.SUCCESS ) {

// do something when you know that TTS is available

}

}

Overriding onPause() and onDestroy()

● When your Activity is paused (e.g., it loses focus), have TTS stop synthesizing

● When your Activity is destroyed, shut TTS down to notify Android that the resources can be released and given to other activities or applications

Overriding onPause() and onDestroy()

public void onPause() {super.onPause();

if ( mTTS != null ) mTTS.stop();}

Overriding onPause() and onDestroy()

public void onDestroy() {super.onPause();

if ( mTTS != null ) mTTS.shutdown();}

Overcoming TTS Limitations

● Every TTS engine mispronounces some words● There are two ways of overcoming this limitation:

Phonetic spelling: spell mispronounced words the way they sound, generate waveforms, and associate words with waveforms

Human recording: have a human record mispronounced words and use the files

References & Reading Suggestions

● http://developer.android.com/reference/android/speech/tts/TextToSpeech.html