Text to Speech in Windows

Text to Speech in Windows

W G M

Text to Speech in Windows1

W G M – 2 2G U S T A V O R O V E L O

June 17th, 2011

Index

Introduction.

2

Challenges.History.

Speech Synthesis.p yTypes.

Text To Speech (TTS).What is TTS?What is TTS?Software and Hardware solutions.Software Development Kits.

Mi ft S h APIMicrosoft Speech API.Festival.

Future improvements.

Challenges3

“Language is the ability to express one’s thoughts by means of a set of signs whether graphical gestural means of a set of signs, whether graphical, gestural, acoustic or even musical. It is a distinctive feature of human beings. Speech is one of its main components.”g p p

Thierry Dutoit. An Introduction toText-To-Speech Synthesis. KluwerA d i P bli h P Academic Publishers. 1997. Pag. 1

Challenges

Text normalization challenges:

4

gNatural language processing .Decide the phonetic representation of each word (the correct pronunciation)pronunciation).

“My latest project is to learn how to better project my voice”

We try to Imitate the human vocal human vocal apparatus .

Thierry Dutoit. An Introduction to Text-To-SpeechSynthesis. Kluwer Academic Publishers. 1997. Pag. 6

History

Mechanical prototypes that tried to imitate the human vocal apparatus

5

vocal apparatus.1950

The first computer-based speech synthesis systems were created.1961

Bell Labs. Use an IBM 704 computer to synthesize speech, recreating the song "Daisy Bell“. Thi d i 2001 A S OdThis was used in 2001: A Space Odyssey.

1970Handheld electronics featuring speech synthesis began emerging. 8 d 1980s and 1990s Appear the firsts multilingual language-independent systems, using Natural Language Processing methods.

Speech synthesis

Artificial production of human speech.

6

p pA computer system used for this purpose is called a speech synthesizer.The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be

d t d understood.

Types

Concatenative.

7

Concatenation of segments of recorded speech.There are three subtypes of them:

U it l ti th iUnit selection synthesis.Large databases of recorded speech.Each utterance is segmented into:• Phones, shylables, morphems, words, sentences.

Desired target utterance is created at run.Provides the greatest naturalness.gThe problem: • Storage

Types

Concatenative.

8

Diphone synthesis.Speech database containing all the sound-to-sound transitions occurring in a language.occurring in a language.Results are generally worse than that of unit-selection systems.Small size.

D i ifi th iDomain-specific synthesisConcatenates prerecorded words and phrases.It must consider every variation of each word.

Types

Formant synthesis.

9

yDoes not use human speech samples at runtime.Output is created using additive synthesis (create a sound by explicitly adding sinusoidal overtones together) and an explicitly adding sinusoidal overtones together) and an acoustic model.Generates artificial and robotic-sounding speech.Can be reliably intelligible avoiding the acoustic glitches.Small size.

Where can we find speech synthesis systems?

Allows people with visual impairments or reading disabilities to listen to any kind of text

10

disabilities to listen to any kind of text.Clocks.Dictionaries.ATMsATMs.

Helping with proofreading and reducing eyestrain. Li t t t t h di Listen to some text when reading could be dangerous.

GPS.R d t i t l h t iReduce costs in telephone customer services.Give a voice to individuals who couldn’t speak at all.

Steven Hawking.

What is Text to Speech?

Speech synthesis application.

11

Creates a spoken sound version of the text in a computer document, such as a help file or a Web page page. TTS is often used with voice recognition programs. Current TTS applications include:Current TTS applications include:

Voice-enabled e-mail.Web pages.RSS F dRSS Feeds.OS screen readers.Video games industry.g y

Text-To-Speech Systems

There are numerous TTS products available:

12

Ivona Text-To-SpeechMicrosoft Speech ServerTextSound 2 0TextSound 2.0SayvoiceFestival

W b t tWeb testLoquendo

Web test 1Web test 2

AcapelaWeb test

Text-To-Speech Systems

Products involving hardware.

13

gQuick Link Pen from WizCom Technologies:

Scan and read words.

Software Development Kits

Microsoft Speech API.

14

pYou can download it HERE.Reduces the code required to use speech recognition and text-to speechto-speech.Provides a high-level interface between an application and speech engines. Implements all the low-level details needed to control and manage the real-time operations of various speech engines.The two basic types of SAPI engines are:The two basic types of SAPI engines are:

Text-to-speech (TTS) systems and Speech recognizers.


Microsoft Speech API

15

pAPI for Text-to-Speech

ISpVoice Component Object Model (COM) interface. ISpVoice::Speak to generate speech output from some text data ISpVoice::Speak to generate speech output from some text data. Several methods for changing voice and synthesis properties:• Speaking rate.• Output volume. • Change current speaking voice.

Special controls can also change real-time synthesis properties:• word emphasis, • speaking rate.


Microsoft Speech API

16

pHello world example.

Add the paths to SAPI.h and SAPI.lib files. Directories:Directories:

C:\Program Files\Microsoft SDKs\Windows\v7.1\IncludeType #include <sapi.h> in your applicationC:\Program Files\Microsoft SDKs\Windows\v7.1\LibAdd sapi.lib to additional dependencies list.


Festival TTS.

17

Is multi-lingual:British English. American English.gSpanish.

It offers full text to speech through shell level command interpretershell level command interpreter,as a C++ library,from Java.

Uses the Edinburgh Speech Tools Library for low level Uses the Edinburgh Speech Tools Library for low level architecture.Is free software allowing unrestricted commercial and non-commercial use alike commercial use alike.


Festival TTS

18

Compiling Festival in Windows:Download and Install CygWin Development Tools.Download Festival:Download Festival:

http://www.cstr.ed.ac.uk/downloads/festival/2.1/Unzip files to a convinient location (C:\festival) using tar

d (d i i )command (do not use Winzip).Put the correct value to 'FESTIVAL_HOME' in ‘festival/config/config‘ file.Follow the next steps:


Festival TTSC i h lib i C Wi h ll

19

Creating the library using CygWin shell:1. Get into the Speech_Tools directory and type:

./configure in bash shellmake VCMakefilemake VCMakefilemake dependcp config/vc_config_make_rules-dist config/vc_config_make_rules

l d d2. Get into Festival directory and type:./configuremake VCMakefilemake dependmake dependcp config/vc_config_make_rules-dist config/vc_config_make_rulesmake -C src/modules init_modules.cc


Festival TTS

20

3. Edit:festival/config/vc_config_make_rules SYSTEM_LIBfestival/config/config FESTIVAL HOMEfestival/config/config FESTIVAL_HOME

4. Uisng Visual Studio shell:Execute VCVARSALL.batcd c:\festival\speech_toolsnmake /nologo /FVCMakefilecd c:\festival\festival\ \nmake /nologo /FVCMakefile


Festival TTS.H ll ld D

21

Hello world Demo.Add the path to Festival and Speech tools files.Directories:

C:\festival\speech tools\includeC:\festival\speech_tools\includeC:\festival\festival\src\includeC:\festival\speech_tools\libC:\festival\festival\src\lib

Use #include “festival.h” in your program.Set these additional dependencies:

libFestival.liblib t l liblibestools.liblibestbase.liblibeststring.lib


Festival TTS

22

Hello world DemoIf you get strange errors, try:

Adding these additional dependencies:Adding these additional dependencies:• ws2_32.lib• winmm.lib

I i h lib iIgnoring these libraries:• MSVCRTD.lib• MSVCPRTD.lib

Future Improvements

Better (or more realistic) synthesized voice engines

23

( ) y gCreating standards for electronic book files

Content publishers, Publishers of TTS software, Manufacturers of digital book display devices,Consumers who read electronic booksConsumers who read electronic books

Bibliography24

Thierry Dutoit. An Introduction to Text-To-Speech Synthesis. KluwerA d i P bli h P d 6Academic Publishers. 1997. Pag. 1 and 6Speech synthesis http://en.wikipedia.org/wiki/Text-to-speech#cite_note-1

h ( )Text-To-Speech (TTS)http://searchmobilecomputing.techtarget.com/definition/text-to-speech

Wizcom Co. htt // i t h / /h / / /d f lthttp://www.wizcomtech.com/eng/home/a/01/defaultpromo.asp

Microsoft Speech API 5.3http://msdn.microsoft.com/en-us/library/ms720163%28v=VS.85%29.aspx

P j t M t U i T t T S h T h l R G id Project Meet. Using Text-To-Speech Technology Resource Guide. http://www.newbedford.k12.ma.us/edtech_toolkit/students/cast/index.htm

25

D h Do you have any question?q

Th k Thank you

Text to Speech in Windows

Education

Transcript of Text to Speech in Windows