Text to Speech in Windows

25
Text to Speech in Windows WGM Text to Speech in Windows 1 WGM – 22 GUSTAVO ROVELO June 17th, 2011

Transcript of Text to Speech in Windows

Page 1: Text to Speech in Windows

Text to Speech in Windows

W G M

Text to Speech in Windows1

W G M – 2 2G U S T A V O R O V E L O

June 17th, 2011

Page 2: Text to Speech in Windows

Index

Introduction.

2

Challenges.History.

Speech Synthesis.p yTypes.

Text To Speech (TTS).What is TTS?What is TTS?Software and Hardware solutions.Software Development Kits.

Mi ft S h APIMicrosoft Speech API.Festival.

Future improvements.

Page 3: Text to Speech in Windows

Challenges3

“Language is the ability to express one’s thoughts by means of a set of signs whether graphical gestural means of a set of signs, whether graphical, gestural, acoustic or even musical. It is a distinctive feature of human beings. Speech is one of its main components.”g p p

Thierry Dutoit. An Introduction toText-To-Speech Synthesis. KluwerA d i P bli h P Academic Publishers. 1997. Pag. 1

Page 4: Text to Speech in Windows

Challenges

Text normalization challenges:

4

gNatural language processing .Decide the phonetic representation of each word (the correct pronunciation)pronunciation).

“My latest project is to learn how to better project my voice”

We try to Imitate the human vocal human vocal apparatus .

Thierry Dutoit. An Introduction to Text-To-SpeechSynthesis. Kluwer Academic Publishers. 1997. Pag. 6

Page 5: Text to Speech in Windows

History

Mechanical prototypes that tried to imitate the human vocal apparatus

5

vocal apparatus.1950

The first computer-based speech synthesis systems were created.1961

Bell Labs. Use an IBM 704 computer to synthesize speech, recreating the song "Daisy Bell“. Thi d i 2001 A S OdThis was used in 2001: A Space Odyssey.

1970Handheld electronics featuring speech synthesis began emerging. 8 d 1980s and 1990s Appear the firsts multilingual language-independent systems, using Natural Language Processing methods.

Page 6: Text to Speech in Windows

Speech synthesis

Artificial production of human speech.

6

p pA computer system used for this purpose is called a speech synthesizer.The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be

d t d understood.

Page 7: Text to Speech in Windows

Types

Concatenative.

7

Concatenation of segments of recorded speech.There are three subtypes of them:

U it l ti th iUnit selection synthesis.Large databases of recorded speech.Each utterance is segmented into:• Phones, shylables, morphems, words, sentences.

Desired target utterance is created at run.Provides the greatest naturalness.gThe problem: • Storage

Page 8: Text to Speech in Windows

Types

Concatenative.

8

Diphone synthesis.Speech database containing all the sound-to-sound transitions occurring in a language.occurring in a language.Results are generally worse than that of unit-selection systems.Small size.

D i ifi th iDomain-specific synthesisConcatenates prerecorded words and phrases.It must consider every variation of each word.

Page 9: Text to Speech in Windows

Types

Formant synthesis.

9

yDoes not use human speech samples at runtime.Output is created using additive synthesis (create a sound by explicitly adding sinusoidal overtones together) and an explicitly adding sinusoidal overtones together) and an acoustic model.Generates artificial and robotic-sounding speech.Can be reliably intelligible avoiding the acoustic glitches.Small size.

Page 10: Text to Speech in Windows

Where can we find speech synthesis systems?

Allows people with visual impairments or reading disabilities to listen to any kind of text

10

disabilities to listen to any kind of text.Clocks.Dictionaries.ATMsATMs.

Helping with proofreading and reducing eyestrain. Li t t t t h di Listen to some text when reading could be dangerous.

GPS.R d t i t l h t iReduce costs in telephone customer services.Give a voice to individuals who couldn’t speak at all.

Steven Hawking.

Page 11: Text to Speech in Windows

What is Text to Speech?

Speech synthesis application.

11

Creates a spoken sound version of the text in a computer document, such as a help file or a Web page page. TTS is often used with voice recognition programs. Current TTS applications include:Current TTS applications include:

Voice-enabled e-mail.Web pages.RSS F dRSS Feeds.OS screen readers.Video games industry.g y

Page 12: Text to Speech in Windows

Text-To-Speech Systems

There are numerous TTS products available:

12

Ivona Text-To-SpeechMicrosoft Speech ServerTextSound 2 0TextSound 2.0SayvoiceFestival

W b t tWeb testLoquendo

Web test 1Web test 2

AcapelaWeb test

Page 13: Text to Speech in Windows

Text-To-Speech Systems

Products involving hardware.

13

gQuick Link Pen from WizCom Technologies:

Scan and read words.

Page 14: Text to Speech in Windows

Software Development Kits

Microsoft Speech API.

14

pYou can download it HERE.Reduces the code required to use speech recognition and text-to speechto-speech.Provides a high-level interface between an application and speech engines. Implements all the low-level details needed to control and manage the real-time operations of various speech engines.The two basic types of SAPI engines are:The two basic types of SAPI engines are:

Text-to-speech (TTS) systems and Speech recognizers.

Page 15: Text to Speech in Windows

Software Development Kits

Microsoft Speech API

15

pAPI for Text-to-Speech

ISpVoice Component Object Model (COM) interface. ISpVoice::Speak to generate speech output from some text data ISpVoice::Speak to generate speech output from some text data. Several methods for changing voice and synthesis properties:• Speaking rate.• Output volume. • Change current speaking voice.

Special controls can also change real-time synthesis properties:• word emphasis, • speaking rate.

Page 16: Text to Speech in Windows

Software Development Kits

Microsoft Speech API

16

pHello world example.

Add the paths to SAPI.h and SAPI.lib files. Directories:Directories:

C:\Program Files\Microsoft SDKs\Windows\v7.1\IncludeType #include <sapi.h> in your applicationC:\Program Files\Microsoft SDKs\Windows\v7.1\LibAdd sapi.lib to additional dependencies list.

Page 17: Text to Speech in Windows

Software Development Kits

Festival TTS.

17

Is multi-lingual:British English. American English.gSpanish.

It offers full text to speech through shell level command interpretershell level command interpreter,as a C++ library,from Java.

Uses the Edinburgh Speech Tools Library for low level Uses the Edinburgh Speech Tools Library for low level architecture.Is free software allowing unrestricted commercial and non-commercial use alike commercial use alike.

Page 18: Text to Speech in Windows

Software Development Kits

Festival TTS

18

Compiling Festival in Windows:Download and Install CygWin Development Tools.Download Festival:Download Festival:

http://www.cstr.ed.ac.uk/downloads/festival/2.1/Unzip files to a convinient location (C:\festival) using tar

d (d i i )command (do not use Winzip).Put the correct value to 'FESTIVAL_HOME' in ‘festival/config/config‘ file.Follow the next steps:

Page 19: Text to Speech in Windows

Software Development Kits

Festival TTSC i h lib i C Wi h ll

19

Creating the library using CygWin shell:1. Get into the Speech_Tools directory and type:

./configure in bash shellmake VCMakefilemake VCMakefilemake dependcp config/vc_config_make_rules-dist config/vc_config_make_rules

l d d2. Get into Festival directory and type:./configuremake VCMakefilemake dependmake dependcp config/vc_config_make_rules-dist config/vc_config_make_rulesmake -C src/modules init_modules.cc

Page 20: Text to Speech in Windows

Software Development Kits

Festival TTS

20

3. Edit:festival/config/vc_config_make_rules SYSTEM_LIBfestival/config/config FESTIVAL HOMEfestival/config/config FESTIVAL_HOME

4. Uisng Visual Studio shell:Execute VCVARSALL.batcd c:\festival\speech_toolsnmake /nologo /FVCMakefilecd c:\festival\festival\ \nmake /nologo /FVCMakefile

Page 21: Text to Speech in Windows

Software Development Kits

Festival TTS.H ll ld D

21

Hello world Demo.Add the path to Festival and Speech tools files.Directories:

C:\festival\speech tools\includeC:\festival\speech_tools\includeC:\festival\festival\src\includeC:\festival\speech_tools\libC:\festival\festival\src\lib

Use #include “festival.h” in your program.Set these additional dependencies:

libFestival.liblib t l liblibestools.liblibestbase.liblibeststring.lib

Page 22: Text to Speech in Windows

Software Development Kits

Festival TTS

22

Hello world DemoIf you get strange errors, try:

Adding these additional dependencies:Adding these additional dependencies:• ws2_32.lib• winmm.lib

I i h lib iIgnoring these libraries:• MSVCRTD.lib• MSVCPRTD.lib

Page 23: Text to Speech in Windows

Future Improvements

Better (or more realistic) synthesized voice engines

23

( ) y gCreating standards for electronic book files

Content publishers, Publishers of TTS software, Manufacturers of digital book display devices,Consumers who read electronic booksConsumers who read electronic books

Page 24: Text to Speech in Windows

Bibliography24

Thierry Dutoit. An Introduction to Text-To-Speech Synthesis. KluwerA d i P bli h P d 6Academic Publishers. 1997. Pag. 1 and 6Speech synthesis http://en.wikipedia.org/wiki/Text-to-speech#cite_note-1

h ( )Text-To-Speech (TTS)http://searchmobilecomputing.techtarget.com/definition/text-to-speech

Wizcom Co. htt // i t h / /h / / /d f lthttp://www.wizcomtech.com/eng/home/a/01/defaultpromo.asp

Microsoft Speech API 5.3http://msdn.microsoft.com/en-us/library/ms720163%28v=VS.85%29.aspx

P j t M t U i T t T S h T h l R G id Project Meet. Using Text-To-Speech Technology Resource Guide. http://www.newbedford.k12.ma.us/edtech_toolkit/students/cast/index.htm

Page 25: Text to Speech in Windows

25

D h Do you have any question?q

Th k Thank you