An Introduction to S3ML

16
An Introduction to S3ML Beijing InfoQuick SinoVoice S peech Technology Corp. CHEN Ming, LV Shinan, LI Xiulin

description

An Introduction to S3ML. Beijing InfoQuick SinoVoice Speech Technology Corp. CHEN Ming, LV Shinan, LI Xiulin. Outline. Background PinYin Support Definition Domain Support Conclusion. Background. SSML Speech Synthesis Markup Language http://www.w3.org/TR/speech-synthesis/ - PowerPoint PPT Presentation

Transcript of An Introduction to S3ML

Page 1: An Introduction to S3ML

An Introduction to S3ML

Beijing InfoQuick SinoVoice Speech Technology Corp.

CHEN Ming, LV Shinan, LI Xiulin

Page 2: An Introduction to S3ML

Outline

Background PinYin Support <say-as> Definition Domain Support Conclusion

Page 3: An Introduction to S3ML

Background SSML

Speech Synthesis Markup Language http://www.w3.org/TR/speech-synthesis/ Now is W3C Recommendation

SinoVoice Famous Speech Technology and Service Provid

er Leading Chinese TTS Technology and Product Deploy 1000+ Real Systems

Page 4: An Introduction to S3ML

Background S3ML (SinoVoice SSML)

Since the launching of jTTS 4.0, March 2004 Based on SSML Specification Defines some extensions aiming at Chinese TTS Defines the detail of some elements which SSML

does not define precisely Provide maximum compatibility with newest SS

ML version

Page 5: An Introduction to S3ML

PinYin Support PinYin

Phoneme annotation method for Chinese characters

<phoneme> in SSML The phoneme element provides a phone

mic/phonetic pronunciation for the contained text.

Two attributes: alphabet and ph

Page 6: An Introduction to S3ML

PinYin Support alphabet

The alphabet attribute is an optional attribute that specifies the phonemic/phonetic alphabet.

Use ‘py’ as value of ‘alphabet’ to specify that PinYin will be used

ph The ph attribute is a required attribute that spe

cifies the phoneme/phone string. Use PinYin string as value of ‘ph’

Page 7: An Introduction to S3ML

PinYin Support Example

More about PinYin string Conformed to “Chinese Mandarin PinYin Specification“ Series of PinYin for several characters Tone information

1~4: high flat, rising, diving and falling tone 0, 5: light tone

<phoneme alphabet="py" ph="zha1"> 查 </phoneme> 良镛<phoneme alphabet="py" ph="zha1 liang2yong1">查良镛 </phoneme> 先生

Page 8: An Introduction to S3ML

PinYin support When PinYin string is included in normal tex

t?

Comparing with CSSML

We think <phoneme> is not for such purpose, <say-as> is more suitable

We think <phoneme> extension in S3ML is more compatible with SSML

Next station is <say-as interpret-as="phoneme" format="py">di4 tan2</say-as>

<phoneme lang=“zh-cn”>zha1</phoneme> 良镛他姓 <phoneme py=“zha1”> 查 </phoneme>

Page 9: An Introduction to S3ML

<say-as> Definition The detail of <say-as> element

When SinoVoice define S3ML, the detail values of the attributes of this element is not defined in SSML.

Now, “SSML 1.0 say-as attribute values” is proposed but it is still on progress http://www.w3.org/TR/2005/NOTE-ssml-

sayas-20050526/ SinoVoice will support this proposal, so I will

only talk about some additional values

Page 10: An Introduction to S3ML

<say-as> Definition Name and address, especially

person name because of the polyphone Chinese characters

Math, some mathematic expression is confused with other info

<say-as interpret-as=“name” format=“person”> 张朝阳 </say-as><say-as interpret-as=“address”> 朝阳区 </say-as>

<say-as interpret-as=“math” >2005-12-13</say-as><say-as interpret-as=“math”>+8610-62972997</say-as>

Page 11: An Introduction to S3ML

<say-as> Definition

Net address

Phoneme, useful for character/phoneme mixed text

<say-as interpret-as="net" format="email">[email protected]</say-as><say-as interpret-as="net" format="url"> http://www.sinovoice.com.cn</say-as>

The pronunciation of ‘tomato’ is <say-as interpret-as="phoneme" format="ipa">t&#x252;m&#x251;to&#x28A;</say-as> Next station is <say-as interpret-as="phoneme" format="py">di4 tan2</say-as>

Page 12: An Introduction to S3ML

Domain Support Important for real system

Customized TTS is used more and more popular

Better voice quality than general version One possibility in SSML

Use <voice> element and define special values of ‘name’ attribute

But it is not natural because it is normal to support several different domains by using a same name (voice library)

Page 13: An Introduction to S3ML

Domain Support

<domain> element The ‘name’ attribute is required to

specify the customized TTS package used The value of ‘name’ attribute will be a

vendor-specific name <domain> will not change voice

If a voice library does not support this domain, this element will be just ignored.

Page 14: An Introduction to S3ML

Domain Support

If we want TTS System select the best voice for this domain automatically Extended ‘domain’ attribute of

<voice> ‘domain’ is still in least priority

<domain name=“weather”>今天白天 ,晴转多云,最高温度 26 度

</domain> <voice domain=“weather”>

今天白天 ,晴转多云,最高温度 26 度</voice>

Page 15: An Introduction to S3ML

Conclusion Summarize extension of S3ML

<phoneme alphabet=“py” ph=“…”> <say-as interpret-as=“...”>

name / address / math / phoneme / net <domain name=“…”> <voice domain=“…”>

We hope it will be helpful to define the standard for internationalizing SSML

Page 16: An Introduction to S3ML

Thank You!