An Introduction to S3ML
description
Transcript of An Introduction to S3ML
An Introduction to S3ML
Beijing InfoQuick SinoVoice Speech Technology Corp.
CHEN Ming, LV Shinan, LI Xiulin
Outline
Background PinYin Support <say-as> Definition Domain Support Conclusion
Background SSML
Speech Synthesis Markup Language http://www.w3.org/TR/speech-synthesis/ Now is W3C Recommendation
SinoVoice Famous Speech Technology and Service Provid
er Leading Chinese TTS Technology and Product Deploy 1000+ Real Systems
Background S3ML (SinoVoice SSML)
Since the launching of jTTS 4.0, March 2004 Based on SSML Specification Defines some extensions aiming at Chinese TTS Defines the detail of some elements which SSML
does not define precisely Provide maximum compatibility with newest SS
ML version
PinYin Support PinYin
Phoneme annotation method for Chinese characters
<phoneme> in SSML The phoneme element provides a phone
mic/phonetic pronunciation for the contained text.
Two attributes: alphabet and ph
PinYin Support alphabet
The alphabet attribute is an optional attribute that specifies the phonemic/phonetic alphabet.
Use ‘py’ as value of ‘alphabet’ to specify that PinYin will be used
ph The ph attribute is a required attribute that spe
cifies the phoneme/phone string. Use PinYin string as value of ‘ph’
PinYin Support Example
More about PinYin string Conformed to “Chinese Mandarin PinYin Specification“ Series of PinYin for several characters Tone information
1~4: high flat, rising, diving and falling tone 0, 5: light tone
<phoneme alphabet="py" ph="zha1"> 查 </phoneme> 良镛<phoneme alphabet="py" ph="zha1 liang2yong1">查良镛 </phoneme> 先生
PinYin support When PinYin string is included in normal tex
t?
Comparing with CSSML
We think <phoneme> is not for such purpose, <say-as> is more suitable
We think <phoneme> extension in S3ML is more compatible with SSML
Next station is <say-as interpret-as="phoneme" format="py">di4 tan2</say-as>
<phoneme lang=“zh-cn”>zha1</phoneme> 良镛他姓 <phoneme py=“zha1”> 查 </phoneme>
<say-as> Definition The detail of <say-as> element
When SinoVoice define S3ML, the detail values of the attributes of this element is not defined in SSML.
Now, “SSML 1.0 say-as attribute values” is proposed but it is still on progress http://www.w3.org/TR/2005/NOTE-ssml-
sayas-20050526/ SinoVoice will support this proposal, so I will
only talk about some additional values
<say-as> Definition Name and address, especially
person name because of the polyphone Chinese characters
Math, some mathematic expression is confused with other info
<say-as interpret-as=“name” format=“person”> 张朝阳 </say-as><say-as interpret-as=“address”> 朝阳区 </say-as>
<say-as interpret-as=“math” >2005-12-13</say-as><say-as interpret-as=“math”>+8610-62972997</say-as>
<say-as> Definition
Net address
Phoneme, useful for character/phoneme mixed text
<say-as interpret-as="net" format="email">[email protected]</say-as><say-as interpret-as="net" format="url"> http://www.sinovoice.com.cn</say-as>
The pronunciation of ‘tomato’ is <say-as interpret-as="phoneme" format="ipa">tɒmɑtoʊ</say-as> Next station is <say-as interpret-as="phoneme" format="py">di4 tan2</say-as>
Domain Support Important for real system
Customized TTS is used more and more popular
Better voice quality than general version One possibility in SSML
Use <voice> element and define special values of ‘name’ attribute
But it is not natural because it is normal to support several different domains by using a same name (voice library)
Domain Support
<domain> element The ‘name’ attribute is required to
specify the customized TTS package used The value of ‘name’ attribute will be a
vendor-specific name <domain> will not change voice
If a voice library does not support this domain, this element will be just ignored.
Domain Support
If we want TTS System select the best voice for this domain automatically Extended ‘domain’ attribute of
<voice> ‘domain’ is still in least priority
<domain name=“weather”>今天白天 ,晴转多云,最高温度 26 度
</domain> <voice domain=“weather”>
今天白天 ,晴转多云,最高温度 26 度</voice>
Conclusion Summarize extension of S3ML
<phoneme alphabet=“py” ph=“…”> <say-as interpret-as=“...”>
name / address / math / phoneme / net <domain name=“…”> <voice domain=“…”>
We hope it will be helpful to define the standard for internationalizing SSML
Thank You!