Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He...

78
(c) 2007 Larson Technical Services 1 VoiceXML Overview James A. Larson Intel Corporation [email protected]

Transcript of Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He...

Page 1: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 1

VoiceXML Overview James A. Larson Intel Corporation

[email protected]

Page 2: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 2

Outline

•  Motivation for VoiceXML •  W3C Speech Interface Framework

Languages •  Dialog—VoiceXML 2.0 •  Speech Synthesis—SSML •  Grammars—SRGS •  Semantic Interpretation—SI •  VoiceXML 2.1

Page 3: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 3

VoiceXML in the Marketplace

•  VoiceXML 2.0 is now ratified as a Recommendation (e.g., official standard) by the W3C

•  Hundreds of millions of VoiceXML calls are answered every day

VoiceXML is the standard for building speech-enabled applications

Page 4: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 4

Motivation for Speech Applications

• Users access Web sites from any telephone, anywhere, any time.

• Speaking and listening are the natural usage modes for phones.

Page 5: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 5

Strength of VoiceXML Applications

•  Traditional system-directed dialogs for novice users

•  Mixed initiative dialogs for experienced users

•  Novice users smoothly become experienced users at their own pace

Page 6: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 6

Limitations of VoiceXML Applications •  No special analysis of speech input

– Not suitable for training speech skills—Reading, ESL, singing, etc.

•  VUI conversational bandwidth is slower than GUI conversational bandwidth – Using a VUI is like drinking from Lake

Superior with a straw

Page 7: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 7

Exercise 1

•  Name or describe a speech application you could use at work.

•  Name or describe a speech application you or family member can use at home.

Page 8: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 8

XML •  XML = eXtensible Markup Language •  Elements are surrounded by tags

<prompt>Welcome to the voice system </prompt> •  Elements may be nested

<prompt> Welcome to Ajax Travel <break/>

we have the cheapest fares </prompt>

•  Elements may have attributes <choice next="#boat"> <grammar type="application/grammar+xml" version="1.0"

root = "by_boat" src = “boat.grxml”> •  Because “<”, “>”, and “&” have special meanings

“&lt;” in place of “<” “&gt;” in place of “>” “&amp;” in place of “&”.

Page 9: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 9

Outline

•  Motivation for VoiceXML •  W3C Speech Interface Framework

Languages •  Dialog—VoiceXML 2.0 •  Speech Synthesis—SSML •  Grammars—SRGS •  Semantic Interpretation—SI •  VoiceXML 2.1

Page 10: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 10

DB

Multimedia Files

Audio Files

Web Server

HTML Scripts

VoiceXML Scripts

Grammars

Speech Server/Gateway

Web Browser

Capture Voice ASR

DTMF Replay Audio

TTS

Database Server

Voice Browser

Documents

Page 11: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 11

W3C Speech Interface Framework

Speech Synthesis

Grammar Other

VoiceXML 2.0

Call Control

Semantic Interpretation

Page 12: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 12

Status of W3C Speech Interface Languages

Voice XML 2.0

Grammar (SRGS)

Synthesis (SSML)

Call Control

(CCXML)

Semantic Interpret-

Ration (SISR)

Recommendation

Proposed Recommendation

Candidate Recommendation

Last Call Working Draft

Requirements

Working Draft

Voice XML 2.1

V3

Page 13: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 13

Outline

•  Motivation for VoiceXML •  W3C Speech Interface Framework

Languages •  Dialog—VoiceXML 2.0 •  Speech Synthesis—SSML •  Grammars—SRGS •  Semantic Interpretation—SI •  VoiceXML 2.1

Page 14: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 14

Example of VoiceXML 2.0 Fragment <?xml version="1.0"?> <vxml version="2.0"> <form> … <field name = "account"> <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode =

"voice"> <rule id = “account_type"> <one-of> <item> savings </item>

<item> checking </item> <item> CD </item> <item> certificate of deposit <tag>$ = “CD”<tag> </item>

</one-of> </rule>

</grammar> </field> …. </form> … </vxml>

Dialog Language (VocieXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification (SRGS) Semantic Interpretation (SI)

Page 15: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 15

Example of VoiceXML 2.0 Fragment <?xml version="1.0"?> <vxml version="2.0"> <form> … <field name = "account"> <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode =

"voice"> <rule id = “account_type"> <one-of> <item> savings </item>

<item> checking </item> <item> CD </item> <item> certificate of deposit <tag>$ = “CD”<tag> </item>

</one-of> </rule>

</grammar> </field> …. </form> … </vxml>

Dialog Language (VocieXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification (SRGS) Semantic Interpretation (SI)

Page 16: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 16

Example of VoiceXML 2.0 Fragment <?xml version="1.0"?> <vxml version="2.0"> <form> … <field name = "account"> <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode = "voice">

<rule id = “account_type"> <one-of> <item> savings </item>

<item> checking </item> <item> CD </item> <item> certificate of deposit <tag>$ = “CD”<tag> </item>

</one-of> </rule>

</grammar> </field> …. </form> … </vxml>

Dialog Language (VocieXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification (SRGS) Semantic Interpretation (SI)

Page 17: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 17

Example of VoiceXML 2.0 Fragment <?xml version="1.0"?> <vxml version="2.0"> <form> … <field name = "account"> <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode = "voice">

<rule id = “account_type"> <one-of> <item> savings </item>

<item> checking </item> <item> CD </item> <item> certificate of deposit <tag>new.account = “CD”<tag> </item>

</one-of> </rule>

</grammar> </field> …. </form> … </vxml>

Dialog Language (VocieXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification (SRGS) Semantic Interpretation (SI)

Page 18: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 18

VoiceXML 2.0 features •  Menus, forms, sub-dialogs

–  <menu>, <form>, <subdialog>

•  Inputs –  Speech recognition <grammar> –  Recording <record> –  Keypad <grammar mode=“dtmf”>

•  Output –  Audio files <audio> –  Text-to-speech <prompt>

•  Variables –  <var> <script> <assign>

•  Events –  <nomatch>, <noinput>, <help>,

<catch>, <throw> •  Transition and submission

–  <goto>, <submit> –  Telephony

–  Connection control –  <transfer>, <disconnect>

–  Telephony information –  Platform

–  Objects –  Performance

–  Fetch

Page 19: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 19

Typical Form Fill-In <form> <block> <prompt>Welcome to the electronic payment system.</prompt> </block> <field name="card_number">

<prompt> Please enter your credit card number? </prompt> <grammar src=“http://www.ajax.com/credit_card_number.grxml"/>

</field> <field name="date">

<prompt>Please enter your expiration date </prompt> <grammar src=“http://www.ajax.com/credit_card_date.grxml"/>

</field> </form>

Page 20: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 20

Exercise 2 Capture “birth date”

<form> <block> <prompt> _____________________ </prompt> </block> <field name = "month">

<prompt> _______________________________</prompt> <grammar src=“http://www.ajax.com/month.grxml"/>

</field> <field name = "day">

<prompt> ______________________________ </prompt> <grammar src=“http://www.ajax.com/day.grxml"/>

</field> <field name = "year"> <prompt> ______________________________ </prompt>

<grammar src=“http://www.ajax.com/year.grxml"/> </field> </form>

Page 21: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 21

Event Handlers •  Deal with exceptional or error conditions •  Control mechanism for dialog turn retries

–  <catch event=“noinput”> … </catch> –  <catch event=“nomatch” … </catch> –  <catch event=“help”> … </catch>

•  Shorthand notation available –  <noinput> … </noinput>, etc.

•  Scoped according to where they occur –  <form>, <field>, etc.

Page 22: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 22

Adding Event Handlers

<form> <prompt> When were you born? </prompt> <field name = "month"> <catch event=“noinput”> ….. </catch> <catch event=“nomatch> ….. </catch>

<prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/>

</field> ….. </form>

Page 23: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 23

Adding Event Handlers

<form> <prompt> When were you born? </prompt> <field name = "month"> <catch event=“noinput”> ….. </catch> <catch event=“nomatch> ….. </catch>

<prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/>

</field> ….. </form>

Page 24: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 24

Adding Event Handlers

<form> <prompt> When were you born? </prompt> <field name = "month"> <catch event=“noinput”> ….. </catch> <catch event=“nomatch> ….. </catch>

<prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/>

</field> ….. </form>

Page 25: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 25

Default Event Handlers

<catch event = "help">

<prompt> Sorry, no help is available. </prompt> </catch>

<catch event = "nomatch"> <prompt> I did not understand, please try again </prompt> </catch>

<catch event = "noinput"> <prompt> I did not hear anything, please speak again </prompt> </catch>

Page 26: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 26

Exercise 3 Write event handlers for the month field

<catch event = "help">

<prompt> ____________________ </prompt> </catch>

<catch event = "nomatch"> <prompt> __________________________ </prompt> </catch>

<catch event = "noinput"> <prompt> ___________________________________ </prompt> </catch>

Page 27: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 27

Outline

•  Motivation for VoiceXML •  W3C Speech Interface Framework

Languages •  Dialog—VoiceXML 2.0 •  Speech Synthesis—SSML •  Grammars—SRGS •  Semantic Interpretation—SI •  VoiceXML 2.1

Page 28: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 28

Speech Synthesis ML

Structure Analysis

Text Normali-

zation

Text-to- Phoneme

Conversion

Prosody Analysis

Waveform Production

Markup support: p, s Non-markup behavior: infer structure by automated text analysis

Page 29: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 29

Before and after Structure Analysis •  Before structure analysis

–  Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught a 19 lb. bass.

•  After structure analysis <s> He plays bass guitar. </s> <s> He also likes to fish; last week he caught

a 19 lb. bass. </s> </p>

<p> <s> Dr. Smith lives at 214 Elm Dr. </s> <s> He weights 214 lb. </s>

Page 30: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 30

Speech Synthesis ML

Structure Analysis

Text Normali-

zation

Text-to- Phoneme

Conversion

Prosody Analysis

Waveform Production

Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs

Markup support: p, s Non-markup behavior: infer structure by automated text analysis

Page 31: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 31

After Text Normalization <p> <s> <sub alias= "doctor">Dr. </sub> Smith lives at 214 Elm <sub alias = "drive">Dr. </sub> </s> <s> He weights 214<sub alias= "pounds"> lb. </sub> </s> <s> He plays bass guitar. </s> <s> He also likes to fish; last week he caught a 19 <sub alias= "pound"> lb. </sub> bass. </s> </p>

Page 32: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 32

Speech Synthesis ML

Structure Analysis

Text Normali-

zation

Text-to- Phoneme

Conversion

Prosody Analysis

Waveform Production

Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary

Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs

Markup support: p, s Non-markup behavior: infer structure by automated text analysis

Page 33: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 33

After text-to-phoneme conversion <p> <s> <sub alias = "doctor">Dr.</sub> Smith lives at <say-as interpret-as = “address"> 214 </sayas> Elm <sub alias = "drive">Dr. </sub> </s> <s> He weighs <sayas interpret-as = “number”>214 </sayas> <sub alias= "pounds"> lb.</sub> </s> <s> He plays <phoneme alphabet = “IPA" ph="b@s">bass</phoneme> guitar. </s> <s> He also likes to fish; last week he caught a

<sayas interpret-as= “number">19 </sayas> <sub alias= "pound"> lb. </sub> <phoneme alphabet = “IPA" ph="bas">bass</phoneme>. </s> </p>

Page 34: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 34

Speech Synthesis ML

Structure Analysis

Text Normali-

zation

Text-to- Phoneme

Conversion

Prosody Analysis

Waveform Production

Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax

Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary

Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs

Markup support: p, s Non-markup behavior: infer structure by automated text analysis

Page 35: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 35

Prosody Analysis (Initial text)

<prompt> Environmental control menu. Do you want to

adjust the lighting or temperature? </prompt>

Page 36: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 36

Prosody Analysis

<prompt> Environmental control menu <break/> <emphasis level = "reduced" > do you want to adjust the </emphasis> <emphasis level = "strong"> lighting </emphasis> <break/> or <emphasis level = "strong"> temperature? </emphasis> </prompt>

Page 37: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 37

Speech Synthesis ML

Structure Analysis

Text Normali-

zation

Text-to- Phoneme

Conversion

Prosody Analysis

Waveform Production

Markup support: voice, audio*

Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax

Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary

Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs

Markup support: paragraph, sentence Non-markup behavior: infer structure by automated text analysis

*audio icons, branding, advertising

Page 38: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 38

Wave Form Production

<prompt> <audio src=“http://www.example.com/adjust.wav" > <desc>

Environmental control menu. Do you want to adjust the lighting or temperature

</desc> </audio> </prompt>

Page 39: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 39

Exercise 4 (insert SSML commands)

<prompt> Welcome to Ajax Bank do you want to

withdraw or deposit funds? </prompt>

Page 40: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 40

Outline

•  Motivation for VoiceXML •  W3C Speech Interface Framework

Languages •  Dialog—VoiceXML 2.0 •  Speech Synthesis—SSML •  Grammars—SRGS •  Semantic Interpretation—SI •  VoiceXML 2.1

Page 41: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 41

Grammars

•  Describe what the user may say at a point in the dialog

•  Enable the speech recognition engine to work faster and more accurately

•  Consist of one or more “rules”

Page 42: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 42

Example Grammar <grammar

type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">

<rule id = "zero_to_ten"> <one-of> <item> zero </item> <ruleref uri = "#single_digit"/> <item> ten </item> </one-of> </rule>

<rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar>

XML form of grammars

Page 43: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 43

Example Grammar <grammar

type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">

<rule id = "zero_to_ten"> <one-of> <item> zero </item> <ruleref uri = "#single_digit"/> <item> ten </item> </one-of> </rule>

<rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar>

Grammar processor should start with the “zero_to_ten” rule

Page 44: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 44

Example Grammar <grammar

type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">

<rule id = "zero_to_ten"> <one-of> <item> zero </item> <ruleref uri = "#single_digit"/> <item> ten </item> </one-of> </rule>

<rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar>

This is a grammar used by the speech recognizer. (There may

also be grammars for DTMF recognizers.)

Page 45: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 45

Example Grammar <grammar

type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">

<rule id = "zero_to_ten"> <one-of>

<item> zero </item> <ruleref uri = "#single_digit"/> <item> ten </item> </one-of> </rule>

<rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar>

Rule describing single digits

Rule describing digits one through ten

Page 46: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 46

Example Grammar <grammar

type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">

<rule id = "zero_to_ten"> <one-of> <item> zero </item> <ruleref uri = "#single_digit"/> <item> ten </item> </one-of> </rule>

<rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar>

<one-of> describes alternatives

Page 47: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 47

Example Grammar <grammar

type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">

<rule id = "zero_to_ten"> <one-of> <item> zero </item> <ruleref uri = "#single_digit"/> <item> ten </item> </one-of> </rule>

<rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar> Rule element references another

rule

Page 48: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 48

Example Grammar <grammar

type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">

<rule id = "zero_to_ten"> <one-of> <item> zero </item> <ruleref uri = "#single_digit"/> <item> ten </item> </one-of> </rule>

<rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar> Exercise 5:

Write a grammar for that recognizes the digits zero to nineteen

Page 49: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 49

More Grammar Elements •  Repeat and optional

<rule id = "goodness" scope = "public"> <item repeat = "0-3" > very </item>

good </rule>

•  Sequence <rule id = "twenty_thru_twentynine“>

Twenty <ruleref uri = "#single_digit"/> </rule>

•  Garbage <rule name = "James_Lewis">

<item> James <ruleref special = “garbage"/> Lewis </item> </rule>

Page 50: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 50

Reusing existing grammars

<grammar type = "application/srgs+xml"

root = "size” src = “http://www.example.com/size.grxml"/>

Page 51: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 51

Outline

•  Motivation for VoiceXML •  W3C Speech Interface Framework

Languages •  Dialog—VoiceXML 2.0 •  Speech Synthesis—SSML •  Grammars—SRGS •  Semantic Interpretation—SI •  VoiceXML 2.1

Page 52: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 52

Semantic Interpretation

•  Semantic Interpretation defines how to extract and modify the results returned by the speech recognition engine

•  Semantic interpretation instructions contained in the <tag> element

•  Two kinds of syntax for <tag> contents: – Semantic Literals (literal values) – Semantic Scripts (ECMAScript)

Page 53: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 53

Semantic Interpretation

•  Semantic Literals example:

<rule id=“drink“> <one-of> <item> coca cola <tag> coke </tag> </item> <item> cola <tag> coke </tag> </item> <item> black fizzy stuff <tag> coke </tag> </item>

<item> coke </item> </one-of> </rule>

Page 54: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 54

Semantic Interpretation

•  Semantic Literals example:

<rule id=“drink“> <one-of> <item> coca cola <tag> coke </tag> </item> <item> cola <tag> coke </tag> </item> <item> black fizzy stuff <tag>coke </tag> </item>

<item> coke </item> Default Assignment </one-of> </rule>

Page 55: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 55

No Semantic Scripts

ASR

Grammar with Semantic

Interpretation Scripts

Semantic Interpretation

Processor

VoiceXML Interpreter

text

ECMAScript object

fourteen

Page 56: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 56

No Semantic Interpretation

ASR

Grammar with Semantic

Interpretation Scripts

VoiceXML Interpreter

text

fourteen

fourteen

ECMAScript object

Semantic Interpretation

Processor

Page 57: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 57

Semantic Interpretation

ASR

Grammar with Semantic

Interpretation Scripts

VoiceXML Interpreter

text

fourteen

<item> fourteen <tag>new.quantity=“14”;</tag> </item>

ECMAScript object

Semantic Interpretation

Processor

Page 58: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 58

Semantic Interpretation

ASR

Grammar with Semantic

Interpretation Scripts

VoiceXML Interpreter

text

fourteen fourteen

{ quantity: “14” }

<item> fourteen <tag>new.quantity=“14”;</tag> </item>

ECMAScript object

Semantic Interpretation

Processor

Page 59: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 59

Semantic Interpretation

•  Semantic Scripts employ ECMAScript

•  Advantages: • Richer structure (objects) • Ability to perform computations

Page 60: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 60

Semantic Interpretation •  Example grammar rule with Script Syntax:

<rule id = "action"> <one-of>

<item> small <tag> out.size = "small"; </tag> </item> <item> medium <tag> out.size = "medium"; </tag> </item> <item> large <tag> out.size = “large"; </tag> </item> </one-of> <one-of> <item> green <tag> out.color = "green"; </tag> </item> <item> blue <tag> out.color = "blue"; </tag> </item> <item> white <tag> out.color = "white"; </tag> </item> </one-of> </rule>

•  ECMAScript structure:

action: { size: "large" color: "white" }

Large white

Page 61: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 61

Semantic Interpretation •  Example grammar rule with Script Syntax:

<rule id="calculator"> What is <ruleref uri="#digit"/><tag>$.total = $digit;</tag>

<item repeat="1-"> plus <ruleref uri="#digit"/> <tag> $.total = $.total + $digit; </tag> </item> </rule>

•  ECMAScript structure:

calculator: { total: 6 }

What is 1+ 2+ 3?

Page 62: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 62

Exercise 6 Fill in the contents of <tag>

•  Grammar rule:

<rule id = “transfer"> from

<one-of> <item> savings <tag>________________________ </tag> </item> <item> checking <tag>________________________</tag> </item> </one-of>

to <one-of> <item> savings <tag>________________________</tag> </item>

<item> checking <tag>________________________</tag> </item> </one-of> </rule>

•  ECMAScript structure:

transfer: { source_account: "savings" target_account: “checking" }

From savings to checking

Page 63: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 63

Outline

•  Motivation for VoiceXML •  W3C Speech Interface Framework

Languages •  Dialog—VoiceXML 2.0 •  Speech Synthesis—SSML •  Grammars—SRGS •  Semantic Interpretation—SI •  VoiceXML 2.1

Page 64: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 64

VoiceXML 2.1

•  VoiceXML’s success and popularity resulted in many implementations early in the standardization process

•  Additional, innovative features were conceived after VoiceXML 2.0 content was agreed

•  Goals of VoiceXML 2.1: –  Ensure portability by specifying a set of commonly

implemented extensions –  Backwards-compatible with VoiceXML 2.0 –  Follow a “fast track” to standardization

Page 65: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 65

VoiceXML 2.1

•  Standardized extensions: – Locate barge-in occurrences within prompts – Access recognition utterances for analysis –  Increase performance be reducing server

round-trips – Extended call transfer types

Page 66: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 66

Summary

•  W3C Speech Interface Framework – Dialog—VoiceXML – Grammar—SRGS – Synthesis—SSML – Semantic Interpretation—SI – Call Control—CCXML

•  Can work together or separately •  See http://www.w3.org/voice/ for details

Page 67: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 67

Industry Organizations

•  World Wide Web Consortium –  http://www.w3.org

•  W3C Voice Browser Working Group –  http://www.w3.org/voice/

•  W3C Multi-Modal Working Group –  http://www.w3.org/2002/mmi/

•  VoiceXML Forum –  http://www.voicexml.org

•  SALT Forum: –  http://www.saltforum.org

•  Speech Technology Magazine –  http://www.amcommexpos.com/

Page 68: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 68

Books •  James A. Larson, VoiceXML—An Introduction

to Developing Speech Applications, 2002, Upper Saddle River, NJ: Prentice Hall. •  Eve Astrid Andersson, et.al., Early Adopter Voice, 2001, Birmingham

UK: Vrox. •  Bruce Balentine & David P. Morgan, How to Build a Speech

Recognition Application: A Style Guide for Telephony Dialogues, 1999, San Ramon, CA: Enterprise Integration Group.

•  Rick Beasley et. al., Voice Application Development with Voice, 2002, Indianapolis: Sams.

•  Bob Edgar, The Voice Handbook, 2001, New York: CMP. •  Susan Weinschenk & Dean T. Barker, Designing Effective Speech

Interfaces, 2000, New York: John Wiley & Sons. •  Chetan Sharma & Jeff Kunins, Voice: Strategies and Techniques for

Effective Voice Application Development with Voice 2.0, 2002, New York: John Wiley.

•  Michael H. Cohen, James P. Giangola, & Jennifer Balogh, Voice User Interface Design, 2004, Addison Wesley.

Page 69: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 69

Other Resources

•  The VoiceXML Guide – http://www.vxmlguide.com/

Page 70: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 70

Tutorials and Articles

•  VoiceXML Forum – http://www.voicexmlforum.org/

•  VoiceXML Review – http://www.voicexmlreview.org/

•  World of VoiceXML – http://www.kenrehor.com/voicexml/

Page 71: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 71

Online Voice SDKs Name URL BeVocal Cafe http://cafe.bevocal.com Tellme Studio http://studio.tellme.com VoiceGenie Developer Workshop http://developer.voicegenie.com

Voxpilot voxbuilder http://www.voxbuilder.com

Page 72: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 72

Questions?

?

Page 73: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 73

Thanks for your attention

Page 74: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 74

Answer to Exercise 2

<form> <prompt> When were you born? </prompt> <field name = "month">

<prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/>

</field> <field name = "day">

<prompt> What day of the month? </prompt> <grammar src=“http://www.ajax.com/day.grxml"/>

</field> <field name = "year"> <prompt> What year </prompt>

<grammar src=“http://www.ajax.com/year.grxml"/> </field> </form>

Page 75: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 75

Answer to Exercise 3 Write event handlers for the month field

<catch event = "help">

<prompt> In what month were you born? </prompt> </catch>

<catch event = "nomatch"> <prompt> Which month, for example, January February, or March? </prompt> </catch>

<catch event = "noinput"> <prompt> Say the name of the month you were born in </prompt> </catch>

Page 76: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 76

Answer to Exercise 4

<prompt> Welcome to Ajax Bank <break/> <emphasis level = "reduced " > do you want to </emphasis> <emphasis level = "strong"> withdraw </emphasis> <break/> or <emphasis level = "strong">deposit </emphasis> funds? </prompt>

Page 77: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 77

Answer to Exercise 5 Write a grammar for zero to nineteen

<grammar type = "application/srgs+xml" root = "zero_to_19" mode = "voice">

<rule id = "zero_to_19"> <one-of> <item> zero </item>

<ruleref uri = "#single_digit"/> <item> ten </item> <item> eleven </item> <item> twelve </item> <item> thirteen </item> <item> fourteen </item> <item> fifteen </item> <item> sixteen </item> <item> seventeen </item> <item> eighteen </item> <item> nineteen </item>

</one-of> </rule>

<rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar>

Page 78: Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught

(c) 2007 Larson Technical Services 78

Answer to Exercise 6 From savings to

checking

•  Grammar rule:

<rule id = “transfer"> from

<one-of> <item> savings <tag> out.source_account = “savings"; </tag> </item> <item> checking <tag> out.source_account = “checking"; </tag> </item> </one-of>

to <one-of> <item> savings <tag> out.target_account = “savings"; </tag> </item>

<item> checking <tag> out.target_account = “checking"; </tag> </item> </one-of> </rule>

•  ECMAScript structure:

transfer: { source_account: "savings" target_account: “checking" }