DAWN: Dynamic Aural Web Navigation
description
Transcript of DAWN: Dynamic Aural Web Navigation
DAWN: Dynamic Aural Web NavigationDAWN: Dynamic Aural Web Navigation
Gopal Gupta, S. Sunder Raman, Mike Nichols, H. Reddy, N. Annamalai
Department of Computer ScienceUniversity of Texas at Dallas
Introduction
• The Web is intrinsically Visual.
• We need computers to access the Web.
• People with visual disabilities cannot fully iiiinteract with the Web.
• What is the Aural Web?
Based on the traditional Web.
Voice/Audio for Input/Output.
Listeners have complete control over navigation.
Aural Web
• What is the Aural Web?
• Why do we need an Aural Web?
Make the Web accessible to all.
Do away with the need to have a computer to access iiiithe Web.
Aural Web
• What is the Aural Web?
• Why do we need an Aural Web?
• How can we obtain an Aural Web?
Translate HTML to VoiceXML
(however, translation is not enough).
Need to enhance VoiceXML to make it dynamically iiiinavigable.
Aural Web
VoiceXML
• W3C Standard for marking voice documents
• VoiceXML documents are ‘played’ on voice iiibrowsers.
• A VoiceXML document consists of various iiiforms. Form names are used to control iiinavigation.
• Inputs are restricted to a set of pre-defined iiiwords specified via a grammar.
Example VoiceXML Document <vxml version="2.0"> <form> <field name="rich"> <grammar type=“application/x-gsl” mode = “voice”> <![CDATA[[ [(yes)]{<option “yes”>} [(no)]<option “no”>} ]]]> </grammar> <prompt>Would you like to get rich quick?</prompt> <filled>Gotcha. <if cond="rich==‘yes’"> You want to be rich! <goto next="rich.vxml" /> <else /> You don't want to be rich. <goto next="poor.vxml" /> </if> </filled> </field> </form> </vxml>
Translating HTML to VoiceXML
• A module to denotationally map HTML constructs iiito VoiceXML.
• It is extendable and flexible.
Translating HTML to VoiceXML
Interface Sheet
HTML Tags
<blockquote>
</blockquote>
Output Text
Starting of text quoted from elsewhere.
Ignore
Input Attributes
Input Duration in Seconds for Text Box :
Input Duration in Seconds for Text Area :
HTML to VoiceXML Translator
• A module that denotationally map HTML constructs iiito VoiceXML.
• It is extendable and flexible.
• It can handle Forms. It preserves information about iiithe submit type and target URL.
• The translator imposes certain reasonable iiirestrictions on the input HTML.
DAWN Architecture HTML to VoiceXML Translator.
VoiceXML Enhancer.
WHY…?
Limitations of VoiceXML
• Navigation is controlled by the author; the listener has iiivirtually no control.
• The author has to hardcode every possible navigation iiipath (obviously not possible).
• Poorly authored documents become difficult to iiibrowse.
• Speech recognition technology allows an arbitrary iiiuser to speak only pre-determined phrases.
Limitations of VoiceXML (cont’d)
• These limitations result in VoiceXML’s being useful for only simple applications.
• Thus, simple database lookup (e.g., American Airlines airline information system) are possible,
• But advanced applications that require complex interaction are not (e.g., making air reservation).
• What is needed is the ability for the listener to move around the VXML document at his/her will.
Solution
• We introduce the concept of Voice Anchors, allowing iiilisteners to dynamically tag and recall any dialog.
• We modify the VoiceXML documents and generate iiinew VoiceXML documents dynamically at run time.
• Support pre-defined keywords (e.g., pause).
Dynamic Voice Anchors
• Analogous to bookmarks or HTML anchors.
• An anchor is a speech label that can be associated iiiwith a specific dialog.
• These anchors can be then used to recall associated iiidialogs.
• A single anchor name can be used to tag multiple iiidialogs (cumulative anchor).
• Any word can be chosen for an anchor name. The iiiuser spells them out the first time only.
The Enhancer Module
• Enhances the VoiceXML file readying it to accept iiiDynamic Voice Anchors.
• Modifies the VoiceXML document to add iiiinterfaces to server side CGIs.
• Adds functionality for specific keywords which iiihave pre-determined semantics.
• e.g. skip, repeat, pause, resume, back.
The Big Picture
DEMO
Some Applications
• EPlan – Is a Web based integrated contingency handling system. The Aural Web increases its accessibility.
• MathML to VoiceXML – Allows complex Mathematical formulas to be broken down, tagged and recalled.
• Searching for directions on the Web.
• Make online airline reservations using the phone.
Current & Future Work
• Design of Voice Scripting Languages (Talk by Mike Nichols tomorrow in Tiberius 2 at 10:30am).
• Intelligent Navigation Strategy for navigating Tables.
• Finally, incorporate all these techniques into a Voice Browser.
Contributions
• An Aural Web based on the traditional Web that allows users to perform complex Web operations using the phone.
• Developed techniques to give a listener maximum control (via dynamic voice anchors).
?