DAWN: Dynamic Aural Web Navigation

DAWN: Dynamic Aural Web NavigationDAWN: Dynamic Aural Web Navigation

Gopal Gupta, S. Sunder Raman, Mike Nichols, H. Reddy, N. Annamalai

Department of Computer ScienceUniversity of Texas at Dallas

Introduction

• The Web is intrinsically Visual.

• We need computers to access the Web.

• People with visual disabilities cannot fully iiiinteract with the Web.

• What is the Aural Web?

Based on the traditional Web.

Voice/Audio for Input/Output.

Listeners have complete control over navigation.

Aural Web


• Why do we need an Aural Web?

Make the Web accessible to all.

Do away with the need to have a computer to access iiiithe Web.

Aural Web


• Why do we need an Aural Web?

• How can we obtain an Aural Web?

Translate HTML to VoiceXML

(however, translation is not enough).

Need to enhance VoiceXML to make it dynamically iiiinavigable.

Aural Web

VoiceXML

• W3C Standard for marking voice documents

• VoiceXML documents are ‘played’ on voice iiibrowsers.

• A VoiceXML document consists of various iiiforms. Form names are used to control iiinavigation.

• Inputs are restricted to a set of pre-defined iiiwords specified via a grammar.

Example VoiceXML Document <vxml version="2.0"> <form> <field name="rich"> <grammar type=“application/x-gsl” mode = “voice”> <![CDATA[[ [(yes)]{<option “yes”>} [(no)]<option “no”>} ]]]> </grammar> <prompt>Would you like to get rich quick?</prompt> <filled>Gotcha. <if cond="rich==‘yes’"> You want to be rich! <goto next="rich.vxml" /> <else /> You don't want to be rich. <goto next="poor.vxml" /> </if> </filled> </field> </form> </vxml>

Translating HTML to VoiceXML

• A module to denotationally map HTML constructs iiito VoiceXML.

• It is extendable and flexible.

Translating HTML to VoiceXML

Interface Sheet

HTML Tags

<blockquote>

</blockquote>

Output Text

Starting of text quoted from elsewhere.

Ignore

Input Attributes

Input Duration in Seconds for Text Box :

Input Duration in Seconds for Text Area :

HTML to VoiceXML Translator

• A module that denotationally map HTML constructs iiito VoiceXML.

• It is extendable and flexible.

• It can handle Forms. It preserves information about iiithe submit type and target URL.

• The translator imposes certain reasonable iiirestrictions on the input HTML.

DAWN Architecture HTML to VoiceXML Translator.

VoiceXML Enhancer.

WHY…?

Limitations of VoiceXML

• Navigation is controlled by the author; the listener has iiivirtually no control.

• The author has to hardcode every possible navigation iiipath (obviously not possible).

• Poorly authored documents become difficult to iiibrowse.

• Speech recognition technology allows an arbitrary iiiuser to speak only pre-determined phrases.

Limitations of VoiceXML (cont’d)

• These limitations result in VoiceXML’s being useful for only simple applications.

• Thus, simple database lookup (e.g., American Airlines airline information system) are possible,

• But advanced applications that require complex interaction are not (e.g., making air reservation).

• What is needed is the ability for the listener to move around the VXML document at his/her will.

Solution

• We introduce the concept of Voice Anchors, allowing iiilisteners to dynamically tag and recall any dialog.

• We modify the VoiceXML documents and generate iiinew VoiceXML documents dynamically at run time.

• Support pre-defined keywords (e.g., pause).

Dynamic Voice Anchors

• Analogous to bookmarks or HTML anchors.

• An anchor is a speech label that can be associated iiiwith a specific dialog.

• These anchors can be then used to recall associated iiidialogs.

• A single anchor name can be used to tag multiple iiidialogs (cumulative anchor).

• Any word can be chosen for an anchor name. The iiiuser spells them out the first time only.

The Enhancer Module

• Enhances the VoiceXML file readying it to accept iiiDynamic Voice Anchors.

• Modifies the VoiceXML document to add iiiinterfaces to server side CGIs.

• Adds functionality for specific keywords which iiihave pre-determined semantics.

• e.g. skip, repeat, pause, resume, back.

The Big Picture

Some Applications

• EPlan – Is a Web based integrated contingency handling system. The Aural Web increases its accessibility.

• MathML to VoiceXML – Allows complex Mathematical formulas to be broken down, tagged and recalled.

• Searching for directions on the Web.

• Make online airline reservations using the phone.

Current & Future Work

• Design of Voice Scripting Languages (Talk by Mike Nichols tomorrow in Tiberius 2 at 10:30am).

• Intelligent Navigation Strategy for navigating Tables.

• Finally, incorporate all these techniques into a Voice Browser.

Contributions

• An Aural Web based on the traditional Web that allows users to perform complex Web operations using the phone.

• Developed techniques to give a listener maximum control (via dynamic voice anchors).

DAWN: Dynamic Aural Web Navigation

Documents

Transcript of DAWN: Dynamic Aural Web Navigation