Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

17
Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd , 2006

Transcript of Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

Page 1: Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

Mobile Multimodal Applications.

Dr. Roman Englert, Gregor Glass March 23rd, 2006

Page 2: Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

2

Agenda:

Motivation

Potentials through Multimodality

Use Cases Map & Sound Logo

Components & Modules for Mobile Multimodal Interaction

User Perspective

Challenges

Page 3: Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

3

Moore's Law: Technical capability will double approximately every 18 months.

Buxton's Law: Technology designers promise functionality proportional to Moore's Law.

Moore‘s Law:

Growth ofTechnology

Moore's Law: Technical capability will double approximately every 18 months.

Buxton's Law: Technology designers promise functionality proportional to Moore's Law.

Buxton‘s Law:

Growth ofFunctionality

Multimodality.Motivation.

The Challenge is how to deliver more functionality without breaking through the complexity barrier and making the systems so cumbersome as to be completely unusable.

God’s Law (complexity barrier): Human capacity is limited and does not increase over time!

God‘s Law:

Growth of Human Capabiltiy

(Billy Buxton)

Page 4: Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

5

Multimodality – New User Interfaces. Composite Usage Szenario: Map.

Example:

User selects a point of interestclicking with a stylus and speaking inorder to focus it.

„Zoom in here”

Page 5: Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

6

Multimodality – New User Interfaces. Composite Usage Szenario: SoundLogo

Example:

User selects a sound logoby clicking on the title witha stylus and speaking inorder to hear it

SoundLogo = Personalized Call Connect Signal

„Play this sound logo”

Page 6: Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

7

Input Voice Stylus

Gesture …

ClientUser Server Content

Back-EndVoice Data

Dialog Management

Synchronisation Management

Media Resource Management (ASR/TTS)

Output Voice Text

Graphic Video …

User Interface Types of Multimodality

Sequential

Parallel

Architecture Layer Internet / Services

Multimodality – New User Interfaces.Components of multimodal end-to-end connection.

Page 7: Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

8

Recognition

grammar

speech

ink

etc.

system-generated

Interpretation

interpretation

interpretation

interpretation

interpretation

mouse/keyboard

semanticinterpretation

Integration

integrationprocessor

interactionmanager

EMMA

EMMA

EMMA

EMMA

EMMA

system andenvironment

applicationfunctions

sessioncomponent

EMMA

Multimodality – New User Interfaces.Main modules for parallel interaction.

back

Page 8: Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

9

Multimodality – New User Interfaces. User Perspective:

Feedback nutshell from divers previous innovation projects: “Give us speech control”

Composite interaction with full prototype implementation for customer self service: 2 campaigns (SMS & Personalized Call Connect Signal)

Need to actively communicate the possibilities & advantages of new multimodal interaction paradigm to user

Real appreciation of speech control & good acceptance of “push-to-talk” mode

Expectation: Symmetry & consistency between the interaction modes

BUT: How do users really want to speak to the machine?How to provide feedback? / How to correct input

errors?

Great for context dependent service interaction BUT: Which mode is most suitable for which task?

For whom? Under which circumstances?

Page 9: Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

10

Multimodality – New User Interfaces. Challenges:

Sequential vs parallel i/o

Unique interpretation of multimodal hypotheses

Discourse phenomena like anaphora resolution and generation

Input correction loops

Encapsulation of i/o tools to achieve a generic front end

Model Driven Architecture

Page 10: Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

11

Thank you for your attention!

Page 11: Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

12

Multimodality – New User Interfaces.Sequential and Parallel Input.

Sequential input Multimodal applications may allow

to choose between different input modalities, e.g. to speak or to click on a button

Only one input channel will be interpreted, i.e. the user may speak or click on a button

Multiple input channels will be interpreted sequentially as defined by the application

Parallel input Also known as composite input Multimodal applications allow

to use multiple input modes at nearly the same time, e.g. the user may speak and tap onto the screen

The Multimodal application will combine multiple inputs and interpret them User navigates in a map

and speaks “zoom in here”

Select a field and then speak “My number is …”

Then click only on a button Afterwards, navigate “Back to

main menu”

Parallel input needs additional platform or application capabilities in order to combine (integrate) and interpret multiple inputs

Page 12: Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

13

User speaks and clicks on the screen:

“Zoom in <here>.”

Recognition

grammar

speech

ink

Interpretation

interpretation

interpretation

semanticinterpretation

Integration

integrationprocessor

interactionmanager

EMMA

EMMA

EMMA

Multimodality – New User Interfaces.Example: Composite input for voice and stylus.

Page 13: Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

14

Semantic Interpretation:

action = zoom in

location = x, y from stylus

Recognition

grammar

speech

ink

Interpretation

interpretation

interpretation

semanticinterpretation

Integration

integrationprocessor

interactionmanager

EMMA

EMMA

EMMA

Multimodality – New User Interfaces.Example: Composite input for voice and stylus.

Interpretation:<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma"> <emma:interpretation id="int1" emma:mode="speech"> <action>zoom_in</action> <location emma:hook="ink"/> </emma:interpretation></emma:emma>

Page 14: Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

15

User clicks on map while speaking:

x = 17

y = 54

Recognition

grammar

speech

ink

Interpretation

interpretation

interpretation

semanticinterpretation

Integration

integrationprocessor

interactionmanager

EMMA

EMMA

EMMA

Multimodality – New User Interfaces.Example: Composite input for voice and stylus.

Page 15: Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

16

Interpretation:<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma"> <emma:interpretation id="int1" emma:mode="ink"> <location> <type>point</type> <x>17</x> <y>54</y> </location> </emma:interpretation></emma:emma>

Recognition

grammar

speech

ink

Interpretation

interpretation

interpretation

semanticinterpretation

Integration

integrationprocessor

interactionmanager

EMMA

EMMA

EMMA

Multimodality – New User Interfaces.Example: Composite input for voice and stylus.

Page 16: Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

17

Recognition

grammar

speech

ink

Interpretation

interpretation

interpretation

semanticinterpretation

Integration

integrationprocessor

interactionmanager

EMMA

EMMA

EMMA

Multimodality – New User Interfaces.Example: Composite input for voice and stylus.

Integration:<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma"> <emma:interpretation id="int1" emma:mode="multimodal"> <action>zoom_in</action> <location> <type>point</type> <x>17</x> <y>54</y> </location> </emma:interpretation></emma:emma>

Page 17: Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

18

Interaction manager (application specific tasks)

Proof of input data integrated input? speech only? ink/stylus only?

Proof of suitability of Integration results input data compatible? (e.g. are the

real number of stylus input (e.g. 2times) the same like the expected value

Mapping of recognition results from different modalities e.g.

Speech recognition error but stylus correct

Speech recognition OK but stylus incorrect

Confidence ok and stylus ok Decision for error handling output

graphical, audio, prompt, TTS Handling of redundant information and

creation of related user reaction prioritisation of input modalities

Integration

integrationprocessor

interactionmanager

EMMA

EMMA

EMMA

Multimodality – New User Interfaces.Methods and functionalities: Interaction manager.

back