Fusion Engines for Input Multimodal Interfaces: a Survey

18
Special session on Multimodal Fusion A survey: Fusion Engines for Multimodal Input 5 papers D. Lalanne (Switzerland), L. Nigay (France), P. Palanque (France), P. Robinson (UK), J. Vanderdonckt (Belgium) 1

description

Fusion engines are fundamental components of multimodal interactive systems, to interpret temporal combinations of deterministic as well as non-deterministic inputs whose meaning can vary according to the context, user and task. While various surveys have already been released on the topic of multimodal interactive systems, the current paper focuses on the design, specification, construction and evaluation of fusion engines. The article first introduces the adopted terminology and the major challenges that fusion engines propose to solve. Further, a history of the work achieved in the field of fusion engines is presented according to the main phases of the BRETAM model. A classification of existing approaches for fusion engines is then presented. The classification dimensions include the types of applications, the fusion principles and the temporal aspects. Finally, unsolved challenges, such as software frameworks, quantitative evaluation, machine learning and adaptation, sketch future work in the field of fusion engines.

Transcript of Fusion Engines for Input Multimodal Interfaces: a Survey

Page 1: Fusion Engines for Input Multimodal Interfaces: a Survey

Special session on Multimodal Fusion

• A survey: Fusion Engines for Multimodal Input• 5 papers

D. Lalanne (Switzerland), L. Nigay (France), P. Palanque (France), P. Robinson (UK), J. Vanderdonckt (Belgium)

1

Page 2: Fusion Engines for Input Multimodal Interfaces: a Survey

Multimodal fusion

• Multimodal fusion for• Perception• Interaction

• Focus on multimodal interaction• 4 papers on multimodal interaction• 1 paper on multimodal perception

(first one)

2

Page 3: Fusion Engines for Input Multimodal Interfaces: a Survey

Input Multimodal Interaction

3

Page 4: Fusion Engines for Input Multimodal Interfaces: a Survey

Input Fusion Engines• Multimodal fusion

• Combining and interpreting data from multiple input modalities

• Usage of input modalities

Combined

Independent

Sequential Parallel

Alternate

Exclusive

Synergistic

Concurrent

4

Page 5: Fusion Engines for Input Multimodal Interfaces: a Survey

Input Fusion Engines

• Combined usage (sequential, parallel) why?

• Natural interaction is multimodal by nature.

• The combination of input modalities increases the bandwidth of the human-computer interaction.

5

Page 6: Fusion Engines for Input Multimodal Interfaces: a Survey

Fusion engines• A very dynamic domain • ˜15 years of contributions: 1993-2008

6

Page 7: Fusion Engines for Input Multimodal Interfaces: a Survey

Input Fusion engines• Some key features

• Multiple and temporal combinations• Types of data and time synchronization

• Probabilistic inputs• Non deterministic inputs

• Robustness• Error handling• Adaptation to context

• Context = (user, environment, platform)

7

Page 8: Fusion Engines for Input Multimodal Interfaces: a Survey

Classification:Fusion engines

8

1980 R. Bolt

“Put that there”

Page 9: Fusion Engines for Input Multimodal Interfaces: a Survey

Classification:Fusion engines

9

1980 R. Bolt

“Put that there”

Cubricon

1989

CARE 1995

Quickset

1997

ICARE 2004 Petshop

2004FAME 2006

Page 10: Fusion Engines for Input Multimodal Interfaces: a Survey

Classification:Fusion engines

10

1980 R. Bolt

“Put that there”

Multiple (up to 255) Input API in

Windows 7 Microsoft

MultiPoint SDK

“Zoom in

here”

UX beats Usability

A gap

Page 11: Fusion Engines for Input Multimodal Interfaces: a Survey

Theories and Contributions over Time

11

Page 12: Fusion Engines for Input Multimodal Interfaces: a Survey

Reference Tool/ language/ programFusion Time Representation

Application types

Notation Type Level Input DevicesAmbiguity Resolution

Quantitative Qualitative

BBolt [4] Put that here system None None Dialog Speech gesture ? N ? Map manipulation

R Wahlster

Erreur ! Source du renvoi introuvable. XTRA None Unification Dialog Keyboard Mouse N Y Map manipulation

Neal [26] CubriconGeneralized Augmented Transition Network Procedural Dialog Speech Mouse Keyboard

Proximity-based N Y Map manipulation

E

Koons [19] No name Parse treeFrame-based Dialog Speech, Eye gaze, Gesture

First solution Y Y 3D World

Nigay [28] Pac-Amodeus Melting PotFrame-based Dialog + low level Speech, Keyboard, Mouse

Context-based resolution Y N Flight Scheduling

Cohen [9] Quickset Feature Structure Unification Dialog Pen VoiceS / G & G / S & N best Y N

Simulation System training

Bellik [3] MEDITOR NoneFrame-based Dialog + low level Speech Mouse

History Buffer Y Y Text Editor

Martin [22] TYCOON Set of processes – Guided Propagation Networks Procedural Dialog Speech Keyboard Mouse

Probability-based resolution Y Y

Edition of graphical user interfaces

Johnston [18] FST Finite State Automata Procedural Dialog Speech penPossible (N best) Y Y Corporate Directory

T & A Krahnstoever

[20] iMap Stream StampedFrame-based Dialog Speech gesture Not given Y N Crisis Management

Dumas [12] HephaisTK XML Typed (SMUIML)Frame-based Dialog Speech Mouse Phidgets First one Y Y Meeting assistants

Holzapfel [17] No Name Typed Feature Structure Unification Dialog Speech gesture N Best list Y N Humanoid Robot

Pfleger [33] PATE XML Typed Unification Dialog Speech pen N Best list Y Y Bathroom design Tool

Milota [25] No Name Multimodal Parse Tree Unification DialogSpeech Mouse keyboard Touchscreen S / G & G /S Y N Graphic Design

Melichar [24] WCIMultimodal Generic Dialog Node Unification Dialog Speech Mouse Keyboard First One ? ? Multimedia DB

Sun [37] PUMPP Matrix Unification Dialog Speech gesture S / G N Y Traffic Control

Bourguet [7] Mengine Finite State machine Procedural Low level Speech Mouse Not given N Y No example

Latoschik [21] No NameTemporal Augmented Transition Network Procedural Dialog Speech gesture

Fuzzy constraints Y Y Virtual reality

Bouchet [5] [6]Mansoux [23]

ICARE(Input/Output) Melting pot

Frame-based Dialog + low level

Speech, Helmet visor HOTAS, Tactile surface, GPS localization, Magnetometer, Mouse, Keyboard

Context-based resolution Y N

Aircraft Cockpit, Authentication, Mobile Augmented Reality systems (Game, Post-it), Augmented Surgery

Navarre [30] Petshop Petri nets Procedural Dialog + low levelSpeech mouse Keyboard Touchscreen *** Y Y Aircraft Cockpit

Flippo [14] No Name Semantic tree Hybrid DialogSpeech Mouse Gaze gesture

Feedback for missing data Y N Collaborative Map

Portillo [34] MIMUSFeature Value Structure (DTAC) Hybrid Dialog Speech Mouse

Knowledgeable agent Y N

Duarte [11] FAME Behavioral Matrix Hybrid Dialog Speech Mouse Keyboard Not given ? ? Digital talking Book12

Page 13: Fusion Engines for Input Multimodal Interfaces: a Survey

ReferenceTool/

language/ program

FusionTime

Representation Application types

Notation Type Level Input DevicesAmbiguity Resolution

Quantitative

Qualitative

B Bolt [4] Put that here system None None Dialog Speech gesture ? N ? Map manipulation

R Wahlster XTRA None Unification Dialog Keyboard Mouse N Y Map manipulation

Neal [26] Cubricon

Generalized Augmented Transition Network Procedural Dialog Speech Mouse Keyboard Proximity-based N Y Map manipulation

E Koons [19] No name Parse tree Frame-based Dialog Speech, Eye gaze, Gesture First solution Y Y 3D World

Nigay [28] Pac-Amodeus Melting Pot Frame-based Dialog + low level Speech, Keyboard, MouseContext-based resolution Y N Flight Scheduling

Cohen [9] Quickset Feature Structure Unification Dialog Pen Voice S / G & G / S & N best Y NSimulation System training

Bellik [3] MEDITOR None Frame-based Dialog + low level Speech Mouse History Buffer Y Y Text Editor

Martin [22] TYCOON

Set of processes – Guided Propagation Networks Procedural Dialog Speech Keyboard Mouse

Probability-based resolution Y Y

Edition of graphical user interfaces

Johnston [18] FST Finite State Automata Procedural Dialog Speech pen Possible (N best) Y Y Corporate Directory

T & A Krahnstoever [20] iMap Stream Stamped Frame-based Dialog Speech gesture Not given Y N Crisis Management

Dumas [12] HephaisTK XML Typed (SMUIML) Frame-based Dialog Speech Mouse Phidgets First one Y Y Meeting assistants

Holzapfel [17] No NameTyped Feature Structure Unification Dialog Speech gesture N Best list Y N Humanoid Robot

Pfleger [33] PATE XML Typed Unification Dialog Speech pen N Best list Y Y Bathroom design Tool

Milota [25] No NameMultimodal Parse Tree Unification Dialog

Speech Mouse keyboard Touchscreen S / G & G /S Y N Graphic Design

Melichar [24] WCIMultimodal Generic Dialog Node Unification Dialog Speech Mouse Keyboard First One ? ? Multimedia DB

Sun [37] PUMPP Matrix Unification Dialog Speech gesture S / G N Y Traffic Control

Bourguet [7] Mengine Finite State machine Procedural Low level Speech Mouse Not given N Y No example

Latoschik [21] No NameTemporal Augmented Transition Network Procedural Dialog Speech gesture Fuzzy constraints Y Y Virtual reality

Bouchet [5] [6]Mansoux [23]

ICARE(Input/Output) Melting pot Frame-based Dialog + low level

Speech, Helmet visor HOTAS, Tactile surface, GPS localization, Magnetometer, Mouse, Keyboard

Context-based resolution Y N

Aircraft Cockpit, Authentication, Mobile Augmented Reality systems (Game, Post-it), Augmented Surgery

Navarre [30] Petshop Petri nets Procedural Dialog + low levelSpeech mouse Keyboard Touchscreen *** Y Y Aircraft Cockpit

Flippo [14] No Name Semantic tree Hybrid Dialog Speech Mouse Gaze gestureFeedback for missing data Y N Collaborative Map

Portillo [34] MIMUSFeature Value Structure (DTAC) Hybrid Dialog Speech Mouse Knowledgeable agent Y N

Duarte [11] FAME Behavioral Matrix Hybrid Dialog Speech Mouse Keyboard Not given ? ? Digital talking Book

13

Page 14: Fusion Engines for Input Multimodal Interfaces: a Survey

Special sessionMultimodal Fusion

• Content• A survey• 5 papers

• Schedule • 10 mn introduction and survey outlook• 15 mn per paper + 5 mn questions• 10 mn for questions on the session

D. Lalanne (Switzerland), L. Nigay (France), P. Palanque (France), P. Robinson (UK), J. Vanderdonckt (Belgium)

Page 15: Fusion Engines for Input Multimodal Interfaces: a Survey

Special sessionMultimodal Fusion

• H. Mendonça: Agent-based fusion• B. Dumas: An evaluation framework to

benchmarck fusion engines• L. Nigay: CARE-based fusion• J. Ladry & P. Palanque: Petri net based formal

description and execution of fusion engines• M. Sezgin: Fusion of speech and facial

expression recognition

Page 16: Fusion Engines for Input Multimodal Interfaces: a Survey

16

QUESTIONS?

Page 17: Fusion Engines for Input Multimodal Interfaces: a Survey

Fusion engines: research agenda

• Performance evaluation• Testbeds, metrics• Identification of interpretation errors• Formal predictive evaluation

• Adaptation to context• Dynamic aspect of adaptation• Reconfigurations

• Engineering aspects• Difficult to develop (toolkit from manufacturers required)• Fusion engine tuning (tuning is the key for interaction

techniques e.g. drag&drop)

17

Page 18: Fusion Engines for Input Multimodal Interfaces: a Survey

Fusion Principles

• Notation: Petri nets based (ICOs)• Type: Procedural only• Level: Dialogue and low level• Input Devices: Speech, mice, keyboard,

touch screen • Ambiguity resolution: inside models • Time representation (Quantitative –

Qualitative): Both• Application Type : Safety Critical,

Aeronautics and Space

18