USI module U1-5 Multimodal interaction

50
USI module U1-5 Multimodal interaction Jacques Terken USI module U1, lecture 5

description

USI module U1-5 Multimodal interaction. Jacques Terken USI module U1, lecture 5. Contents. Demos and video clips Multimodal behaviour Multimodal interaction, architecture and multimodal fusion Design heuristics, guidelines and tools. http://www.nuance.com/xmode/demo/# - PowerPoint PPT Presentation

Transcript of USI module U1-5 Multimodal interaction

Page 1: USI module U1-5 Multimodal  interaction

USI module U1-5Multimodal interaction Jacques TerkenUSI module U1, lecture 5

Page 2: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

2

Contents• Demos and video clips• Multimodal behaviour• Multimodal interaction, architecture and

multimodal fusion• Design heuristics, guidelines and tools

Page 4: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

4

Quickset ipaq (ogi – chcc)

Page 5: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

5

Multimodal behaviour • Development of multimodal systems

dependent on knowledge about natural integration patterns that are characteristic for the combined use of different modalities

• Dealing with myths about multimodal interaction:– Oviatt, S.L., “Ten myths of Multimodal interaction”,

Communications of the ACM 42(11), 1999, pp.74-81

Page 6: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

6

Myth 1: If you build a multimodal system, users will interact multimodally.

Dependent on domain:• Spatial domain: 95-100% of the users have a

preference for multimodal interaction; • Other domains: 20% of the commands are

multimodalDependent on type of action:• High MM: adding, moving, modifying objects,

calculating distance between objects• Low MM: printing, scrolling etc.

Multi-Modal Interaction (0H640)

Page 7: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

7Multi-Modal Interaction (0H640)

Page 8: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

8

• Distinction between general, selective and spatial actions

• General: non-object-directed actions (printing etc.)

• Selective: choosing objects • Spatial: manipulation of objects ( adding etc.)

Page 9: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

9

Page 10: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

10

myth 2: Speech & pointing is the dominant multimodal integration pattern.

• Central in Bolt’s speak-and-point interface (“put that there”

• Speak-and-point includes only 14% of spontaneous multimodal actions

• In human communication pointing accounts for appr. 20% of all gestures

• Other actions: handwriting, hand gestures, facial expressions (“Rich” interaction)

Multi-Modal Interaction (0H640)

Page 11: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

11

myth 3: Multimodal input involves simultaneous signals.

Multi-Modal Interaction (0H640)

• Information from different modalities is often sequential

• Often gestures precede speech

Page 12: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

12

myth 4: Speech is the primary input mode in any multimodal system that includes it, and gestures, head and body movement, gaze direction and other input are secondary

• Often speech cannot contain all information (cf. combination of pen + speech)

• Gestures are better for some kinds of information

• Often gestures indicate the context for speech

Multi-Modal Interaction (0H640)

Page 13: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

13

myth 5: Multimodal language does not differ linguistically from unimodal language.

• Users often avoid complicated commands in multimodal interaction

• Multimodal language is often shorter, syntactically more simple, and more fluent– Unimodal: “place a boat dock on the east, no,

west end of reward lake”– Multimodal: [draws rectangle] “add rectangle”

• Multimodal language more easy to process– Less anaphora and indirectness

Multi-Modal Interaction (0H640)

Page 14: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

14

myth 6: Multimodal integration involves redundancy of content between modes.

• Different modalities contribute complementary information:– Speech: subject, object, verb (objects,

actions/operations): – Gesture: Location (spatial info)

• Even in the case of correction only 1% redundancy

Multi-Modal Interaction (0H640)

Page 15: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

15

myth 7: Individual error-prone recognition technologies combine multimodally to produce even greater unreliability.

• Combination of inputs enables mutual disambiguation

• Users choose the least error-prone modality (“leveraging from users’ natural intelligence about when and how to deploy input modes effectively”)

• Combination of error-prone modalities gives in fact a more stable system

Multi-Modal Interaction (0H640)

Page 16: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

16

myth 8: All users’ multimodal commands are integrated in a uniform way.

• Differences between people• Consistent use within people

• Advance detection of integration pattern can result in better recognition

Multi-Modal Interaction (0H640)

Page 17: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

17

Page 18: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

18

myth 9: Different input modes are capable of transmitting comparable content (alt-mode hypothesis).

• Differences between modalities:– Type of information– Functionality during communication– Accuracy of expression – Manner of integration with other modalities

Multi-Modal Interaction (0H640)

Page 19: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

19

myth 10: Enhanced speed and efficiency are the main advantages of multimodal systems.

Applies indeed (to a limited extent) for the spatial domain:

• In multimodal pen/speech interaction speed increase with app. 10%

More important advantages in other domains:• Decrease in errors an non-fluent speech with 35-50%• Possibility of choice of input:

– Less chance of fatigue per modality– Better opportunities for repair– Larger range of users

Multi-Modal Interaction (0H640)

Page 20: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

20

Advantages: Robustness • Individual signal processing technologies error-

prone• Integration of complementary modalities to yield

synergy, capitalizing on the strength of each modality and overcoming weaknesses in the other– Users will select the input mode that they

consider less error prone for particular lexical content

– User’s language is simplified when interacting multimodally

– Users tend to switch modes after system errors, facilitating error recovery

– Users report less frustration when interacting multimodally (greater sense of control)

– Mutual compensation/disambiguation

Page 21: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

21

W3C (see http://www.w3.org/TR/mmi-reqs/ ): Seen from the perspective of the system (how the input

is handled)• Sequential multimodal input

Modality A for action a, next Modality B for action b, each event handled as a separate event

• Simultaneous (Uncoordinated) multimodal inputEach event handled as a separate event. Choice between different modalities at each moment in time

• Composite (coordinated simultaneous) multimodal inputEvents integrated into a single event before interpretation. (“true” multimodality)

Technologies: Types of multimodality

Page 22: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

22

Sequential Simultaneous

Non-coordinated(W3C: supplementary)

Exclusive(W3C: Sequential)

Concurrent(W3C: simultaneous)

Coordinated(W3C: complementary)

alternate Synergistic(W3C: composite)

Coutaz & Nigay

Page 23: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

23

Page 24: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

24

Mutual disambiguation (MD)• Speech input: n-best list

1. Ditch2. Ditches

• Gestural input

• Joint interpretation:1. Ditches

• Benefit may be dependent on situation (e.g. larger for non-native speakers)

Page 25: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

25

Early fusion• Closely coupled and synchronized modalities

such as speech and lip movements• “Feature level” fusion• Based on multiple Hidden Markov Models or

temporal neural networks. Correlation structure between modes can be taken into account automatically via learning

• Problems: modelling complexity, computational intensity, training difficulty

Page 26: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

26

Late fusion• “Semantic level” fusion• Individual recognizers • Sequential integration• Advantages: scalable – individual recognizers

don’t need to be retrained

• Early approaches: multimodal command’s posterior probability is the cross-product of the posterior probabilities of the associated constituents No advantage taken from mutual compensation phenomenon

Page 27: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

27

Architectural requirements for late semantic fusion• Fine-grained timestamping• Sequentially-integrated or simultaneously

delivered• Common representational format for different

modalities• Frame based (multimodal fusion through

unification of feature-structures) Mutual disambiguation

Page 28: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

28

Unificationutterance gesture

Page 29: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

29

Design of multimodal interfaces1. Task analysis

What are the actions that need to be performed?

2. Task allocationWhat party is the most suitable candidate for performing particular actions?

3. Modality allocationWhat modality or combination of modalities is most suited to perform particular actions?

Current presentation focuses on 3

Page 30: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

30

Definition of ‘modality’• Modality as sensory channel

However, stating that particular numeric information should be presented in the visual modality provides little grip

• Hence, the notion of ‘representational modality’ has been proposed (Bernsen), which distinguishes e.g. table and graph as two different modalities

• For the time being, we use ‘modality’ in the more restricted sense of sensory channel, and look for mappings between actions and modalities

Page 31: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

31

Relevant dimensions• Nature of the information• Interaction paradigm• Physical and dialogue context• Platform• Accessibility• Multitasking

Page 32: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

32

Rules of thumb, heuristics• Michaelis and Wiggins (1982)• Cohen and Oviatt (1994)• Suhm (2000)• Larsson (2003)• Reeves, Lai et al. (2004)

• For references see Terken J. “Guidelines and Tools for the Design of Multimodal Interfaces”, Workshop ASIDE2005, Aalborg (DK)

Page 33: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

33

Michaelis and Wiggins (1982)• Speech generation is preferable when the

– message is short.– message will not be referred to later.– messages deal with events in time.– message requires an immediate response.– visual channels of communication are overloaded.– environment is too brightly lit, too poorly lit, subject

to severe vibration, or otherwise unsuitable for transmission of visual information.

– user must be free to move around.– user is subjected to high G forces or anoxia.

• Tentative guidelines for when NOT to use speech may be derived from these suggestions through negation.

Page 34: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

34

Cohen and Oviatt (1994)• spoken communication with machines (both

input and output) may be advantageous:– when the user’s hands or eyes are busy– when only limited keyboard and/or screen is

available– when the user is disabled– when pronunciation is the subject matter of

computer use– when natural language interaction is

preferred

Page 35: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

35

Suhm (2000)Principles for choosing the set of modalities2. Consider speech input for entry of textual data, dialogue-

oriented tasks, and command control. Speech input is generally less efficient for navigation, manipulation of image data. and resolution of object references.

3. Consider written input for corrections, entry of digits, and entry of graphical data (formulas, sketches, etc.)

4. Consider gesture input for indicating scope or type of commands, for resolving deictic object references

5. Consider the traditional modalities (keyboard and mouse input) as alternative, unless superiority of novel modalities (speech, pen input) is proven.

• Principles to circumvent limitations of recognition technology

• Principles for the implementation of Pen-Speech Interfaces

Page 36: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

36

Larsson (2003)• Satisfy Real-world Constraints

– Task-oriented Guidelines – Physical Guidelines – Environmental Guidelines

• Communicate Clearly, Concisely, and Consistently with Users– Consistency Guidelines – Organizational Guidelines

• Help Users Recover Quickly and Efficiently from Errors– Conversational Guidelines– Reliability Guidelines

• Make Users Comfortable– System Status – Human-memory Constraints – Social Guidelines – …

Page 37: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

37

Reeves, Lai et al. (2004)Propose a set of multimodal design principles

that are founded in perception and cognition science (but motivation remains implicit)

Four general areas• Designing multimodal input and output• Adaptivity• Consistency• Feedback• Error prevention/handling

Page 38: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

38

Designing Multimodal Input and Output• Maximize human cognitive and physical abilities.

Designers need to determine how to support intuitive, streamlined interactions based on users' human information processing abilities (including attention, working memory, and decision making) for example:– Avoid unnecessarily presenting information in two

different modalities in cases where the user must simultaneously attend to both sources to comprehend the material being presented; such redundancy can increase cognitive load at the cost of learning the material.

– Maximize the advantages of each modality to reduce user's memory load in certain tasks and situations;

– System visual presentation coupled with user manual input for spatial information and parallel processing;

– System auditory presentation coupled with user speech input for state information, serial processing, attention alerting, or issuing commands.

Page 39: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

39

• Integrate modalities in a manner compatible with user preferences, context, and system functionality. Additional modalities should be added to the system only if they improve satisfaction, efficiency, or other aspects of performance for a given user and context. When using multiple modalities:– Match output to acceptable user input style (for example,

if the user is constrained by a set grammar, do not design a virtual agent to use unconstrained natural language);

– Use multimodal cues to improve collaborative speech (for example, a virtual agent's gaze direction or gesture can guide user turn-taking);

– Ensure system output modalities are well synchronized temporally (for example, map-based display and spoken directions, or virtual display and non-speech audio);

– Ensure that the current system interaction state is shared across modalities and that appropriate information is displayed in order to support: • Users in choosing alternative interaction modalities; • Multidevice and distributed interaction;

Page 40: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

40

3. Theoretical approaches• Modality theory (Bernsen c.s.)

‘Modality’ defined as ‘representational modality’

Page 41: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

41

Modality theory (Bernsen)Aim• Given any particular class of task domain information

which needs to be exchanged between user and system during task performance, identify the set of input/output modalities which constitute an optimal solution to the representation and exchange of that information (Bernsen, 2001).

• Taxonomic analyses: – (representational) Input and output modalities are

characterized in terms of a limited number basic features such as

– linguistic/nonlinguistic, – analogue/non-analogue, – arbitrary/nonarbitrary, – static/dynamic.

Page 42: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

42

• Modality properties can then be applied according to the following procedure:1.       Requirements Specification >2.       Modality Properties + Natural

Intelligence >3. Advice/Insight with respect to modality

choice.

Page 43: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

43

• [MP1] Linguistic input/output modalities have interpretational scope, which makes them eminently suited for conveying abstract information. They are therefore unsuited for conveying high-specificity information including detailed information on spatial manipulation and location.

• [MP2] Linguistic input/output modalities, being unsuited for specifying detailed information on spatial manipulation, lack an adequate vocabulary for describing the manipulations.

• [MP3] Arbitrary input/output modalities impose a learning overhead which increases with the number of arbitrary items to be learned.

• [MP4] Acoustic input/output modalities are omnidirectional.

• [MP5] Acoustic input/output modalities do not require limb (including haptic) or visual activity.

Page 44: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

45

4. Tools• SMALTO (Bernsen)• Multimodal property flowchart (Williams et al.,

2002)

Page 45: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

46

SMALTO• Addresses the “Speech functionality problem”:• SMALTO has been created by taking a large

number of claims or findings from the literature on designing speech or speech-centric interfaces and casting these claims into the structured representation expressing the Speech Functionality Problem

Page 46: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

47

• [Combined speech input/output, speech output, or speech input modalities M1, M2 and/or M3 etc.] or [speech modality M1, M2 and/or M3 etc. in combination with non-speech modalities NSM1, NSM2 and/or NSM3 etc.]

• are [useful or not useful] • for [generic task: GT] • and/or ]speech act type: SA] • and/or [user group: UG] • and/or [interaction mode: IM] • and/or [work environment: WE] • and/or [generic system: GS] • and/or [performance parameter: PP] • and/or [learning parameter: LP] • and/or [cognitive property: CP] • and/or [preferable or non-preferable] to [alternative modalities

AM1, AM2 and/or AM3 etc.]• and/or [useful on conditions] C1, C2 and/or C3 etc.

Page 47: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

48

• SMALTO has been evaluated within the framework of projects involving the creators and in the DISC project

• Informal evidence indicates that it is difficult to apply for “linguistically naïve” designers because of the way the modality properties are formulated

• This was also the motivation for the Modality Property Flowchart (Williams et al. 2002)

Page 48: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

49

Multimodal property flowchart

pdf

Page 49: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

50

• Multimodal interfaces as a particular type of interfaces Multimodal property flowchart needs to be combined with general usability heuristics for interface design (e.g. Nielsen)

Page 50: USI module U1-5 Multimodal  interaction

SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

51

Main points• Multimodal interfaces match the natural

expressivity of human beings • Taxonomy of multimodal interaction• Limitations of signal processing in one

modality can be overcome by taking into consideration input from another modality (multimodal disambiguation)

• Mapping of functionalities onto modalities not always straightforward support from guidelines and tools