Multimodal Interaction: An Introduction

Multimodal Interaction!

Abdallah ‘Abdo’ El Ali

An Introduction!

h"p://staff.science.uva.nl/~elali/

Some slides adapted from: Gabriel Skantze (KTH Royal Institute of Technology, Sweden), Denis Lalanne (University of Fribourg, Switzerland)

Who am I?!

Currently: PhD in Mobile Human-‐Computer Interac<on -‐UvA

Crossmodal Interac=on in Mobile Environments

Msc in Cogni<ve Science -‐ UvA Cogni=on, Language, & Communica=on track

Bsc in English Language & Literature -‐ American University of Beirut

Screenwri=ng, Copywri=ng, Edi=ng

2

Outline!

I. Mul=modal Interac=on & Interfaces

II. Mul=modal Input

III. Mul=modal Output

IV. Prac=cal Ma"ers

3

Multimodal Interaction & Interfaces!

4

A Brief History of Computer Interfaces! Punched cards (late 19th century)

Herman Hollerith -‐ Tabula=ng Machine Company (1896)

The Command Line Interface (1960s)

Sketchpad (1963) by Ivan Sutherland – light-‐pen pointer-‐based system to create and manipulate objects in drawings

Alto personal computer (1973) developed at Xerox PARC

Desktop metaphor, WIMP (windows, icons, menus, poin=ng device)

WYSIWYG

Xerox 8010 Star Informa=on System (1981)

Apple Macintosh (1984)

Windows 1.01 (1987)

Microsoc Windows 3.0 (1990)

Mac OSX (2000’s)

[…] 5

Multimodal Interfaces!

6

7

Project NATAL for Xbox 360

Kinect for Xbox 360 Playstation Move

Playstation EyePet

HCI and Human Characteristics !

HCI is a mul=-‐disciplinary topic Computer Science & AI Cogni=ve Science Sociology Psychology Design […]

In HCI design, important to understand something about

Human informa=on-‐processing (cogni=ve architecture, memory, percep=on, motor skills, etc.)

How human ac=on is structured The nature of human communica=on Human physical and physiological

requirements/constraints

8

Why HCI?! Humans are limited in their

capacity to process informa=on Implica=ons for the interac=on

design Mul=tasking says it all

Important considera=ons Input-‐output channels (senses and

effectors) Memory Learning (acquiring skills) Reasoning / Problem solving

(cogni=ve ac=vity) Decision making

9

Dis=nc=ve aspects of mobile interac=on (Chi"aro, 2010):

Hardware: small screen, limited I/O

Perceptual: noisy street, sunlight reflec=on, no device contact

Motor: voluntary movements when in-‐vehicle, fat-‐finger problem

Social: phone ring at a conference, gestures in front of strangers

Cogni<ve: limited a"en=on span, high stress & load, limited memory

10

Use Case: Mobile Interaction!

Embodiment! Embodied cogni=on, Situated Cogni=on, Embodied Interac=on, EEC, Social Compu=ng, Tangible

Compu=ng, Ac=ve percep=on, […]

Gibson (1979) “The Ecological Approach to Visual Percep=on” “....perceiving is an act not a response, an act of a"en=on, not a triggered impression, an achievement, not

a reflex”

Heidegger (1927) “Being and Time” Present-‐at-‐hand vs. ready-‐to-‐hand e.g., hammer as object (presence) vs. hammer as tool (cogni=ve extension) E.g., mouse as hardware vs. mouse as tool for performing GUI opera=ons

Dourish (1999) “Founda=ons of Embodied Interac=on” “…interac=on is an embodied phenomenon. It happens in the world, and that world (a physical world and a

social world) lends form, substance and meaning to the interac=on.

Sensori-‐motor coordina=on Percep=on for ac=on Ac=on for percep=on

Agent

World

Sensation & Perception! Humans perceive the world through their

senses (sensory input) and act on it through the motor control of their effectors

Five major senses Sight Hearing Touch Taste Smell (Propriocep=on, thermocep=on, nociocep=on, …)

Effectors Limbs (arms, legs, body posi=on, …) Fingers Eyes Head / Face Body Vocal system

12

Man-Machine Interaction!

Interac<on can be seen as a dialog between the computer and the user

Interac=on styles : Command language / Command line

interface Form-‐fills and spreadsheets Menus Natural language and query language Ques=on/answer dialog WIMP Point-‐and-‐click Direct manipula=on 3D interfaces (virtual reality) Brain-‐computer interface

13

Multimodal Interfaces!

Mul<modal Interac<on: the situa=on where the user is provided with mul=ple modes for interac=ng with a system

Mul<modal Interfaces “…process two or more combined user input modes (such as speech, pen, touch, manual gesture, gaze, and head and body movements) in a coordinated manner with mul=media system output. They are a new class of interfaces that aim to recognize naturally occurring forms of human language and behavior, and which incorporate one or more recogni=on-‐based technologies (e.g. speech, pen, vision)” (Ovia" et al., 2002)

14

Multimodality vs. Multimedia!

Modality “refers to the type of communica=on channel used to convey or acquire informa=on. It also covers the way an idea is expressed or perceived, or the manner an ac=on is performed” (Nigay & Coutaz, 1993) Visual, Auditory, Hap=c, etc. Mul=-‐ refers to 2 or more such modali=es used

Mode “refers to a state that determines the way informa=on is interpreted to extract or convey meaning” (Nigay & Coutaz, 1993)

Mul<media “focuses on the medium or technology rather than the applica0on or user” (Buxton, 1986) e.g., sound clip a"ached to a presenta=on Media channels: Text, graphics, anima=on, video, etc.

15

16

Speech and gestures used simultaneously

Early Example!

“Put That There” system (Bolt, 1980)

Why Multimodal Interaction?!

Advantages over GUI and unimodal systems:

Natural/realism: making use of more (appropriate) senses

New ways of interac=ng Flexible: different modali=es excel at

different tasks Wearable computers and small devices

e.g., keyboard typing devices require training Helps the visually/physically impaired Faster, more efficient, higher informa=on

processing bandwidth Robust: mutual disambigua=on of

recogni=on errors Mul=modal interfaces are more engaging

17

Why Multimodal Interaction?!

Human – Human protocols Ini0a0ng conversa0on,

turn-‐taking, interrup0ng, direc0ng a:en0on, …

Human – Computer protocols Shell interac0on, drag-‐and-‐

drop, dialog boxes, …

18

Use more of users’ senses Users perceive multiple things at once Users do multiple things at once (e.g., speak and use

hand gestures, body position, orientation, and gaze)

Questions?!

19

Multimodal Input!

20

Multimodal Input Overview!

Mul=modal Input: allows humans to

communicate naturally

provides user with mul=ple input modali=es

permits mul=ple styles of interac=on

may be simultaneous or not

must consider modality fusion and temporal constraints

21

Multimodal Input! Poin=ng (deixis), (Mul=-‐)Touch Mo=on controller

Accelerometer, gyro

Speech Free form, fixed, non-‐speech sounds

Body movement/Gestures Gait, posture

Head posi=on & movements Facial expression, Gaze

Tangibles Digital pen and paper

Biometrics Sweat, pulse, respira=on, skin conductance

Brain ac=vity (neural) EEG signals, fMRI signals, blood oxygena=on

Scent? Odor detec=on

Taste?

22

Speech and Gesture Interaction! Speech User sa=sfac=on is highly dependant on their profiles and tasks The learning rate is fast Error handling is getng be"er

Perceptual & social usage constraints are important (ambient noise, confiden=ality, disturbance, etc.)

Good spoken languages: short sentences with prosody clearly demarca=ng end of words

Gesture Habits are inherited from the usage of mouse Gesture poin=ng is direct and reliable (deixis) Gesture signs may not be natural making recogni=on hard

23

Fundamental Problems !

Aligning HCI tasks with modali<es (and vice versa)

Aligning mul=modal usage to user profiles (and vice versa)

Mul<modal Fusion the integra=on of communica=on modali=es in interac=ve systems Input

Mul<modal Fission the repar==oning of informa=on among several communica=on modali=es Output

24

Multimodal Man-Machine Interaction Model!

25 (Dumas et al., 2009)

Levels of Multimodal Fusion!

26

Data Level: e.g., combining 2 webcam video streams, mul=ple

perspec=ves

Feature level: e.g., combining speech and lip movements

Decision level: e.g., combining gestures and speech

Unimodal or Multimodal?!

27

MATCH: Multimodal Access to City Help (Johnston et al., 2002)!

28

Interac=ve city guide and naviga=on applica=on: provides restaurant and subway informa=on for NY and DC

Dynamic map-‐based interface on tablet Input modali=es:

Speech, pen gesture, handwri=ng, GUI Commands can be speech, pen, or

mul=modal Visual parsing of complex gestural input

Output modali=es: Coordinated mul=modal output combining

synthe=c speech and dynamic graphics Example:

Speech: “show inexpensive italian places in chelsea”

Mul=modal: “cheap italian places in this area” (pen gesture; right)

NUMACK (Foster and White, 2005)!

29

NUMACK (Northwestern University Mul=modal Autonomous Conversa=onal Kiosk)

Embodied Conversa=on Agent (ECA) that gives direc=ons around Northwestern's Campus

Combina=on of speech, gestures and facial expressions

Uses a grammar-‐based, computa=onal model of language and gesture planning system

NUMACK's verbal, non-‐verbal and mul=modal behaviors realized through synthesized speech and kinema=c body model

System updates its model of context and the world by fusing mul=modal user input

Stereoscopic, head-‐tracking system Speech Pen

Multimodal Input Advantages!

Improved error handling & efficiency fewer errors faster task comple=on

Greater expressive power Greater precision in visual-‐spa=al tasks (e.g., map

scrolling & item localiza=on) Support for users’ preferred interac=on style Accommoda=on to diverse users, tasks & usage

environments e.g., accented speakers & mobile environments

Shorter & less complex linguis=c construc=ons e.g., fewer loca=ve descrip=ons

30

Questions?!

31

Multimodal Output!

32

Multimodal Output! Visual

Text Graphics Anima=ons Virtual/Augmented Reality

Auditory Speech (e.g., Embodied

Conversa=onal Agent) Non-‐speech Sound

Hap=cs (tac=le) Force feedback (e.g., PS3

controller) Vibrotac=le (e.g., phone vibrate)

Scent? Scented mobile phones

Taste?

33

Multimodal Output!

Advantages (Sarter, 2006; Ovia", 2002): Synergy Redundancy Higher Informa=on bandwidth

Wicken’s Mul=ple Resource Theory (1984)

More modali=es = be"er? Higher resource compe==on

when people have to a"end to two sources at once (Reeves et al., 2004).

34

Mobile Multimodal Interfaces! Mobile context means a"en=onal

and memory resources are limited (Tamminen et al., 2004)

E.g., map scrolling, talking with friend, crossing the street

Poten=al of mul=modal feedback cues in: 1. addressing issues of accessibility (e.g., to

support blind users in naviga=on) (Magnusson et al., 2009)

2. developing pedestrian naviga=on aids to support situa=onal impairment and awareness (Brewster et al., 2003)

Examples: Pocket Navigator (Pielot et al, 2010) AudioGPS (Holland et al., 2002)

35

http://feelspace.cogsci.uni-osnabrueck.de/

http://www.lalyagaye.com/

Tactile and Non-Speech Auditory Feedback!

36

Tactons: “Structured, abstract messages that can be used to communicate non-‐visually” (Brown, 2005). Informa=on encoded in parameters such as:

Waveform, dura=on, rhythm, spa=al loca=on, frequency, […]

Earcons: “Non-‐verbal audio messages that are used in the computer/user interface to provide informa0on to the user about some computer object, opera0on or interac0on" (Bla"ner, 1989). Informa=on encoded in:

Pitch, amplitude, dura=on, spa=al loca=on, […]

Amodal parameters: consist of informa=on that is not specific to any one sensory modality (Lewkowickz, 1994). Parameters common to both tac=le and auditory domains (Lewkowickz, 1994; Hoggan et al., 2009):

Spa=al loca=on, rhythm, texture, dura=on, frequency, intensity/amplitude

Crossmodal Interaction!

37

Subset of mul=modal interac=on where the

senses receive the ‘same’ informa=on content across invoked sensory modali=es (Gibson, 1966; Lewkowicz,1994)

Cf., Sensory Subs=tu=on (Visell, 2008) vOICe: Seeing with Sound applica=on; Braille

Crossmodal Interac=on refers to situa=ons where characteris=cs of one sensory modality may be bi-‐direc=onally transformed into the characteris=cs of another (e.g., audio ⇿ tac=le) (Hoggan, 2007; 2009) Redundancy

Crossmodal Output Advantages!

Crossmodal output advantages: Unlike mul=modal interac=on,

li"le risk of informa=on processing overload

When one sensory modality is knocked out (e.g., noise environment, body contact), informa=on is s=ll received

Permits both ‘eyes-‐free’ and ‘hands-‐free’ interac=on

38

Questions?!

39

Practical Matters !

40

Multimodal Input Research Areas!

41

Applied Machine Learning

Speech Recogni=on, Speech Synthesis

Gesture Recogni=on, Mo=on Tracking

Head, Gait and Pose Es=ma=on

Mul=modal Fusion

HCI

Usability issues in diverse tasks

Social acceptability

Context-‐aware and ubiquitous compu=ng (which modality to use when)

Design/Prototyping of Mul=modal Interfaces (e.g., wizard of Oz)

Multimodal Output Research Areas!

42

Virtual and Mixed Reality (Immersive

Environments) Embodied Conversa=on Agents

Hap=cs: force-‐feedback, vibrotac=le feedback

Audio: feedback, synthesis

Crossmodal Integra=on

HCI (Usability, Ssa<sfac<on)

Mul=modal Feedback (in-‐vehicle/pedestrian naviga=on, safety and control, surgery, ergonomics, etc.)

Crossmodal Feedback

(Mobile) Mul=modal Interface design

International Communities!

43

CHI: ACM CHI Conference on Human Factors in Compu=ng

Systems

MobileHCI: ACM conference on Human-‐computer interac=on with mobile devices and services

ICMI: ACM Interna=onal Conference on Mul=modal Interac=on

CSCW: ACM Conference on Computer Supported Coopera=ve Work

ACM MM: ACM Mul=media Conference

INTERACT: IFIP conference on Human-‐Computer Interac=on

WHC: World Hap=cs Conference

Resources!

44

Books: Paul Dourish (2004) “Where the Ac=on is: The founda=ons of

embodied interac=on” Andy Clark (2003) “Natural-‐Born Cyborgs: Minds, Technologies,

and the Future of Human Intelligence” Bill Buxton (2007) “Sketching User Experiences: Getng the

design right and the right design” Adam Greenfield (2006) “Everyware: The dawning age of

ubiquitous compu=ng”

Ar<cles: Mark Weiser (1991) “The Computer for the 21st Century”,

Scien0fic American Sharon Ovia" (2002) “Perceptual user interfaces: mul=modal

interfaces that process what comes naturally”, Communica=ons of the ACM

Sharon Ovia" (1999) “Ten myths of mul=modal interac=on”, Communica=ons of the ACM

Nadine Sarter (2006) “Mul=modal informa=on presenta=on: Design guidance and research challenges”, Interna=onal Journal of Industrial Ergonomics

Leah Reeves et al. (2004) “Guidelines for mul=modal user interface design”, Communica=ons of the ACM

Summary!

45

We are embodied and embedded creatures, and this influences the way we interact with the world and computa=onal ar=facts

Mul<modal Interfaces aim at making communica=on with machines more natural, more efficient, and more engaging

Mul<modal Input and Output focus on different aspects within HCI, requiring different skill sets, but mul=modal research and development requires both

Mul<modal Interac<on is an exci=ng and rapidly growing area that hugely benefits from HCI work

The Future of Computing is Multimodal…!

46

Contact!

47

Address:

Room C3.258, Informa=cs Ins=tute, Science Park 904, 1098 XH Amsterdam, NL

e: [email protected]

w: h"p://staff.science.uva.nl/~elali/

t: +31 (0)20 525 8661

Slides available at: h"p://staff.science.uva.nl/~elali/hci_abdo_2011.pdf

Abdo El Ali

References (1)!Bla"ner, M. M., Sumikawa, D. A., & Greenberg, R. M. (1989). Earcons and icons: Their structure and common design

principles. Human-‐Computer Interac=on, 4, 1, 11-‐44 Bolt., R. A. (1980). “Put-‐that-‐there”: Voice and gesture at the graphics interface. SIGGRAPH Comput. Graph. 14, 3,

262-‐270. Brown, L. M., Brewster, S. A. and Purchase, H. C. (2005). A First Inves=ga=on into the Effec=veness of Tactons. In

Proceedings of the First Joint Eurohap=cs Conference and Symposium on Hap=c Interfaces for Virtual Environment and Teleoperator Systems (WHC '05). IEEE Computer Society, Washington, DC, USA, 167-‐176.

Brewster, S., Lumsden, J., Bell, M., Hall, M., and Tasker, S. (2003.) Mul=modal 'eyes-‐free’ interac=on techniques for wearable devices. In Proc. of CHI '03. ACM Press, New York, NY.

Buxton, W. (1986) There's More to Interac=on than Meets the Eye: Some Issues in Manual Input. In Norman, D. A. and Draper, S. W. (Eds.), (1986), User Centered System Design: New Perspec=ves on Human-‐Computer Interac=on. Lawrence Erlbaum Associates, Hillsdale, New Jersey, pp. 319-‐337.

Chi"aro, L. (2009). Dis=nc=ve aspects of mobile interac=on and their implica=ons for the design of mul=modal interfaces. Journal on Mul=modal User Interfaces, 3(3), 157-‐165.

Dourish, P. (2000). Embodied Interac=on: Exploring the Founda=ons of a New Approach to HCI. Transac=ons on Computer-‐Human Interac=on.

Dumas, B., Lalanne, D. and Ovia", S. (2009). Mul=modal Interfaces: A Survey of Principles, Models and Frameworks. In Human Machine Interac=on, Denis Lalanne and Jorg Kohlas (Eds.). Lecture Notes In Computer Science, Vol. 5440. Springer-‐Verlag, Berlin, Heidelberg 3-‐26.

Gibson, J. J. (1966). The Senses Considered as Perceptual Systems. Houghton Mifflin, Boston.

Gibson, J. J. (1979). The Ecological Approach to Visual Percep=on. Houghton Mifflin, Boston.

Heidegger, M. (1927). Being and Time. Trans. by John Macquarrie & Edward Robinson, London: SCM Press, 1962).

Hoggan, E. and Brewster, S.A. (2007) Designing Audio and Tac=le Crossmodal Icons for Mobile Devices. In ACM Interna=onal Conference on Mul=modal Interfaces (Nagoya, Japan). ACM Press, pp 162-‐169

48

References (2)!Hoggan, E., Raisamo, R. and Brewster, S.A (2009). Mapping Informa=on to Audio and Tac=le Icons. In Proceedings of ACM ICMI 2009 (Cambridge, MA, USA). ACM Press, pp 327-‐334

Holland, S., Morse, D. R., and Gedenryd, H. (2002). AudioGPS: Spa=al audio naviga=on with a minimal a"en=on interface. Personal Ubiquitous Comput., 6(4):253–259, 2002

Kopp, S., Tepper, P. and Cassell, J. (2004). "Towards Integrated Microplanning of Language and Iconic Gesture for Mul=modal Output.“ ICMI 2004.

Lewkowicz, D. J. (1994). Development of intersensory percep=on in human infants. In Lewkowicz, D. J. & Lickliter, R. (Eds.). Development of Intersensory Percep=on: Compara=ve Perspec=ves, Norwood, N.J.: Lawrence Erlbaum Associates

Magnusson, C., Tollmar, K., Brewster, S., Sarjakoski, T., Sarjakoski, T., & Roselier, S. (2009). Exploring future challenges for hap=c, audio and visual interfaces for mobile maps and loca=on based services. In Proceedings of the 2nd interna=onal workshop on loca=on and the web (pp. 8:1{8:4). New York, NY, USA: ACM.

Nigay, L. and Coutaz, J. (1993). A design space for mul=modal systems: concurrent processing and data fusion. In Proceedings of the INTERACT '93 and CHI '93 conference on Human factors in compu=ng systems (CHI '93). ACM, New York, NY, USA, 172-‐178.

Pielot, M., Krull, O. and Boll, S. (2010b). Where is my team: suppor=ng situa=on awareness with tac=le displays. In Proceedings of the 28th interna0onal conference on Human factors in compu0ng systems (CHI '10). ACM, New York, NY, USA, 1705-‐1714.

Pielot, M, Poppinga, B., and Boll, S. (2010). PocketNavigator: Vibro-‐Tac=le Waypoint Naviga=on for Everyday Mobile Devices, Mobile HCI 2010, Lisboa, Portugal.

Reeves, L. M., KLai, J., Larson, J. A., Ovia", S., Balaji, T. S., Buisine, S., Collings,P., Kraal, B., Mar=n, J. C., McTear, M., Raman, T. V., Stanney, K. M., Su, H., and Wang, Q. Y. Guidelines for Mul=modal User Interface Design. Commun. ACM 47(1)(2004), 57 – 59.

Visell. Y. (2009). Tac=le sensory subs=tu=on: Models for enac=on in HCI. Interact. Comput. 21, 1-‐2, p.38-‐53.

49

Multimodal Interaction: An Introduction

Technology

Transcript of Multimodal Interaction: An Introduction