Multimodal Interaction: An Introduction
-
Upload
abdo-el-ali -
Category
Technology
-
view
119 -
download
0
description
Transcript of Multimodal Interaction: An Introduction
Multimodal Interaction!
Abdallah ‘Abdo’ El Ali
An Introduction!
h"p://staff.science.uva.nl/~elali/
Some slides adapted from: Gabriel Skantze (KTH Royal Institute of Technology, Sweden), Denis Lalanne (University of Fribourg, Switzerland)
Who am I?!
Currently: PhD in Mobile Human-‐Computer Interac<on -‐UvA
Crossmodal Interac=on in Mobile Environments
Msc in Cogni<ve Science -‐ UvA Cogni=on, Language, & Communica=on track
Bsc in English Language & Literature -‐ American University of Beirut
Screenwri=ng, Copywri=ng, Edi=ng
2
Outline!
I. Mul=modal Interac=on & Interfaces
II. Mul=modal Input
III. Mul=modal Output
IV. Prac=cal Ma"ers
3
Multimodal Interaction & Interfaces!
4
A Brief History of Computer Interfaces! Punched cards (late 19th century)
Herman Hollerith -‐ Tabula=ng Machine Company (1896)
The Command Line Interface (1960s)
Sketchpad (1963) by Ivan Sutherland – light-‐pen pointer-‐based system to create and manipulate objects in drawings
Alto personal computer (1973) developed at Xerox PARC
Desktop metaphor, WIMP (windows, icons, menus, poin=ng device)
WYSIWYG
Xerox 8010 Star Informa=on System (1981)
Apple Macintosh (1984)
Windows 1.01 (1987)
Microsoc Windows 3.0 (1990)
Mac OSX (2000’s)
[…] 5
Multimodal Interfaces!
6
7
Project NATAL for Xbox 360
Kinect for Xbox 360 Playstation Move
Playstation EyePet
HCI and Human Characteristics !
HCI is a mul=-‐disciplinary topic Computer Science & AI Cogni=ve Science Sociology Psychology Design […]
In HCI design, important to understand something about
Human informa=on-‐processing (cogni=ve architecture, memory, percep=on, motor skills, etc.)
How human ac=on is structured The nature of human communica=on Human physical and physiological
requirements/constraints
8
Why HCI?! Humans are limited in their
capacity to process informa=on Implica=ons for the interac=on
design Mul=tasking says it all
Important considera=ons Input-‐output channels (senses and
effectors) Memory Learning (acquiring skills) Reasoning / Problem solving
(cogni=ve ac=vity) Decision making
9
Dis=nc=ve aspects of mobile interac=on (Chi"aro, 2010):
Hardware: small screen, limited I/O
Perceptual: noisy street, sunlight reflec=on, no device contact
Motor: voluntary movements when in-‐vehicle, fat-‐finger problem
Social: phone ring at a conference, gestures in front of strangers
Cogni<ve: limited a"en=on span, high stress & load, limited memory
10
Use Case: Mobile Interaction!
Embodiment! Embodied cogni=on, Situated Cogni=on, Embodied Interac=on, EEC, Social Compu=ng, Tangible
Compu=ng, Ac=ve percep=on, […]
Gibson (1979) “The Ecological Approach to Visual Percep=on” “....perceiving is an act not a response, an act of a"en=on, not a triggered impression, an achievement, not
a reflex”
Heidegger (1927) “Being and Time” Present-‐at-‐hand vs. ready-‐to-‐hand e.g., hammer as object (presence) vs. hammer as tool (cogni=ve extension) E.g., mouse as hardware vs. mouse as tool for performing GUI opera=ons
Dourish (1999) “Founda=ons of Embodied Interac=on” “…interac=on is an embodied phenomenon. It happens in the world, and that world (a physical world and a
social world) lends form, substance and meaning to the interac=on.
Sensori-‐motor coordina=on Percep=on for ac=on Ac=on for percep=on
Agent
World
Sensation & Perception! Humans perceive the world through their
senses (sensory input) and act on it through the motor control of their effectors
Five major senses Sight Hearing Touch Taste Smell (Propriocep=on, thermocep=on, nociocep=on, …)
Effectors Limbs (arms, legs, body posi=on, …) Fingers Eyes Head / Face Body Vocal system
12
Man-Machine Interaction!
Interac<on can be seen as a dialog between the computer and the user
Interac=on styles : Command language / Command line
interface Form-‐fills and spreadsheets Menus Natural language and query language Ques=on/answer dialog WIMP Point-‐and-‐click Direct manipula=on 3D interfaces (virtual reality) Brain-‐computer interface
13
Multimodal Interfaces!
Mul<modal Interac<on: the situa=on where the user is provided with mul=ple modes for interac=ng with a system
Mul<modal Interfaces “…process two or more combined user input modes (such as speech, pen, touch, manual gesture, gaze, and head and body movements) in a coordinated manner with mul=media system output. They are a new class of interfaces that aim to recognize naturally occurring forms of human language and behavior, and which incorporate one or more recogni=on-‐based technologies (e.g. speech, pen, vision)” (Ovia" et al., 2002)
14
Multimodality vs. Multimedia!
Modality “refers to the type of communica=on channel used to convey or acquire informa=on. It also covers the way an idea is expressed or perceived, or the manner an ac=on is performed” (Nigay & Coutaz, 1993) Visual, Auditory, Hap=c, etc. Mul=-‐ refers to 2 or more such modali=es used
Mode “refers to a state that determines the way informa=on is interpreted to extract or convey meaning” (Nigay & Coutaz, 1993)
Mul<media “focuses on the medium or technology rather than the applica0on or user” (Buxton, 1986) e.g., sound clip a"ached to a presenta=on Media channels: Text, graphics, anima=on, video, etc.
15
16
Speech and gestures used simultaneously
Early Example!
“Put That There” system (Bolt, 1980)
Why Multimodal Interaction?!
Advantages over GUI and unimodal systems:
Natural/realism: making use of more (appropriate) senses
New ways of interac=ng Flexible: different modali=es excel at
different tasks Wearable computers and small devices
e.g., keyboard typing devices require training Helps the visually/physically impaired Faster, more efficient, higher informa=on
processing bandwidth Robust: mutual disambigua=on of
recogni=on errors Mul=modal interfaces are more engaging
17
Why Multimodal Interaction?!
Human – Human protocols Ini0a0ng conversa0on,
turn-‐taking, interrup0ng, direc0ng a:en0on, …
Human – Computer protocols Shell interac0on, drag-‐and-‐
drop, dialog boxes, …
18
Use more of users’ senses Users perceive multiple things at once Users do multiple things at once (e.g., speak and use
hand gestures, body position, orientation, and gaze)
Questions?!
19
Multimodal Input!
20
Multimodal Input Overview!
Mul=modal Input: allows humans to
communicate naturally
provides user with mul=ple input modali=es
permits mul=ple styles of interac=on
may be simultaneous or not
must consider modality fusion and temporal constraints
21
Multimodal Input! Poin=ng (deixis), (Mul=-‐)Touch Mo=on controller
Accelerometer, gyro
Speech Free form, fixed, non-‐speech sounds
Body movement/Gestures Gait, posture
Head posi=on & movements Facial expression, Gaze
Tangibles Digital pen and paper
Biometrics Sweat, pulse, respira=on, skin conductance
Brain ac=vity (neural) EEG signals, fMRI signals, blood oxygena=on
Scent? Odor detec=on
Taste?
22
Speech and Gesture Interaction! Speech User sa=sfac=on is highly dependant on their profiles and tasks The learning rate is fast Error handling is getng be"er
Perceptual & social usage constraints are important (ambient noise, confiden=ality, disturbance, etc.)
Good spoken languages: short sentences with prosody clearly demarca=ng end of words
Gesture Habits are inherited from the usage of mouse Gesture poin=ng is direct and reliable (deixis) Gesture signs may not be natural making recogni=on hard
23
Fundamental Problems !
Aligning HCI tasks with modali<es (and vice versa)
Aligning mul=modal usage to user profiles (and vice versa)
Mul<modal Fusion the integra=on of communica=on modali=es in interac=ve systems Input
Mul<modal Fission the repar==oning of informa=on among several communica=on modali=es Output
24
Multimodal Man-Machine Interaction Model!
25 (Dumas et al., 2009)
Levels of Multimodal Fusion!
26
Data Level: e.g., combining 2 webcam video streams, mul=ple
perspec=ves
Feature level: e.g., combining speech and lip movements
Decision level: e.g., combining gestures and speech
Unimodal or Multimodal?!
27
MATCH: Multimodal Access to City Help (Johnston et al., 2002)!
28
Interac=ve city guide and naviga=on applica=on: provides restaurant and subway informa=on for NY and DC
Dynamic map-‐based interface on tablet Input modali=es:
Speech, pen gesture, handwri=ng, GUI Commands can be speech, pen, or
mul=modal Visual parsing of complex gestural input
Output modali=es: Coordinated mul=modal output combining
synthe=c speech and dynamic graphics Example:
Speech: “show inexpensive italian places in chelsea”
Mul=modal: “cheap italian places in this area” (pen gesture; right)
NUMACK (Foster and White, 2005)!
29
NUMACK (Northwestern University Mul=modal Autonomous Conversa=onal Kiosk)
Embodied Conversa=on Agent (ECA) that gives direc=ons around Northwestern's Campus
Combina=on of speech, gestures and facial expressions
Uses a grammar-‐based, computa=onal model of language and gesture planning system
NUMACK's verbal, non-‐verbal and mul=modal behaviors realized through synthesized speech and kinema=c body model
System updates its model of context and the world by fusing mul=modal user input
Stereoscopic, head-‐tracking system Speech Pen
Multimodal Input Advantages!
Improved error handling & efficiency fewer errors faster task comple=on
Greater expressive power Greater precision in visual-‐spa=al tasks (e.g., map
scrolling & item localiza=on) Support for users’ preferred interac=on style Accommoda=on to diverse users, tasks & usage
environments e.g., accented speakers & mobile environments
Shorter & less complex linguis=c construc=ons e.g., fewer loca=ve descrip=ons
30
Questions?!
31
Multimodal Output!
32
Multimodal Output! Visual
Text Graphics Anima=ons Virtual/Augmented Reality
Auditory Speech (e.g., Embodied
Conversa=onal Agent) Non-‐speech Sound
Hap=cs (tac=le) Force feedback (e.g., PS3
controller) Vibrotac=le (e.g., phone vibrate)
Scent? Scented mobile phones
Taste?
33
Multimodal Output!
Advantages (Sarter, 2006; Ovia", 2002): Synergy Redundancy Higher Informa=on bandwidth
Wicken’s Mul=ple Resource Theory (1984)
More modali=es = be"er? Higher resource compe==on
when people have to a"end to two sources at once (Reeves et al., 2004).
34
Mobile Multimodal Interfaces! Mobile context means a"en=onal
and memory resources are limited (Tamminen et al., 2004)
E.g., map scrolling, talking with friend, crossing the street
Poten=al of mul=modal feedback cues in: 1. addressing issues of accessibility (e.g., to
support blind users in naviga=on) (Magnusson et al., 2009)
2. developing pedestrian naviga=on aids to support situa=onal impairment and awareness (Brewster et al., 2003)
Examples: Pocket Navigator (Pielot et al, 2010) AudioGPS (Holland et al., 2002)
35
http://feelspace.cogsci.uni-osnabrueck.de/
http://www.lalyagaye.com/
Tactile and Non-Speech Auditory Feedback!
36
Tactons: “Structured, abstract messages that can be used to communicate non-‐visually” (Brown, 2005). Informa=on encoded in parameters such as:
Waveform, dura=on, rhythm, spa=al loca=on, frequency, […]
Earcons: “Non-‐verbal audio messages that are used in the computer/user interface to provide informa0on to the user about some computer object, opera0on or interac0on" (Bla"ner, 1989). Informa=on encoded in:
Pitch, amplitude, dura=on, spa=al loca=on, […]
Amodal parameters: consist of informa=on that is not specific to any one sensory modality (Lewkowickz, 1994). Parameters common to both tac=le and auditory domains (Lewkowickz, 1994; Hoggan et al., 2009):
Spa=al loca=on, rhythm, texture, dura=on, frequency, intensity/amplitude
Crossmodal Interaction!
37
Subset of mul=modal interac=on where the
senses receive the ‘same’ informa=on content across invoked sensory modali=es (Gibson, 1966; Lewkowicz,1994)
Cf., Sensory Subs=tu=on (Visell, 2008) vOICe: Seeing with Sound applica=on; Braille
Crossmodal Interac=on refers to situa=ons where characteris=cs of one sensory modality may be bi-‐direc=onally transformed into the characteris=cs of another (e.g., audio ⇿ tac=le) (Hoggan, 2007; 2009) Redundancy
Crossmodal Output Advantages!
Crossmodal output advantages: Unlike mul=modal interac=on,
li"le risk of informa=on processing overload
When one sensory modality is knocked out (e.g., noise environment, body contact), informa=on is s=ll received
Permits both ‘eyes-‐free’ and ‘hands-‐free’ interac=on
38
Questions?!
39
Practical Matters !
40
Multimodal Input Research Areas!
41
Applied Machine Learning
Speech Recogni=on, Speech Synthesis
Gesture Recogni=on, Mo=on Tracking
Head, Gait and Pose Es=ma=on
Mul=modal Fusion
HCI
Usability issues in diverse tasks
Social acceptability
Context-‐aware and ubiquitous compu=ng (which modality to use when)
Design/Prototyping of Mul=modal Interfaces (e.g., wizard of Oz)
Multimodal Output Research Areas!
42
Virtual and Mixed Reality (Immersive
Environments) Embodied Conversa=on Agents
Hap=cs: force-‐feedback, vibrotac=le feedback
Audio: feedback, synthesis
Crossmodal Integra=on
HCI (Usability, Ssa<sfac<on)
Mul=modal Feedback (in-‐vehicle/pedestrian naviga=on, safety and control, surgery, ergonomics, etc.)
Crossmodal Feedback
(Mobile) Mul=modal Interface design
International Communities!
43
CHI: ACM CHI Conference on Human Factors in Compu=ng
Systems
MobileHCI: ACM conference on Human-‐computer interac=on with mobile devices and services
ICMI: ACM Interna=onal Conference on Mul=modal Interac=on
CSCW: ACM Conference on Computer Supported Coopera=ve Work
ACM MM: ACM Mul=media Conference
INTERACT: IFIP conference on Human-‐Computer Interac=on
WHC: World Hap=cs Conference
Resources!
44
Books: Paul Dourish (2004) “Where the Ac=on is: The founda=ons of
embodied interac=on” Andy Clark (2003) “Natural-‐Born Cyborgs: Minds, Technologies,
and the Future of Human Intelligence” Bill Buxton (2007) “Sketching User Experiences: Getng the
design right and the right design” Adam Greenfield (2006) “Everyware: The dawning age of
ubiquitous compu=ng”
Ar<cles: Mark Weiser (1991) “The Computer for the 21st Century”,
Scien0fic American Sharon Ovia" (2002) “Perceptual user interfaces: mul=modal
interfaces that process what comes naturally”, Communica=ons of the ACM
Sharon Ovia" (1999) “Ten myths of mul=modal interac=on”, Communica=ons of the ACM
Nadine Sarter (2006) “Mul=modal informa=on presenta=on: Design guidance and research challenges”, Interna=onal Journal of Industrial Ergonomics
Leah Reeves et al. (2004) “Guidelines for mul=modal user interface design”, Communica=ons of the ACM
Summary!
45
We are embodied and embedded creatures, and this influences the way we interact with the world and computa=onal ar=facts
Mul<modal Interfaces aim at making communica=on with machines more natural, more efficient, and more engaging
Mul<modal Input and Output focus on different aspects within HCI, requiring different skill sets, but mul=modal research and development requires both
Mul<modal Interac<on is an exci=ng and rapidly growing area that hugely benefits from HCI work
The Future of Computing is Multimodal…!
46
Contact!
47
Address:
Room C3.258, Informa=cs Ins=tute, Science Park 904, 1098 XH Amsterdam, NL
w: h"p://staff.science.uva.nl/~elali/
t: +31 (0)20 525 8661
Slides available at: h"p://staff.science.uva.nl/~elali/hci_abdo_2011.pdf
Abdo El Ali
References (1)!Bla"ner, M. M., Sumikawa, D. A., & Greenberg, R. M. (1989). Earcons and icons: Their structure and common design
principles. Human-‐Computer Interac=on, 4, 1, 11-‐44 Bolt., R. A. (1980). “Put-‐that-‐there”: Voice and gesture at the graphics interface. SIGGRAPH Comput. Graph. 14, 3,
262-‐270. Brown, L. M., Brewster, S. A. and Purchase, H. C. (2005). A First Inves=ga=on into the Effec=veness of Tactons. In
Proceedings of the First Joint Eurohap=cs Conference and Symposium on Hap=c Interfaces for Virtual Environment and Teleoperator Systems (WHC '05). IEEE Computer Society, Washington, DC, USA, 167-‐176.
Brewster, S., Lumsden, J., Bell, M., Hall, M., and Tasker, S. (2003.) Mul=modal 'eyes-‐free’ interac=on techniques for wearable devices. In Proc. of CHI '03. ACM Press, New York, NY.
Buxton, W. (1986) There's More to Interac=on than Meets the Eye: Some Issues in Manual Input. In Norman, D. A. and Draper, S. W. (Eds.), (1986), User Centered System Design: New Perspec=ves on Human-‐Computer Interac=on. Lawrence Erlbaum Associates, Hillsdale, New Jersey, pp. 319-‐337.
Chi"aro, L. (2009). Dis=nc=ve aspects of mobile interac=on and their implica=ons for the design of mul=modal interfaces. Journal on Mul=modal User Interfaces, 3(3), 157-‐165.
Dourish, P. (2000). Embodied Interac=on: Exploring the Founda=ons of a New Approach to HCI. Transac=ons on Computer-‐Human Interac=on.
Dumas, B., Lalanne, D. and Ovia", S. (2009). Mul=modal Interfaces: A Survey of Principles, Models and Frameworks. In Human Machine Interac=on, Denis Lalanne and Jorg Kohlas (Eds.). Lecture Notes In Computer Science, Vol. 5440. Springer-‐Verlag, Berlin, Heidelberg 3-‐26.
Gibson, J. J. (1966). The Senses Considered as Perceptual Systems. Houghton Mifflin, Boston.
Gibson, J. J. (1979). The Ecological Approach to Visual Percep=on. Houghton Mifflin, Boston.
Heidegger, M. (1927). Being and Time. Trans. by John Macquarrie & Edward Robinson, London: SCM Press, 1962).
Hoggan, E. and Brewster, S.A. (2007) Designing Audio and Tac=le Crossmodal Icons for Mobile Devices. In ACM Interna=onal Conference on Mul=modal Interfaces (Nagoya, Japan). ACM Press, pp 162-‐169
48
References (2)!Hoggan, E., Raisamo, R. and Brewster, S.A (2009). Mapping Informa=on to Audio and Tac=le Icons. In Proceedings of ACM ICMI 2009 (Cambridge, MA, USA). ACM Press, pp 327-‐334
Holland, S., Morse, D. R., and Gedenryd, H. (2002). AudioGPS: Spa=al audio naviga=on with a minimal a"en=on interface. Personal Ubiquitous Comput., 6(4):253–259, 2002
Kopp, S., Tepper, P. and Cassell, J. (2004). "Towards Integrated Microplanning of Language and Iconic Gesture for Mul=modal Output.“ ICMI 2004.
Lewkowicz, D. J. (1994). Development of intersensory percep=on in human infants. In Lewkowicz, D. J. & Lickliter, R. (Eds.). Development of Intersensory Percep=on: Compara=ve Perspec=ves, Norwood, N.J.: Lawrence Erlbaum Associates
Magnusson, C., Tollmar, K., Brewster, S., Sarjakoski, T., Sarjakoski, T., & Roselier, S. (2009). Exploring future challenges for hap=c, audio and visual interfaces for mobile maps and loca=on based services. In Proceedings of the 2nd interna=onal workshop on loca=on and the web (pp. 8:1{8:4). New York, NY, USA: ACM.
Nigay, L. and Coutaz, J. (1993). A design space for mul=modal systems: concurrent processing and data fusion. In Proceedings of the INTERACT '93 and CHI '93 conference on Human factors in compu=ng systems (CHI '93). ACM, New York, NY, USA, 172-‐178.
Pielot, M., Krull, O. and Boll, S. (2010b). Where is my team: suppor=ng situa=on awareness with tac=le displays. In Proceedings of the 28th interna0onal conference on Human factors in compu0ng systems (CHI '10). ACM, New York, NY, USA, 1705-‐1714.
Pielot, M, Poppinga, B., and Boll, S. (2010). PocketNavigator: Vibro-‐Tac=le Waypoint Naviga=on for Everyday Mobile Devices, Mobile HCI 2010, Lisboa, Portugal.
Reeves, L. M., KLai, J., Larson, J. A., Ovia", S., Balaji, T. S., Buisine, S., Collings,P., Kraal, B., Mar=n, J. C., McTear, M., Raman, T. V., Stanney, K. M., Su, H., and Wang, Q. Y. Guidelines for Mul=modal User Interface Design. Commun. ACM 47(1)(2004), 57 – 59.
Visell. Y. (2009). Tac=le sensory subs=tu=on: Models for enac=on in HCI. Interact. Comput. 21, 1-‐2, p.38-‐53.
49