Multimodal Human-Computer Interaction

32
Multimodal Human-Computer Interaction by Maurits André The design and implementation of a prototype

description

Multimodal Human-Computer Interaction. The design and implementation of a prototype. by Maurits André. Overview. Introduction Problem statement Technologies used Speech Hand gesture input Gazetracking Design of the system Multimodal issues. Overview. Testing program tests - PowerPoint PPT Presentation

Transcript of Multimodal Human-Computer Interaction

Page 1: Multimodal  Human-Computer Interaction

Multimodal Human-Computer Interaction

by Maurits André

The design and implementation of a prototype

Page 2: Multimodal  Human-Computer Interaction

2

Overview• Introduction

– Problem statement

• Technologies used– Speech– Hand gesture input– Gazetracking

• Design of the system• Multimodal issues

Page 3: Multimodal  Human-Computer Interaction

3

Overview• Testing

– program tests– usability tests– human factors studies

• Conclusions and recommendations

• Future work

• Video

Page 4: Multimodal  Human-Computer Interaction

4

CAIPCenter for Computer Aids in Industrial Productivity

RobustImage Understanding

Machine Vision

GroupwareNetworking

Visiometricsand modeling

VSLI design

MultimediaInformation Systems

Virtual Reality

SpeechGeneration

Image & videocompression

Adaptive voice mimic

Prof. James L. Flanagan

Microphonearrays

Speech / SpeakerRecognition

Page 5: Multimodal  Human-Computer Interaction

5

Multimodal HCI• Currently: mouse, keyboard input

• More natural communication technologies available:– sight– sound– touch

• Robust and intelligent combination of these technologies

Page 6: Multimodal  Human-Computer Interaction

6

Aim

Page 7: Multimodal  Human-Computer Interaction

7

Problem statement

• Study three technologies:• Speech recognition and synthesis

• Hand gesture input

• Gazetracking

• Design prototype (appropriate application)

• Implement prototype

• Test and debug

• Human performance studies

Page 8: Multimodal  Human-Computer Interaction

8

SR and TTS

• Microsoft Whisper system (C++)

• Speaker independent

• Continuous speech

• Restricted task-specific vocabulary (150)

• Finite state grammar

• Sound capture: microphone array

Page 9: Multimodal  Human-Computer Interaction

9

Hand gesture input

• Advantages:– Natural– Powerful– Direct

• Disadvantages:– Fatigue (RSI)– Learning– Non-intentional gestures– Lack of comfort

Page 10: Multimodal  Human-Computer Interaction

10

Force feedback tactile glove

• Polhemus tracker for wrist position/orientation• 5 gestures are recognized

Page 11: Multimodal  Human-Computer Interaction

11

Implemented gestures

• Grab “Move this”• Open hand “Put down”• Point at an object “Select”

“Identify”• Thumb up “Resize this”

Page 12: Multimodal  Human-Computer Interaction

12

Eyes output

• Direction of gaze

• Blinks

• Closed eyes

• Part of emotion

Page 13: Multimodal  Human-Computer Interaction

13

Gazetracker

• ISCAN RK-726 gazetracker

• 60 Hz.• Calibration

Page 14: Multimodal  Human-Computer Interaction

14

Application

• Requirements:– Multi-user, collaborative– Written in Java– Simple

• Choice:– Drawing program– Military mission planning system

Page 15: Multimodal  Human-Computer Interaction

15

Drawing program

Page 16: Multimodal  Human-Computer Interaction

16

Military mission planning

Page 17: Multimodal  Human-Computer Interaction

17

Frames

• Slots

• Inheritance

• Generic properties

• Default values

circle/rect/lineType

Create Figure

Color [white]/red/grn/...

Height 0..max_y

0..max_xWidth

Location (x,y)

Page 18: Multimodal  Human-Computer Interaction

18

Command Frame

YESConfirm

EXIT

0...max_intObject ID

Cmd. with object

Confirm NO

circle/rect/lineType

Create Figure

Color [white]/red/grn/...

Height 0..max_y

0..max_xWidth

Location (x,y)

(x,y)Destination

Move

Page 19: Multimodal  Human-Computer Interaction

19

Design

Gazetracking

Speech Recognition

Gesture Recognition

Fusion Agent

Collaborative Workspace

Speech Synthesis

Page 20: Multimodal  Human-Computer Interaction

20

Fusion Agent

ParserParser

Slot BufferSlot Buffer ControlControl

Rule BasedFeedback

Rule BasedFeedback

Tank 7object(x4,y4)where

Frame: MOVE

Example:“Move tank seven here.” (x1,y1) (x2,y2) (x3,y3) (x4,y4)

Page 21: Multimodal  Human-Computer Interaction

21

Classification of feedback

• Confirmation– Exit the system. Are you sure?

• Information retrieval– What is this? This is tank seven.

– Where is tank nine. Visual feedback.

• Missing data– Create tank. Need to specify an ID for a tank.

• Semantic error– Create tank seven. Tank seven already exists.

– Resize tank nine. Cannot resize a tank.

Page 22: Multimodal  Human-Computer Interaction

22

Multimodal issues• Referring to objects

– describing in speech: Move the big red circle

– using anaphora: Move it

– by glove

– gaze + pronoun: Delete this

– glove + pronoun: Delete this

• TimestampsCreate a red rectangle from here to here

T1 T2 T3 T4 T5 T6 T7 T8 xy1 xy2 xy3 xy4 xy5 xy6 xy7 xy8

Page 23: Multimodal  Human-Computer Interaction

23

Multimodal issues

• Ambiguity• saying x, looking at y »» x

• saying x, pointing at y»» x

• looking at x, pointing at y »» x

• saying x, gesturing y »» xy or yx

• Redundancy• saying x, looking at x »» x

• etc.

Page 24: Multimodal  Human-Computer Interaction

24

Program testing

• Implementation in Java

• Program testing and debugging• Module testing

• Integration testing

• Configuration testing

• Time testing

• Recovery testing

Page 25: Multimodal  Human-Computer Interaction

25

Testing

• Usability tests– Demonstration with military personnel– Human factors study:

• Script for user

• Questionnaire for user

• Tables for observer

• Log-file for observer

Page 26: Multimodal  Human-Computer Interaction

26

Lab

Page 27: Multimodal  Human-Computer Interaction

27

Conclusions:

Modality Accuracy Speed Learning

Speech ++ ++ – Gaze + ++ ++ Glove + + + Mouse ++ – +

Selecting

Page 28: Multimodal  Human-Computer Interaction

28

Conclusions / recommendations• Speech

• real-time • low error rate• timestamps • misunderstanding• grammar in help file

• Glove• real-time • fatigue• low precision • non-intentional gesture• 2D »» 3D • limited number of

gestures

• Gaze• real-time • head movements• self-calibration • jumpiness of eye

movements• face tracker • object of interest

Page 29: Multimodal  Human-Computer Interaction

29

General remarks

• Response time within 1 sec.

• Instruction, help files

• Application effective but limited

Page 30: Multimodal  Human-Computer Interaction

30

Future work

• Human performance studies

• Conversational interaction

• Context-based reasoning and information retrieval

• New design:

Page 31: Multimodal  Human-Computer Interaction

31

Control

Status History

Gazetracker

Facetracker Speaker ident.

Gesture recog.Speech recog.

Agents

Blackboard

User

Speech synth.

Haptic feedbackDiscourse mng.

Graphical fb.

User model

Disambiguator

Domain knowledge Application

Controller

Page 32: Multimodal  Human-Computer Interaction

32

Problem statement

• Study three technologies:• Speech recognition and synthesis

• Hand gesture input

• Gazetracking

• Design prototype (appropriate application)

• Implement prototype

• Test and debug

• Human performance studies