Hands and Speech in Space

Post on 06-May-2015

514 views 0 download

description

Speech given by Mark Billinghurst at the AWE 2014 conference on how to use multimodal speech and gesture interaction with Augmented Reality applications. Talk given on May 28th, 2014.

Transcript of Hands and Speech in Space

Hands and Speech in Space

Mark Billinghurst

mark.billinghurst@hitlabnz.org

The HIT Lab NZ, University of Canterbury

May 28th 2014

2012 – Iron Man 2

To Make the Vision Real..  Hardware/software requirements

 Contact lens displays  Free space hand/body tracking  Speech/gesture recognition  Etc..

 Most importantly  Usability/User Experience

Natural Hand Interaction

  Using bare hands to interact with AR content  MS Kinect depth sensing   Real time hand tracking   Physics based simulation model

Pros and Cons of Gesture Only Input  Gesture-only good for

 Direct manipulation,  Selection, Motion  Rapid expressiveness

  Limitations  Descriptions (eg Temporal information)  Operation on large numbers of objects   Indirect manipulation, delayed actions

Multimodal Interaction   Combined speech and gesture input   Gesture and Speech complimentary

  Speech: modal commands, quantities  Gesture: selection, motion, qualities

  Previous work found multimodal interfaces intuitive for 2D/3D graphics interaction  However, few multimodal AR interfaces

Wizard of Oz Study   What speech and gesture input

would people like to use?   Wizard

  Perform speech recognition  Command interpretation

  Domain   3D object interaction/modelling

Lee, M., & Billinghurst, M. (2008, October). A Wizard of Oz study for an AR multimodal interface. In Proceedings of the 10th international conference on Multimodal interfaces (pp. 249-256). ACM.

System Architecture

System Set Up

Key Results   Most commands multimodal

 Multimodal (63%), Gesture (34%), Speech (4%)

  Most spoken phrases short   74% phrases average 1.25 words long   Sentences (26%) average 3 words

  Main gestures deictic (65%), metaphoric (35%)   In multimodal commands gesture issued first

  94% time gesture begun before speech

Free Hand Multimodal Input

  Use free hand to interact with AR content   Recognize simple gestures  Open hand, closed hand, pointing

Point Move Pick/Drop

Lee, M., Billinghurst, M., Baek, W., Green, R., & Woo, W. (2013). A usability study of multimodal input in an augmented reality environment. Virtual Reality, 17(4), 293-305.

Speech Input   MS Speech + MS SAPI (> 90% accuracy)   Single word speech commands

Multimodal Architecture

Multimodal Fusion

Hand Occlusion

Experimental Setup

Change object shape and colour

User Evaluation

  Change object shape, colour and position   Conditions

  (1) Speech only, (2) gesture only, (3) multimodal   Measures

  performance time, errors, subjective survey

Results - Performance

  Average performance time   Gesture: 15.44s   Speech: 12.38s   Multimodal: 11.78s

  Significant difference across conditions (p < 0.01)  Difference between gesture and speech/MMI

Subjective Results (Likert 1-7)

  User subjective survey   Gesture significantly worse, MMI and Speech same   MMI perceived as most efficient

  Preference   70% MMI, 25% speech only, 5% gesture only

Gesture Speech MMI

Naturalness 4.60 5.60 5.80

Ease of Use 4.00 5.90 6.00

Efficiency 4.45 5.15 6.05

Physical Effort 4.75 3.15 3.85

Observations   Significant difference in number of commands

 Gesture (6.14), Speech (5.23), MMI (4.93)

  MMI Simultaneous vs. Sequential commands   79% sequential, 21% simultaneous

  Reaction to system errors   Almost always repeated same command   In MMI rarely changes modalities

Lessons Learned   Multimodal interaction significantly better than

gesture alone in AR interfaces for 3D tasks   Shorter task time, more efficient

  Multimodal input was more natural, easier, and more effective that gesture/speech only   Simultaneous input rarely used

  More studies need to be conducted  What gesture/speech patterns? Richer input

3D Gesture Tracking

  3 Gear Systems   Kinect/Primesense Sensor   Two hand tracking   http://www.threegear.com

Skeleton Interaction + AR

  HMD AR View   Viewpoint tracking

  Two hand input   Skeleton interaction, occlusion

AR Rift Display

Conclusions   AR experiences need new interaction methods   Combined speech and gesture more powerful

 Complimentary input modalities

  Natural user interfaces possible   Free hand gesture, speech, intelligence interfaces

  Important research directions for the future  What gesture/speech commands should be used?   Relationship better speech and gesture?

More Information

•  Mark Billinghurst –  Email: mark.billinghurst@hitlabnz.org

– Twitter: @marknb00

•  Website –  http://www.hitlabnz.org/