Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective

Post on 04-Jan-2016

20 views 0 download

Tags:

description

Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective. Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science University of Houston. Motivation. Avatars have been increasingly used in Human-Computer Interfaces - PowerPoint PPT Presentation

Transcript of Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective

Perceptual Analysis of Talking Avatar Head Movements: A Quantitative

Perspective

Xiaohan Ma, Binh H. Le, and Zhigang Deng

Department of Computer Science University of Houston

Motivation

Avatars have been increasingly used in Human-Computer Interfaces– Teleconferencing, computer-mediated

communication, distance education, online virtual worlds, etc.

Human-like avatar gestures influence human perception significantly– Facial expressions– Hand gestures– Lip movements– head movements

• One of the crucial visual cues to facilitate engaging social interaction and communication

How do talking head movements affect perception?

Our Quantitative Perspective

Uncover how talking avatar head movements affect human perception– User-rated head

animations’ naturalness– Joint features extracted

from head animations (with audio)

• Acoustic speech features• Head motion patterns

– Quantitatively analyze the association between extracted joint features and user ratings

Joint Features

Perception (rating)

Analysis of the association

Talking Avatar Head Animations

User evaluation

Featureextraction

Data Acquisition and Processing

Acquisition of the audio-head motion dataset– Head & speech were recorded

simultaneously– Head motion: optical motion

capture system (120 Hz)– Speech: microphone (48 kHz)

Processing of the captured audio-head motion dataset– Head motion: 3 Euler rotation

angles per frame– Speech: pitches and RMS

energy– Aligned head & speech

datasets to the same frame rate (24 FPS)

Y-axis rotation

X-axis rotation

Z-axis rotation

Subjective Evaluation Using the captured dataset,

we generated 60 head animation clips– Based on 15 recorded speech

clips– 4 different audio-head motion

generation techniques– Mosaic on the mouth region

User study– 18 participants– Ages: 23~28– Gender: female (16.67%),

male (83.33%)– Language: fluent English-

speakers– User rating: 1~5

Original data Play back the captured

HMMs [Busso et al. 05]

Mood-Swings [Chuang et al. 05]

Random Randomly generated

Speech-Head Motion Features and Perception

Measure the correlation between head motion and speech features– Canonical Correlation Analysis

(CCA)

Pitch-Head motion and human perception– Computed Pearson coefficient:

0.731

Energy-Head motion and human perception– Seem random, definitely not

linear.

Speech-Head Motion Features and Perception

Implications for CHI– Validate the tight coordination between speech and head

motion: Precise timing in generation is required• Delayed head movement generation may significantly degrade

human perception

– An approximate linear correlation between user ratings and CCA for Pitch-head motion

• Prosody driven head motion synthesis could be fundamentally sound.

– No a simple linear correlation between user ratings and CCA for RMS Energy-head motion

• RMS energy may vary among sentences

Frequency-Domain Analysis of Head Motion

Frequency-domain analysis of head motion– Head motion: rotation angles– Frequency spectrum: FFT

transform applied to the head rotation angle vector

Association between head motion spectrum and human perception– With squared magnitude less

than 5 degree.

- X-axis: average user rating (2.1 ~ 4.2) - Y-axis: the squared magnitude of three Euler angles in the head rotation (0 ~ 5 degree) - Z-axis: Frequency spectrum (0 ~ 19 Hz)

X-axis

Y-axis

Z-axis

Frequency-Domain Analysis of Head Motion

Key observations– Highly rated: low-frequency

• Natural head motion: less than 10 Hz

– Lowly rated: high-frequency• Typically lager than 12 Hz• With a small range of head movements

Implications for HCI– The comfortable head motion

frequency zone: 0~12 Hz – Smooth post-processing for head

motion generations of talking avatar• Smooth: Post-process the synthesized head motions• Simply crop the high frequency part

from the synthesized head motions

Low-frequency patterns

High-frequency patterns

Conclusion and Future Work Summary of our findings

– The coupling between the pitch and head motion has a strong linear correlation with human perception

– The perceived-natural head motions mainly consist of low-frequency motion components and those high-frequency components (>12 Hz) will damage human perception significantly.

Future work– Multi-party conversation scenario– Analysis of other fundamental speech features: pause,

repetitions, etc.

Acknowledgments: This work is in part supported by NSF IIS-0914965, Texas Norman Hackerman Advanced Research 003652-0058-2007, and research gifts from Google and Nokia.