Collection of multimodal data Face – Speech – Body

14
Collection of multimodal data Face – Speech – Body George Caridakis ICCS Ginevra Castellano DIST Loic Kessous TAU

description

Collection of multimodal data Face – Speech – Body. George Caridakis ICCS Ginevra Castellano DIST Loic Kessous TAU. Overview. Objectives Scenario Equipment specifications Subjects & Procedure Visual aspects Acoustic aspects Future processing Please try this at home…. Objectives. - PowerPoint PPT Presentation

Transcript of Collection of multimodal data Face – Speech – Body

Page 1: Collection of multimodal data Face – Speech – Body

Collection of multimodal dataFace – Speech – Body

George Caridakis ICCSGinevra Castellano DIST

Loic Kessous TAU

Page 2: Collection of multimodal data Face – Speech – Body

Overview Objectives Scenario Equipment specifications Subjects & Procedure Visual aspects Acoustic aspects Future processing Please try this at home…

Page 3: Collection of multimodal data Face – Speech – Body

Objectives

Collection of emotional multimodal data

Process different modalities Holy Grail:

“EMOTION RECOGNITION”

Page 4: Collection of multimodal data Face – Speech – Body

Scenario Inspired by GEMEP corpus Pseudo-language sentence

(“Toko”, damato ma gali sa) Standing body posture 10 subjects 8 emotions uniformly distributed through

the quadrants (2D emotion theory, valence-arousal)

3 repetitions of emotion specific gesture 3 repetitions of emotion independent

gesture

Page 5: Collection of multimodal data Face – Speech – Body

Emotion specific gesturesdespair leave me alone

hot anger violent descend of hands

irritation smooth go away

sadness smooth falling hands

interest raise hands

pleasure open hands

joy italianate/explain

pride close hands

Page 6: Collection of multimodal data Face – Speech – Body

Equipment specifications

2 DV cameras Full body Face

Wireless microphone (shirt-mounted) PC + External sound card Uniform dark background 2 artificial light sources Light coloured, long sleeves shirt ;)

Page 7: Collection of multimodal data Face – Speech – Body

Subjects & Procedure Subjects

10 “actors” 6 males 4 females

despair, hot anger, irritation sadness, interest, pleasure, joy, pride

Procedure Subject instructions Clap before every execution: synchronize

streams

Page 8: Collection of multimodal data Face – Speech – Body

Video quality issues

Highest possible resolution Progressive video (not interlaced) Correct exposure Good color quality No compression artifacts Uniform lighting

Page 9: Collection of multimodal data Face – Speech – Body

Interlacing / Over-exposure

Interlacing / De-Interlacing

Over-exposure 70% zebra pattern Prefer lower-exposure

so signal will not be clipped

Page 10: Collection of multimodal data Face – Speech – Body

Colour/Lighting

Medium Y/C Resolution Compression Artifacts Exposure

Good Video quality Source: DV

Page 11: Collection of multimodal data Face – Speech – Body

Archiving

PAL: 720x576 @ 25 frames/second DV Format: ~36Mbit/sec

~16 GBytes/hour MPEG2 @ 4-8Mbit/sec (DVD quality)

~1.8-3.5 GB/hour MPEG-1 @ 1.1 Mbit/sec

~500MBytes/hour

Page 12: Collection of multimodal data Face – Speech – Body

Visual Aspects Summary Video Camera

DV or Better Progressive Scan Capability Over-Exposure Indication, Zebra Patterns

Shooting Use the zebra patterns at 70% Zoom in as much as possible to increase subject’s

resolution Facial features must be visible for facial analysis Try to avoid occlusions (hair, glasses, clothes, hand

movement) Uniform lighting conditions

Archive DV tapes, DV Video or Frames, (not MPEG-1)

Page 13: Collection of multimodal data Face – Speech – Body

Acoustic aspects Why: “Toko, damato ma gali sa”?

Toko: solicitation by naming the interlocutor Vowels found in majority of language Meaning: Toko, can you open it? (request) for

maintaining semantic aspect Sampling frequency 44.1 kHz 16 bits mono information depth Uncompressed .wav files

Page 14: Collection of multimodal data Face – Speech – Body

Future processing Process different modalities

Facial feature extraction Gesture expressiveness analysis Acoustic analysis

Gesture recognition Synchronization Modalities fusion

RNN RSOM + Markov SVM …

Emotion recognition