Communicating Virtual Agents - uni-bielefeld.deskopp/download/KI3_2.pdf · 2 Kopp & Krämer KI3:...

Post on 22-Sep-2019

11 views 0 download

Transcript of Communicating Virtual Agents - uni-bielefeld.deskopp/download/KI3_2.pdf · 2 Kopp & Krämer KI3:...

1

KI3

Communicating Virtual Agents

Nicole Krämer

nicole.kraemer@uni-koeln.de

University of Cologne, Germany

Stefan Kopp

skopp@techfak.uni-bielefeld.de

University of Bielefeld, Germany

Part 2: Bases of Multimodal Communication

Kopp & Krämer

KI3: Communicating virtual agents

Overview

I. Introduction� Motivation, history, recent developments� Evaluation

II. Bases of multimodal communication� Channels and functions of multimodal communication� Synthetic communicative behaviors, e.g., facial &

gestural animation, speech synthesis

III. Modeling conversational behavior� Underlying models & architecture� Top-down vs. bottom-up� Outlook & discussion

Kopp & Krämer

KI3: Communicating virtual agents

...knowledge about communication when implementing virtual agentsthat communicate in a human like fashion

� Conversational behavior is highly complex. Since the agent is supposed to behave „autonomously“, we need to know some rules.

� In order to build agents that are accepted and efficient, we need toknow about the effects of specific behaviors.

�Communication research has to provide bases and rules of communication (fundamental research) as well as evaluate theeffects of the agents (applied research).

Problem: Most of the relevant bases and rules are not known yet!

We need...

Kopp & Krämer

KI3: Communicating virtual agents

Channels of communication behavior (I)

Communication has an enormous complexity that mainly is caused by the variety of different channels and their interdependency.

• Verbal and nonverbal communication (Scherer & Wallbott, 1979),vocal and nonvocal channels (Laver & Hutcheson, 1972)

• „Basic triple structure“ of communication: language, paralanguageand kinesics (Poyatos, 1983)

• Studies show that especially the nonverbal behavior is of crucial importance for communication and person perception (Mehrabian& Ferris, 1967; „snap judgements“, Schneider, Hastorff & Ellsworth, 1979).

Kopp & Krämer

KI3: Communicating virtual agents

Channels of communication behavior (II)

Nonverbal behavior channels (according to Wallbott, 1994)

vocal

Time dependent aspects

Voice dependent aspects

Continuity dependent aspects

nonvocal

Motor channels

Physio-chemical channels

Ecological channels

Facial expression

Gestures

Gaze

Posture

Olfactory

Tactile

Thermal

Territory

Interpersonal distance

Appearance

Kopp & Krämer

KI3: Communicating virtual agents

Further important features

• Dimensional complexity – interdependence with respect to the effects (dependence on various contexts: other channels, interaction partners, situational context)

• Sequential complexity - time structure is very important (turn taking, gestures, lip synch)

• Importance of movements and activity (cf. Grammer et al., 1999)

• Subliminal reception and judging as well as producing nonverbal behaviors („communication between limbic systems“, Buck, 1994)

�So far it remains an open question whether rules can be foundthat allow reliable production of the „correct“ behavior

2

Kopp & Krämer

KI3: Communicating virtual agents

Functions of nonverbal behavior (I)

Modeling functions

Discourse functions

Dialogue functions

Relational functions

All these functions are used in FTF-communication and therefore are expected when an humanoid agentappears on the screen. So they have to be modeled!

Mehrabian (1970), Exline et al., (1975), Frey (1999)

Security presen-tations in airplanes

Bandura (1977)

Bolinger (1983), McNeill (1992), Chovil (1991)

Duncan (1972)

Cassell et al. (1994), Nagao & Takeuchi (1994)

Cassell et al. (1999); Thórisson (1996)

Kopp & Krämer

KI3: Communicating virtual agents

Functions of nonverbal behavior (II)

• Discourse functions

� Nonverbal behaviors that are closely related to verbal behavior and can work either as complements, supplements or substitutes of speech

� Especially gestures, but also facial movements such as eyebrow raising (Chovil, 1991) can serve this function

� Concerning gesture Ekman & Friesen (1979; see Efron, 1941) differentiate Illustrators and Emblems (as well as Adaptorsthat do not seem to have discourse function)

� McNeill (1992) distinguishes iconics, metaphorics, deictics,and beats as different types of spontaneous gestures(

�KW1)

Kopp & Krämer

KI3: Communicating virtual agents

Coverbal gesture

• Coverbal gestures are closely related to speech flow (semantic, pragmatic, and temporal synchrony, McNeill, 1992)

• Speech-gesture synchronization on various levels

� Gestures co-occur with rheme (Cassell, 2000)

� Stroke onset precedes orco-occurs with the most contrastively stressed syllable in speech and covaries with it in time.(De Ruiter, 1999; McNeill, 1992; Kendon, 1986)

�Characteristic spatiotemporal features and kinematic properties

Kopp & Krämer

KI3: Communicating virtual agents

Functions of nonverbal behavior (III)

• Dialogue functions

� Consist of turn-taking and backchannel signals� Serve to guarantee the smooth flow of interaction when

exchanging speaker and listener roles� Sacks, Schegloff & Jefferson (1974) list verbal and paraverbal

regulators, Duncan (1972) finds important nonverbal cues� Controversy about the importance of nonverbal cues (Rimé,

1983 vs. Rutter et al., 1979)

Kopp & Krämer

KI3: Communicating virtual agents

Functions of nonverbal behavior (IV)

• Turn-taking-signals (cf. Duncan, 1972)

� Turn yielding signal – extension of the last syllable or last stressed syllable, terminal clause, termination of gestures, sociocentric sentences, looking at interaction partner

� Speaker state signal – starting gesticulation, audible breath, rotating the head away, (over)loudness

� Backchannel signal (Yngve, 1970) – nods, paraverbal feedback, short questions, repetitions, sentence completion

� Turn keeping signal – gesture (negates turn yielding signals), increased head movement activity (Donaghy & Goldberg, 1991)

Kopp & Krämer

KI3: Communicating virtual agents

Functions of nonverbal behavior (V)

• Relational functions

� Socio-emotional effects, definition of the relationship, regulation of emotional climate, impression management

� Mehrabian (1970; cf. Osgood, 1966) differentiates• Evaluation (immediacy cues)

• Dominance (relaxation cues)

• Activity, responsiveness

� Mehrabian tried to find cues for all different dimensions of nonverbal communication...

3

Kopp & Krämer

KI3: Communicating virtual agents

Functions of nonverbal behavior (VI)

• Relational functions – Findings

� Evaluation: gaze, smile, touch, forward lean, head tilt, low distance, activity (e.g. facial expressiveness)

� Dominance: turning away, more expansive gestures, leaning backwards, nonreciprocal touch, relaxation cues?

� Activity/responsiveness: synchrony, relation to increased evaluation

Kopp & Krämer

KI3: Communicating virtual agents

Example of multifunctionality: Eye gaze

• Signals search for information

• Helps to regulate flow of conversation (cf. Duncan, 1972; Kendon, 1967)

• Establishes intimacy (cf. Argyle & Dean, 1967)

• Indicates personality characteristics (social status, culture, etc.) (cf. Exline et al., 1975)

Kopp & Krämer

KI3: Communicating virtual agents

• How to generated communicative behaviors automatically?

� Verbal behavior, also known as speech

� Facial animation for creating facial display and lip synch speech

� Skeletal animation for synthetic gesture

Kopp & Krämer

KI3: Communicating virtual agents

Verbal behaviors

• Spoken utterances with natural intonation contour(crucial for intelligibility and believeability)

�Text-to-speech synthesis

• Lexical stress and sentence stress determined by word class, syntactic constituency, surface position

• Emphatic stress determined by information structure(rheme vs. theme, Halliday, 1967)

• Contrastive stress or focus, e.g. „I like blue tiles more than green tiles.“ vs. „I like blue tiles better than blue wallpaper.“

Emphatic & contrastive stress (= primary stress)�main synchronization points for nonverbal behaviors!

(de Ruiter, 1999)

Kopp & Krämer

KI3: Communicating virtual agents

TTS for multimodality

• TXT2PHO (IKP) and MBROLA (TCTS)• SABLE tags for additional intonation commands

TXT2PHOTXT2PHO

Parse tagsParse tags

ManipulationManipulation

MBROLAMBROLA

Phonetic text+Phonetic text+

Speech

External commands

„<SABLE> Drehe <EMPH> die Leiste <\EMPH>quer zu <EMPH> der Leiste <\EMPH>. <\SABLE>“

„<SABLE> Drehe <EMPH> die Leiste <\EMPH>quer zu <EMPH> der Leiste <\EMPH>. <\SABLE>“

Initialization Planning

Phonation

Phonetic textPhonetic text

Phonetic text:

S 105 18 ...

P 90 8 153

a: 104 4 ...

s 71 28 ...

IPA/XSAMPA

Phonetic text:

S 105 18 ...

P 90 8 153

a: 104 4 ...

s 71 28 ...

IPA/XSAMPA

Kopp & Krämer

KI3: Communicating virtual agents

Nonverbal behaviors

• Generation requires...� High-level way of specifying movements� Accuracy w.r.t. both, spatial and temporal features� Reproduction of naturalness, lifelikeness, even subtleties of

emotive and individual (personal) expression

�Computer animation:

Illusion of movement by displaying slightly alteredpictures in a subsequent and fast manner

�Translation of behaviors into positions and orientations of visual objects for each frame

4

Kopp & Krämer

KI3: Communicating virtual agents

Computer anmation

�Critical issue due to high complexity of both, object and movement, in nonverbal behaviors�Motion control on different levels of abstraction...

Direct specification of all motion parameters(e.g., human body > 240 DOFs)

Direct specification of all motion parameters(e.g., human body > 240 DOFs)

Abstract description of movement &Automatic generation of low-level parameters

Abstract description of movement &Automatic generation of low-level parameters

�Control level hierarchies

simplicity of motion spec

naturalness of animation

Computer animation = modeling + motion control + rendering

Computer animation = modeling + motion control + rendering

Kopp & Krämer

KI3: Communicating virtual agents

Representational animations

• The object‘s representation is subject to the animation

• soft object animation� Animated deformations� Facial Animation, „cloth animation“, etc.

• skeletal animation � Hierarchical structure of rotational joints

connected by rigid links� Animation by alteration of joint angles� Additional control methods (tissue simulation,

cloth animation, etc.) based on underlying kinematic skeleton

Kopp & Krämer

KI3: Communicating virtual agents

Facial Animation

• Requires control hierarchy for deforming the highly complex facial geometry

Vertex displacementsVertex displacements

Face muscle simulationFace muscle simulation

Action EncodingAction EncodingHigh-level specification of actions

performable on the human face:� FACS (Ekman & Friesen, 1978):

Visible facial actions (emotional or conversational) described at muscle level in terms of action units

� MPA (Kalra et al., 1998): Visible features of both facial expressions and visemes(65 MPAs)

High-level specification of actions performable on the human face:

� FACS (Ekman & Friesen, 1978):Visible facial actions (emotional or conversational) described at muscle level in terms of action units

� MPA (Kalra et al., 1998): Visible features of both facial expressions and visemes(65 MPAs)

Kopp & Krämer

KI3: Communicating virtual agents

Face muscles

• Eleven muscles responsible for facial animation; four major groups: Jaw (A), mouth (B-G), eye (H,I), brow/neck (J,K)

• Fixed mapping from muscle contractions to vertex displacements

• Examples: Levator labii superioris (B), Zygomaticus major (C)

(Flemming & Dobbs, 1999)

Kopp & Krämer

KI3: Communicating virtual agents

Vertex displacement

• Movement generation by interpolating target positions (Morphing)• Targets given by, e.g., set of muscle contractions or visual

phonems• Straight, weighted, or segmented morphing

(Flemming & Dobbs, 1999)

Kopp & Krämer

KI3: Communicating virtual agents

Speech animation

• Visual phonems (visemes): mouth positions representing the sounds we hear in speech

• 16 visual phonems, but reduced sets may beadequate for lip synch

• „ba“ & ga � da(McGurk & MacDonald,1986)

5

Kopp & Krämer

KI3: Communicating virtual agents

Speech animation

• Creating lip synch speech� Determine phonems and assign visemes� Animate visemes based on

articulation of phonems� Coarticulation, e.g., drop phonems to

increase smoothness

• Speech Animation + TTS = Talking heads� Baldi, (Massaro et al., 2000)

Kopp & Krämer

KI3: Communicating virtual agents

Skeletal animation

• Hierarchy of rotational joints connected by rigid links

• Anthropometric modeling, joint limits• Redundancy (

�DOF problem, IK

problem)�

Various motion control variables (Cartesian, joint angles, elbow swivel, etc.)

R3 Rn

FK

IK

Kopp & Krämer

KI3: Communicating virtual agents

Keyframing

• Parametric keyframing: Automatic generation of intermediate frames for a given a set of keyframes, by means of interpolating joint angles

• Quality of movements depends onnumber of keyframes

• Still tedious work to define keyframes in low-level control parameters

Kopp & Krämer

KI3: Communicating virtual agents

Performance animation

• Motion capture: Measuring and recording direct movements ofactor for immediate or delayed analysis and playback

• Capture data and map to digital character� Mechanical: joystick, mouse, data gloves, etc.� Optical: at least two cameras, reflecting markers� Electromagnetical: sensors for tracking keypoints

• High degree of naturalness, but lack of generality & flexibility

Kopp & Krämer

KI3: Communicating virtual agents

Procedural animation

• Motion algorithmically described; calculation of control parameters for given point in time

• Physics-based animation� Non-constraint (Newton, Lagrange, etc.) vs. constraint-based

methods (constraint forces, spacetime constraints)� Forward & inverse dynamics� Generation of secondary movements

• Model-based animations� Detailed knowledge about targetted movement� Freqently applied for locomotion

Kopp & Krämer

KI3: Communicating virtual agents

Real-time requirements

Only a polygonal shape with possible texture may be applied

Individual hairs possibleHair

Texture mappingModel with wrinklesSkin

Texture mappingCalculated using mechanical models

Clothes

Simplified models should be used; limitations on the facial deformations

Complex models may be used including muscles with finite elements

Facial Animation

Dynamic models may be too CPU intensiveAny model/method may be used: motion capture, kinematics, dynamics, biomechanics

Locomotion

Real-time processing may prevent using expensive methods based on inverse dynamics or control theory

Any method may be usedSkeletal Animation

Requires fast transformations, e.g., based on cross-sections

May be calculated using metaballs, FFD, splines

Deformations

Limitations on the number of polygonsNo limitations on complexitySurface Modeling

Real-timeFrame-by-frame

(Magnenat-Thalmann & Thalmann, 1998)

6

Kopp & Krämer

KI3: Communicating virtual agents

Gesture animation

• Flexibility, accuracy, and naturalness!

• Two approaches to skeleton motion control:� Motion drawn from a database of predefined motions� Motion dynamically calculated on demand

• Integration of several motion generators vital for designing complex motions!

� hand vs. arm movement� gesture stroke vs. retraction� emblematic vs. iconic gestures

• In terms of Laban Movement Analysis: „Gestures [...] exist because they have some distinctiveness in their Effort and Shapeparameter.“ (Costa et al., 2000)

Kopp & Krämer

KI3: Communicating virtual agents

Gesture animation

• Start from high-level, parametrizeable gesture representations

� Script-based animations, e.g., PaT-Nets (Badler et al. 1993)� Feature-based descriptions based on some

gesture/movement notation system (Calvert et al., 1982; Lebourque & Gibet, 1999; Kopp & Wachsmuth, 2000)

Kopp & Krämer

KI3: Communicating virtual agents

Trajectory formation...

...and modulation

Kopp & Krämer

KI3: Communicating virtual agents

Tomorrow...

I. Introduction� Motivation, history, recent developments� Evaluation

II. Bases of multimodal communication� Channels and functions of multimodal communication� Synthetic communicative behaviors, e.g., facial &

gestural animation, speech synthesis

III. Modeling conversational behavior� Underlying models & architecture� Top-down vs. bottom-up� Outlook & discussion

Kopp & Krämer

KI3: Communicating virtual agents

• Questions? Otherwise....