How doyou like your virtual agent?: Human-agent...

14
How do you like your virtual agent?: Human-agent interaction experience through nonverbal features and personality traits Authors ~ Abstract. Recent studies suggest that people’s interaction experience with virtual agents can be, to a very large degree, described by people’s personality traits. Moreover, the nonverbal behavior of a person has been known to indicate several social constructs in different settings. In this study we analyze human-agent interaction from the perspective of non- verbal behaviors displayed during the interaction and the personality of the human. Based on existing psychological work, we designed and recorded an experiment on human -agent interactions, in which a human communicates with two different virtual agents. Human-agent interac- tions are described with three self-reported measures: quality, rapport and likeness of the agent. We investigate the use of self-reported person- ality traits and extracted audio-visual nonverbal features as descriptors of these measures. Our results on a correlation analysis show strong cor- relations between the interaction measures and several of the personality traits and nonverbal features, which are supported by both psychologi- cal and human-interaction literature. We further use traits and exported nonverbal cues as features to build regression models for predicting mea- sures of interaction experience. Our results show that the best results are obtained when nonverbal cues and personality traits are used together. Keywords: human-agent interaction, nonverbal behavior, personality 1 Introduction A growing number of applications seek to provide social abilities and human- like intelligence to computers. Compelling social interactions with computers, or specifically Embodied Conversational Agents (ECAs), are persuasive, engaging, and they increase trust and feeling of likeness, so it is understandable why recent trends show increasing usage of virtual agents in social media, education or social coaching. Clearly, with the advance of social, user-aware adaptive interfaces, it has be- come increasingly important to model and reason social judgment for agents. To help virtual agents to interpret the human behaviors a number of observa- tion studies has been proposed: human conversational behaviours are induced in Wizard of Oz experiments with agents, or in interaction with other humans, and

Transcript of How doyou like your virtual agent?: Human-agent...

  • How do you like your virtual agent?:

    Human-agent interaction experience through

    nonverbal features and personality traits

    Authors

    ~

    Abstract. Recent studies suggest that people’s interaction experiencewith virtual agents can be, to a very large degree, described by people’spersonality traits. Moreover, the nonverbal behavior of a person has beenknown to indicate several social constructs in different settings. In thisstudy we analyze human-agent interaction from the perspective of non-verbal behaviors displayed during the interaction and the personalityof the human. Based on existing psychological work, we designed andrecorded an experiment on human -agent interactions, in which a humancommunicates with two different virtual agents. Human-agent interac-tions are described with three self-reported measures: quality, rapportand likeness of the agent. We investigate the use of self-reported person-ality traits and extracted audio-visual nonverbal features as descriptorsof these measures. Our results on a correlation analysis show strong cor-relations between the interaction measures and several of the personalitytraits and nonverbal features, which are supported by both psychologi-cal and human-interaction literature. We further use traits and exportednonverbal cues as features to build regression models for predicting mea-sures of interaction experience. Our results show that the best results areobtained when nonverbal cues and personality traits are used together.

    Keywords: human-agent interaction, nonverbal behavior, personality

    1 Introduction

    A growing number of applications seek to provide social abilities and human-like intelligence to computers. Compelling social interactions with computers, orspecifically Embodied Conversational Agents (ECAs), are persuasive, engaging,and they increase trust and feeling of likeness, so it is understandable why recenttrends show increasing usage of virtual agents in social media, education or socialcoaching.

    Clearly, with the advance of social, user-aware adaptive interfaces, it has be-come increasingly important to model and reason social judgment for agents.To help virtual agents to interpret the human behaviors a number of observa-tion studies has been proposed: human conversational behaviours are induced inWizard of Oz experiments with agents, or in interaction with other humans, and

  • 2 Authors

    further, observed behaviors are used to model both perception and reasoningcomponents for the agents.

    When it comes to evaluation studies and outcomes of human-agent-interaction(HAI) with respect to human personality, most of prior works have observedonly extraversion trait, with a goal to help understanding of human preferencefor interactive characters. The most notable studies on this topic come from theearly 2000s. Limited by technology, researchers used only vocal behaviors [15], ora still image and textual interface [16] to simulate the extraverted/intravertedagent. In similar and more recent study extraversion trait is manipulated viacomputer-generated voice and gestures of 2D cartoon-like agent [3]. As out-comes, it has been shown how humans are attracted by characters who haveboth similar personality, confirming similarity rule, and opposite personality,confirming complementary rule (see [3] for an overview).

    Recent studies have started to observe influence of personality traits otherthan extraversion to various social phenomena of HAI. In [13], two conditions(low and high behaviour realism of the agent) in interaction with humans weremanipulated when interacting with an agent designed to build rapport. Further,human’s personality traits were correlated with persistent behavioral patterns,such as shyness or fear interpersonal encounters. The results of the study haveshown how both extraversion and agreeableness traits have been recognized tohave a major impact on human attitudes, more than gender and age. Moreover,other predictor variables, such as shyness and public self-consciousness, haveproduced negative effect of likeness towards agents. Other Big 5 traits, neuroti-cism, openness to experience and consciousness were not found significant forHAI outcome in this study. Other studies on perception of HAI have shown howrapport has been assigned to people with high agreeableness trait [6], and feelingof friendliness to extraversion [4].

    In this paper we focus on perceived experience of HAI by using human per-sonality and nonverbal behavior as descriptors. We design an experimental studyin which we collect audio-visual data of humans interacting with agents, alongwith their Big 5 traits and perceived experience measures. Further, we experi-ment with regression models to predict the perceived experience measures us-ing collected data. Motivation for our study comes from several facts. As ex-plained beforehand, personality traits shape perception and behaviors of hu-mans in human-human and human-agent interaction. Nonverbal cues have beenalso shown to characterize several social constructs [7], and to be significant inpredicting some of the Big 5 traits in social computing (e.g. in social media[2], in meetings [1]). Moreover, recent advances in social computing communityhave shown how fusion of audio-visual data is significant for prediction of var-ious behavior patterns and phenomena in social dialogue, such as dominance[26] or aggression [27] Thus, we believe that fusion of both visual and acousticcues could be significant for predicting perceived measures of HAI. Our studyis similar to the study with rapport agent [13], but with one major difference:rather than observing the influence of personality traits on HAI experience we

  • 3

    are focused on multi-modal analysis of perceived experience from both visualand vocal nonverbal behavior cues.

    Specifically, in this paper we investigate the nonverbal cues and self-reportedBig five traits as descriptors of an interaction of a person with two differentvirtual agents. We compose and observe interaction experience through threeaspects, or measures (quality, rapport and likeness) [5]. Virtual agents we use inour study are Sensitive Artificial Listeners (SALs)[8], which are designed witha purpose of inducing specific emotional conversation. There are in total fourdifferent agents in SAL system: happy, angry, sad and neutral character. Studiessuggest that the perceived personality of a social artifact has a significant effecton usability and acceptance [12], so we find these agents relevant to explorethe interaction experience. Though SALs’ understanding capabilities are limitedto emotional processing, their personality has been successfully recognized in arecent evaluation study [8].

    Our study has three contributions. First, we examine the relation betweenthe Big 5 traits and interaction experience, with comparison to existing workin social psychology and human-agent interaction. Second, we investigate linksbetween nonverbal cues and perceived interaction experience. And finally, webuild a method to predict experience outcome of an human-agent interactionbased on automatically extracted nonverbal cues displayed during the interactionand self-reported personality. Given the fact that we record our subjects witha consumer depth camera, we also investigate and discuss potentials of usingcheap markerless tracking system for analyzing nonverbal behaviors.

    2 Data collection

    Our data collection contains recordings of 33 subjects, out of which are 14 femalesand 19 males. 26 are graduate students and researchers in computer science, and7 are students of management. Most of them have different cultural background;however 85% subjects are Caucasians. Subjects were recruited using two mailinglists and they were compensated with 10 CHF for participation.

    Before the recording session, each subject had to sign the consent form, andfill in demographic information and NEO FFI Big 5 personality questionnaire[17]. The recording session contains three recordings of the subject, where thedata has been captured with a Kinect RGB-D camera (see Figure 1). First,the subject was asked to give a 1-minute self-presentation via video call. Then,he/she had two 4-minute interactions with two agents: first interaction was withsad Obadiah, and second with cheerful Poppy. These characters are selected be-cause evaluation study on SALs [8] has shown how Poppy is the most consistentand familiar and Obadiah is the most believable character. Before interaction,subjects were given an explanation what SALs are and what they can expectfrom interaction. To encourage the interaction, a list of potential conversationtopics was placed in the view field of the subject. Topics were: plans for theweekend, vacation plans, things that the subject did yesterday, the last book the

  • 4 Authors

    subject read. After each human-agent interaction, the subjects filled a question-naire, reporting their perceived interaction experience and mood.

    Interaction experience measures have been inspired from the study [5] inwhich authors investigate how Big 5 traits are manifested in mixed-sex dyadicinteractions of strangers. To measure perceived interaction they construct a ”Per-ception of Interaction” questionnaire with items which rate various aspects ofparticipants’ interaction experience. We target the same aspects in human-agentinteraction: Quality of Interaction (QoI), Degree of rapport (DoR) and Degreeof likeness of the agent (DoL). Each interaction aspect in our questionnaire wastargeted by a group of statements with a five-point Likert scale ((1) - Disagreestrongly to (5) - Agree strongly).

    Some of the items used by [5] were excluded, such as ”I believe that partnerwants to interact more in the future”, given the constrained social and perceptionabilities of SALs. In total, our interaction questionnaire has 15 items which reportQoI (5), DoR (7) and DoL (3). The questions that we used in the questionnaireand the target aspect of each question is shown in Table 1. The values of thesemeasures are normalized to the range in [0, 1]. Additionally, our questionnairealso measures subject’s mood (same questionnaire as used in [14]), which is atthe moment excluded from our experiments.

    Table 1. The questions and targeted aspects in the interaction questionnaire

    Question Target Aspect

    The interaction with the character was smooth,natural, and relaxed. QoI

    I felt accepted and respected by the character. DoR

    I think the character is likable. DoL

    I got along with the character pretty good. DoR

    The interaction with the character was forced,awkward, and strained. QoI

    I did not want to get along with the character. DoL

    I was paying attention to way that character responds to meand I was adapting my own behaviour to it. DoR

    I felt uncomfortable during the interaction. DoR

    The character often said things completely out of place. QoI

    I think that the character finds me likable. DoR

    The interaction with the character was pleasant and interesting. QoI

    I would like to interact more with the character in the future. DoL

    I felt that character was paying attention to my mood. DoR

    I enjoyed the interaction QoI.

    I felt self-conscious during the conversation. DoR

    At the end of each recording session, several streams were obtained: RGB-D data and audio data from Kinect, and screen captures and log files withdescription of agent’s behaviour.

  • 5

    Fig. 1. Recording environment with a participant.

    3 Cue Extraction

    We extracted nonverbal cues from both visual and auditory channel. The selec-tion of features was based on previous studies on human-human interaction andconversational displays in psychology. For visual nonverbal displays we studiedthe literature on displays of attitude in initial human-human interactions (in-teractions where the interaction partners meet for the first time). Then, giventhe fact that previous research has shown how personality traits of extraversionand agreableness are important predictors of HAI [13], we also take into accountfindings on nonverbal cues which are important for predicting personality traits.

    Related to attitude in initial human-human interaction, a number of worksobserve how postural congruence and mimicry are positively related to liking andrapport ([20], [21], or more recently [22]). Mimicry has also been investigated inhuman-agent community, with attempts to build automatic models to predictmimicry [23]. Our interaction scenario can only observe facial mimicry, becauseSAL agents have only their face visible and they do not make any body leans.Among other nonverbal cues, psychological literature agrees how frequent eyecontact, relaxation, leaning and orienting towards, less fiddling, moving closer,touching, more open arm and leg positions, smiling and more expressive faceand voice are signs of liking from observer’s (or coder’s) point of view [24], [7].Yet, when it comes to displays of liking associated with self-reported measures,findings are not that evident. In an extensive review of literature dealing withthe posture cue, Mehrabian shows how displays of liking vary from gender andstatus [9]. He also shows how larger reclining angle of sideways leaning commu-nicates a more negative attitude, and smaller reclining angle of a communicatorwhile seated, and therefore a smaller degree of trunk relaxation, communicatesa more positive attitude. Investigation of non-verbal behaviour cues and likingconducted on initial same-sex dyad interactions [20] shows how the most sig-nificant variables in predicting subjects’ liking is the actual amount of mutualgaze and the total percentage time looking. Other significant behaviours are:expressiveness of the face and the amount of activity in movement and gesture,synchrony of movement and speech, and expressiveness of the face and gesturing.

  • 6 Authors

    Another cross-study [11] examined only kinesics and vocalic behaviours. Resultsshow how increased pitch variety is associated with female actors, whereas in-teresting effect is noticed for loudness and length of talking, which decrease overinteraction time. Though authors say how their research shows how this meansdisengagement in conversations, another work reports how this means greaterattractiveness [25].

    Psychologists have been noted that, when observed alone, vocal and paralin-guistic features have the highest correlation with person judgments of personalitytraits, at least in certain experimental conditions [18]. This has been confirmedin some studies in automatic recognition of personality traits which use nonver-bal behavior as predictors. A study on the prediction of personality impressionsanalyses predictability of Big 5 personality trait impressions using audio-visualnonverbal cues extracted from the vlogs [2]. Nonverbal cues include speakingactivity (speaking time, pauses, etc.), prosody (spectral entropy, pitch, etc.),motion (weigthed motion energy images, movements in front of camera), gazebehavior, vertical framing (position of the face), and distance to camera. Amongthe cues, speaking time and length, prosody, motion and looking time weremost significant for inferring the perceived personality. Observer judgments ofextraversion are positively correlated with high fluency, meaning greater lengthof the speech segments, and less number of speaking turns, and positively withloudness, looking time and motion. People who are observed as more agreeablespeak with higher voice, and people who are observed as more extraverted havea higher vocal control. In another study on meeting videos [19], speech relatedmeasurements (e.g., speaking time, mean energy, pithc, etc.) and percent oflooking time (e.g., amount of received and given gaze) were shown as significantpredictors of personality traits.

    Based on the overviewed literature we extract the following features fromhuman-agent interaction sequences: speaking activity, prosody, body leans, headdirection, visual activity, and hand activity. Every cue, except hand activity,is extracted automatically from whole conversational sequences. Whereas weacknowledge the importance of mimicry, in this experiment we only extractindividual behaviors of humans without looking at agent’s behavior.

    3.1 Audio Cues

    To extract nonverbal cues from speech, we first applied automatic speaker di-arization on human-agent audio files. We used an in-house speaker diarizationsystem and MIT Human Dynamics group toolkit ([10] to export voice qualitymeasures.

    Speaking activity. Based on the diarization output, we extracted the speechsegments of the subject and computed the following features for each human-agent sequence: total speaking length (TSL), total speaking turns (TST), filteredturns, and average turn duration (ATD).

    Voice quality measures. The voice quality measures are extracted on thesubject’s speech, based on the diraization output. We extracted the statistics -mean and standard deviation - of following features: pitch (F0 (m), F0 (std)),

  • 7

    pitch confidence (F0 conf (m), F0 conf (std)), spectral entropy (SE (m), SE(std)), delta energy (DE (m), DE (std)), location of autocorrelation peaks (LocR0 (m), Loc R0 (std), number of autocorrelation peaks (# R0 (m), # R0 (std)),calue of of autocorrelation peaks (Val R0 (m), Val R0 (std)). Furthermore, threeother measures were exported: average length of speaking segment (ALSS), av-erage length of voiced segment (ALVS), fraction of time speaking (FTS), voicingrate (VR), and fraction speaking over (FSO).

    3.2 Visual Cues

    One of the aspects we wanted to investigate in this study is the potential of usingcheap markerless motion capture systems (MS Kinect SDK v1.8) for the purposeof automatic social behaviour analysis. Using Kinect SDK upper body and facetracking information we created body lean and head direction classifier. Sincethe tracker produced significantly poor results for arm/hand joints, we manuallyannotated hand activity.

    Body Leans. In this paper we propose a module for automatic analysis ofbody leans from 3D upper body pose and depth image. We use a support vectormachine (SVM) classifier, RBF kernel, trained with extended 3D upper bodypose features. Extended 3D upper body pose is an extended version of featuresextracted from Kinect SDK upper body tracker; along with x-y position valuesof shoulders, neck and head, it also contains torso information and z-valuesof shoulders, neck and torso normalized with respect to the neutral body pose.Using our classifier, distribution of the following body leans is extracted: neutral,sideways left, sideways right, forward and backward leans (BL). These categoriesare inspired from psychological work on posture behavior and displays of affect[9]. Along with those distributions we also compute frequency of shifting betweenthose leans.

    Head Direction. We use a simple method which outputs three head di-rections; screen, table, or other (HDO), and frequency of shifts (HDFS). Themethod is using 3D object approximation of screen and table and head infor-mation retrieved from Kinect face tracker. The method is tested on manuallyannotated ground truth data and is proven to produce satisfying results.

    Visual activity. The visual activity of the subject is extracted by usingweighted motion energy images (wMEI), which is a binary image that describesthe spatial motion distribution in the video sequence [2]. The features we extractare statistics of wMEI: entropy, mean and median value.

    Hand activity. We manually annotated the hand activity of the subjectduring the interaction. The classes used are hidden hands, hand gestures (GES),gestures on table, hands on table, and self-touch (ST).

    4 Analysis and results

    In the first two parts of this section, we present the correlation analysis andlinks between the interaction experience and Big 5 traits and also the extracted

  • 8 Authors

    nonverbal cues. We compare and discuss our results with previous works frompsychology and human-agent interaction literature. We also present the resultsof our experiments for predicting interaction experience.

    4.1 Personality and interaction experience

    We find the individual correlations between Big 5 traits of the participants andindividual measures of interaction experience to understand what traits may beuseful to infer interaction with two virtual characters.

    Table 2. Significant Pearson correlation effects between Big Five traits and interactionexperience measures: QoI, DoR and DoL (p

  • 9

    to similarity rule, and opposite, complementary rule introduced in Section 1.Our results show two correlation effects of DoL and Big 5 traits for both agents:Obadiah, who is shown to have high neuroticism, and Poppy, who is shown tohave high extraversion [8]. Extraverted people in our study show tendency tolike Obadiah (for Poppy no relation is found), whereas more people with highneuroticism show tendency to like Poppy (for Obadiah no relation is found).These results show support for the complementary likeness rule.

    4.2 Nonverbal cues and interaction experience

    We study the individual association between nonverbal features and measures ofinteraction experience, which are shown in Table 3. As a first result we foundthat interactions with agent Poppy results in higher cue utilization (17) thanwith agent Obadiah (10). One possible explanation for this could be that sub-jects freely expressed themselves in second interaction (with Poppy) becausethey knew what is expected from them. This has been confirmed for assessmentof personality meaning that “when strangers get to know each other, informa-tion contained early in the interaction may be less useful for making accuratepersonality assessments” (see [8], p. 315. for discussion). To confirm this theory,slices of videos (e.g. only second part of human-Obadiah interaction) could beanalyzed, which is left as a future work.

    Table 3. Significant Pearson correlation effects between interaction experience mea-sures and nonverbal cues (p

  • 10 Authors

    designed to induce interaction. Besides, lower speech frequency, voiced segmentswhich last shorter, higher value of autocorrelation peaks (higher energy) andlower location of autocorrelation peaks, also show higher QoI for Poppy. Amongvisual cues, only back leans show how people who lean back more are not havinghigh QoI with Poppy. Back leans in this case may indicate boredom or lack ofinterest. Lower average turn duration was found to be significant for higher QoI,DoR and DoL for Obadiah, which may mean that people who feel empathy withsad Obadiah tend to speak less. Indeed, this was the case with some subjectswho reported how “they wanted to cheer up Obadiah” after the experiment. Tosupport this theory, linguistics content of the speech could be analysed.

    For Obadiah QoI, we also got two significant results, showing that peoplewho make less gestures and whose speech segments last shorter are reportinghigher QoI. Shorter speech segments, which are also reported for higher DoLand DoR for Obadiah indicate that the interaction is indeed two ways, the agentresponses to the subject, which could explain the high QoI. For Obadiah, peoplewho lean back a lot show lower DoR. In case of Poppy, people who take moreturns, lean back less, speak louder, touch themselves less, and shift their headless and look away from screen less show higher DoR. These can all be identifiedas signals of interest. These features are also found to be significant in human-human interaction ([5]).

    With regards to DoL, in the case of Obadiah, only vocal features of thesubjects were found to be significant. Average turn duration and average lengthof speaking segments were also found to be significant in DoR and QoI. Lowerfraction of speaking time and fraction speaking over are related to higher DoL,which means people who speak less tend to like Obadiah more. With regardsto visual nonverbal cues and DoL, we found significant results only for Poppy.Compared to sad Obadiah, cheerful Poppy is a more likable character. Peoplewho like Poppy more, lean back less and do not move their head away fromscreen a lot. This result is also related to findings on likeness in human-humaninteraction, showing how people show more direct eye contact to liked partner[11]. Although in our case only head direction is computed, our results show thatit is an acceptable approximation to eye contact in this scenario.

    4.3 Regression Analysis

    The task of predicting the interaction experience is addressed by building compu-tational models for predicting the score of individual measures of perceived expe-rience: QoI, DoR and DoL. For prediction task we used three different regressionmodels: support vector regression (SVR), neural networks (NN) and kernel ridgeregression (KRR). Each model is trained using double cross-validation (CV) ap-proach in which for outer fold we used leave-one-out CV, and for inner fold weused 5-fold CV approach. The inner fold is used for parameter optimization.SVR and KRR models use RBF kernel. The models are trained with differentfeature sets: we experimented with (1) all extracted nonverbal behaviour (NVB)cues, (2) all NVB cues and all personality traits (PT), (3) all PT, (4) signifi-cant NVB cues, (5) significant PT, and (6) significant NVB and PT (significant

  • 11

    NVB cues and PT are shown in tables 2 and 3). Additionally, for all featuresets we also applied Principal Component Analysis (PCA), in order to reducedimensionality of data.

    Table 4. Prediction results for Obadiah and Poppy with different feature sets (Person-ality traits (PT), nonverbal behaviour (NVB), all vs. significant cues. For each featureset we only show results of the best regression model.

    Feature Set Meth. R2 RMSE

    Obadiah

    QoINVB+PT (sig.) SVR 0.271 0.146PT (sig.) SVR 0.187 0.154NVB+PT (all, PCA) SVR 0.05 0.166

    DoRNVB+PT (sig.) SVR 0.34 0.137NVB (sig.) KRR 0.106 0.159PT (sig.) KRR 0.134 0.157

    DoLNVB+PT (sig) KRR 0.174 0.197NVB (sig.) KRR 0.066 0.201PT (sig.) SVR 0.016 0.215

    Poppy

    QoINVB+PT (sig.) KRR 0.518 0.126NVB (sig.) KRR 0.158 0.170PT (sig.) SVR 0.017 0.184

    DoRNVB+PT (sig.) SVR 0.376 0.134NVB (sig.) SVR 0.152 0.195NVB+PT (all, PCA) KRR 0.114 0.199

    DoLNVB+PT (sig.) SVR 0.241 0.211NVB (sig.) SVR 0.212 0.212NVB+PT (all, PCA) KRR 0.097 0.230

    Table 4 shows the results of our experiments, where we report the R2 andRoot Mean Square Error (RMSE). Among different feature sets and regressionmodels that we have experimented with, we report the best results for eachaspect. To stress the difference between experimented feature sets, we only showthe results of the best regression model for a specific feature set.

    One can first notice how for the best results are obtained when personalitytraits and nonverbal cues are combined, which boost the performance of eachindividual input source (PT and NVB used alone). QoI and DoR for both agentsare predicted with R2 of 0.3. Although the R2 values found for QoI and DoRare on the low side, they are comparable to the results found in other studiesin social computing literature for predicting several other social aspects suchas personality [1] [2]. DoL is the weakest aspect of our prediction models: Thehighest R2 for Obadiah is 0.174 and Poppy 0.240. With regards to regressionmethod, SVR performed the best for each interaction aspect, except for QoIfor Poppy and DoL for Obadiah. However, for QoI for Obadiah SVR and KRRproduced similar results.

  • 12 Authors

    5 Conclusions

    Our paper presents a study in which we attempt to analyze and predict theexperience of interaction (or perceived interaction) with virtual characters. Anovelty of our work is that we use a combination of nonverbal cues and per-sonality traits to predict the experience. We examined self-reported personalitytraits and extracted nonverbal cues as descriptors of experience and found thatpersonality traits are very significant features, as also reported in [13]. Bestresults for all experience measures are obtained when nonverbal cues and per-sonality traits are used together as features. Prediction results show that thedegree of rapport for agent Obadiah and quality of interaction for agent Poppyare the most predictable measures. We also found that some of the extractednonverbal features are related to socio-psychological findings on affect and liking,such as body leaning and head orientation. More significant nonverbal features,or higher cue utilization, were found in second interaction (with agent Poppy),which could be due to the fact that conditions in our experimental design are notcounterbalanced. The same fact is the reason why we can not show strong sup-port to complementary likeness rule in HAI, which is a result of our study. Oneof our interest with obtained data were potentials of cheap markerless motioncapture system (MS Kinect v1.8) for the purpose of automatic social behavioranalysis. Experiments with head information from the face tracking system weresuccessful and we build head direction classifier. However, upper body trackingfailed to produce good results for hand tracking, so hand activity was manuallyannotated. Moreover, additional, information was required to build body leansclassifier.

    Future work includes several directions. First, to improve prediction of like-ness we consider exporting more significant nonverbal cues related to affect andliking, e.g. eye shifts and eye contact [11]. To improve quality of interaction itmight be useful to use overlapped speech segments as feature. We also plan toexplore prediction of personality from nonverbal cues from both human-agentinteraction and self-presentation sequences, as output of those predictors mightbe used to build automatic predictors of interaction experience using only non-verbal behaviour features. Besides, with regards to unbalanced nonverbal cueutilization between agent Obadiah and Poppy, we plan to inspect interactionslices instead of whole interaction sequences.

    Acknowledgments.

    References

    1. Aran, O., Gatica-Perez, D.: One of a Kind: Inferring Personality Impressions inMeetings. In International Conference on Multimodal Interaction (ICMI), pp. 11-18, Sydney (2013)

    2. Biel, J.-I., Gatica-Perez, D.: ”The Youtube lens: Crowdsourced personality impres-sion and audiovisual of vlogs,” IEEE Transactions on Multimedia, vol. 15, no. 1,pp. 41-55 (2012)

  • 13

    3. Buisine, S., Martin, J. C.: The influence of user’s personality and gender on the pro-cessing of virtual agents multimodal behavior. Advances in Psychology Research,pp. 1-14, (2010)

    4. Cafaro, A., Vilhjalmsson, H. H., Bickmore, T., Heylen, D., Johannsdottir, K. R.,Valgardhsson, G. Steinn: First impressions: Users judgements of virtual agents’personality and interpersonal attitude in first encounters. In IVA12, pp. 67-80(2012)

    5. Cuperman, R., Ickes, W.: Big five predictors of behavior and perceptions in initialdyadic interactions: personality similarity helps extraverts and introverts, but hurtsdisagreeables. Journal of Personality and Social Psychology, 97(4), pp. 667-684(2009)

    6. Kang, S.-H., Gratch, J., Wang, N., Watt, J. H.: Agreeable people like agreeablevirtual humans. In Intelligent virtual agents (IVA) ’08, pp. 253-261 (2008)

    7. Knapp, M. L., Hall, J. A., Horgan, T. G.: Nonverbal Communication in HumanInteraction. Cengage Learning, 8 edition, Jan (2013)

    8. McRorie, M.. Sneddon, I., McKeown, G., Bevacqua, E., Sevin, E., Pelachaud, C.:Evaluation of four designed virtual agent personalities. Transactions on AffectiveComputing, 3(3), pp. 311-322 (2012)

    9. Mehrabian, A.: Significance of posture and position in the communication of atti-tude and status relationships. Psychological Bulletin, 71(5), pp. 359-372 (1969)

    10. Pentland, A. S.: Honest Signals: How They Shape Our World. The MIT Press(2008)

    11. Ray, G. B., Floyd, K.: Nonverbal expressions of liking and disliking in initial in-teraction: Encoding and decoding perspectives. Southern Communication Journal,71(1), pp. 45-65 (2006)

    12. Tapus, A., Mataric, M. J.: Socially assistive robots: The link between personality,empathy, physiological signals, and task performance. In Emotion, Personality, andSocial Behavior, pp. 133-140. AAAI (2008)

    13. von der Putten, A. M., Kramer, N. C., Gratch, J.: How our personality shapes ourinteractions with virtual characters - implications for research and development.In Intelligent virtual agents (IVA) ’10, Philadelphia, PA (2010)

    14. Biel, J.-I., Aran, O., Gatica-Perez, D.: The Good, the Bad, and the Angry: Analyz-ing Crowdsourced Impressions of Vloggers. In International Conference on Weblogsand Social Media (ICWSM), (2012)

    15. Nass, C., Lee, K. M.: Does computer-generated speech manifest personality? anexperimental test of similarity-attraction. In CHI 00: Proceedings of the SIGCHIconference on Human factors in computing systems, pp. 329-336, New York, NY,USA, (2000)

    16. Isbister, K., Nass., C.: Consistency of personality in interactive characters: verbalcues, non-verbal cues, and user characteristics. International Journal of Human-Computer Studies, 53(2), pp. 251-267, (2000)

    17. McCrae, R. R., Costa, P. T.: A contemplated revision of the NEO Five-FactorInventory, Personality and Individual Differences, vol. 36 (3), pp. 587 596, (2004)

    18. Ekman, P., Friesen, W., OSullivan, M., cherer, K. :Relative importance of face,body, and speech in judgments of personality and affect, Journal of Personalityand Social Psychology, vol. 38, no. 2, pp. 270-277, (1980)

    19. Pianesi, F., Zancanaro, M., Lepri, B., Cappelletti. A.: A multimodal annotated cor-pus of consensus decision making meetings, Language Resources and Evaluation,vol. 41, no. 3, pp. 409-429, (2007)

    20. Maxwell, G. M., Cook, M. W.: Postural congruence and judgements of liking andperceived similarity. New Zealand Journal of Psychology, 15(1), pp. 20-26, (1985)

  • 14 Authors

    21. Waldron. J.: Judgement of like-dislike from facial expression and body posture.Perceptual and Motor Skills, 41(3), pp. 799-804, (1975)

    22. Vacharkulksemsuk, T. B., Fredrickson, L.: Strangers in sync: Achieving embodiedrapport through shared movements. Journal of Experimental Social Psychology,48(1), pp. 399-402, 2012.

    23. Sun, X., Nijholt, A., Truong, K. P, Pantic, M.: Automatic understanding of affec-tive and social signals by multimodal mimicry recognition. In Proceedings of the4th international conference on Affective computing and intelligent interaction -Volume Part II (ACII’11), Sidney D’Mello, Arthur Graesser, Bjrn Schuller, andJean-Claude Martin (Eds.), Vol. Part II. Springer-Verlag, Berlin, Heidelberg, pp.289-296. (2011)

    24. Argyle, M.: Bodily communication. Methuen, (1988)25. Palmer, M. T., Simmons, K. B.: Communicating intentions through nonverbal

    behaviors conscious and nonconscious encoding of liking. Human CommunicationResearch, 22(1), pp- 128-160, (1995)

    26. Sanchez-Cortes, D., Aran, O., Jayagopi, D., Schmid Mast M. , Gatica-Perez, D.Emergent Leaders through Looking and Speaking: From Audio-Visual Data toMultimodal Recognition Journal on Multimodal User Interfaces, Special Issue onMultimodal Corpora, published online Aug. 2012, Vol. 7, Issue 1-2, pp. 39-53,(2013)

    27. Lefter, I., Burghouts, G. J., Rothkrantz, L. J.M.: Automatic Audio-Visual Fusionfor Aggression Detection Using Meta-information avss, pp.19-24, 2012 IEEE NinthInternational Conference on Advanced Video and Signal-Based Surveillance, (2012)

    28. Jung, C.: Psychological types. New York: Harcourt, Brace., (1921)