Abstract Department of Psychology Jim Blascovich Jack ...bailenso/papers/TSI.pdf · haiku, stir...

14
Jeremy N. Bailenson [email protected] Virtual Human Interaction Lab Department of Communication Stanford University Stanford, CA 94305-2050 Andrew C. Beall Jack Loomis Jim Blascovich Department of Psychology University of California at Santa Barbara Santa Barbara, CA 93106 Matthew Turk Department of Computer Science University of California at Santa Barbara Santa Barbara, CA 93106 Presence, Vol. 13, No. 4, August 2004, 428 – 441 © 2004 by the Massachusetts Institute of Technology Transformed Social Interaction: Decoupling Representation from Behavior and Form in Collaborative Virtual Environments Abstract Computer-mediated communication systems known as collaborative virtual environ- ments (CVEs) allow geographically separated individuals to interact verbally and nonverbally in a shared virtual space in real time. We discuss a CVE-based research paradigm that transforms (i.e., filters and modifies) nonverbal behaviors during social interaction. Because the technology underlying CVEs allows a strategic decoupling of rendered behavior from the actual behavior of the interactants, conceptual and perceptual constraints inherent in face-to-face interaction need not apply. Decou- pling algorithms can enhance or degrade facets of nonverbal behavior within CVEs, such that interactants can reap the benefits of nonverbal enhancement or suffer nonverbal degradation. Concepts underlying transformed social interaction (TSI), the ethics and implications of such a research paradigm, and data from a pilot study examining TSI are discussed. 1 Introduction While conversing, you could look around the room, doodle, fine-groom, peel tiny bits of dead skin away from your cuticles, compose phone-pad haiku, stir things on the stove; you could even carry on a whole separate ad- ditional sign-language-and-exaggerated-facial-expression type of conversa- tion with people right there in the room with you, all while seeming to be right there attending closely to the voice on the phone. And yet—and this was the retrospectively marvelous part— even as you were dividing your at- tention between the phone call and all sorts of other idle little fugue-like activities, you were somehow never haunted by the suspicion that the person on the other end’s attention might be similarly divided (Wallace, 1996, p. 146). In his hypothetical depiction of future, video-based remote interaction, Wal- lace looks back fondly on traditional phone conversations and notes a distinct advantage that telephone conversations hold over videoconferencing. While remote conferences mediated by telephony limit interactants to a single com- munication channel, the second channel (i.e., visual information) offered in conferences mediated by video may prove superfluous or even counterproduc- tive to the quality of the interaction. Collaborative virtual environments (CVEs) that employ 3D computer- 428 PRESENCE: VOLUME 13, NUMBER 4

Transcript of Abstract Department of Psychology Jim Blascovich Jack ...bailenso/papers/TSI.pdf · haiku, stir...

Jeremy N BailensonbailensonstanfordeduVirtual Human Interaction LabDepartment of CommunicationStanford UniversityStanford CA 94305-2050

Andrew C BeallJack LoomisJim BlascovichDepartment of PsychologyUniversity of California atSanta BarbaraSanta Barbara CA 93106

Matthew TurkDepartment of Computer ScienceUniversity of California atSanta BarbaraSanta Barbara CA 93106

Presence Vol 13 No 4 August 2004 428ndash441

copy 2004 by the Massachusetts Institute of Technology

Transformed Social InteractionDecoupling Representation fromBehavior and Form inCollaborative VirtualEnvironments

Abstract

Computer-mediated communication systems known as collaborative virtual environ-ments (CVEs) allow geographically separated individuals to interact verbally andnonverbally in a shared virtual space in real time We discuss a CVE-based researchparadigm that transforms (ie filters and modifies) nonverbal behaviors during socialinteraction Because the technology underlying CVEs allows a strategic decouplingof rendered behavior from the actual behavior of the interactants conceptual andperceptual constraints inherent in face-to-face interaction need not apply Decou-pling algorithms can enhance or degrade facets of nonverbal behavior within CVEssuch that interactants can reap the benefits of nonverbal enhancement or suffernonverbal degradation Concepts underlying transformed social interaction (TSI) theethics and implications of such a research paradigm and data from a pilot studyexamining TSI are discussed

1 Introduction

While conversing you could look around the room doodle fine-groompeel tiny bits of dead skin away from your cuticles compose phone-padhaiku stir things on the stove you could even carry on a whole separate ad-ditional sign-language-and-exaggerated-facial-expression type of conversa-tion with people right there in the room with you all while seeming to beright there attending closely to the voice on the phone And yetmdashand thiswas the retrospectively marvelous partmdasheven as you were dividing your at-tention between the phone call and all sorts of other idle little fugue-likeactivities you were somehow never haunted by the suspicion that the personon the other endrsquos attention might be similarly divided (Wallace 1996 p 146)

In his hypothetical depiction of future video-based remote interaction Wal-lace looks back fondly on traditional phone conversations and notes a distinctadvantage that telephone conversations hold over videoconferencing Whileremote conferences mediated by telephony limit interactants to a single com-munication channel the second channel (ie visual information) offered inconferences mediated by video may prove superfluous or even counterproduc-tive to the quality of the interaction

Collaborative virtual environments (CVEs) that employ 3D computer-

428 PRESENCE VOLUME 13 NUMBER 4

generated avatars to represent human interactants (asopposed to direct video feeds) may provide an ideal bal-ance between the limited information offered via audiocommunication and the problems that seem inherent tovideoconferences In most current CVE implementa-tions interactants have the opportunity to utilize twoperceptual channels audition and vision However un-like a videoconference a CVE operating system can bedesigned to render a carefully chosen subset of interac-tantsrsquo nonverbal behaviors filter or amplify that subsetof behaviors or even render nonverbal behaviors thatinteractants may not have performed

Transformed social interaction (TSI) involves noveltechniques that permit changing the nature of socialinteraction (either positively or negatively) by providingsystem designers with methods to enhance or degradeinterpersonal communication Tracking nonverbal sig-nals (eg eye gaze facial gestures body gestures) andrendering them via avatars allows for a strategic decou-pling of transmitted nonverbal signals from one interac-tant from those received by another (ie rendered) Forexample eye gaze directed from A to B can be trans-formed without Arsquos knowledge such that B experiencesthe opposite gaze aversion The idea of decoupling ren-dered behaviors from actual ones is not new (see dis-cussion in Benford Bowers Fahlen Greenhalgh ampSnowdon 1995 on truthfulness as well as LoomisBlascovich amp Beall 1999) Here we explore this strate-gic decoupling TSI can be applied to some all or nomembers of a CVE

Distorting the veridicality of communication signalscertainly raises ethical questions We do not advocatethe unconstrained use of TSI However we do believethat as CVEs become widespread decoupling renderedbehavior from actual behavior is inevitable Indeed cur-rent users of chat rooms and networked video gamesfrequently represent themselves nonveridically (Yee2002) Consequently the ethical implications of TSIwarrant serious consideration by anyone who interactsvia CVEs At the very least CVE system designersshould anticipate and try to obviate misuse ExaminingTSI now as a basic research question will increase theprobability that we can ethically use and manage CVEsin the future

The remainder of this paper is divided into three sec-

tions First we review some of the ideas and currentimplementations of CVEs nonverbal behavior trackingtechnology and the visual nonverbal behaviors in inter-action Second we provide concrete examples of TSIand discuss possible implications for conversation Fi-nally we conclude by discussing some of the ethical im-plications of TSI provide pilot data from a study inwhich participants attempted to detect TSI and pointto future directions for research

2 Nonverbal Behavior and CVEs

Social scientists have long understood that socialinteraction involves communication of both verbal andnonverbal signals The former include spoken writtenand signed language the latter include gaze gesturesand postures facial expressions touch etc as well asparalanguage cues such as variations in intonation andvoice quality If specific parallel signals were redundantin meaning across channels little need would exist formultiple channels and correspondingly little needwould exist for sophisticated telecommunication tech-nology beyond simple audio transmission

However signals often prove inconsistent acrosschannels (eg ldquoHersquos a winnerrdquo can communicate itsliteral meaning or the opposite depending on tone)Furthermore some channels appear less controllable byinteractants and hence are judged more veridical (egnonverbal channels communicating feelings or emotionsand motivation) Also signals directed toward specificinteractants convey messages to third parties For exam-ple if two interactants share mutual gaze to the exclu-sion of a third the message to the third person can leadto feelings of ostracism (Williams Cheung amp Choi2000)

Although much research on the role of nonverbalsignals in social interaction has appeared (for reviewssee Argyle 1988 Depaulo amp Friedman 1998 Patter-son 1982) for the most part investigators have had tochoose between ecological validity (ie a realistic set-ting or environment) and experimental control forcingthe sacrifice of one for the other Ecologically realisticresearch has tended to involve qualitative observationsExperimental work ideally examines social behavior in

Bailenson et al 429

the lab via strict controls over most variables sometimeseven involving confederates or imagined scenarios butwithout much in the way of external validity or general-izability CVEs promise to produce major advances inthe understanding of social interaction both dyadic andgroup by allowing much more ecological validity whilemaintaining a high level of experimental control (Blas-covich et al 2002 Loomis et al 1999)

Technology has long facilitated social interaction Forcenturies written correspondence has proven highlyeffective for communicating ideas and to a lesser ex-tent feelings The telegraph permitted more or less real-time interaction However the telephone constituted anenormous advance both because it afforded real-timeinteraction and because it allowed communication viaparalanguage cues so important for emotional exchangeMore recently videoconferencing has permitted thecommunication of some visual nonvisual (NV) cues butwith little opportunity for ldquoside-channelrdquo communica-tion among nonconversing group members (eg mean-ingful glances) and typically without allowing for mu-tual gaze among group members (Gale amp Monk 2002Lanier 2001 Vertegaal 1999) Now CVEs promise topromote more effective dyadic (ie 2-person) and n-person interactions (Zhang amp Furnas 2002 BailensonBeall amp Blascovich 2002 Slater Sadagic Usoh ampSchroeder 2000 Normand et al 1999 Leigh De-Fanti Johnson Brown amp Sandin 1997 Mania ampChalmers 1998 Schwartz et al 1998) by sensing andrendering the visual NV signals of multiple interactantsTwo approaches in this regard are (1) capturing andinterpolating 2D images from multiple video camerasand recovering the 3D models and (2) tracking ges-tures using a variety of sensors including video Theinterpolated images or rendered 3D models can then bedisplayed to each of the interactants

In addition to making remote human interaction pos-sible communication technology has important scien-tific value in terms of facilitating the assessment of thesufficiency or adequacy of transmitted verbal and non-verbal signals For example the fact that telephone con-versants feel an intimate connection indicates that audi-tory information is often adequate for personallymeaningful dyadic interaction This sense of connected-ness persists despite interactantsrsquo awareness that they are

actually talking to devices which indicates that the pro-cess of social interaction via telephone is to some extentldquocognitively impenetrablerdquo (Pylyshyn 1980) Mirrortalking provides another compelling example If a roomcontains a large mirror people often find themselvesconversing with each otherrsquos mirror or ldquovirtualrdquo imageInterestingly no discernible loss in effectiveness of theinteraction appears to occur even though each interac-tant knows that he or she is not engaging in face-to-faceinteraction with the actual person This ldquotransparencyrdquoof interaction is also observed in dyadic interaction overproperly designed videoconferencing systems (ie onesthat permit mutual eye gaze) and will be true of CVEsystems in the near future even though interactantsknow at some level that they see only digital models ofother interactants Transparency of interaction reflectedboth in interactantsrsquo experience and in the effectivenessof group performance (eg in collaborative decisionmaking) speaks to the sufficiency of the verbal and non-verbal signals and also indicates that social interaction ismediated by automatic processes that are quite separatefrom conscious cognition (Fodor 1983 Pylyshyn1980) Thus the creation of new communication mediacan provide insight into human social interaction

3 Implementations of TSI

In this section we outline three important TSIcomponents Each involves a number of theoreticalideas that warrant technical development as well as eval-uation via behavioral research The categories of TSIinclude self representations (ie avatars) sensory capabil-ities and contextual situation Each category also pro-vides researchers with powerful new tools to investigateand improve understanding of psychological processesunderlying behavior (Blascovich et al 2002 Loomis etal 1999) Specifically investigators can manipulate theunderlying structure of social interaction using TSI byaltering the operation of its individual components andthereby ldquoreverse engineeringrdquo social interaction In thispaper however the focus is to explore the theoreticalnature of TSI as its own basic research question and tospeculate on its potential implications for communica-tion via CVEs While we discuss these three categories

430 PRESENCE VOLUME 13 NUMBER 4

as separate entities clearly a system that employs TSIwould be most effective as some combination of thethree We keep them separate for the purpose of clarityin this paper

We realize that all of the necessary CVE technologymay not yet be available (see Kraut Fussell Brennan ampSiegel 2002) Furthermore in order to adequatelystudy and enable transformed social interaction in col-laborative virtual environments the technology used fortracking nonverbal signals must eventually be passiveand unobtrusive Sensors and markers that are worn onthe body can limit the naturalness of interaction bycausing participants to focus on the technology at theexpense of the interaction Computer vision technologyoffers the possibility of using passive noncontact sens-ing to locate track and model human body motionSubsequently pattern recognition and classificationtechniques can be used to recognize meaningful move-ments and gestures

In the past dozen or so years there has been a signifi-cant and increasing interest in these problems within thecomputer vision research community (Turk amp Kolsch2003 Black amp Yacoob 1997 Donato Bartlett HagerEckman amp Sejnowski 1999 Feris Hu amp Turk 2003Stiefelhagen Yang amp Waibel 1997 Viola amp Jones2001) Motivated by various application areas includingbiometrics surveillance multimedia indexing and re-trieval medical applications and human-computer in-teraction there has been significant progress in areassuch as face detection face recognition facial expressionanalysis articulated body tracking and gesture recogni-tion The state of the art in these areas is not yet to thepoint of fully supporting CVEs as many of these sys-tems tend to be slow and lack robustness in real-worldenvironments (with typical changes in lighting cloth-ing etc)

But the progress is promising and we expect to seean increased utility of these technologies to track andmodel nonverbal behaviors in order to transmit andtransform them within the context of CVEs We believethat each of the TSI implementations discussed in thecurrent work is foreseeable perhaps even in the nearfuture For the purposes of the current discussion de-tails of the specific CVE implementation are not criticalTSI should be effective in projection-based CVEs head-

mounted display CVEs CAVEs or in certain types ofaugmented-reality CVEs

A concrete example of a typical CVE interaction helpsdescribe the specific types of transformations GenerallyTSI should enable interactants to communicate moreeffectively by providing them with more information aswell as providing them (or systems designers) with morecontrol in directing their nonverbal behaviors The lat-ter suggests on a more cynical note that the peoplewho may profit most from TSI may be those who enterinteractions with specific goals for example changingthe attitudes of the other interactants (Slater Pertaubamp Steed 1999) In the subsequent sections we describean interaction with a leader and one or more commu-nity members evaluating a proposal in a CVE Howeverone could just as easily substitute leader with politicianteacher lawyer leader or missionary and substitutecommunity members with voters students jurorsmembers or atheists Hence the theoretical parametersand implications of TSI have applications across manydifferent contexts

31 Transforming Self Representations

In CVEs avatars representing interactants canbear varying degrees of photographic or anthropomor-phic (Garau et al 2003 Bailenson Beall BlascovichRaimmundo amp Weisbuch 2001 Sannier amp Thalmann1998) behavioral (Bailenson et al 2002 Cassell 2000Biocca 1997) and even dispositional resemblance tointeractants they represent Assuming that interactants(by their own design or through the actions of systemsoperators) have the freedom to vary both the photo-graphic and behavioral similarity of their avatar to them-selves a number of subtle but potentially drastic (interms of outcomes of CVE interactions) transformationscan occur

In many instances similarity breeds attraction (Byrne1971) We know that people treat avatars that look likethemselves more intimately than avatars that look likeothers as indicated by invasion of their personal spaceand willingness to perform embarrassing acts in front ofthem and by how attractive and likable they believe theavatars to be (Bailenson Blascovich Beall amp Guadagno2004 Bailenson et al 2001) Given this special rela-

Bailenson et al 431

tionship a CVE interactant may use this principle to anadvantage Consider the situation in which a leader anda community member are negotiating via a CVE A par-ticularly devious leader can represent herself by incorpo-rating characteristics of the memberrsquos representation Bymaking herself appear more similar to the member theleader becomes substantially more persuasive (Chaiken1979 Simons 1976) Indeed a leader would be able toadjust the structural or textural similarity of her ownavatar idiosyncratically to the members in her audience

This similarity could be achieved in various mannersemploying any of a number of techniques to parametri-cally vary the similarity of computer-generated modelsvia 2D and 3D morphing techniques (Blanz amp Vetter1999 Busey 1998 Decarlo Metaxas amp Stone 1998)The leader could be represented as some kind of a hy-brid maintaining some percentage of her original facialstructure and texture but also incorporating percent-ages of the memberrsquos structure and texture Alterna-tively the leader could be represented completelyveridically to her facial structure but for a few framesper second could replace her own head with the head ofthe member Priming familiarity with limited exposureto human faces has proven to be effective with 2D im-ages (Zajonc 1971) Finally consider the situation inwhich the leader is interacting via CVE with two mem-bers The leader can be differentially represented toboth members simultaneously such that each membersees a different hybrid leader avatar incorporating as-pects of each member In other words the leader doesnot need a consistent representation across interactantsbecause the CVE operator is free to render differentleader avatars to each member

Incorporating the self-identity of other interactantscan also occur via behavioral characteristics Psychologi-cal research has demonstrated that when an experi-menter subtly mimics experimental participants (egleans in the same direction as they do crosses her legswhen the participants do) participants subsequentlyreport that they liked the experimenter more andsmoother conversation flowed (Chartrand amp Bargh1999) This ldquochameleon effectrdquo could be extremely ef-fective in CVEs The leader (or the system operator) canuse algorithms to detect motions of the other interac-tants at varying levels of detail and coordinate the ani-

mations of her avatar to be a blended combination ofher own and those of the others

Consider a CVE interaction consisting of a leader andtwo members In the course of this interaction patternsof nonverbal behaviors will emerge and statistics basedon a running tabulation can be automatically collectedvia CVE technology In other words if there is a certainrate of head nodding exhibited by person A and anotherrate exhibited by person B the leaderrsquos head can bemade to nod in a way consistent with the statistics (egan average or median) Alternatively the leaderrsquos avatarcan just mimic each interactant individually and renderthose particular movements only to each correspondinginteractant

The leader could also morph her representation withthat of an unrepresented party not present in the CVEbut who is previously known to possess qualities thatinspire certain reactions Depending on the context forexample the leader can morph a percentage of famouspoliticians historical figures or even pop stars into heravatar This feature blending can be explicit and blatant(eg the leader looks just like an expert or a religiousfigure) or more implicit and subterranean (eg theleader incorporates subtle features such as cheekbonesand hairstyle) Alternatively the leader can morph her-self with a person who may not be famous but withwhom the member maintains a deep trust (Gibson1984)

A second form of avatar transformation arises fromthe ability to selectively decouple and reconstruct ren-dered behavior in CVEs In other words not only caninteractants render nonverbal behaviors different fromthe nonverbal behaviors that they actually perform butsimilarly to the discussion above they can render thosebehaviors idiosyncratically for each of the other interac-tants

Consider what we term Non-Zero-Sum-Mutual-Gaze(NZSMG) Ordinary mutual gaze occurs when individ-uals look at one anotherrsquos eyes during discourse Inface-to-face conversation mutual gaze is zero-sum Inother words if interactant A maintains eye contact withinteractant B for 70 percent of the time it is not possi-ble for A to maintain eye contact with interactant C formore than 30 percent of the time However interactionin CVEs is not bound by this constraint With digital

432 PRESENCE VOLUME 13 NUMBER 4

avatars A can be made to appear to maintain mutualgaze with both B and C for a majority of the conversa-tion

Gaze is one of the most thoroughly studied nonverbalgestures in research on social interaction (Rutter 1984Kleinke 1986 Kendon 1977) Direct eye gaze canprovide cues for intimacy agreement and interest(Arygle 1988) Furthermore gaze can enhance learningduring instruction as well as memory for information(Fry amp Smith 1975 Sherwood 1987) The advantageof using CVEs is that normal nonverbal behaviors ofinteractants can be augmented via NZSMG Further-more the interactants in a CVE can either be unawareof this transformation (ie implicit NZSMG) or awareof this transformation (ie explicit NZSMG) as Figure1 demonstrates Preliminary work studying implicitNZSMG has demonstrated that interactants are notaware of the decoupling from actual behavior Further-more the interactants respond to the artificial gaze as ifit were actual gaze (Beall Bailenson Loomis Blasco-vich amp Rex 2003) This method may prove to be mosteffective during distance learning in educational CVEs(Morgan Kriz Howard das Neves amp Kelso 2001) inwhich the instructor uses her augmented gaze as a toolto keep the students more engaged

Decoupling can also be used to achieve the oppositeeffect Consider the situation where the leader wants toscrutinize the nonverbal behaviors of member A butdoes not want the member to feel uncomfortable fromher unwavering gaze The leader can render herselflooking at her shoes or perhaps at member B in the

CVE while in reality she is watching member Arsquos everymove

In order for such a system to be effective there mustbe a convincing algorithm to drive the autonomous eyegaze In other words if the leader wants the freedom toemploy NZSMG or to wander around the CVE scruti-nizing different aspects of the conversation undetectedshe (by her own device or assisted by the systems opera-tor) must maintain the illusion that her avatar is exhibit-ing the typical and appropriate nonverbal gesturesThere are a number of ways to achieve this The first issome type of artificial intelligence algorithm that ap-proximates appropriate gestures of the leaderrsquos avatar bymonitoring the gestures and speech by the other inter-actants While there have been significant advances inthis regard (Cassell 2000) the ability of an algorithmto process natural language as well as generate believ-able responses may still be many years off A more likelymethod for achieving this goal would be to use actualhumans instead of AI algorithms In this scenario theleader employs one or more nonverbal ldquocyranoidsrdquo(Milgram 1992) to augment the nonverbal behaviorspresented to each individual member To do so theleader solicits the help of several assistants each of whosejob is to provide the nonverbal behaviors targeted to-ward a particular member See Figure 2

In this many-to-many ldquoWizard of Ozrdquo implementa-tion each member is presented a unified Leader who isrendered privately to her this private representationwould be a melding of the actual leader and one of herassistants so that when the leaderrsquos attention was di-

Figure 1 Internal belief states from implicit NZMG (left) and explicit NZMG (right)

Bailenson et al 433

verted away from that member for long periods of timethe assistant could step in and help maintain a believableinteraction by seamlessly serving as the leaderrsquos proxyThe leader herself would then act as a conductor over-seeing all the interactions yet being free to focus herattention on individual members when she so desires Inaddition the leader is free to wander about the digitalspace consult her notes take a rest or conduct a side-bar meeting with another person However because heravatar is partially cyranic it can continue to exhibit theappropriate nonverbal behaviors all the while to eachmember Furthermore having a number of assistantswhose sole focus is to respond with appropriate nonver-bal gestures to each of the interactants in the CVEshould maximize the membersrsquo involvement or sense ofpresence in the CVE For important meetings seminarsor presentations conducted via CVEs individual interac-tants may want to utilize a number of assistants as a corepresentation team

32 Transforming Sensory Capabilities

Interactants can be assisted by technology thattakes advantage of CVEs that can keep precise runningtabs of certain types of behaviors and then display sum-

maries of those behaviors exclusively to individual inter-actants For example consider an educational CVE inwhich an instructor wants to ensure that she is directingher nonverbal behaviors in a desired fashion Such asinstructor may want to monitor her mutual gaze to en-sure that she is not looking at any one student morethan others during a presentation The tracking equip-ment used to render the scene can keep an online totalof the amount of time the instructor gazed at each indi-vidual student The CVE can render a display of thisgaze meter as well as use visual or auditory alerts toinform the instructor of disproportionate applications ofgaze

Furthermore interactants can use the tracking datasummaries to learn more about the attitudes of the oth-ers Nonverbal gestures are often correlates of specificmental states (Ekman 1978 Zajonc Murphy amp Ingle-hart 1989) For example in general we nod when weagree smile when we are pleased tilt our heads whenwe are confused and look at something in which we areinterested Interactants will be able to tailor their CVEsystems to keep track of nonverbal behaviors with thegoal of aiding interactants to infer the mental states ofthe other interactants For example a teacher will beable to gauge the percentage of students exhibitingnonverbal behaviors that suggest confusion or not un-derstanding a point in a lesson Similarly a leader coulddetermine who in a room full of members is respondingmost positively to her behavior Intuitively tabulatingand assessing the nonverbal behaviors of others is cer-tainly something that humans do constantly in face-to-face interactions With CVEs interactants will be able totabulate these behaviors with greater precision Interac-tants can use the objective tabulations from the trackingdata to augment their normal intuitions about the ges-tures occurring in the interaction

Another transformation involves filtering or degrad-ing certain signals or nonverbal behaviors There aresome visual nonverbal behaviors that tend to distractinteractants Using filtering algorithms interactants canprevent counterproductive distractions in a number ofways For example consider the situation in which aspeaker in a CVE taps her pen rapidly as she speaks Inface-to-face meetings this type of behavior can distract

Figure 2 A depiction of cyranoids On the top row are three

nonrendered gesturers Each member on the bottom hears the

leaderrsquos actual verbal behaviors (dashed lines) However each

member views the nonverbal behavior of her dedicated gesturer

rendered onto the avatar of the leader (unbroken lines)

434 PRESENCE VOLUME 13 NUMBER 4

interactants Using a CVE this type of behavior can befiltered in two ways First the speaker can filter the be-havior on the transmitting end If people know thatthey have difficulty suppressing certain nonverbal behav-iors that tend to be perceived in a negative mannersuch as a nervous tick they can activate a filter that pre-vents the behavior from being rendered Similarly incertain situations a CVE interactant may not want torender certain nonverbal behaviors Consider the leaderexample The potential member may benefit from ren-dering her ldquopoker facerdquo that is not demonstrating anyenthusiasm or disappointment via facial expressionsConsequently the member may accrue strategic advan-tage during a negotiation Furthermore interactants canfilter behaviors on the receiving end If a speakerrsquos handmotions are distracting then a listener can simplychoose to not render that interactantrsquos hand move-ments

Another example of transforming sensory capabilitiesis producing a visual indicator regarding where eachinteractantrsquos attention currently lies as revealed by their

eye direction (Velichkovsky 1995) We have explored atechnique that involves rendering each personrsquos viewfrustrum to indicate the field of view as Figure 3 illus-trates In this example the wire frame frustrums spot-light the 3D space visible to each person This featurecolor coded for each person may be especially helpfulto teachers in a distance learning CVE who could usesuch information to see where students are focusingtheir visual attention without having to look directly atthe studentsrsquo eyes

There are a number of similar tools (ie specific ob-jects rendered only to particular interactants) that canassist interactants in a CVE For example in ourNZSMG studies an experimenter enters a CVE and at-tempts to persuade other interactants regarding a cer-tain topic (Beall et al 2003) In those interactions werender the interactantsrsquo names over their heads on float-ing billboards for the experimenter to read In this man-ner the experimenter can refer to people by name moreeasily There are many other ways to use these floatingbillboards to assist interactants for example reminders

Figure 3 View frustrums marking the field of view of interactants

Bailenson et al 435

about the interactantrsquos preferences or personality (egldquodoesnrsquot respond well to prolonged mutual gazerdquo)

One of the most useful forms of transforming sensorycapabilities may be to enlist one or more human con-sultants who are rendered to only one member in aCVE (ie virtual ghosts) Unlike a face-to-face interac-tion a CVE will enable an interactant to have informedhuman consultants who are free to wander around thevirtual meeting space to scrutinize the actions of otherinteractants to conduct online research and sidebarmeetings in order to provide key interactants with addi-tional information and to generally provide support forthe interactants For example the leader can have herresearch team actually rendered beside her in the CVEMembers of her team can point out actions by potentialmembers suggest new strategies and even provide real-time criticism and feedback concerning the behavior ofthe leader without any of the other members havingeven a hint of awareness concerning the human consul-tantsrsquo presence Alternatively the leader herself can gointo ldquoghost moderdquo and explore the virtual world withher team while her avatar remains seated and is evencontrolled by yet another member of her team

33 Transforming the Situation

In addition to transforming their representationand sensory capabilities CVE interactants can also usealgorithms to transform their general spatial or temporalsituations In a CVE people generally adopt a spatiallycoherent situational context across all remote interac-tants that brings everyone together in the shared spaceHowever there is no reason that the details and ar-rangements of that virtual space need to be constant forall the interactants in the CVE Consider the situationfor three interactants Interactant A may choose to forman isosceles triangle with the other two while both in-teractants B and C may choose to form equilateral trian-gles Interactant A may even choose to flip the locationsof B and C In this scenario the CVE operating systemcan preserve the intended eye gaze direction by trans-forming the amplitudes or direction of head and eyemovements in a prescribed manner While this is asomewhat simple example with as many as four interac-tants it is straightforward to design spatial transforma-

tions that allow the intended eye and head gaze cues toremain intact across all interactants While eventuallysuch discordance may cause the quality and smoothnessof the interaction to suffer there are a number of waysthat transforming the situation can assist individual in-teractants

One such transformation involves multilateral per-spectives In a normal conversation each interactant hasa unique and privileged perspective That perspective isa combination of her sensory input (eg visual andacoustic fields of view) and internal beliefs about theinteraction In normal face-to-face interactions peoplecontinually use sensory input to update and adjust theirinternal beliefs (Kendon 1977) Interactants in a CVEwill possess a completely new mechanism to adjust andupdate internal beliefs A personrsquos viewpoint can bemultilateral as opposed to unilateral (normal) Inother words in a real-time conversation interactant Acan take the viewpoint of interactant B and perceiveherself as she performs various verbal and nonverbal ges-tures during the interaction In this manner she canacquire invaluable sensory information pertaining to theinteraction and update her internal beliefs concerningthe interaction in ways not possible without the CVE

Consequently interactants in educational and persua-sive interactions may be able to improve performancebecause seeing oneself through the eyes of another mayallow one to develop a more informed set of internalbeliefs about others (Baumeister 1998) Furthermoreit may be the case that being able to experience an inter-action through someone elsersquos eyes should reinforce thefact that one is indeed copresent in the CVE (egDurlach amp Slater 2000) Finally utilizing mulitlateralperspectives may assist students in distance learningCVEs in terms of training transfer effects (Rickel ampJohnson 2000) that might occur after an interactantwho has been trained in multilateral perspective takingperforms similar group tasks in nonmediated situations

A second situational transformation involves partiallyrecording the interaction and adjusting temporal prop-erties or sequences in real time Similar to commercialproducts sold for digitally recording and playing backbroadcast television interactants in a CVE should beable to accelerate and decelerate the perceived flow oftime during the mediated interaction Consider the fol-

436 PRESENCE VOLUME 13 NUMBER 4

lowing situation The student in a distance learningCVE does not understand an example that the instruc-tor provides The student can ldquorewindrdquo the recordedinteraction go back to the beginning of the confusingexample and then play back the example Once the stu-dent has understood the confusing example she canthen turn up the rate of playback (eg watch the se-quence at 2X speed) and eventually she can catch upto the instructor again By slowing down the renderedflow of time or speeding it up the interactant can focusdifferentially on particular topics and can review thesame scene from different points of view without miss-ing the remainder of the interaction Of course doingso will result in costs to that interactantrsquos contributionto the CVE in terms of interactivity (ie what does heravatar do while she rewinds) Consequently the disrup-tion of the temporal sequence will necessarily be cou-pled with some kind of an avatar autopilot

Changing the rate of time in a CVE brings up an-other interesting transformation Traditionally CVEsare roughly defined as ldquogeographically separated interac-tantsrdquo interacting over some kind of a computer-mediated network in a shared environment Howeverby combining some of the concepts discussed in previ-ous sections it may be possible to include in the defini-tion of a CVE ldquotemporally separated interactantsrdquo in ashared environment Consider a videoconference of abusiness meeting Oftentimes interested parties whocannot attend the meetings will later review a videotapeof the meeting In a CVE the temporally absent mem-ber has an option to more deeply involve herself in theinteraction Specifically she can situate her avatar in aspecific place in the CVEs seating arrangement and usean autopilot to give her representation rudimentarynonverbal behaviors Furthermore the absent membercan program her avatar to perform simple interactivetasksmdashprerecorded introductions answers to certainquestions about the CVE topic or perhaps more realis-tically for the near-term direct the avatar to play back arecorded performance Then the CVE interaction canproceed in real time with the temporally absent mem-berrsquos avatar approximating the types of behaviors thatshe would do and say As a result temporally presentmembers would actually direct pieces of the conversa-tion towards the absent member as well as transmit

nonverbal gestures towards her Later on instead of justreviewing the recording the temporally absent membercan take her place in the CVE and actually feel presentin the dialogue receiving appropriate nonverbal behav-iors and maximizing the degree of copresence More-over the members of the CVE who were present at thescheduled time can program their avatars during thereplay of the interaction to respond to any post hocquestions that the absent member might have In thisway the degree of interactivity during the replay can beincreased and perhaps at some point in the not-too-distant future the line between real-time and non-real-time interactions will become interestingly blurred

4 Implications of TSI and ResearchDirections

For better or for worse TSI implemented throughCVEs has great potential to change the nature of medi-ated interaction The strategic decoupling of renderedbehavior from actual behavior allows interactants tobreak many constraints that are inherent in face-to-faceinteraction as well as other forms of mediated interac-tions such as telephone and videophone conferencesThe effects of TSI remain to be seen Assuming thatimplementation of the TSI techniques are technicallyfeasible and that using TSI implementations is concep-tually workable for the interactants (both of which aresubstantial assumptions) one could predict a number ofconsequences First TSI may develop into a worthwhiletool that assists interactants in overcoming the inade-quacies of communicating from remote locations Byaugmenting their representational sensory and situa-tional characteristics interactants of CVEs may be ableto achieve levels of interaction that actually surpass face-to-face interaction

On the other hand people in fact may find the useof these transformations extremely unsettling Thereis the potential for the difference between TSI andcurrent CVE implementations to be as drastic as dif-ferences between email and the written letter As thistechnology is developed it is essential to examinepeoplersquos responses to this new medium (ie Reevesamp Nass 1996) It is essential to examine these impor-

Bailenson et al 437

tant potential implications of TSI before the technol-ogy becomes widespread

Along the same lines the threat of TSI may be thevery downfall of CVE interaction In face-to-face inter-action there tends to be some degree of deception forexample people using facial expressions to mask theiremotions Clearly this deception has the potential to bemuch greater with TSI If interactants have no faith thattheir perceptual experience is genuine they may havelittle reason to ever enter a CVE A complete lack oftrust in the truthfulness of gestures one-to-one corre-spondence of avatars and temporal presence of interac-tants has the potential to rob the CVE of one of itsgreatest strengths namely interactivity since the inter-actants may not know who what or when they are in-teracting with others Similarly given an expectation ofTSI interactants may be constantly suspicious duringinteractions this lack of trust of fellow interactants maylead to unproductive collaborations

A solution to this breakdown may require the devel-opment of TSI detectors for interactants either basedon computer algorithms that analyze nonverbal behav-iors or based on actual humans that scrutinize the inter-action To examine the possibility of using human TSIdetectors we now discuss what we call the non-verbalTuring Test (NVTT)

In the popular reinterpretation of the Turing Test(Turing 1950) a judge attempts to determine whichof two players (one human one machine) is a fellowhuman In our NVTT pilot studies experimental par-ticipants acting as judges enter a CVE with two vir-tual humans one avatar whose head movements areveridical and playing back the movements of anotherhuman in real time (ie without TSI) and one avatarwhose head movements are actually a transformationof the judgersquos own head movements (ie time-delayed and reduced motion range) The judge seesthe head movements from a real person on one avatarand some sort of self-mimicked movements on theother During the interaction only head movementsare permitted (ie no verbal communication al-lowed) and participants must devise ways to engageand test the two virtual humans through nonverbalmeans in order to ascertain which is human andwhich is a mimicker Figure 4 illustrates

In the current initial pilot study we manipulatedthree independent variables test trial length (either 16or 32 seconds) mimic delay (ie the computer-agentmimics either 1 2 4 or 8 seconds after the judgersquosmovements) and range of motion (high pitch yaw androll or low yaw only)

Participants in this study wore head-mounted displayswhile a render computer tracked their head orientationtracking rendering and networking latencies were alllow enough to impart a compelling sense of copresence(see Bailenson et al 2002 for detailed descriptions ofthe hardware and software used) Participants were in-structed to sit in a virtual room with two virtual peoplea human agent (ie a representation whose movementsare controlled by a real person in another room) and acomputer agent (ie a computer program that is de-signed to mimic the userrsquos movements in some way)Participants were instructed to interact with the twoother virtual people using head movements in order todetermine which one is the human agent Participantswere run in groups of two with each one acting as thehuman agent for the other Each participant sat at a vir-tual conference table with two virtual humans (similarto that shown in Figure 3) Each participant received arandom order of 32 test trials (two instances of the 16conditions resulting from the crossing of the three inde-

Figure 4 A schematic of the NVTT The human judge is forced to

determine which of the two avatars exhibiting head movements is the

real human and which is the computer-generated human mimicking

the judgersquos own movements

438 PRESENCE VOLUME 13 NUMBER 4

pendent variables) Forty-one undergraduates partici-pated in this study

For the purposes of brevity we focus on two resultsin particular First despite the fact that we explicitly toldparticipants that the computer agent was directly mim-icking them they performed surprisingly poorly whenattempting to identify the human avatar The overallaverage score was only 66 correct (SD 10chance 50 maximum score 100) Moreover ofthe 41 participants in the study more than one fourthwas not reliably different from chance (ie less than 3SEM from 50 between 44 and 56) Second as Fig-ure 5 demonstrates participantsrsquo scores diminished in-versely with the magnitude of the mimic delay in thatthere was a linear trend in the logarithm of the delayvariable F(132) 885 p 01 When the delay wasgreater than 1 second participants had more difficultyidentifying a mimicker

These data are particularly striking in that we had ini-tially predicted that participants would be able to recog-nize their own head movements much more easily whenaware that one avatar was designed to mimic them Inthis study we explicitly told participants about the useof TSI and they still had great difficulty in detecting itThe effects of implicit TSI (that is not disclosed) canonly have a higher impact While this pilot study is ex-tremely simple and only scratches the surface of a para-digm that examines TSI it is still noteworthy that par-ticipants did not detect the mimicker across the board

We are currently exploring other factors underlyingthe discrimination of human nonverbal behavior from

computer-generated behaviors In future studies wewill use NVTTs to study other nonverbal behaviorssuch as facial gestures eye-head gestures (pointingindications by either system) hand gestures and in-terpersonal distance We have shown that in albeitsimple scenarios it is possible to pass the NVTT for apercentage of our test population using TSI We areconfident that as this percentage grows in the nearfuture important scientific and sociological discover-ies will surface along the way

In conclusion there are many reasons one mightwant to avoid TSI these reasons range from Orwellianconcerns to the fear of rendering CVEs (perhaps eventhe telephone) functionally useless We are not advo-cates of TSI as a means to replace normal communica-tion nor are we staunch believers in avoiding TSI inorder to preserve the natural order of communicationand conversation However we do acknowledge thefact that as CVEs become more prevalent the strategicdecoupling of representation from behavior is inevita-ble For that reason alone the notion of TSI warrantsconsiderable attention

Acknowledgments

The authors would like to thank Robin Gilmour and Christo-pher Rex for helpful suggestions Furthermore we thankChristopher Rex and Ryan Jaeger for assistance in collectingdata This research was sponsored in part by NSF Award SBE-9873432 and in part by NSF ITR Award IIS 0205740

References

Argyle M (1988) Bodily communication (2nd ed) LondonMethuen

Bailenson J N Beall A C amp Blascovich J (2002) Mutualgaze and task performance in shared virtual environmentsJournal of Visualization and Computer Animation 13 1ndash8

Bailenson J N Beall A C Blascovich J Raimmundo Ramp Weisbuch M (2001) Intelligent agents who wear yourface Usersrsquo reactions to the virtual self Lecture Notes inArtificial Intelligence 2190 86ndash99

Bailenson J N Blascovich J Beall A C amp Guadagno

Figure 5 Percent correct by mimic delay in seconds This data

excludes subjects at chance performance

Bailenson et al 439

R E (submitted) Self representations in immersive virtualenvironments

Baumeister R F (1998) The self In D T Gilbert S TFiske amp G Lindzey (Eds) Handbook of social psychology(4th ed pp 680ndash740) New York McGraw-Hill

Beall A C Bailenson J N Loomis J Blascovich J ampRex C (2003) Non-zero-sum mutual gaze in immersivevirtual environments Proceedings of HCI International2003

Benford S Bowers J Fahlen L Greenhalgh C amp Snow-don D (1995) User embodiment in collaborative virtualenvironments Proceedings of CHIrsquo95 (pp 242ndash249) ACMPress

Biocca F (1997) The cyborgrsquos dilemma Progressive em-bodiment in virtual environments Journal of Computer-Mediated Communication [online] 3 Retrieved fromhttpwwwascuscorgjcmcvol3issue2biocca2html

Black M amp Yacoob Y (1997) Recognizing facial expres-sions in image sequences using local parameterized modelsof image motion International Journal of Computer Vision25(1) 23ndash48

Blanz V amp Vetter T (1999) A morphable model for thesynthesis of 3D faces SIGGRAPH rsquo99 Conference Proceed-ings 187ndash194

Blascovich J Loomis J Beall A C Swinth K R HoytC amp Bailenson J N (2002) Immersive virtual environ-ment technology as a methodological tool for social psy-chology Psychological Inquiry 13 103ndash124

Busey T A (1988) Physical and psychological representa-tions of faces Evidence from morphing Psychological Sci-ence 9 476ndash483

Byrne D (1971) The attraction paradigm New York Aca-demic Press

Cassell J (2000) Nudge nudge wink wink Elements of face-to-face conversation for embodied conversational agents InJ Cassell et al (Eds) Embodied conversational agentsCambridge MA MIT Press

Chaiken S (1979) Communicator physical attractiveness andpersuasion Journal of Personality and Social Psychology 371387ndash1397

Chartrand T L amp Bargh J (1999) The chameleon effectThe perception-behavior link and social interaction Journalof Personality amp Social Psychology 76(6) 893ndash910

Decarlo D Metaxas D amp Stone M (1998) An anthropo-metric face model using variational techniques Proceedingsof SIGGRAPH rsquo98 67ndash74

Depaulo B M amp Friedman H S (1998) Nonverbal com-munication In D T Gilbert S T Fiske amp G Lindzey

(Eds) The handbook of social psychology (4th ed Vol 2 pp3ndash40) Boston McGraw-Hill

Donato G Bartlett M S Hager J C Ekman P amp Se-jnowski T J (1999) Classifying facial actions IEEE Trans-actions on Pattern Analysis and Machine Intelligence21(10) 974ndash989

Durlach N amp Slater M (2000) Presence in shared virtualenvironments and virtual togetherness Presence Teleopera-tors and Virtual Environments 9 214ndash217

Ekman P (1978) Facial signs Facts fantasies and possibili-ties In T Sebeok (Ed) Sight sound and sense Blooming-ton IN Indiana University Press

Fodor J A (1983) The modularity of mind An essay on fac-ulty psychology Cambridge MA MIT Press

Fry R amp Smith G F (1975) The effects of feedback andeye contact on performance of a digit-encoding task Jour-nal of Social Psychology 96 145ndash146

Gale C amp Monk A F (2002) A look is worth a thousandwords Full gaze awareness in video-mediated conversationDiscourse Processes 33

Garau M Slater M Vinayagamoorhty V Brogni ASteed A amp Sasse M A (2003) The impact of avatar real-ism and eye gaze control on perceived quality of communi-cation in a shared immersive virtual environment Proceed-ings of the SIGCHI Conference on Human Factors inComputing Systems

Gibson W (1984) Neuromancer New York Ace BooksHu C Ferris R amp Turk M (2003) Active wavelet net-

works for face alignment Proceedings of the British MachineVision Conference Norwich UK

Kendon A (1977) Studies in the behavior of social interactionBloomington IN Indiana University

Kleinke C L (1986) Gaze and eye contact A research re-view Psychological Bulletin 100 78ndash100

Kraut R E Fussell S R Brennan S E amp Siegel J(2002) Understanding effects of proximity on collabora-tion Implications for technologies to support remote col-laborative work In P Hinds amp S Kiesler (Eds) Distrib-uted work Cambridge MA MIT Press

Lanier J (2001) Virtually there Scientific American April2001

Leigh J DeFanti T Johnson A Brown M Sandin D(1997) Global telemersion Better than being there Pro-ceedings of ICAT rsquo97

Loomis J M Blascovich J J amp Beall A C (1999) Im-mersive virtual environments as a basic research tool in psy-chology Behavior Research Methods Instruments and Com-puters 31(4) 557ndash564

440 PRESENCE VOLUME 13 NUMBER 4

Mania K amp Chalmers A (1998) Proceedings of theFourth International Conference on Virtual Systems andMultimedia (pp 177ndash182) Amsterdam IOS Press-Ohmsha

Milgram S (1992) The individual in a social world Essaysand experiments (2nd ed) New York McGraw-Hill

Morgan T Kriz R Howard T Dias Neves F amp Kelso J(2001) Extending the use of collaborative virtual environ-ments for instruction to Kndash12 schools Insight 1(1)

Normand V Babski C Benford S Bullock A Carion SChrysanthou Y et al (1999) The COVEN project Ex-ploring applicative technical and usage dimensions of col-laborative virtual environments Presence Teleoperators andVirtual Environments 8(2) 218ndash236

Patterson M L (1982) An arousal model of interpersonalintimacy Psychological Review 89 231ndash249

Pylyshyn Z W (1980) Computation and cognition Issuesin the foundations of cognitive science Behavioral amp BrainSciences 3 111ndash169

Reeves B amp Nass C (1996) The media equation Howpeople treat computers television and new media like realpeople and places New York Cambridge University Press

Rickel J amp Johnson W L (2000) Task-oriented collabora-tion with embodied agents in virtual worlds In J Cassell JSullivan S Prevost amp E Churchill (Eds) Embodied con-versational agents Cambridge MA MIT Press

Rutter D R (1984) Looking and seeing The role of visualcommunication in social interaction Suffolk UK JohnWiley amp Sons

Sannier G amp Thalmann M N (1998) A user friendly tex-ture-fitting methodology for virtual humans ComputerGraphics International rsquo97

Schwartz P Bricker L Campbell B Furness T InkpenK Matheson L et al (1998) Virtual playground Archi-tectures for a shared virtual world Proceedings of the ACMSymposium on Virtual Reality Software and Technology 199843ndash50

Sherwood J V (1987) Facilitative effects of gaze uponlearning Perceptual and Motor Skills 64 1275ndash1278

Simons H (1976) Persuasion Understanding practice andanalysis Reading MA Heath

Slater M Pertaub D amp Steed A (1999) Public speaking

in virtual reality Facing an audience of avatars IEEE Com-puter Graphics and Applications 19(2) 6ndash9

Slater A Sadagic M Usoh R amp Schroeder R (2000)Small group behavior in a virtual and real environment Acomparative study Presence Teleoperators and Virtual Envi-ronments 9(1) 37ndash51

Stiefelhagen R Yang J amp Waibel A (1997) Tracking eyesand monitoring eye gaze In M Turk amp Y Takabayashi(Eds) Proceedings of the Workshop on Perceptual User Inter-faces

Turing A (1950) Computing machinery and intelligenceMind 59 (236)

Turk M amp Kolsch M (in press) Perceptual interfaces InMedioni G amp Kang S B (Eds) Emerging topics in com-puter vision Boston Prentice Hall

Velichkovsky B M (1995) Communicating attention Gazeposition transfer in cooperative problem solving Pragmaticsand Cognition 3(2) 199ndash222

Vertegaal R (1999) The GAZE groupware system Mediat-ing joint attention in multiparty communication and collab-oration Proceedings of the CHI rsquo99 Conference on HumanFactors in Computing Systems The CHI is the Limit 294ndash301

Viola P amp Jones M (2001) Rapid object detection using aboosted cascade of simple features Proceedings of the IEEEConference on Computer Vision and Pattern Recognition

Wallace D F (1996) Infinite jest Boston Little BrownWilliams K Cheung K T amp Choi W (2000) Cyberostra-

cisms Effects of being ignored over the internet Journal ofPersonality and Social Psychology 79 748ndash762

Yee N (2002) Befriending ogres and wood elvesmdashUnder-standing relationship formation in MMORPGs Retrievedfrom httpwwwnickyeecomhubrelationshipshomehtml

Zajonc R B (1971) Brainwash Familiarity breeds comfortPsychology Today 3(9) 60ndash64

Zajonc R B Murphy S T amp Inglehart M (1989) Feel-ing and facial efference Implication of the vascular theoryof emotion Psychological Review 96 395ndash416

Zhang X amp Furnas G (2002) Social interactions in multi-scale CVEs Proceedings of the ACM Conference on Collabo-rative Virtual Environments 2002 (CVE 2002)

Bailenson et al 441

generated avatars to represent human interactants (asopposed to direct video feeds) may provide an ideal bal-ance between the limited information offered via audiocommunication and the problems that seem inherent tovideoconferences In most current CVE implementa-tions interactants have the opportunity to utilize twoperceptual channels audition and vision However un-like a videoconference a CVE operating system can bedesigned to render a carefully chosen subset of interac-tantsrsquo nonverbal behaviors filter or amplify that subsetof behaviors or even render nonverbal behaviors thatinteractants may not have performed

Transformed social interaction (TSI) involves noveltechniques that permit changing the nature of socialinteraction (either positively or negatively) by providingsystem designers with methods to enhance or degradeinterpersonal communication Tracking nonverbal sig-nals (eg eye gaze facial gestures body gestures) andrendering them via avatars allows for a strategic decou-pling of transmitted nonverbal signals from one interac-tant from those received by another (ie rendered) Forexample eye gaze directed from A to B can be trans-formed without Arsquos knowledge such that B experiencesthe opposite gaze aversion The idea of decoupling ren-dered behaviors from actual ones is not new (see dis-cussion in Benford Bowers Fahlen Greenhalgh ampSnowdon 1995 on truthfulness as well as LoomisBlascovich amp Beall 1999) Here we explore this strate-gic decoupling TSI can be applied to some all or nomembers of a CVE

Distorting the veridicality of communication signalscertainly raises ethical questions We do not advocatethe unconstrained use of TSI However we do believethat as CVEs become widespread decoupling renderedbehavior from actual behavior is inevitable Indeed cur-rent users of chat rooms and networked video gamesfrequently represent themselves nonveridically (Yee2002) Consequently the ethical implications of TSIwarrant serious consideration by anyone who interactsvia CVEs At the very least CVE system designersshould anticipate and try to obviate misuse ExaminingTSI now as a basic research question will increase theprobability that we can ethically use and manage CVEsin the future

The remainder of this paper is divided into three sec-

tions First we review some of the ideas and currentimplementations of CVEs nonverbal behavior trackingtechnology and the visual nonverbal behaviors in inter-action Second we provide concrete examples of TSIand discuss possible implications for conversation Fi-nally we conclude by discussing some of the ethical im-plications of TSI provide pilot data from a study inwhich participants attempted to detect TSI and pointto future directions for research

2 Nonverbal Behavior and CVEs

Social scientists have long understood that socialinteraction involves communication of both verbal andnonverbal signals The former include spoken writtenand signed language the latter include gaze gesturesand postures facial expressions touch etc as well asparalanguage cues such as variations in intonation andvoice quality If specific parallel signals were redundantin meaning across channels little need would exist formultiple channels and correspondingly little needwould exist for sophisticated telecommunication tech-nology beyond simple audio transmission

However signals often prove inconsistent acrosschannels (eg ldquoHersquos a winnerrdquo can communicate itsliteral meaning or the opposite depending on tone)Furthermore some channels appear less controllable byinteractants and hence are judged more veridical (egnonverbal channels communicating feelings or emotionsand motivation) Also signals directed toward specificinteractants convey messages to third parties For exam-ple if two interactants share mutual gaze to the exclu-sion of a third the message to the third person can leadto feelings of ostracism (Williams Cheung amp Choi2000)

Although much research on the role of nonverbalsignals in social interaction has appeared (for reviewssee Argyle 1988 Depaulo amp Friedman 1998 Patter-son 1982) for the most part investigators have had tochoose between ecological validity (ie a realistic set-ting or environment) and experimental control forcingthe sacrifice of one for the other Ecologically realisticresearch has tended to involve qualitative observationsExperimental work ideally examines social behavior in

Bailenson et al 429

the lab via strict controls over most variables sometimeseven involving confederates or imagined scenarios butwithout much in the way of external validity or general-izability CVEs promise to produce major advances inthe understanding of social interaction both dyadic andgroup by allowing much more ecological validity whilemaintaining a high level of experimental control (Blas-covich et al 2002 Loomis et al 1999)

Technology has long facilitated social interaction Forcenturies written correspondence has proven highlyeffective for communicating ideas and to a lesser ex-tent feelings The telegraph permitted more or less real-time interaction However the telephone constituted anenormous advance both because it afforded real-timeinteraction and because it allowed communication viaparalanguage cues so important for emotional exchangeMore recently videoconferencing has permitted thecommunication of some visual nonvisual (NV) cues butwith little opportunity for ldquoside-channelrdquo communica-tion among nonconversing group members (eg mean-ingful glances) and typically without allowing for mu-tual gaze among group members (Gale amp Monk 2002Lanier 2001 Vertegaal 1999) Now CVEs promise topromote more effective dyadic (ie 2-person) and n-person interactions (Zhang amp Furnas 2002 BailensonBeall amp Blascovich 2002 Slater Sadagic Usoh ampSchroeder 2000 Normand et al 1999 Leigh De-Fanti Johnson Brown amp Sandin 1997 Mania ampChalmers 1998 Schwartz et al 1998) by sensing andrendering the visual NV signals of multiple interactantsTwo approaches in this regard are (1) capturing andinterpolating 2D images from multiple video camerasand recovering the 3D models and (2) tracking ges-tures using a variety of sensors including video Theinterpolated images or rendered 3D models can then bedisplayed to each of the interactants

In addition to making remote human interaction pos-sible communication technology has important scien-tific value in terms of facilitating the assessment of thesufficiency or adequacy of transmitted verbal and non-verbal signals For example the fact that telephone con-versants feel an intimate connection indicates that audi-tory information is often adequate for personallymeaningful dyadic interaction This sense of connected-ness persists despite interactantsrsquo awareness that they are

actually talking to devices which indicates that the pro-cess of social interaction via telephone is to some extentldquocognitively impenetrablerdquo (Pylyshyn 1980) Mirrortalking provides another compelling example If a roomcontains a large mirror people often find themselvesconversing with each otherrsquos mirror or ldquovirtualrdquo imageInterestingly no discernible loss in effectiveness of theinteraction appears to occur even though each interac-tant knows that he or she is not engaging in face-to-faceinteraction with the actual person This ldquotransparencyrdquoof interaction is also observed in dyadic interaction overproperly designed videoconferencing systems (ie onesthat permit mutual eye gaze) and will be true of CVEsystems in the near future even though interactantsknow at some level that they see only digital models ofother interactants Transparency of interaction reflectedboth in interactantsrsquo experience and in the effectivenessof group performance (eg in collaborative decisionmaking) speaks to the sufficiency of the verbal and non-verbal signals and also indicates that social interaction ismediated by automatic processes that are quite separatefrom conscious cognition (Fodor 1983 Pylyshyn1980) Thus the creation of new communication mediacan provide insight into human social interaction

3 Implementations of TSI

In this section we outline three important TSIcomponents Each involves a number of theoreticalideas that warrant technical development as well as eval-uation via behavioral research The categories of TSIinclude self representations (ie avatars) sensory capabil-ities and contextual situation Each category also pro-vides researchers with powerful new tools to investigateand improve understanding of psychological processesunderlying behavior (Blascovich et al 2002 Loomis etal 1999) Specifically investigators can manipulate theunderlying structure of social interaction using TSI byaltering the operation of its individual components andthereby ldquoreverse engineeringrdquo social interaction In thispaper however the focus is to explore the theoreticalnature of TSI as its own basic research question and tospeculate on its potential implications for communica-tion via CVEs While we discuss these three categories

430 PRESENCE VOLUME 13 NUMBER 4

as separate entities clearly a system that employs TSIwould be most effective as some combination of thethree We keep them separate for the purpose of clarityin this paper

We realize that all of the necessary CVE technologymay not yet be available (see Kraut Fussell Brennan ampSiegel 2002) Furthermore in order to adequatelystudy and enable transformed social interaction in col-laborative virtual environments the technology used fortracking nonverbal signals must eventually be passiveand unobtrusive Sensors and markers that are worn onthe body can limit the naturalness of interaction bycausing participants to focus on the technology at theexpense of the interaction Computer vision technologyoffers the possibility of using passive noncontact sens-ing to locate track and model human body motionSubsequently pattern recognition and classificationtechniques can be used to recognize meaningful move-ments and gestures

In the past dozen or so years there has been a signifi-cant and increasing interest in these problems within thecomputer vision research community (Turk amp Kolsch2003 Black amp Yacoob 1997 Donato Bartlett HagerEckman amp Sejnowski 1999 Feris Hu amp Turk 2003Stiefelhagen Yang amp Waibel 1997 Viola amp Jones2001) Motivated by various application areas includingbiometrics surveillance multimedia indexing and re-trieval medical applications and human-computer in-teraction there has been significant progress in areassuch as face detection face recognition facial expressionanalysis articulated body tracking and gesture recogni-tion The state of the art in these areas is not yet to thepoint of fully supporting CVEs as many of these sys-tems tend to be slow and lack robustness in real-worldenvironments (with typical changes in lighting cloth-ing etc)

But the progress is promising and we expect to seean increased utility of these technologies to track andmodel nonverbal behaviors in order to transmit andtransform them within the context of CVEs We believethat each of the TSI implementations discussed in thecurrent work is foreseeable perhaps even in the nearfuture For the purposes of the current discussion de-tails of the specific CVE implementation are not criticalTSI should be effective in projection-based CVEs head-

mounted display CVEs CAVEs or in certain types ofaugmented-reality CVEs

A concrete example of a typical CVE interaction helpsdescribe the specific types of transformations GenerallyTSI should enable interactants to communicate moreeffectively by providing them with more information aswell as providing them (or systems designers) with morecontrol in directing their nonverbal behaviors The lat-ter suggests on a more cynical note that the peoplewho may profit most from TSI may be those who enterinteractions with specific goals for example changingthe attitudes of the other interactants (Slater Pertaubamp Steed 1999) In the subsequent sections we describean interaction with a leader and one or more commu-nity members evaluating a proposal in a CVE Howeverone could just as easily substitute leader with politicianteacher lawyer leader or missionary and substitutecommunity members with voters students jurorsmembers or atheists Hence the theoretical parametersand implications of TSI have applications across manydifferent contexts

31 Transforming Self Representations

In CVEs avatars representing interactants canbear varying degrees of photographic or anthropomor-phic (Garau et al 2003 Bailenson Beall BlascovichRaimmundo amp Weisbuch 2001 Sannier amp Thalmann1998) behavioral (Bailenson et al 2002 Cassell 2000Biocca 1997) and even dispositional resemblance tointeractants they represent Assuming that interactants(by their own design or through the actions of systemsoperators) have the freedom to vary both the photo-graphic and behavioral similarity of their avatar to them-selves a number of subtle but potentially drastic (interms of outcomes of CVE interactions) transformationscan occur

In many instances similarity breeds attraction (Byrne1971) We know that people treat avatars that look likethemselves more intimately than avatars that look likeothers as indicated by invasion of their personal spaceand willingness to perform embarrassing acts in front ofthem and by how attractive and likable they believe theavatars to be (Bailenson Blascovich Beall amp Guadagno2004 Bailenson et al 2001) Given this special rela-

Bailenson et al 431

tionship a CVE interactant may use this principle to anadvantage Consider the situation in which a leader anda community member are negotiating via a CVE A par-ticularly devious leader can represent herself by incorpo-rating characteristics of the memberrsquos representation Bymaking herself appear more similar to the member theleader becomes substantially more persuasive (Chaiken1979 Simons 1976) Indeed a leader would be able toadjust the structural or textural similarity of her ownavatar idiosyncratically to the members in her audience

This similarity could be achieved in various mannersemploying any of a number of techniques to parametri-cally vary the similarity of computer-generated modelsvia 2D and 3D morphing techniques (Blanz amp Vetter1999 Busey 1998 Decarlo Metaxas amp Stone 1998)The leader could be represented as some kind of a hy-brid maintaining some percentage of her original facialstructure and texture but also incorporating percent-ages of the memberrsquos structure and texture Alterna-tively the leader could be represented completelyveridically to her facial structure but for a few framesper second could replace her own head with the head ofthe member Priming familiarity with limited exposureto human faces has proven to be effective with 2D im-ages (Zajonc 1971) Finally consider the situation inwhich the leader is interacting via CVE with two mem-bers The leader can be differentially represented toboth members simultaneously such that each membersees a different hybrid leader avatar incorporating as-pects of each member In other words the leader doesnot need a consistent representation across interactantsbecause the CVE operator is free to render differentleader avatars to each member

Incorporating the self-identity of other interactantscan also occur via behavioral characteristics Psychologi-cal research has demonstrated that when an experi-menter subtly mimics experimental participants (egleans in the same direction as they do crosses her legswhen the participants do) participants subsequentlyreport that they liked the experimenter more andsmoother conversation flowed (Chartrand amp Bargh1999) This ldquochameleon effectrdquo could be extremely ef-fective in CVEs The leader (or the system operator) canuse algorithms to detect motions of the other interac-tants at varying levels of detail and coordinate the ani-

mations of her avatar to be a blended combination ofher own and those of the others

Consider a CVE interaction consisting of a leader andtwo members In the course of this interaction patternsof nonverbal behaviors will emerge and statistics basedon a running tabulation can be automatically collectedvia CVE technology In other words if there is a certainrate of head nodding exhibited by person A and anotherrate exhibited by person B the leaderrsquos head can bemade to nod in a way consistent with the statistics (egan average or median) Alternatively the leaderrsquos avatarcan just mimic each interactant individually and renderthose particular movements only to each correspondinginteractant

The leader could also morph her representation withthat of an unrepresented party not present in the CVEbut who is previously known to possess qualities thatinspire certain reactions Depending on the context forexample the leader can morph a percentage of famouspoliticians historical figures or even pop stars into heravatar This feature blending can be explicit and blatant(eg the leader looks just like an expert or a religiousfigure) or more implicit and subterranean (eg theleader incorporates subtle features such as cheekbonesand hairstyle) Alternatively the leader can morph her-self with a person who may not be famous but withwhom the member maintains a deep trust (Gibson1984)

A second form of avatar transformation arises fromthe ability to selectively decouple and reconstruct ren-dered behavior in CVEs In other words not only caninteractants render nonverbal behaviors different fromthe nonverbal behaviors that they actually perform butsimilarly to the discussion above they can render thosebehaviors idiosyncratically for each of the other interac-tants

Consider what we term Non-Zero-Sum-Mutual-Gaze(NZSMG) Ordinary mutual gaze occurs when individ-uals look at one anotherrsquos eyes during discourse Inface-to-face conversation mutual gaze is zero-sum Inother words if interactant A maintains eye contact withinteractant B for 70 percent of the time it is not possi-ble for A to maintain eye contact with interactant C formore than 30 percent of the time However interactionin CVEs is not bound by this constraint With digital

432 PRESENCE VOLUME 13 NUMBER 4

avatars A can be made to appear to maintain mutualgaze with both B and C for a majority of the conversa-tion

Gaze is one of the most thoroughly studied nonverbalgestures in research on social interaction (Rutter 1984Kleinke 1986 Kendon 1977) Direct eye gaze canprovide cues for intimacy agreement and interest(Arygle 1988) Furthermore gaze can enhance learningduring instruction as well as memory for information(Fry amp Smith 1975 Sherwood 1987) The advantageof using CVEs is that normal nonverbal behaviors ofinteractants can be augmented via NZSMG Further-more the interactants in a CVE can either be unawareof this transformation (ie implicit NZSMG) or awareof this transformation (ie explicit NZSMG) as Figure1 demonstrates Preliminary work studying implicitNZSMG has demonstrated that interactants are notaware of the decoupling from actual behavior Further-more the interactants respond to the artificial gaze as ifit were actual gaze (Beall Bailenson Loomis Blasco-vich amp Rex 2003) This method may prove to be mosteffective during distance learning in educational CVEs(Morgan Kriz Howard das Neves amp Kelso 2001) inwhich the instructor uses her augmented gaze as a toolto keep the students more engaged

Decoupling can also be used to achieve the oppositeeffect Consider the situation where the leader wants toscrutinize the nonverbal behaviors of member A butdoes not want the member to feel uncomfortable fromher unwavering gaze The leader can render herselflooking at her shoes or perhaps at member B in the

CVE while in reality she is watching member Arsquos everymove

In order for such a system to be effective there mustbe a convincing algorithm to drive the autonomous eyegaze In other words if the leader wants the freedom toemploy NZSMG or to wander around the CVE scruti-nizing different aspects of the conversation undetectedshe (by her own device or assisted by the systems opera-tor) must maintain the illusion that her avatar is exhibit-ing the typical and appropriate nonverbal gesturesThere are a number of ways to achieve this The first issome type of artificial intelligence algorithm that ap-proximates appropriate gestures of the leaderrsquos avatar bymonitoring the gestures and speech by the other inter-actants While there have been significant advances inthis regard (Cassell 2000) the ability of an algorithmto process natural language as well as generate believ-able responses may still be many years off A more likelymethod for achieving this goal would be to use actualhumans instead of AI algorithms In this scenario theleader employs one or more nonverbal ldquocyranoidsrdquo(Milgram 1992) to augment the nonverbal behaviorspresented to each individual member To do so theleader solicits the help of several assistants each of whosejob is to provide the nonverbal behaviors targeted to-ward a particular member See Figure 2

In this many-to-many ldquoWizard of Ozrdquo implementa-tion each member is presented a unified Leader who isrendered privately to her this private representationwould be a melding of the actual leader and one of herassistants so that when the leaderrsquos attention was di-

Figure 1 Internal belief states from implicit NZMG (left) and explicit NZMG (right)

Bailenson et al 433

verted away from that member for long periods of timethe assistant could step in and help maintain a believableinteraction by seamlessly serving as the leaderrsquos proxyThe leader herself would then act as a conductor over-seeing all the interactions yet being free to focus herattention on individual members when she so desires Inaddition the leader is free to wander about the digitalspace consult her notes take a rest or conduct a side-bar meeting with another person However because heravatar is partially cyranic it can continue to exhibit theappropriate nonverbal behaviors all the while to eachmember Furthermore having a number of assistantswhose sole focus is to respond with appropriate nonver-bal gestures to each of the interactants in the CVEshould maximize the membersrsquo involvement or sense ofpresence in the CVE For important meetings seminarsor presentations conducted via CVEs individual interac-tants may want to utilize a number of assistants as a corepresentation team

32 Transforming Sensory Capabilities

Interactants can be assisted by technology thattakes advantage of CVEs that can keep precise runningtabs of certain types of behaviors and then display sum-

maries of those behaviors exclusively to individual inter-actants For example consider an educational CVE inwhich an instructor wants to ensure that she is directingher nonverbal behaviors in a desired fashion Such asinstructor may want to monitor her mutual gaze to en-sure that she is not looking at any one student morethan others during a presentation The tracking equip-ment used to render the scene can keep an online totalof the amount of time the instructor gazed at each indi-vidual student The CVE can render a display of thisgaze meter as well as use visual or auditory alerts toinform the instructor of disproportionate applications ofgaze

Furthermore interactants can use the tracking datasummaries to learn more about the attitudes of the oth-ers Nonverbal gestures are often correlates of specificmental states (Ekman 1978 Zajonc Murphy amp Ingle-hart 1989) For example in general we nod when weagree smile when we are pleased tilt our heads whenwe are confused and look at something in which we areinterested Interactants will be able to tailor their CVEsystems to keep track of nonverbal behaviors with thegoal of aiding interactants to infer the mental states ofthe other interactants For example a teacher will beable to gauge the percentage of students exhibitingnonverbal behaviors that suggest confusion or not un-derstanding a point in a lesson Similarly a leader coulddetermine who in a room full of members is respondingmost positively to her behavior Intuitively tabulatingand assessing the nonverbal behaviors of others is cer-tainly something that humans do constantly in face-to-face interactions With CVEs interactants will be able totabulate these behaviors with greater precision Interac-tants can use the objective tabulations from the trackingdata to augment their normal intuitions about the ges-tures occurring in the interaction

Another transformation involves filtering or degrad-ing certain signals or nonverbal behaviors There aresome visual nonverbal behaviors that tend to distractinteractants Using filtering algorithms interactants canprevent counterproductive distractions in a number ofways For example consider the situation in which aspeaker in a CVE taps her pen rapidly as she speaks Inface-to-face meetings this type of behavior can distract

Figure 2 A depiction of cyranoids On the top row are three

nonrendered gesturers Each member on the bottom hears the

leaderrsquos actual verbal behaviors (dashed lines) However each

member views the nonverbal behavior of her dedicated gesturer

rendered onto the avatar of the leader (unbroken lines)

434 PRESENCE VOLUME 13 NUMBER 4

interactants Using a CVE this type of behavior can befiltered in two ways First the speaker can filter the be-havior on the transmitting end If people know thatthey have difficulty suppressing certain nonverbal behav-iors that tend to be perceived in a negative mannersuch as a nervous tick they can activate a filter that pre-vents the behavior from being rendered Similarly incertain situations a CVE interactant may not want torender certain nonverbal behaviors Consider the leaderexample The potential member may benefit from ren-dering her ldquopoker facerdquo that is not demonstrating anyenthusiasm or disappointment via facial expressionsConsequently the member may accrue strategic advan-tage during a negotiation Furthermore interactants canfilter behaviors on the receiving end If a speakerrsquos handmotions are distracting then a listener can simplychoose to not render that interactantrsquos hand move-ments

Another example of transforming sensory capabilitiesis producing a visual indicator regarding where eachinteractantrsquos attention currently lies as revealed by their

eye direction (Velichkovsky 1995) We have explored atechnique that involves rendering each personrsquos viewfrustrum to indicate the field of view as Figure 3 illus-trates In this example the wire frame frustrums spot-light the 3D space visible to each person This featurecolor coded for each person may be especially helpfulto teachers in a distance learning CVE who could usesuch information to see where students are focusingtheir visual attention without having to look directly atthe studentsrsquo eyes

There are a number of similar tools (ie specific ob-jects rendered only to particular interactants) that canassist interactants in a CVE For example in ourNZSMG studies an experimenter enters a CVE and at-tempts to persuade other interactants regarding a cer-tain topic (Beall et al 2003) In those interactions werender the interactantsrsquo names over their heads on float-ing billboards for the experimenter to read In this man-ner the experimenter can refer to people by name moreeasily There are many other ways to use these floatingbillboards to assist interactants for example reminders

Figure 3 View frustrums marking the field of view of interactants

Bailenson et al 435

about the interactantrsquos preferences or personality (egldquodoesnrsquot respond well to prolonged mutual gazerdquo)

One of the most useful forms of transforming sensorycapabilities may be to enlist one or more human con-sultants who are rendered to only one member in aCVE (ie virtual ghosts) Unlike a face-to-face interac-tion a CVE will enable an interactant to have informedhuman consultants who are free to wander around thevirtual meeting space to scrutinize the actions of otherinteractants to conduct online research and sidebarmeetings in order to provide key interactants with addi-tional information and to generally provide support forthe interactants For example the leader can have herresearch team actually rendered beside her in the CVEMembers of her team can point out actions by potentialmembers suggest new strategies and even provide real-time criticism and feedback concerning the behavior ofthe leader without any of the other members havingeven a hint of awareness concerning the human consul-tantsrsquo presence Alternatively the leader herself can gointo ldquoghost moderdquo and explore the virtual world withher team while her avatar remains seated and is evencontrolled by yet another member of her team

33 Transforming the Situation

In addition to transforming their representationand sensory capabilities CVE interactants can also usealgorithms to transform their general spatial or temporalsituations In a CVE people generally adopt a spatiallycoherent situational context across all remote interac-tants that brings everyone together in the shared spaceHowever there is no reason that the details and ar-rangements of that virtual space need to be constant forall the interactants in the CVE Consider the situationfor three interactants Interactant A may choose to forman isosceles triangle with the other two while both in-teractants B and C may choose to form equilateral trian-gles Interactant A may even choose to flip the locationsof B and C In this scenario the CVE operating systemcan preserve the intended eye gaze direction by trans-forming the amplitudes or direction of head and eyemovements in a prescribed manner While this is asomewhat simple example with as many as four interac-tants it is straightforward to design spatial transforma-

tions that allow the intended eye and head gaze cues toremain intact across all interactants While eventuallysuch discordance may cause the quality and smoothnessof the interaction to suffer there are a number of waysthat transforming the situation can assist individual in-teractants

One such transformation involves multilateral per-spectives In a normal conversation each interactant hasa unique and privileged perspective That perspective isa combination of her sensory input (eg visual andacoustic fields of view) and internal beliefs about theinteraction In normal face-to-face interactions peoplecontinually use sensory input to update and adjust theirinternal beliefs (Kendon 1977) Interactants in a CVEwill possess a completely new mechanism to adjust andupdate internal beliefs A personrsquos viewpoint can bemultilateral as opposed to unilateral (normal) Inother words in a real-time conversation interactant Acan take the viewpoint of interactant B and perceiveherself as she performs various verbal and nonverbal ges-tures during the interaction In this manner she canacquire invaluable sensory information pertaining to theinteraction and update her internal beliefs concerningthe interaction in ways not possible without the CVE

Consequently interactants in educational and persua-sive interactions may be able to improve performancebecause seeing oneself through the eyes of another mayallow one to develop a more informed set of internalbeliefs about others (Baumeister 1998) Furthermoreit may be the case that being able to experience an inter-action through someone elsersquos eyes should reinforce thefact that one is indeed copresent in the CVE (egDurlach amp Slater 2000) Finally utilizing mulitlateralperspectives may assist students in distance learningCVEs in terms of training transfer effects (Rickel ampJohnson 2000) that might occur after an interactantwho has been trained in multilateral perspective takingperforms similar group tasks in nonmediated situations

A second situational transformation involves partiallyrecording the interaction and adjusting temporal prop-erties or sequences in real time Similar to commercialproducts sold for digitally recording and playing backbroadcast television interactants in a CVE should beable to accelerate and decelerate the perceived flow oftime during the mediated interaction Consider the fol-

436 PRESENCE VOLUME 13 NUMBER 4

lowing situation The student in a distance learningCVE does not understand an example that the instruc-tor provides The student can ldquorewindrdquo the recordedinteraction go back to the beginning of the confusingexample and then play back the example Once the stu-dent has understood the confusing example she canthen turn up the rate of playback (eg watch the se-quence at 2X speed) and eventually she can catch upto the instructor again By slowing down the renderedflow of time or speeding it up the interactant can focusdifferentially on particular topics and can review thesame scene from different points of view without miss-ing the remainder of the interaction Of course doingso will result in costs to that interactantrsquos contributionto the CVE in terms of interactivity (ie what does heravatar do while she rewinds) Consequently the disrup-tion of the temporal sequence will necessarily be cou-pled with some kind of an avatar autopilot

Changing the rate of time in a CVE brings up an-other interesting transformation Traditionally CVEsare roughly defined as ldquogeographically separated interac-tantsrdquo interacting over some kind of a computer-mediated network in a shared environment Howeverby combining some of the concepts discussed in previ-ous sections it may be possible to include in the defini-tion of a CVE ldquotemporally separated interactantsrdquo in ashared environment Consider a videoconference of abusiness meeting Oftentimes interested parties whocannot attend the meetings will later review a videotapeof the meeting In a CVE the temporally absent mem-ber has an option to more deeply involve herself in theinteraction Specifically she can situate her avatar in aspecific place in the CVEs seating arrangement and usean autopilot to give her representation rudimentarynonverbal behaviors Furthermore the absent membercan program her avatar to perform simple interactivetasksmdashprerecorded introductions answers to certainquestions about the CVE topic or perhaps more realis-tically for the near-term direct the avatar to play back arecorded performance Then the CVE interaction canproceed in real time with the temporally absent mem-berrsquos avatar approximating the types of behaviors thatshe would do and say As a result temporally presentmembers would actually direct pieces of the conversa-tion towards the absent member as well as transmit

nonverbal gestures towards her Later on instead of justreviewing the recording the temporally absent membercan take her place in the CVE and actually feel presentin the dialogue receiving appropriate nonverbal behav-iors and maximizing the degree of copresence More-over the members of the CVE who were present at thescheduled time can program their avatars during thereplay of the interaction to respond to any post hocquestions that the absent member might have In thisway the degree of interactivity during the replay can beincreased and perhaps at some point in the not-too-distant future the line between real-time and non-real-time interactions will become interestingly blurred

4 Implications of TSI and ResearchDirections

For better or for worse TSI implemented throughCVEs has great potential to change the nature of medi-ated interaction The strategic decoupling of renderedbehavior from actual behavior allows interactants tobreak many constraints that are inherent in face-to-faceinteraction as well as other forms of mediated interac-tions such as telephone and videophone conferencesThe effects of TSI remain to be seen Assuming thatimplementation of the TSI techniques are technicallyfeasible and that using TSI implementations is concep-tually workable for the interactants (both of which aresubstantial assumptions) one could predict a number ofconsequences First TSI may develop into a worthwhiletool that assists interactants in overcoming the inade-quacies of communicating from remote locations Byaugmenting their representational sensory and situa-tional characteristics interactants of CVEs may be ableto achieve levels of interaction that actually surpass face-to-face interaction

On the other hand people in fact may find the useof these transformations extremely unsettling Thereis the potential for the difference between TSI andcurrent CVE implementations to be as drastic as dif-ferences between email and the written letter As thistechnology is developed it is essential to examinepeoplersquos responses to this new medium (ie Reevesamp Nass 1996) It is essential to examine these impor-

Bailenson et al 437

tant potential implications of TSI before the technol-ogy becomes widespread

Along the same lines the threat of TSI may be thevery downfall of CVE interaction In face-to-face inter-action there tends to be some degree of deception forexample people using facial expressions to mask theiremotions Clearly this deception has the potential to bemuch greater with TSI If interactants have no faith thattheir perceptual experience is genuine they may havelittle reason to ever enter a CVE A complete lack oftrust in the truthfulness of gestures one-to-one corre-spondence of avatars and temporal presence of interac-tants has the potential to rob the CVE of one of itsgreatest strengths namely interactivity since the inter-actants may not know who what or when they are in-teracting with others Similarly given an expectation ofTSI interactants may be constantly suspicious duringinteractions this lack of trust of fellow interactants maylead to unproductive collaborations

A solution to this breakdown may require the devel-opment of TSI detectors for interactants either basedon computer algorithms that analyze nonverbal behav-iors or based on actual humans that scrutinize the inter-action To examine the possibility of using human TSIdetectors we now discuss what we call the non-verbalTuring Test (NVTT)

In the popular reinterpretation of the Turing Test(Turing 1950) a judge attempts to determine whichof two players (one human one machine) is a fellowhuman In our NVTT pilot studies experimental par-ticipants acting as judges enter a CVE with two vir-tual humans one avatar whose head movements areveridical and playing back the movements of anotherhuman in real time (ie without TSI) and one avatarwhose head movements are actually a transformationof the judgersquos own head movements (ie time-delayed and reduced motion range) The judge seesthe head movements from a real person on one avatarand some sort of self-mimicked movements on theother During the interaction only head movementsare permitted (ie no verbal communication al-lowed) and participants must devise ways to engageand test the two virtual humans through nonverbalmeans in order to ascertain which is human andwhich is a mimicker Figure 4 illustrates

In the current initial pilot study we manipulatedthree independent variables test trial length (either 16or 32 seconds) mimic delay (ie the computer-agentmimics either 1 2 4 or 8 seconds after the judgersquosmovements) and range of motion (high pitch yaw androll or low yaw only)

Participants in this study wore head-mounted displayswhile a render computer tracked their head orientationtracking rendering and networking latencies were alllow enough to impart a compelling sense of copresence(see Bailenson et al 2002 for detailed descriptions ofthe hardware and software used) Participants were in-structed to sit in a virtual room with two virtual peoplea human agent (ie a representation whose movementsare controlled by a real person in another room) and acomputer agent (ie a computer program that is de-signed to mimic the userrsquos movements in some way)Participants were instructed to interact with the twoother virtual people using head movements in order todetermine which one is the human agent Participantswere run in groups of two with each one acting as thehuman agent for the other Each participant sat at a vir-tual conference table with two virtual humans (similarto that shown in Figure 3) Each participant received arandom order of 32 test trials (two instances of the 16conditions resulting from the crossing of the three inde-

Figure 4 A schematic of the NVTT The human judge is forced to

determine which of the two avatars exhibiting head movements is the

real human and which is the computer-generated human mimicking

the judgersquos own movements

438 PRESENCE VOLUME 13 NUMBER 4

pendent variables) Forty-one undergraduates partici-pated in this study

For the purposes of brevity we focus on two resultsin particular First despite the fact that we explicitly toldparticipants that the computer agent was directly mim-icking them they performed surprisingly poorly whenattempting to identify the human avatar The overallaverage score was only 66 correct (SD 10chance 50 maximum score 100) Moreover ofthe 41 participants in the study more than one fourthwas not reliably different from chance (ie less than 3SEM from 50 between 44 and 56) Second as Fig-ure 5 demonstrates participantsrsquo scores diminished in-versely with the magnitude of the mimic delay in thatthere was a linear trend in the logarithm of the delayvariable F(132) 885 p 01 When the delay wasgreater than 1 second participants had more difficultyidentifying a mimicker

These data are particularly striking in that we had ini-tially predicted that participants would be able to recog-nize their own head movements much more easily whenaware that one avatar was designed to mimic them Inthis study we explicitly told participants about the useof TSI and they still had great difficulty in detecting itThe effects of implicit TSI (that is not disclosed) canonly have a higher impact While this pilot study is ex-tremely simple and only scratches the surface of a para-digm that examines TSI it is still noteworthy that par-ticipants did not detect the mimicker across the board

We are currently exploring other factors underlyingthe discrimination of human nonverbal behavior from

computer-generated behaviors In future studies wewill use NVTTs to study other nonverbal behaviorssuch as facial gestures eye-head gestures (pointingindications by either system) hand gestures and in-terpersonal distance We have shown that in albeitsimple scenarios it is possible to pass the NVTT for apercentage of our test population using TSI We areconfident that as this percentage grows in the nearfuture important scientific and sociological discover-ies will surface along the way

In conclusion there are many reasons one mightwant to avoid TSI these reasons range from Orwellianconcerns to the fear of rendering CVEs (perhaps eventhe telephone) functionally useless We are not advo-cates of TSI as a means to replace normal communica-tion nor are we staunch believers in avoiding TSI inorder to preserve the natural order of communicationand conversation However we do acknowledge thefact that as CVEs become more prevalent the strategicdecoupling of representation from behavior is inevita-ble For that reason alone the notion of TSI warrantsconsiderable attention

Acknowledgments

The authors would like to thank Robin Gilmour and Christo-pher Rex for helpful suggestions Furthermore we thankChristopher Rex and Ryan Jaeger for assistance in collectingdata This research was sponsored in part by NSF Award SBE-9873432 and in part by NSF ITR Award IIS 0205740

References

Argyle M (1988) Bodily communication (2nd ed) LondonMethuen

Bailenson J N Beall A C amp Blascovich J (2002) Mutualgaze and task performance in shared virtual environmentsJournal of Visualization and Computer Animation 13 1ndash8

Bailenson J N Beall A C Blascovich J Raimmundo Ramp Weisbuch M (2001) Intelligent agents who wear yourface Usersrsquo reactions to the virtual self Lecture Notes inArtificial Intelligence 2190 86ndash99

Bailenson J N Blascovich J Beall A C amp Guadagno

Figure 5 Percent correct by mimic delay in seconds This data

excludes subjects at chance performance

Bailenson et al 439

R E (submitted) Self representations in immersive virtualenvironments

Baumeister R F (1998) The self In D T Gilbert S TFiske amp G Lindzey (Eds) Handbook of social psychology(4th ed pp 680ndash740) New York McGraw-Hill

Beall A C Bailenson J N Loomis J Blascovich J ampRex C (2003) Non-zero-sum mutual gaze in immersivevirtual environments Proceedings of HCI International2003

Benford S Bowers J Fahlen L Greenhalgh C amp Snow-don D (1995) User embodiment in collaborative virtualenvironments Proceedings of CHIrsquo95 (pp 242ndash249) ACMPress

Biocca F (1997) The cyborgrsquos dilemma Progressive em-bodiment in virtual environments Journal of Computer-Mediated Communication [online] 3 Retrieved fromhttpwwwascuscorgjcmcvol3issue2biocca2html

Black M amp Yacoob Y (1997) Recognizing facial expres-sions in image sequences using local parameterized modelsof image motion International Journal of Computer Vision25(1) 23ndash48

Blanz V amp Vetter T (1999) A morphable model for thesynthesis of 3D faces SIGGRAPH rsquo99 Conference Proceed-ings 187ndash194

Blascovich J Loomis J Beall A C Swinth K R HoytC amp Bailenson J N (2002) Immersive virtual environ-ment technology as a methodological tool for social psy-chology Psychological Inquiry 13 103ndash124

Busey T A (1988) Physical and psychological representa-tions of faces Evidence from morphing Psychological Sci-ence 9 476ndash483

Byrne D (1971) The attraction paradigm New York Aca-demic Press

Cassell J (2000) Nudge nudge wink wink Elements of face-to-face conversation for embodied conversational agents InJ Cassell et al (Eds) Embodied conversational agentsCambridge MA MIT Press

Chaiken S (1979) Communicator physical attractiveness andpersuasion Journal of Personality and Social Psychology 371387ndash1397

Chartrand T L amp Bargh J (1999) The chameleon effectThe perception-behavior link and social interaction Journalof Personality amp Social Psychology 76(6) 893ndash910

Decarlo D Metaxas D amp Stone M (1998) An anthropo-metric face model using variational techniques Proceedingsof SIGGRAPH rsquo98 67ndash74

Depaulo B M amp Friedman H S (1998) Nonverbal com-munication In D T Gilbert S T Fiske amp G Lindzey

(Eds) The handbook of social psychology (4th ed Vol 2 pp3ndash40) Boston McGraw-Hill

Donato G Bartlett M S Hager J C Ekman P amp Se-jnowski T J (1999) Classifying facial actions IEEE Trans-actions on Pattern Analysis and Machine Intelligence21(10) 974ndash989

Durlach N amp Slater M (2000) Presence in shared virtualenvironments and virtual togetherness Presence Teleopera-tors and Virtual Environments 9 214ndash217

Ekman P (1978) Facial signs Facts fantasies and possibili-ties In T Sebeok (Ed) Sight sound and sense Blooming-ton IN Indiana University Press

Fodor J A (1983) The modularity of mind An essay on fac-ulty psychology Cambridge MA MIT Press

Fry R amp Smith G F (1975) The effects of feedback andeye contact on performance of a digit-encoding task Jour-nal of Social Psychology 96 145ndash146

Gale C amp Monk A F (2002) A look is worth a thousandwords Full gaze awareness in video-mediated conversationDiscourse Processes 33

Garau M Slater M Vinayagamoorhty V Brogni ASteed A amp Sasse M A (2003) The impact of avatar real-ism and eye gaze control on perceived quality of communi-cation in a shared immersive virtual environment Proceed-ings of the SIGCHI Conference on Human Factors inComputing Systems

Gibson W (1984) Neuromancer New York Ace BooksHu C Ferris R amp Turk M (2003) Active wavelet net-

works for face alignment Proceedings of the British MachineVision Conference Norwich UK

Kendon A (1977) Studies in the behavior of social interactionBloomington IN Indiana University

Kleinke C L (1986) Gaze and eye contact A research re-view Psychological Bulletin 100 78ndash100

Kraut R E Fussell S R Brennan S E amp Siegel J(2002) Understanding effects of proximity on collabora-tion Implications for technologies to support remote col-laborative work In P Hinds amp S Kiesler (Eds) Distrib-uted work Cambridge MA MIT Press

Lanier J (2001) Virtually there Scientific American April2001

Leigh J DeFanti T Johnson A Brown M Sandin D(1997) Global telemersion Better than being there Pro-ceedings of ICAT rsquo97

Loomis J M Blascovich J J amp Beall A C (1999) Im-mersive virtual environments as a basic research tool in psy-chology Behavior Research Methods Instruments and Com-puters 31(4) 557ndash564

440 PRESENCE VOLUME 13 NUMBER 4

Mania K amp Chalmers A (1998) Proceedings of theFourth International Conference on Virtual Systems andMultimedia (pp 177ndash182) Amsterdam IOS Press-Ohmsha

Milgram S (1992) The individual in a social world Essaysand experiments (2nd ed) New York McGraw-Hill

Morgan T Kriz R Howard T Dias Neves F amp Kelso J(2001) Extending the use of collaborative virtual environ-ments for instruction to Kndash12 schools Insight 1(1)

Normand V Babski C Benford S Bullock A Carion SChrysanthou Y et al (1999) The COVEN project Ex-ploring applicative technical and usage dimensions of col-laborative virtual environments Presence Teleoperators andVirtual Environments 8(2) 218ndash236

Patterson M L (1982) An arousal model of interpersonalintimacy Psychological Review 89 231ndash249

Pylyshyn Z W (1980) Computation and cognition Issuesin the foundations of cognitive science Behavioral amp BrainSciences 3 111ndash169

Reeves B amp Nass C (1996) The media equation Howpeople treat computers television and new media like realpeople and places New York Cambridge University Press

Rickel J amp Johnson W L (2000) Task-oriented collabora-tion with embodied agents in virtual worlds In J Cassell JSullivan S Prevost amp E Churchill (Eds) Embodied con-versational agents Cambridge MA MIT Press

Rutter D R (1984) Looking and seeing The role of visualcommunication in social interaction Suffolk UK JohnWiley amp Sons

Sannier G amp Thalmann M N (1998) A user friendly tex-ture-fitting methodology for virtual humans ComputerGraphics International rsquo97

Schwartz P Bricker L Campbell B Furness T InkpenK Matheson L et al (1998) Virtual playground Archi-tectures for a shared virtual world Proceedings of the ACMSymposium on Virtual Reality Software and Technology 199843ndash50

Sherwood J V (1987) Facilitative effects of gaze uponlearning Perceptual and Motor Skills 64 1275ndash1278

Simons H (1976) Persuasion Understanding practice andanalysis Reading MA Heath

Slater M Pertaub D amp Steed A (1999) Public speaking

in virtual reality Facing an audience of avatars IEEE Com-puter Graphics and Applications 19(2) 6ndash9

Slater A Sadagic M Usoh R amp Schroeder R (2000)Small group behavior in a virtual and real environment Acomparative study Presence Teleoperators and Virtual Envi-ronments 9(1) 37ndash51

Stiefelhagen R Yang J amp Waibel A (1997) Tracking eyesand monitoring eye gaze In M Turk amp Y Takabayashi(Eds) Proceedings of the Workshop on Perceptual User Inter-faces

Turing A (1950) Computing machinery and intelligenceMind 59 (236)

Turk M amp Kolsch M (in press) Perceptual interfaces InMedioni G amp Kang S B (Eds) Emerging topics in com-puter vision Boston Prentice Hall

Velichkovsky B M (1995) Communicating attention Gazeposition transfer in cooperative problem solving Pragmaticsand Cognition 3(2) 199ndash222

Vertegaal R (1999) The GAZE groupware system Mediat-ing joint attention in multiparty communication and collab-oration Proceedings of the CHI rsquo99 Conference on HumanFactors in Computing Systems The CHI is the Limit 294ndash301

Viola P amp Jones M (2001) Rapid object detection using aboosted cascade of simple features Proceedings of the IEEEConference on Computer Vision and Pattern Recognition

Wallace D F (1996) Infinite jest Boston Little BrownWilliams K Cheung K T amp Choi W (2000) Cyberostra-

cisms Effects of being ignored over the internet Journal ofPersonality and Social Psychology 79 748ndash762

Yee N (2002) Befriending ogres and wood elvesmdashUnder-standing relationship formation in MMORPGs Retrievedfrom httpwwwnickyeecomhubrelationshipshomehtml

Zajonc R B (1971) Brainwash Familiarity breeds comfortPsychology Today 3(9) 60ndash64

Zajonc R B Murphy S T amp Inglehart M (1989) Feel-ing and facial efference Implication of the vascular theoryof emotion Psychological Review 96 395ndash416

Zhang X amp Furnas G (2002) Social interactions in multi-scale CVEs Proceedings of the ACM Conference on Collabo-rative Virtual Environments 2002 (CVE 2002)

Bailenson et al 441

the lab via strict controls over most variables sometimeseven involving confederates or imagined scenarios butwithout much in the way of external validity or general-izability CVEs promise to produce major advances inthe understanding of social interaction both dyadic andgroup by allowing much more ecological validity whilemaintaining a high level of experimental control (Blas-covich et al 2002 Loomis et al 1999)

Technology has long facilitated social interaction Forcenturies written correspondence has proven highlyeffective for communicating ideas and to a lesser ex-tent feelings The telegraph permitted more or less real-time interaction However the telephone constituted anenormous advance both because it afforded real-timeinteraction and because it allowed communication viaparalanguage cues so important for emotional exchangeMore recently videoconferencing has permitted thecommunication of some visual nonvisual (NV) cues butwith little opportunity for ldquoside-channelrdquo communica-tion among nonconversing group members (eg mean-ingful glances) and typically without allowing for mu-tual gaze among group members (Gale amp Monk 2002Lanier 2001 Vertegaal 1999) Now CVEs promise topromote more effective dyadic (ie 2-person) and n-person interactions (Zhang amp Furnas 2002 BailensonBeall amp Blascovich 2002 Slater Sadagic Usoh ampSchroeder 2000 Normand et al 1999 Leigh De-Fanti Johnson Brown amp Sandin 1997 Mania ampChalmers 1998 Schwartz et al 1998) by sensing andrendering the visual NV signals of multiple interactantsTwo approaches in this regard are (1) capturing andinterpolating 2D images from multiple video camerasand recovering the 3D models and (2) tracking ges-tures using a variety of sensors including video Theinterpolated images or rendered 3D models can then bedisplayed to each of the interactants

In addition to making remote human interaction pos-sible communication technology has important scien-tific value in terms of facilitating the assessment of thesufficiency or adequacy of transmitted verbal and non-verbal signals For example the fact that telephone con-versants feel an intimate connection indicates that audi-tory information is often adequate for personallymeaningful dyadic interaction This sense of connected-ness persists despite interactantsrsquo awareness that they are

actually talking to devices which indicates that the pro-cess of social interaction via telephone is to some extentldquocognitively impenetrablerdquo (Pylyshyn 1980) Mirrortalking provides another compelling example If a roomcontains a large mirror people often find themselvesconversing with each otherrsquos mirror or ldquovirtualrdquo imageInterestingly no discernible loss in effectiveness of theinteraction appears to occur even though each interac-tant knows that he or she is not engaging in face-to-faceinteraction with the actual person This ldquotransparencyrdquoof interaction is also observed in dyadic interaction overproperly designed videoconferencing systems (ie onesthat permit mutual eye gaze) and will be true of CVEsystems in the near future even though interactantsknow at some level that they see only digital models ofother interactants Transparency of interaction reflectedboth in interactantsrsquo experience and in the effectivenessof group performance (eg in collaborative decisionmaking) speaks to the sufficiency of the verbal and non-verbal signals and also indicates that social interaction ismediated by automatic processes that are quite separatefrom conscious cognition (Fodor 1983 Pylyshyn1980) Thus the creation of new communication mediacan provide insight into human social interaction

3 Implementations of TSI

In this section we outline three important TSIcomponents Each involves a number of theoreticalideas that warrant technical development as well as eval-uation via behavioral research The categories of TSIinclude self representations (ie avatars) sensory capabil-ities and contextual situation Each category also pro-vides researchers with powerful new tools to investigateand improve understanding of psychological processesunderlying behavior (Blascovich et al 2002 Loomis etal 1999) Specifically investigators can manipulate theunderlying structure of social interaction using TSI byaltering the operation of its individual components andthereby ldquoreverse engineeringrdquo social interaction In thispaper however the focus is to explore the theoreticalnature of TSI as its own basic research question and tospeculate on its potential implications for communica-tion via CVEs While we discuss these three categories

430 PRESENCE VOLUME 13 NUMBER 4

as separate entities clearly a system that employs TSIwould be most effective as some combination of thethree We keep them separate for the purpose of clarityin this paper

We realize that all of the necessary CVE technologymay not yet be available (see Kraut Fussell Brennan ampSiegel 2002) Furthermore in order to adequatelystudy and enable transformed social interaction in col-laborative virtual environments the technology used fortracking nonverbal signals must eventually be passiveand unobtrusive Sensors and markers that are worn onthe body can limit the naturalness of interaction bycausing participants to focus on the technology at theexpense of the interaction Computer vision technologyoffers the possibility of using passive noncontact sens-ing to locate track and model human body motionSubsequently pattern recognition and classificationtechniques can be used to recognize meaningful move-ments and gestures

In the past dozen or so years there has been a signifi-cant and increasing interest in these problems within thecomputer vision research community (Turk amp Kolsch2003 Black amp Yacoob 1997 Donato Bartlett HagerEckman amp Sejnowski 1999 Feris Hu amp Turk 2003Stiefelhagen Yang amp Waibel 1997 Viola amp Jones2001) Motivated by various application areas includingbiometrics surveillance multimedia indexing and re-trieval medical applications and human-computer in-teraction there has been significant progress in areassuch as face detection face recognition facial expressionanalysis articulated body tracking and gesture recogni-tion The state of the art in these areas is not yet to thepoint of fully supporting CVEs as many of these sys-tems tend to be slow and lack robustness in real-worldenvironments (with typical changes in lighting cloth-ing etc)

But the progress is promising and we expect to seean increased utility of these technologies to track andmodel nonverbal behaviors in order to transmit andtransform them within the context of CVEs We believethat each of the TSI implementations discussed in thecurrent work is foreseeable perhaps even in the nearfuture For the purposes of the current discussion de-tails of the specific CVE implementation are not criticalTSI should be effective in projection-based CVEs head-

mounted display CVEs CAVEs or in certain types ofaugmented-reality CVEs

A concrete example of a typical CVE interaction helpsdescribe the specific types of transformations GenerallyTSI should enable interactants to communicate moreeffectively by providing them with more information aswell as providing them (or systems designers) with morecontrol in directing their nonverbal behaviors The lat-ter suggests on a more cynical note that the peoplewho may profit most from TSI may be those who enterinteractions with specific goals for example changingthe attitudes of the other interactants (Slater Pertaubamp Steed 1999) In the subsequent sections we describean interaction with a leader and one or more commu-nity members evaluating a proposal in a CVE Howeverone could just as easily substitute leader with politicianteacher lawyer leader or missionary and substitutecommunity members with voters students jurorsmembers or atheists Hence the theoretical parametersand implications of TSI have applications across manydifferent contexts

31 Transforming Self Representations

In CVEs avatars representing interactants canbear varying degrees of photographic or anthropomor-phic (Garau et al 2003 Bailenson Beall BlascovichRaimmundo amp Weisbuch 2001 Sannier amp Thalmann1998) behavioral (Bailenson et al 2002 Cassell 2000Biocca 1997) and even dispositional resemblance tointeractants they represent Assuming that interactants(by their own design or through the actions of systemsoperators) have the freedom to vary both the photo-graphic and behavioral similarity of their avatar to them-selves a number of subtle but potentially drastic (interms of outcomes of CVE interactions) transformationscan occur

In many instances similarity breeds attraction (Byrne1971) We know that people treat avatars that look likethemselves more intimately than avatars that look likeothers as indicated by invasion of their personal spaceand willingness to perform embarrassing acts in front ofthem and by how attractive and likable they believe theavatars to be (Bailenson Blascovich Beall amp Guadagno2004 Bailenson et al 2001) Given this special rela-

Bailenson et al 431

tionship a CVE interactant may use this principle to anadvantage Consider the situation in which a leader anda community member are negotiating via a CVE A par-ticularly devious leader can represent herself by incorpo-rating characteristics of the memberrsquos representation Bymaking herself appear more similar to the member theleader becomes substantially more persuasive (Chaiken1979 Simons 1976) Indeed a leader would be able toadjust the structural or textural similarity of her ownavatar idiosyncratically to the members in her audience

This similarity could be achieved in various mannersemploying any of a number of techniques to parametri-cally vary the similarity of computer-generated modelsvia 2D and 3D morphing techniques (Blanz amp Vetter1999 Busey 1998 Decarlo Metaxas amp Stone 1998)The leader could be represented as some kind of a hy-brid maintaining some percentage of her original facialstructure and texture but also incorporating percent-ages of the memberrsquos structure and texture Alterna-tively the leader could be represented completelyveridically to her facial structure but for a few framesper second could replace her own head with the head ofthe member Priming familiarity with limited exposureto human faces has proven to be effective with 2D im-ages (Zajonc 1971) Finally consider the situation inwhich the leader is interacting via CVE with two mem-bers The leader can be differentially represented toboth members simultaneously such that each membersees a different hybrid leader avatar incorporating as-pects of each member In other words the leader doesnot need a consistent representation across interactantsbecause the CVE operator is free to render differentleader avatars to each member

Incorporating the self-identity of other interactantscan also occur via behavioral characteristics Psychologi-cal research has demonstrated that when an experi-menter subtly mimics experimental participants (egleans in the same direction as they do crosses her legswhen the participants do) participants subsequentlyreport that they liked the experimenter more andsmoother conversation flowed (Chartrand amp Bargh1999) This ldquochameleon effectrdquo could be extremely ef-fective in CVEs The leader (or the system operator) canuse algorithms to detect motions of the other interac-tants at varying levels of detail and coordinate the ani-

mations of her avatar to be a blended combination ofher own and those of the others

Consider a CVE interaction consisting of a leader andtwo members In the course of this interaction patternsof nonverbal behaviors will emerge and statistics basedon a running tabulation can be automatically collectedvia CVE technology In other words if there is a certainrate of head nodding exhibited by person A and anotherrate exhibited by person B the leaderrsquos head can bemade to nod in a way consistent with the statistics (egan average or median) Alternatively the leaderrsquos avatarcan just mimic each interactant individually and renderthose particular movements only to each correspondinginteractant

The leader could also morph her representation withthat of an unrepresented party not present in the CVEbut who is previously known to possess qualities thatinspire certain reactions Depending on the context forexample the leader can morph a percentage of famouspoliticians historical figures or even pop stars into heravatar This feature blending can be explicit and blatant(eg the leader looks just like an expert or a religiousfigure) or more implicit and subterranean (eg theleader incorporates subtle features such as cheekbonesand hairstyle) Alternatively the leader can morph her-self with a person who may not be famous but withwhom the member maintains a deep trust (Gibson1984)

A second form of avatar transformation arises fromthe ability to selectively decouple and reconstruct ren-dered behavior in CVEs In other words not only caninteractants render nonverbal behaviors different fromthe nonverbal behaviors that they actually perform butsimilarly to the discussion above they can render thosebehaviors idiosyncratically for each of the other interac-tants

Consider what we term Non-Zero-Sum-Mutual-Gaze(NZSMG) Ordinary mutual gaze occurs when individ-uals look at one anotherrsquos eyes during discourse Inface-to-face conversation mutual gaze is zero-sum Inother words if interactant A maintains eye contact withinteractant B for 70 percent of the time it is not possi-ble for A to maintain eye contact with interactant C formore than 30 percent of the time However interactionin CVEs is not bound by this constraint With digital

432 PRESENCE VOLUME 13 NUMBER 4

avatars A can be made to appear to maintain mutualgaze with both B and C for a majority of the conversa-tion

Gaze is one of the most thoroughly studied nonverbalgestures in research on social interaction (Rutter 1984Kleinke 1986 Kendon 1977) Direct eye gaze canprovide cues for intimacy agreement and interest(Arygle 1988) Furthermore gaze can enhance learningduring instruction as well as memory for information(Fry amp Smith 1975 Sherwood 1987) The advantageof using CVEs is that normal nonverbal behaviors ofinteractants can be augmented via NZSMG Further-more the interactants in a CVE can either be unawareof this transformation (ie implicit NZSMG) or awareof this transformation (ie explicit NZSMG) as Figure1 demonstrates Preliminary work studying implicitNZSMG has demonstrated that interactants are notaware of the decoupling from actual behavior Further-more the interactants respond to the artificial gaze as ifit were actual gaze (Beall Bailenson Loomis Blasco-vich amp Rex 2003) This method may prove to be mosteffective during distance learning in educational CVEs(Morgan Kriz Howard das Neves amp Kelso 2001) inwhich the instructor uses her augmented gaze as a toolto keep the students more engaged

Decoupling can also be used to achieve the oppositeeffect Consider the situation where the leader wants toscrutinize the nonverbal behaviors of member A butdoes not want the member to feel uncomfortable fromher unwavering gaze The leader can render herselflooking at her shoes or perhaps at member B in the

CVE while in reality she is watching member Arsquos everymove

In order for such a system to be effective there mustbe a convincing algorithm to drive the autonomous eyegaze In other words if the leader wants the freedom toemploy NZSMG or to wander around the CVE scruti-nizing different aspects of the conversation undetectedshe (by her own device or assisted by the systems opera-tor) must maintain the illusion that her avatar is exhibit-ing the typical and appropriate nonverbal gesturesThere are a number of ways to achieve this The first issome type of artificial intelligence algorithm that ap-proximates appropriate gestures of the leaderrsquos avatar bymonitoring the gestures and speech by the other inter-actants While there have been significant advances inthis regard (Cassell 2000) the ability of an algorithmto process natural language as well as generate believ-able responses may still be many years off A more likelymethod for achieving this goal would be to use actualhumans instead of AI algorithms In this scenario theleader employs one or more nonverbal ldquocyranoidsrdquo(Milgram 1992) to augment the nonverbal behaviorspresented to each individual member To do so theleader solicits the help of several assistants each of whosejob is to provide the nonverbal behaviors targeted to-ward a particular member See Figure 2

In this many-to-many ldquoWizard of Ozrdquo implementa-tion each member is presented a unified Leader who isrendered privately to her this private representationwould be a melding of the actual leader and one of herassistants so that when the leaderrsquos attention was di-

Figure 1 Internal belief states from implicit NZMG (left) and explicit NZMG (right)

Bailenson et al 433

verted away from that member for long periods of timethe assistant could step in and help maintain a believableinteraction by seamlessly serving as the leaderrsquos proxyThe leader herself would then act as a conductor over-seeing all the interactions yet being free to focus herattention on individual members when she so desires Inaddition the leader is free to wander about the digitalspace consult her notes take a rest or conduct a side-bar meeting with another person However because heravatar is partially cyranic it can continue to exhibit theappropriate nonverbal behaviors all the while to eachmember Furthermore having a number of assistantswhose sole focus is to respond with appropriate nonver-bal gestures to each of the interactants in the CVEshould maximize the membersrsquo involvement or sense ofpresence in the CVE For important meetings seminarsor presentations conducted via CVEs individual interac-tants may want to utilize a number of assistants as a corepresentation team

32 Transforming Sensory Capabilities

Interactants can be assisted by technology thattakes advantage of CVEs that can keep precise runningtabs of certain types of behaviors and then display sum-

maries of those behaviors exclusively to individual inter-actants For example consider an educational CVE inwhich an instructor wants to ensure that she is directingher nonverbal behaviors in a desired fashion Such asinstructor may want to monitor her mutual gaze to en-sure that she is not looking at any one student morethan others during a presentation The tracking equip-ment used to render the scene can keep an online totalof the amount of time the instructor gazed at each indi-vidual student The CVE can render a display of thisgaze meter as well as use visual or auditory alerts toinform the instructor of disproportionate applications ofgaze

Furthermore interactants can use the tracking datasummaries to learn more about the attitudes of the oth-ers Nonverbal gestures are often correlates of specificmental states (Ekman 1978 Zajonc Murphy amp Ingle-hart 1989) For example in general we nod when weagree smile when we are pleased tilt our heads whenwe are confused and look at something in which we areinterested Interactants will be able to tailor their CVEsystems to keep track of nonverbal behaviors with thegoal of aiding interactants to infer the mental states ofthe other interactants For example a teacher will beable to gauge the percentage of students exhibitingnonverbal behaviors that suggest confusion or not un-derstanding a point in a lesson Similarly a leader coulddetermine who in a room full of members is respondingmost positively to her behavior Intuitively tabulatingand assessing the nonverbal behaviors of others is cer-tainly something that humans do constantly in face-to-face interactions With CVEs interactants will be able totabulate these behaviors with greater precision Interac-tants can use the objective tabulations from the trackingdata to augment their normal intuitions about the ges-tures occurring in the interaction

Another transformation involves filtering or degrad-ing certain signals or nonverbal behaviors There aresome visual nonverbal behaviors that tend to distractinteractants Using filtering algorithms interactants canprevent counterproductive distractions in a number ofways For example consider the situation in which aspeaker in a CVE taps her pen rapidly as she speaks Inface-to-face meetings this type of behavior can distract

Figure 2 A depiction of cyranoids On the top row are three

nonrendered gesturers Each member on the bottom hears the

leaderrsquos actual verbal behaviors (dashed lines) However each

member views the nonverbal behavior of her dedicated gesturer

rendered onto the avatar of the leader (unbroken lines)

434 PRESENCE VOLUME 13 NUMBER 4

interactants Using a CVE this type of behavior can befiltered in two ways First the speaker can filter the be-havior on the transmitting end If people know thatthey have difficulty suppressing certain nonverbal behav-iors that tend to be perceived in a negative mannersuch as a nervous tick they can activate a filter that pre-vents the behavior from being rendered Similarly incertain situations a CVE interactant may not want torender certain nonverbal behaviors Consider the leaderexample The potential member may benefit from ren-dering her ldquopoker facerdquo that is not demonstrating anyenthusiasm or disappointment via facial expressionsConsequently the member may accrue strategic advan-tage during a negotiation Furthermore interactants canfilter behaviors on the receiving end If a speakerrsquos handmotions are distracting then a listener can simplychoose to not render that interactantrsquos hand move-ments

Another example of transforming sensory capabilitiesis producing a visual indicator regarding where eachinteractantrsquos attention currently lies as revealed by their

eye direction (Velichkovsky 1995) We have explored atechnique that involves rendering each personrsquos viewfrustrum to indicate the field of view as Figure 3 illus-trates In this example the wire frame frustrums spot-light the 3D space visible to each person This featurecolor coded for each person may be especially helpfulto teachers in a distance learning CVE who could usesuch information to see where students are focusingtheir visual attention without having to look directly atthe studentsrsquo eyes

There are a number of similar tools (ie specific ob-jects rendered only to particular interactants) that canassist interactants in a CVE For example in ourNZSMG studies an experimenter enters a CVE and at-tempts to persuade other interactants regarding a cer-tain topic (Beall et al 2003) In those interactions werender the interactantsrsquo names over their heads on float-ing billboards for the experimenter to read In this man-ner the experimenter can refer to people by name moreeasily There are many other ways to use these floatingbillboards to assist interactants for example reminders

Figure 3 View frustrums marking the field of view of interactants

Bailenson et al 435

about the interactantrsquos preferences or personality (egldquodoesnrsquot respond well to prolonged mutual gazerdquo)

One of the most useful forms of transforming sensorycapabilities may be to enlist one or more human con-sultants who are rendered to only one member in aCVE (ie virtual ghosts) Unlike a face-to-face interac-tion a CVE will enable an interactant to have informedhuman consultants who are free to wander around thevirtual meeting space to scrutinize the actions of otherinteractants to conduct online research and sidebarmeetings in order to provide key interactants with addi-tional information and to generally provide support forthe interactants For example the leader can have herresearch team actually rendered beside her in the CVEMembers of her team can point out actions by potentialmembers suggest new strategies and even provide real-time criticism and feedback concerning the behavior ofthe leader without any of the other members havingeven a hint of awareness concerning the human consul-tantsrsquo presence Alternatively the leader herself can gointo ldquoghost moderdquo and explore the virtual world withher team while her avatar remains seated and is evencontrolled by yet another member of her team

33 Transforming the Situation

In addition to transforming their representationand sensory capabilities CVE interactants can also usealgorithms to transform their general spatial or temporalsituations In a CVE people generally adopt a spatiallycoherent situational context across all remote interac-tants that brings everyone together in the shared spaceHowever there is no reason that the details and ar-rangements of that virtual space need to be constant forall the interactants in the CVE Consider the situationfor three interactants Interactant A may choose to forman isosceles triangle with the other two while both in-teractants B and C may choose to form equilateral trian-gles Interactant A may even choose to flip the locationsof B and C In this scenario the CVE operating systemcan preserve the intended eye gaze direction by trans-forming the amplitudes or direction of head and eyemovements in a prescribed manner While this is asomewhat simple example with as many as four interac-tants it is straightforward to design spatial transforma-

tions that allow the intended eye and head gaze cues toremain intact across all interactants While eventuallysuch discordance may cause the quality and smoothnessof the interaction to suffer there are a number of waysthat transforming the situation can assist individual in-teractants

One such transformation involves multilateral per-spectives In a normal conversation each interactant hasa unique and privileged perspective That perspective isa combination of her sensory input (eg visual andacoustic fields of view) and internal beliefs about theinteraction In normal face-to-face interactions peoplecontinually use sensory input to update and adjust theirinternal beliefs (Kendon 1977) Interactants in a CVEwill possess a completely new mechanism to adjust andupdate internal beliefs A personrsquos viewpoint can bemultilateral as opposed to unilateral (normal) Inother words in a real-time conversation interactant Acan take the viewpoint of interactant B and perceiveherself as she performs various verbal and nonverbal ges-tures during the interaction In this manner she canacquire invaluable sensory information pertaining to theinteraction and update her internal beliefs concerningthe interaction in ways not possible without the CVE

Consequently interactants in educational and persua-sive interactions may be able to improve performancebecause seeing oneself through the eyes of another mayallow one to develop a more informed set of internalbeliefs about others (Baumeister 1998) Furthermoreit may be the case that being able to experience an inter-action through someone elsersquos eyes should reinforce thefact that one is indeed copresent in the CVE (egDurlach amp Slater 2000) Finally utilizing mulitlateralperspectives may assist students in distance learningCVEs in terms of training transfer effects (Rickel ampJohnson 2000) that might occur after an interactantwho has been trained in multilateral perspective takingperforms similar group tasks in nonmediated situations

A second situational transformation involves partiallyrecording the interaction and adjusting temporal prop-erties or sequences in real time Similar to commercialproducts sold for digitally recording and playing backbroadcast television interactants in a CVE should beable to accelerate and decelerate the perceived flow oftime during the mediated interaction Consider the fol-

436 PRESENCE VOLUME 13 NUMBER 4

lowing situation The student in a distance learningCVE does not understand an example that the instruc-tor provides The student can ldquorewindrdquo the recordedinteraction go back to the beginning of the confusingexample and then play back the example Once the stu-dent has understood the confusing example she canthen turn up the rate of playback (eg watch the se-quence at 2X speed) and eventually she can catch upto the instructor again By slowing down the renderedflow of time or speeding it up the interactant can focusdifferentially on particular topics and can review thesame scene from different points of view without miss-ing the remainder of the interaction Of course doingso will result in costs to that interactantrsquos contributionto the CVE in terms of interactivity (ie what does heravatar do while she rewinds) Consequently the disrup-tion of the temporal sequence will necessarily be cou-pled with some kind of an avatar autopilot

Changing the rate of time in a CVE brings up an-other interesting transformation Traditionally CVEsare roughly defined as ldquogeographically separated interac-tantsrdquo interacting over some kind of a computer-mediated network in a shared environment Howeverby combining some of the concepts discussed in previ-ous sections it may be possible to include in the defini-tion of a CVE ldquotemporally separated interactantsrdquo in ashared environment Consider a videoconference of abusiness meeting Oftentimes interested parties whocannot attend the meetings will later review a videotapeof the meeting In a CVE the temporally absent mem-ber has an option to more deeply involve herself in theinteraction Specifically she can situate her avatar in aspecific place in the CVEs seating arrangement and usean autopilot to give her representation rudimentarynonverbal behaviors Furthermore the absent membercan program her avatar to perform simple interactivetasksmdashprerecorded introductions answers to certainquestions about the CVE topic or perhaps more realis-tically for the near-term direct the avatar to play back arecorded performance Then the CVE interaction canproceed in real time with the temporally absent mem-berrsquos avatar approximating the types of behaviors thatshe would do and say As a result temporally presentmembers would actually direct pieces of the conversa-tion towards the absent member as well as transmit

nonverbal gestures towards her Later on instead of justreviewing the recording the temporally absent membercan take her place in the CVE and actually feel presentin the dialogue receiving appropriate nonverbal behav-iors and maximizing the degree of copresence More-over the members of the CVE who were present at thescheduled time can program their avatars during thereplay of the interaction to respond to any post hocquestions that the absent member might have In thisway the degree of interactivity during the replay can beincreased and perhaps at some point in the not-too-distant future the line between real-time and non-real-time interactions will become interestingly blurred

4 Implications of TSI and ResearchDirections

For better or for worse TSI implemented throughCVEs has great potential to change the nature of medi-ated interaction The strategic decoupling of renderedbehavior from actual behavior allows interactants tobreak many constraints that are inherent in face-to-faceinteraction as well as other forms of mediated interac-tions such as telephone and videophone conferencesThe effects of TSI remain to be seen Assuming thatimplementation of the TSI techniques are technicallyfeasible and that using TSI implementations is concep-tually workable for the interactants (both of which aresubstantial assumptions) one could predict a number ofconsequences First TSI may develop into a worthwhiletool that assists interactants in overcoming the inade-quacies of communicating from remote locations Byaugmenting their representational sensory and situa-tional characteristics interactants of CVEs may be ableto achieve levels of interaction that actually surpass face-to-face interaction

On the other hand people in fact may find the useof these transformations extremely unsettling Thereis the potential for the difference between TSI andcurrent CVE implementations to be as drastic as dif-ferences between email and the written letter As thistechnology is developed it is essential to examinepeoplersquos responses to this new medium (ie Reevesamp Nass 1996) It is essential to examine these impor-

Bailenson et al 437

tant potential implications of TSI before the technol-ogy becomes widespread

Along the same lines the threat of TSI may be thevery downfall of CVE interaction In face-to-face inter-action there tends to be some degree of deception forexample people using facial expressions to mask theiremotions Clearly this deception has the potential to bemuch greater with TSI If interactants have no faith thattheir perceptual experience is genuine they may havelittle reason to ever enter a CVE A complete lack oftrust in the truthfulness of gestures one-to-one corre-spondence of avatars and temporal presence of interac-tants has the potential to rob the CVE of one of itsgreatest strengths namely interactivity since the inter-actants may not know who what or when they are in-teracting with others Similarly given an expectation ofTSI interactants may be constantly suspicious duringinteractions this lack of trust of fellow interactants maylead to unproductive collaborations

A solution to this breakdown may require the devel-opment of TSI detectors for interactants either basedon computer algorithms that analyze nonverbal behav-iors or based on actual humans that scrutinize the inter-action To examine the possibility of using human TSIdetectors we now discuss what we call the non-verbalTuring Test (NVTT)

In the popular reinterpretation of the Turing Test(Turing 1950) a judge attempts to determine whichof two players (one human one machine) is a fellowhuman In our NVTT pilot studies experimental par-ticipants acting as judges enter a CVE with two vir-tual humans one avatar whose head movements areveridical and playing back the movements of anotherhuman in real time (ie without TSI) and one avatarwhose head movements are actually a transformationof the judgersquos own head movements (ie time-delayed and reduced motion range) The judge seesthe head movements from a real person on one avatarand some sort of self-mimicked movements on theother During the interaction only head movementsare permitted (ie no verbal communication al-lowed) and participants must devise ways to engageand test the two virtual humans through nonverbalmeans in order to ascertain which is human andwhich is a mimicker Figure 4 illustrates

In the current initial pilot study we manipulatedthree independent variables test trial length (either 16or 32 seconds) mimic delay (ie the computer-agentmimics either 1 2 4 or 8 seconds after the judgersquosmovements) and range of motion (high pitch yaw androll or low yaw only)

Participants in this study wore head-mounted displayswhile a render computer tracked their head orientationtracking rendering and networking latencies were alllow enough to impart a compelling sense of copresence(see Bailenson et al 2002 for detailed descriptions ofthe hardware and software used) Participants were in-structed to sit in a virtual room with two virtual peoplea human agent (ie a representation whose movementsare controlled by a real person in another room) and acomputer agent (ie a computer program that is de-signed to mimic the userrsquos movements in some way)Participants were instructed to interact with the twoother virtual people using head movements in order todetermine which one is the human agent Participantswere run in groups of two with each one acting as thehuman agent for the other Each participant sat at a vir-tual conference table with two virtual humans (similarto that shown in Figure 3) Each participant received arandom order of 32 test trials (two instances of the 16conditions resulting from the crossing of the three inde-

Figure 4 A schematic of the NVTT The human judge is forced to

determine which of the two avatars exhibiting head movements is the

real human and which is the computer-generated human mimicking

the judgersquos own movements

438 PRESENCE VOLUME 13 NUMBER 4

pendent variables) Forty-one undergraduates partici-pated in this study

For the purposes of brevity we focus on two resultsin particular First despite the fact that we explicitly toldparticipants that the computer agent was directly mim-icking them they performed surprisingly poorly whenattempting to identify the human avatar The overallaverage score was only 66 correct (SD 10chance 50 maximum score 100) Moreover ofthe 41 participants in the study more than one fourthwas not reliably different from chance (ie less than 3SEM from 50 between 44 and 56) Second as Fig-ure 5 demonstrates participantsrsquo scores diminished in-versely with the magnitude of the mimic delay in thatthere was a linear trend in the logarithm of the delayvariable F(132) 885 p 01 When the delay wasgreater than 1 second participants had more difficultyidentifying a mimicker

These data are particularly striking in that we had ini-tially predicted that participants would be able to recog-nize their own head movements much more easily whenaware that one avatar was designed to mimic them Inthis study we explicitly told participants about the useof TSI and they still had great difficulty in detecting itThe effects of implicit TSI (that is not disclosed) canonly have a higher impact While this pilot study is ex-tremely simple and only scratches the surface of a para-digm that examines TSI it is still noteworthy that par-ticipants did not detect the mimicker across the board

We are currently exploring other factors underlyingthe discrimination of human nonverbal behavior from

computer-generated behaviors In future studies wewill use NVTTs to study other nonverbal behaviorssuch as facial gestures eye-head gestures (pointingindications by either system) hand gestures and in-terpersonal distance We have shown that in albeitsimple scenarios it is possible to pass the NVTT for apercentage of our test population using TSI We areconfident that as this percentage grows in the nearfuture important scientific and sociological discover-ies will surface along the way

In conclusion there are many reasons one mightwant to avoid TSI these reasons range from Orwellianconcerns to the fear of rendering CVEs (perhaps eventhe telephone) functionally useless We are not advo-cates of TSI as a means to replace normal communica-tion nor are we staunch believers in avoiding TSI inorder to preserve the natural order of communicationand conversation However we do acknowledge thefact that as CVEs become more prevalent the strategicdecoupling of representation from behavior is inevita-ble For that reason alone the notion of TSI warrantsconsiderable attention

Acknowledgments

The authors would like to thank Robin Gilmour and Christo-pher Rex for helpful suggestions Furthermore we thankChristopher Rex and Ryan Jaeger for assistance in collectingdata This research was sponsored in part by NSF Award SBE-9873432 and in part by NSF ITR Award IIS 0205740

References

Argyle M (1988) Bodily communication (2nd ed) LondonMethuen

Bailenson J N Beall A C amp Blascovich J (2002) Mutualgaze and task performance in shared virtual environmentsJournal of Visualization and Computer Animation 13 1ndash8

Bailenson J N Beall A C Blascovich J Raimmundo Ramp Weisbuch M (2001) Intelligent agents who wear yourface Usersrsquo reactions to the virtual self Lecture Notes inArtificial Intelligence 2190 86ndash99

Bailenson J N Blascovich J Beall A C amp Guadagno

Figure 5 Percent correct by mimic delay in seconds This data

excludes subjects at chance performance

Bailenson et al 439

R E (submitted) Self representations in immersive virtualenvironments

Baumeister R F (1998) The self In D T Gilbert S TFiske amp G Lindzey (Eds) Handbook of social psychology(4th ed pp 680ndash740) New York McGraw-Hill

Beall A C Bailenson J N Loomis J Blascovich J ampRex C (2003) Non-zero-sum mutual gaze in immersivevirtual environments Proceedings of HCI International2003

Benford S Bowers J Fahlen L Greenhalgh C amp Snow-don D (1995) User embodiment in collaborative virtualenvironments Proceedings of CHIrsquo95 (pp 242ndash249) ACMPress

Biocca F (1997) The cyborgrsquos dilemma Progressive em-bodiment in virtual environments Journal of Computer-Mediated Communication [online] 3 Retrieved fromhttpwwwascuscorgjcmcvol3issue2biocca2html

Black M amp Yacoob Y (1997) Recognizing facial expres-sions in image sequences using local parameterized modelsof image motion International Journal of Computer Vision25(1) 23ndash48

Blanz V amp Vetter T (1999) A morphable model for thesynthesis of 3D faces SIGGRAPH rsquo99 Conference Proceed-ings 187ndash194

Blascovich J Loomis J Beall A C Swinth K R HoytC amp Bailenson J N (2002) Immersive virtual environ-ment technology as a methodological tool for social psy-chology Psychological Inquiry 13 103ndash124

Busey T A (1988) Physical and psychological representa-tions of faces Evidence from morphing Psychological Sci-ence 9 476ndash483

Byrne D (1971) The attraction paradigm New York Aca-demic Press

Cassell J (2000) Nudge nudge wink wink Elements of face-to-face conversation for embodied conversational agents InJ Cassell et al (Eds) Embodied conversational agentsCambridge MA MIT Press

Chaiken S (1979) Communicator physical attractiveness andpersuasion Journal of Personality and Social Psychology 371387ndash1397

Chartrand T L amp Bargh J (1999) The chameleon effectThe perception-behavior link and social interaction Journalof Personality amp Social Psychology 76(6) 893ndash910

Decarlo D Metaxas D amp Stone M (1998) An anthropo-metric face model using variational techniques Proceedingsof SIGGRAPH rsquo98 67ndash74

Depaulo B M amp Friedman H S (1998) Nonverbal com-munication In D T Gilbert S T Fiske amp G Lindzey

(Eds) The handbook of social psychology (4th ed Vol 2 pp3ndash40) Boston McGraw-Hill

Donato G Bartlett M S Hager J C Ekman P amp Se-jnowski T J (1999) Classifying facial actions IEEE Trans-actions on Pattern Analysis and Machine Intelligence21(10) 974ndash989

Durlach N amp Slater M (2000) Presence in shared virtualenvironments and virtual togetherness Presence Teleopera-tors and Virtual Environments 9 214ndash217

Ekman P (1978) Facial signs Facts fantasies and possibili-ties In T Sebeok (Ed) Sight sound and sense Blooming-ton IN Indiana University Press

Fodor J A (1983) The modularity of mind An essay on fac-ulty psychology Cambridge MA MIT Press

Fry R amp Smith G F (1975) The effects of feedback andeye contact on performance of a digit-encoding task Jour-nal of Social Psychology 96 145ndash146

Gale C amp Monk A F (2002) A look is worth a thousandwords Full gaze awareness in video-mediated conversationDiscourse Processes 33

Garau M Slater M Vinayagamoorhty V Brogni ASteed A amp Sasse M A (2003) The impact of avatar real-ism and eye gaze control on perceived quality of communi-cation in a shared immersive virtual environment Proceed-ings of the SIGCHI Conference on Human Factors inComputing Systems

Gibson W (1984) Neuromancer New York Ace BooksHu C Ferris R amp Turk M (2003) Active wavelet net-

works for face alignment Proceedings of the British MachineVision Conference Norwich UK

Kendon A (1977) Studies in the behavior of social interactionBloomington IN Indiana University

Kleinke C L (1986) Gaze and eye contact A research re-view Psychological Bulletin 100 78ndash100

Kraut R E Fussell S R Brennan S E amp Siegel J(2002) Understanding effects of proximity on collabora-tion Implications for technologies to support remote col-laborative work In P Hinds amp S Kiesler (Eds) Distrib-uted work Cambridge MA MIT Press

Lanier J (2001) Virtually there Scientific American April2001

Leigh J DeFanti T Johnson A Brown M Sandin D(1997) Global telemersion Better than being there Pro-ceedings of ICAT rsquo97

Loomis J M Blascovich J J amp Beall A C (1999) Im-mersive virtual environments as a basic research tool in psy-chology Behavior Research Methods Instruments and Com-puters 31(4) 557ndash564

440 PRESENCE VOLUME 13 NUMBER 4

Mania K amp Chalmers A (1998) Proceedings of theFourth International Conference on Virtual Systems andMultimedia (pp 177ndash182) Amsterdam IOS Press-Ohmsha

Milgram S (1992) The individual in a social world Essaysand experiments (2nd ed) New York McGraw-Hill

Morgan T Kriz R Howard T Dias Neves F amp Kelso J(2001) Extending the use of collaborative virtual environ-ments for instruction to Kndash12 schools Insight 1(1)

Normand V Babski C Benford S Bullock A Carion SChrysanthou Y et al (1999) The COVEN project Ex-ploring applicative technical and usage dimensions of col-laborative virtual environments Presence Teleoperators andVirtual Environments 8(2) 218ndash236

Patterson M L (1982) An arousal model of interpersonalintimacy Psychological Review 89 231ndash249

Pylyshyn Z W (1980) Computation and cognition Issuesin the foundations of cognitive science Behavioral amp BrainSciences 3 111ndash169

Reeves B amp Nass C (1996) The media equation Howpeople treat computers television and new media like realpeople and places New York Cambridge University Press

Rickel J amp Johnson W L (2000) Task-oriented collabora-tion with embodied agents in virtual worlds In J Cassell JSullivan S Prevost amp E Churchill (Eds) Embodied con-versational agents Cambridge MA MIT Press

Rutter D R (1984) Looking and seeing The role of visualcommunication in social interaction Suffolk UK JohnWiley amp Sons

Sannier G amp Thalmann M N (1998) A user friendly tex-ture-fitting methodology for virtual humans ComputerGraphics International rsquo97

Schwartz P Bricker L Campbell B Furness T InkpenK Matheson L et al (1998) Virtual playground Archi-tectures for a shared virtual world Proceedings of the ACMSymposium on Virtual Reality Software and Technology 199843ndash50

Sherwood J V (1987) Facilitative effects of gaze uponlearning Perceptual and Motor Skills 64 1275ndash1278

Simons H (1976) Persuasion Understanding practice andanalysis Reading MA Heath

Slater M Pertaub D amp Steed A (1999) Public speaking

in virtual reality Facing an audience of avatars IEEE Com-puter Graphics and Applications 19(2) 6ndash9

Slater A Sadagic M Usoh R amp Schroeder R (2000)Small group behavior in a virtual and real environment Acomparative study Presence Teleoperators and Virtual Envi-ronments 9(1) 37ndash51

Stiefelhagen R Yang J amp Waibel A (1997) Tracking eyesand monitoring eye gaze In M Turk amp Y Takabayashi(Eds) Proceedings of the Workshop on Perceptual User Inter-faces

Turing A (1950) Computing machinery and intelligenceMind 59 (236)

Turk M amp Kolsch M (in press) Perceptual interfaces InMedioni G amp Kang S B (Eds) Emerging topics in com-puter vision Boston Prentice Hall

Velichkovsky B M (1995) Communicating attention Gazeposition transfer in cooperative problem solving Pragmaticsand Cognition 3(2) 199ndash222

Vertegaal R (1999) The GAZE groupware system Mediat-ing joint attention in multiparty communication and collab-oration Proceedings of the CHI rsquo99 Conference on HumanFactors in Computing Systems The CHI is the Limit 294ndash301

Viola P amp Jones M (2001) Rapid object detection using aboosted cascade of simple features Proceedings of the IEEEConference on Computer Vision and Pattern Recognition

Wallace D F (1996) Infinite jest Boston Little BrownWilliams K Cheung K T amp Choi W (2000) Cyberostra-

cisms Effects of being ignored over the internet Journal ofPersonality and Social Psychology 79 748ndash762

Yee N (2002) Befriending ogres and wood elvesmdashUnder-standing relationship formation in MMORPGs Retrievedfrom httpwwwnickyeecomhubrelationshipshomehtml

Zajonc R B (1971) Brainwash Familiarity breeds comfortPsychology Today 3(9) 60ndash64

Zajonc R B Murphy S T amp Inglehart M (1989) Feel-ing and facial efference Implication of the vascular theoryof emotion Psychological Review 96 395ndash416

Zhang X amp Furnas G (2002) Social interactions in multi-scale CVEs Proceedings of the ACM Conference on Collabo-rative Virtual Environments 2002 (CVE 2002)

Bailenson et al 441

as separate entities clearly a system that employs TSIwould be most effective as some combination of thethree We keep them separate for the purpose of clarityin this paper

We realize that all of the necessary CVE technologymay not yet be available (see Kraut Fussell Brennan ampSiegel 2002) Furthermore in order to adequatelystudy and enable transformed social interaction in col-laborative virtual environments the technology used fortracking nonverbal signals must eventually be passiveand unobtrusive Sensors and markers that are worn onthe body can limit the naturalness of interaction bycausing participants to focus on the technology at theexpense of the interaction Computer vision technologyoffers the possibility of using passive noncontact sens-ing to locate track and model human body motionSubsequently pattern recognition and classificationtechniques can be used to recognize meaningful move-ments and gestures

In the past dozen or so years there has been a signifi-cant and increasing interest in these problems within thecomputer vision research community (Turk amp Kolsch2003 Black amp Yacoob 1997 Donato Bartlett HagerEckman amp Sejnowski 1999 Feris Hu amp Turk 2003Stiefelhagen Yang amp Waibel 1997 Viola amp Jones2001) Motivated by various application areas includingbiometrics surveillance multimedia indexing and re-trieval medical applications and human-computer in-teraction there has been significant progress in areassuch as face detection face recognition facial expressionanalysis articulated body tracking and gesture recogni-tion The state of the art in these areas is not yet to thepoint of fully supporting CVEs as many of these sys-tems tend to be slow and lack robustness in real-worldenvironments (with typical changes in lighting cloth-ing etc)

But the progress is promising and we expect to seean increased utility of these technologies to track andmodel nonverbal behaviors in order to transmit andtransform them within the context of CVEs We believethat each of the TSI implementations discussed in thecurrent work is foreseeable perhaps even in the nearfuture For the purposes of the current discussion de-tails of the specific CVE implementation are not criticalTSI should be effective in projection-based CVEs head-

mounted display CVEs CAVEs or in certain types ofaugmented-reality CVEs

A concrete example of a typical CVE interaction helpsdescribe the specific types of transformations GenerallyTSI should enable interactants to communicate moreeffectively by providing them with more information aswell as providing them (or systems designers) with morecontrol in directing their nonverbal behaviors The lat-ter suggests on a more cynical note that the peoplewho may profit most from TSI may be those who enterinteractions with specific goals for example changingthe attitudes of the other interactants (Slater Pertaubamp Steed 1999) In the subsequent sections we describean interaction with a leader and one or more commu-nity members evaluating a proposal in a CVE Howeverone could just as easily substitute leader with politicianteacher lawyer leader or missionary and substitutecommunity members with voters students jurorsmembers or atheists Hence the theoretical parametersand implications of TSI have applications across manydifferent contexts

31 Transforming Self Representations

In CVEs avatars representing interactants canbear varying degrees of photographic or anthropomor-phic (Garau et al 2003 Bailenson Beall BlascovichRaimmundo amp Weisbuch 2001 Sannier amp Thalmann1998) behavioral (Bailenson et al 2002 Cassell 2000Biocca 1997) and even dispositional resemblance tointeractants they represent Assuming that interactants(by their own design or through the actions of systemsoperators) have the freedom to vary both the photo-graphic and behavioral similarity of their avatar to them-selves a number of subtle but potentially drastic (interms of outcomes of CVE interactions) transformationscan occur

In many instances similarity breeds attraction (Byrne1971) We know that people treat avatars that look likethemselves more intimately than avatars that look likeothers as indicated by invasion of their personal spaceand willingness to perform embarrassing acts in front ofthem and by how attractive and likable they believe theavatars to be (Bailenson Blascovich Beall amp Guadagno2004 Bailenson et al 2001) Given this special rela-

Bailenson et al 431

tionship a CVE interactant may use this principle to anadvantage Consider the situation in which a leader anda community member are negotiating via a CVE A par-ticularly devious leader can represent herself by incorpo-rating characteristics of the memberrsquos representation Bymaking herself appear more similar to the member theleader becomes substantially more persuasive (Chaiken1979 Simons 1976) Indeed a leader would be able toadjust the structural or textural similarity of her ownavatar idiosyncratically to the members in her audience

This similarity could be achieved in various mannersemploying any of a number of techniques to parametri-cally vary the similarity of computer-generated modelsvia 2D and 3D morphing techniques (Blanz amp Vetter1999 Busey 1998 Decarlo Metaxas amp Stone 1998)The leader could be represented as some kind of a hy-brid maintaining some percentage of her original facialstructure and texture but also incorporating percent-ages of the memberrsquos structure and texture Alterna-tively the leader could be represented completelyveridically to her facial structure but for a few framesper second could replace her own head with the head ofthe member Priming familiarity with limited exposureto human faces has proven to be effective with 2D im-ages (Zajonc 1971) Finally consider the situation inwhich the leader is interacting via CVE with two mem-bers The leader can be differentially represented toboth members simultaneously such that each membersees a different hybrid leader avatar incorporating as-pects of each member In other words the leader doesnot need a consistent representation across interactantsbecause the CVE operator is free to render differentleader avatars to each member

Incorporating the self-identity of other interactantscan also occur via behavioral characteristics Psychologi-cal research has demonstrated that when an experi-menter subtly mimics experimental participants (egleans in the same direction as they do crosses her legswhen the participants do) participants subsequentlyreport that they liked the experimenter more andsmoother conversation flowed (Chartrand amp Bargh1999) This ldquochameleon effectrdquo could be extremely ef-fective in CVEs The leader (or the system operator) canuse algorithms to detect motions of the other interac-tants at varying levels of detail and coordinate the ani-

mations of her avatar to be a blended combination ofher own and those of the others

Consider a CVE interaction consisting of a leader andtwo members In the course of this interaction patternsof nonverbal behaviors will emerge and statistics basedon a running tabulation can be automatically collectedvia CVE technology In other words if there is a certainrate of head nodding exhibited by person A and anotherrate exhibited by person B the leaderrsquos head can bemade to nod in a way consistent with the statistics (egan average or median) Alternatively the leaderrsquos avatarcan just mimic each interactant individually and renderthose particular movements only to each correspondinginteractant

The leader could also morph her representation withthat of an unrepresented party not present in the CVEbut who is previously known to possess qualities thatinspire certain reactions Depending on the context forexample the leader can morph a percentage of famouspoliticians historical figures or even pop stars into heravatar This feature blending can be explicit and blatant(eg the leader looks just like an expert or a religiousfigure) or more implicit and subterranean (eg theleader incorporates subtle features such as cheekbonesand hairstyle) Alternatively the leader can morph her-self with a person who may not be famous but withwhom the member maintains a deep trust (Gibson1984)

A second form of avatar transformation arises fromthe ability to selectively decouple and reconstruct ren-dered behavior in CVEs In other words not only caninteractants render nonverbal behaviors different fromthe nonverbal behaviors that they actually perform butsimilarly to the discussion above they can render thosebehaviors idiosyncratically for each of the other interac-tants

Consider what we term Non-Zero-Sum-Mutual-Gaze(NZSMG) Ordinary mutual gaze occurs when individ-uals look at one anotherrsquos eyes during discourse Inface-to-face conversation mutual gaze is zero-sum Inother words if interactant A maintains eye contact withinteractant B for 70 percent of the time it is not possi-ble for A to maintain eye contact with interactant C formore than 30 percent of the time However interactionin CVEs is not bound by this constraint With digital

432 PRESENCE VOLUME 13 NUMBER 4

avatars A can be made to appear to maintain mutualgaze with both B and C for a majority of the conversa-tion

Gaze is one of the most thoroughly studied nonverbalgestures in research on social interaction (Rutter 1984Kleinke 1986 Kendon 1977) Direct eye gaze canprovide cues for intimacy agreement and interest(Arygle 1988) Furthermore gaze can enhance learningduring instruction as well as memory for information(Fry amp Smith 1975 Sherwood 1987) The advantageof using CVEs is that normal nonverbal behaviors ofinteractants can be augmented via NZSMG Further-more the interactants in a CVE can either be unawareof this transformation (ie implicit NZSMG) or awareof this transformation (ie explicit NZSMG) as Figure1 demonstrates Preliminary work studying implicitNZSMG has demonstrated that interactants are notaware of the decoupling from actual behavior Further-more the interactants respond to the artificial gaze as ifit were actual gaze (Beall Bailenson Loomis Blasco-vich amp Rex 2003) This method may prove to be mosteffective during distance learning in educational CVEs(Morgan Kriz Howard das Neves amp Kelso 2001) inwhich the instructor uses her augmented gaze as a toolto keep the students more engaged

Decoupling can also be used to achieve the oppositeeffect Consider the situation where the leader wants toscrutinize the nonverbal behaviors of member A butdoes not want the member to feel uncomfortable fromher unwavering gaze The leader can render herselflooking at her shoes or perhaps at member B in the

CVE while in reality she is watching member Arsquos everymove

In order for such a system to be effective there mustbe a convincing algorithm to drive the autonomous eyegaze In other words if the leader wants the freedom toemploy NZSMG or to wander around the CVE scruti-nizing different aspects of the conversation undetectedshe (by her own device or assisted by the systems opera-tor) must maintain the illusion that her avatar is exhibit-ing the typical and appropriate nonverbal gesturesThere are a number of ways to achieve this The first issome type of artificial intelligence algorithm that ap-proximates appropriate gestures of the leaderrsquos avatar bymonitoring the gestures and speech by the other inter-actants While there have been significant advances inthis regard (Cassell 2000) the ability of an algorithmto process natural language as well as generate believ-able responses may still be many years off A more likelymethod for achieving this goal would be to use actualhumans instead of AI algorithms In this scenario theleader employs one or more nonverbal ldquocyranoidsrdquo(Milgram 1992) to augment the nonverbal behaviorspresented to each individual member To do so theleader solicits the help of several assistants each of whosejob is to provide the nonverbal behaviors targeted to-ward a particular member See Figure 2

In this many-to-many ldquoWizard of Ozrdquo implementa-tion each member is presented a unified Leader who isrendered privately to her this private representationwould be a melding of the actual leader and one of herassistants so that when the leaderrsquos attention was di-

Figure 1 Internal belief states from implicit NZMG (left) and explicit NZMG (right)

Bailenson et al 433

verted away from that member for long periods of timethe assistant could step in and help maintain a believableinteraction by seamlessly serving as the leaderrsquos proxyThe leader herself would then act as a conductor over-seeing all the interactions yet being free to focus herattention on individual members when she so desires Inaddition the leader is free to wander about the digitalspace consult her notes take a rest or conduct a side-bar meeting with another person However because heravatar is partially cyranic it can continue to exhibit theappropriate nonverbal behaviors all the while to eachmember Furthermore having a number of assistantswhose sole focus is to respond with appropriate nonver-bal gestures to each of the interactants in the CVEshould maximize the membersrsquo involvement or sense ofpresence in the CVE For important meetings seminarsor presentations conducted via CVEs individual interac-tants may want to utilize a number of assistants as a corepresentation team

32 Transforming Sensory Capabilities

Interactants can be assisted by technology thattakes advantage of CVEs that can keep precise runningtabs of certain types of behaviors and then display sum-

maries of those behaviors exclusively to individual inter-actants For example consider an educational CVE inwhich an instructor wants to ensure that she is directingher nonverbal behaviors in a desired fashion Such asinstructor may want to monitor her mutual gaze to en-sure that she is not looking at any one student morethan others during a presentation The tracking equip-ment used to render the scene can keep an online totalof the amount of time the instructor gazed at each indi-vidual student The CVE can render a display of thisgaze meter as well as use visual or auditory alerts toinform the instructor of disproportionate applications ofgaze

Furthermore interactants can use the tracking datasummaries to learn more about the attitudes of the oth-ers Nonverbal gestures are often correlates of specificmental states (Ekman 1978 Zajonc Murphy amp Ingle-hart 1989) For example in general we nod when weagree smile when we are pleased tilt our heads whenwe are confused and look at something in which we areinterested Interactants will be able to tailor their CVEsystems to keep track of nonverbal behaviors with thegoal of aiding interactants to infer the mental states ofthe other interactants For example a teacher will beable to gauge the percentage of students exhibitingnonverbal behaviors that suggest confusion or not un-derstanding a point in a lesson Similarly a leader coulddetermine who in a room full of members is respondingmost positively to her behavior Intuitively tabulatingand assessing the nonverbal behaviors of others is cer-tainly something that humans do constantly in face-to-face interactions With CVEs interactants will be able totabulate these behaviors with greater precision Interac-tants can use the objective tabulations from the trackingdata to augment their normal intuitions about the ges-tures occurring in the interaction

Another transformation involves filtering or degrad-ing certain signals or nonverbal behaviors There aresome visual nonverbal behaviors that tend to distractinteractants Using filtering algorithms interactants canprevent counterproductive distractions in a number ofways For example consider the situation in which aspeaker in a CVE taps her pen rapidly as she speaks Inface-to-face meetings this type of behavior can distract

Figure 2 A depiction of cyranoids On the top row are three

nonrendered gesturers Each member on the bottom hears the

leaderrsquos actual verbal behaviors (dashed lines) However each

member views the nonverbal behavior of her dedicated gesturer

rendered onto the avatar of the leader (unbroken lines)

434 PRESENCE VOLUME 13 NUMBER 4

interactants Using a CVE this type of behavior can befiltered in two ways First the speaker can filter the be-havior on the transmitting end If people know thatthey have difficulty suppressing certain nonverbal behav-iors that tend to be perceived in a negative mannersuch as a nervous tick they can activate a filter that pre-vents the behavior from being rendered Similarly incertain situations a CVE interactant may not want torender certain nonverbal behaviors Consider the leaderexample The potential member may benefit from ren-dering her ldquopoker facerdquo that is not demonstrating anyenthusiasm or disappointment via facial expressionsConsequently the member may accrue strategic advan-tage during a negotiation Furthermore interactants canfilter behaviors on the receiving end If a speakerrsquos handmotions are distracting then a listener can simplychoose to not render that interactantrsquos hand move-ments

Another example of transforming sensory capabilitiesis producing a visual indicator regarding where eachinteractantrsquos attention currently lies as revealed by their

eye direction (Velichkovsky 1995) We have explored atechnique that involves rendering each personrsquos viewfrustrum to indicate the field of view as Figure 3 illus-trates In this example the wire frame frustrums spot-light the 3D space visible to each person This featurecolor coded for each person may be especially helpfulto teachers in a distance learning CVE who could usesuch information to see where students are focusingtheir visual attention without having to look directly atthe studentsrsquo eyes

There are a number of similar tools (ie specific ob-jects rendered only to particular interactants) that canassist interactants in a CVE For example in ourNZSMG studies an experimenter enters a CVE and at-tempts to persuade other interactants regarding a cer-tain topic (Beall et al 2003) In those interactions werender the interactantsrsquo names over their heads on float-ing billboards for the experimenter to read In this man-ner the experimenter can refer to people by name moreeasily There are many other ways to use these floatingbillboards to assist interactants for example reminders

Figure 3 View frustrums marking the field of view of interactants

Bailenson et al 435

about the interactantrsquos preferences or personality (egldquodoesnrsquot respond well to prolonged mutual gazerdquo)

One of the most useful forms of transforming sensorycapabilities may be to enlist one or more human con-sultants who are rendered to only one member in aCVE (ie virtual ghosts) Unlike a face-to-face interac-tion a CVE will enable an interactant to have informedhuman consultants who are free to wander around thevirtual meeting space to scrutinize the actions of otherinteractants to conduct online research and sidebarmeetings in order to provide key interactants with addi-tional information and to generally provide support forthe interactants For example the leader can have herresearch team actually rendered beside her in the CVEMembers of her team can point out actions by potentialmembers suggest new strategies and even provide real-time criticism and feedback concerning the behavior ofthe leader without any of the other members havingeven a hint of awareness concerning the human consul-tantsrsquo presence Alternatively the leader herself can gointo ldquoghost moderdquo and explore the virtual world withher team while her avatar remains seated and is evencontrolled by yet another member of her team

33 Transforming the Situation

In addition to transforming their representationand sensory capabilities CVE interactants can also usealgorithms to transform their general spatial or temporalsituations In a CVE people generally adopt a spatiallycoherent situational context across all remote interac-tants that brings everyone together in the shared spaceHowever there is no reason that the details and ar-rangements of that virtual space need to be constant forall the interactants in the CVE Consider the situationfor three interactants Interactant A may choose to forman isosceles triangle with the other two while both in-teractants B and C may choose to form equilateral trian-gles Interactant A may even choose to flip the locationsof B and C In this scenario the CVE operating systemcan preserve the intended eye gaze direction by trans-forming the amplitudes or direction of head and eyemovements in a prescribed manner While this is asomewhat simple example with as many as four interac-tants it is straightforward to design spatial transforma-

tions that allow the intended eye and head gaze cues toremain intact across all interactants While eventuallysuch discordance may cause the quality and smoothnessof the interaction to suffer there are a number of waysthat transforming the situation can assist individual in-teractants

One such transformation involves multilateral per-spectives In a normal conversation each interactant hasa unique and privileged perspective That perspective isa combination of her sensory input (eg visual andacoustic fields of view) and internal beliefs about theinteraction In normal face-to-face interactions peoplecontinually use sensory input to update and adjust theirinternal beliefs (Kendon 1977) Interactants in a CVEwill possess a completely new mechanism to adjust andupdate internal beliefs A personrsquos viewpoint can bemultilateral as opposed to unilateral (normal) Inother words in a real-time conversation interactant Acan take the viewpoint of interactant B and perceiveherself as she performs various verbal and nonverbal ges-tures during the interaction In this manner she canacquire invaluable sensory information pertaining to theinteraction and update her internal beliefs concerningthe interaction in ways not possible without the CVE

Consequently interactants in educational and persua-sive interactions may be able to improve performancebecause seeing oneself through the eyes of another mayallow one to develop a more informed set of internalbeliefs about others (Baumeister 1998) Furthermoreit may be the case that being able to experience an inter-action through someone elsersquos eyes should reinforce thefact that one is indeed copresent in the CVE (egDurlach amp Slater 2000) Finally utilizing mulitlateralperspectives may assist students in distance learningCVEs in terms of training transfer effects (Rickel ampJohnson 2000) that might occur after an interactantwho has been trained in multilateral perspective takingperforms similar group tasks in nonmediated situations

A second situational transformation involves partiallyrecording the interaction and adjusting temporal prop-erties or sequences in real time Similar to commercialproducts sold for digitally recording and playing backbroadcast television interactants in a CVE should beable to accelerate and decelerate the perceived flow oftime during the mediated interaction Consider the fol-

436 PRESENCE VOLUME 13 NUMBER 4

lowing situation The student in a distance learningCVE does not understand an example that the instruc-tor provides The student can ldquorewindrdquo the recordedinteraction go back to the beginning of the confusingexample and then play back the example Once the stu-dent has understood the confusing example she canthen turn up the rate of playback (eg watch the se-quence at 2X speed) and eventually she can catch upto the instructor again By slowing down the renderedflow of time or speeding it up the interactant can focusdifferentially on particular topics and can review thesame scene from different points of view without miss-ing the remainder of the interaction Of course doingso will result in costs to that interactantrsquos contributionto the CVE in terms of interactivity (ie what does heravatar do while she rewinds) Consequently the disrup-tion of the temporal sequence will necessarily be cou-pled with some kind of an avatar autopilot

Changing the rate of time in a CVE brings up an-other interesting transformation Traditionally CVEsare roughly defined as ldquogeographically separated interac-tantsrdquo interacting over some kind of a computer-mediated network in a shared environment Howeverby combining some of the concepts discussed in previ-ous sections it may be possible to include in the defini-tion of a CVE ldquotemporally separated interactantsrdquo in ashared environment Consider a videoconference of abusiness meeting Oftentimes interested parties whocannot attend the meetings will later review a videotapeof the meeting In a CVE the temporally absent mem-ber has an option to more deeply involve herself in theinteraction Specifically she can situate her avatar in aspecific place in the CVEs seating arrangement and usean autopilot to give her representation rudimentarynonverbal behaviors Furthermore the absent membercan program her avatar to perform simple interactivetasksmdashprerecorded introductions answers to certainquestions about the CVE topic or perhaps more realis-tically for the near-term direct the avatar to play back arecorded performance Then the CVE interaction canproceed in real time with the temporally absent mem-berrsquos avatar approximating the types of behaviors thatshe would do and say As a result temporally presentmembers would actually direct pieces of the conversa-tion towards the absent member as well as transmit

nonverbal gestures towards her Later on instead of justreviewing the recording the temporally absent membercan take her place in the CVE and actually feel presentin the dialogue receiving appropriate nonverbal behav-iors and maximizing the degree of copresence More-over the members of the CVE who were present at thescheduled time can program their avatars during thereplay of the interaction to respond to any post hocquestions that the absent member might have In thisway the degree of interactivity during the replay can beincreased and perhaps at some point in the not-too-distant future the line between real-time and non-real-time interactions will become interestingly blurred

4 Implications of TSI and ResearchDirections

For better or for worse TSI implemented throughCVEs has great potential to change the nature of medi-ated interaction The strategic decoupling of renderedbehavior from actual behavior allows interactants tobreak many constraints that are inherent in face-to-faceinteraction as well as other forms of mediated interac-tions such as telephone and videophone conferencesThe effects of TSI remain to be seen Assuming thatimplementation of the TSI techniques are technicallyfeasible and that using TSI implementations is concep-tually workable for the interactants (both of which aresubstantial assumptions) one could predict a number ofconsequences First TSI may develop into a worthwhiletool that assists interactants in overcoming the inade-quacies of communicating from remote locations Byaugmenting their representational sensory and situa-tional characteristics interactants of CVEs may be ableto achieve levels of interaction that actually surpass face-to-face interaction

On the other hand people in fact may find the useof these transformations extremely unsettling Thereis the potential for the difference between TSI andcurrent CVE implementations to be as drastic as dif-ferences between email and the written letter As thistechnology is developed it is essential to examinepeoplersquos responses to this new medium (ie Reevesamp Nass 1996) It is essential to examine these impor-

Bailenson et al 437

tant potential implications of TSI before the technol-ogy becomes widespread

Along the same lines the threat of TSI may be thevery downfall of CVE interaction In face-to-face inter-action there tends to be some degree of deception forexample people using facial expressions to mask theiremotions Clearly this deception has the potential to bemuch greater with TSI If interactants have no faith thattheir perceptual experience is genuine they may havelittle reason to ever enter a CVE A complete lack oftrust in the truthfulness of gestures one-to-one corre-spondence of avatars and temporal presence of interac-tants has the potential to rob the CVE of one of itsgreatest strengths namely interactivity since the inter-actants may not know who what or when they are in-teracting with others Similarly given an expectation ofTSI interactants may be constantly suspicious duringinteractions this lack of trust of fellow interactants maylead to unproductive collaborations

A solution to this breakdown may require the devel-opment of TSI detectors for interactants either basedon computer algorithms that analyze nonverbal behav-iors or based on actual humans that scrutinize the inter-action To examine the possibility of using human TSIdetectors we now discuss what we call the non-verbalTuring Test (NVTT)

In the popular reinterpretation of the Turing Test(Turing 1950) a judge attempts to determine whichof two players (one human one machine) is a fellowhuman In our NVTT pilot studies experimental par-ticipants acting as judges enter a CVE with two vir-tual humans one avatar whose head movements areveridical and playing back the movements of anotherhuman in real time (ie without TSI) and one avatarwhose head movements are actually a transformationof the judgersquos own head movements (ie time-delayed and reduced motion range) The judge seesthe head movements from a real person on one avatarand some sort of self-mimicked movements on theother During the interaction only head movementsare permitted (ie no verbal communication al-lowed) and participants must devise ways to engageand test the two virtual humans through nonverbalmeans in order to ascertain which is human andwhich is a mimicker Figure 4 illustrates

In the current initial pilot study we manipulatedthree independent variables test trial length (either 16or 32 seconds) mimic delay (ie the computer-agentmimics either 1 2 4 or 8 seconds after the judgersquosmovements) and range of motion (high pitch yaw androll or low yaw only)

Participants in this study wore head-mounted displayswhile a render computer tracked their head orientationtracking rendering and networking latencies were alllow enough to impart a compelling sense of copresence(see Bailenson et al 2002 for detailed descriptions ofthe hardware and software used) Participants were in-structed to sit in a virtual room with two virtual peoplea human agent (ie a representation whose movementsare controlled by a real person in another room) and acomputer agent (ie a computer program that is de-signed to mimic the userrsquos movements in some way)Participants were instructed to interact with the twoother virtual people using head movements in order todetermine which one is the human agent Participantswere run in groups of two with each one acting as thehuman agent for the other Each participant sat at a vir-tual conference table with two virtual humans (similarto that shown in Figure 3) Each participant received arandom order of 32 test trials (two instances of the 16conditions resulting from the crossing of the three inde-

Figure 4 A schematic of the NVTT The human judge is forced to

determine which of the two avatars exhibiting head movements is the

real human and which is the computer-generated human mimicking

the judgersquos own movements

438 PRESENCE VOLUME 13 NUMBER 4

pendent variables) Forty-one undergraduates partici-pated in this study

For the purposes of brevity we focus on two resultsin particular First despite the fact that we explicitly toldparticipants that the computer agent was directly mim-icking them they performed surprisingly poorly whenattempting to identify the human avatar The overallaverage score was only 66 correct (SD 10chance 50 maximum score 100) Moreover ofthe 41 participants in the study more than one fourthwas not reliably different from chance (ie less than 3SEM from 50 between 44 and 56) Second as Fig-ure 5 demonstrates participantsrsquo scores diminished in-versely with the magnitude of the mimic delay in thatthere was a linear trend in the logarithm of the delayvariable F(132) 885 p 01 When the delay wasgreater than 1 second participants had more difficultyidentifying a mimicker

These data are particularly striking in that we had ini-tially predicted that participants would be able to recog-nize their own head movements much more easily whenaware that one avatar was designed to mimic them Inthis study we explicitly told participants about the useof TSI and they still had great difficulty in detecting itThe effects of implicit TSI (that is not disclosed) canonly have a higher impact While this pilot study is ex-tremely simple and only scratches the surface of a para-digm that examines TSI it is still noteworthy that par-ticipants did not detect the mimicker across the board

We are currently exploring other factors underlyingthe discrimination of human nonverbal behavior from

computer-generated behaviors In future studies wewill use NVTTs to study other nonverbal behaviorssuch as facial gestures eye-head gestures (pointingindications by either system) hand gestures and in-terpersonal distance We have shown that in albeitsimple scenarios it is possible to pass the NVTT for apercentage of our test population using TSI We areconfident that as this percentage grows in the nearfuture important scientific and sociological discover-ies will surface along the way

In conclusion there are many reasons one mightwant to avoid TSI these reasons range from Orwellianconcerns to the fear of rendering CVEs (perhaps eventhe telephone) functionally useless We are not advo-cates of TSI as a means to replace normal communica-tion nor are we staunch believers in avoiding TSI inorder to preserve the natural order of communicationand conversation However we do acknowledge thefact that as CVEs become more prevalent the strategicdecoupling of representation from behavior is inevita-ble For that reason alone the notion of TSI warrantsconsiderable attention

Acknowledgments

The authors would like to thank Robin Gilmour and Christo-pher Rex for helpful suggestions Furthermore we thankChristopher Rex and Ryan Jaeger for assistance in collectingdata This research was sponsored in part by NSF Award SBE-9873432 and in part by NSF ITR Award IIS 0205740

References

Argyle M (1988) Bodily communication (2nd ed) LondonMethuen

Bailenson J N Beall A C amp Blascovich J (2002) Mutualgaze and task performance in shared virtual environmentsJournal of Visualization and Computer Animation 13 1ndash8

Bailenson J N Beall A C Blascovich J Raimmundo Ramp Weisbuch M (2001) Intelligent agents who wear yourface Usersrsquo reactions to the virtual self Lecture Notes inArtificial Intelligence 2190 86ndash99

Bailenson J N Blascovich J Beall A C amp Guadagno

Figure 5 Percent correct by mimic delay in seconds This data

excludes subjects at chance performance

Bailenson et al 439

R E (submitted) Self representations in immersive virtualenvironments

Baumeister R F (1998) The self In D T Gilbert S TFiske amp G Lindzey (Eds) Handbook of social psychology(4th ed pp 680ndash740) New York McGraw-Hill

Beall A C Bailenson J N Loomis J Blascovich J ampRex C (2003) Non-zero-sum mutual gaze in immersivevirtual environments Proceedings of HCI International2003

Benford S Bowers J Fahlen L Greenhalgh C amp Snow-don D (1995) User embodiment in collaborative virtualenvironments Proceedings of CHIrsquo95 (pp 242ndash249) ACMPress

Biocca F (1997) The cyborgrsquos dilemma Progressive em-bodiment in virtual environments Journal of Computer-Mediated Communication [online] 3 Retrieved fromhttpwwwascuscorgjcmcvol3issue2biocca2html

Black M amp Yacoob Y (1997) Recognizing facial expres-sions in image sequences using local parameterized modelsof image motion International Journal of Computer Vision25(1) 23ndash48

Blanz V amp Vetter T (1999) A morphable model for thesynthesis of 3D faces SIGGRAPH rsquo99 Conference Proceed-ings 187ndash194

Blascovich J Loomis J Beall A C Swinth K R HoytC amp Bailenson J N (2002) Immersive virtual environ-ment technology as a methodological tool for social psy-chology Psychological Inquiry 13 103ndash124

Busey T A (1988) Physical and psychological representa-tions of faces Evidence from morphing Psychological Sci-ence 9 476ndash483

Byrne D (1971) The attraction paradigm New York Aca-demic Press

Cassell J (2000) Nudge nudge wink wink Elements of face-to-face conversation for embodied conversational agents InJ Cassell et al (Eds) Embodied conversational agentsCambridge MA MIT Press

Chaiken S (1979) Communicator physical attractiveness andpersuasion Journal of Personality and Social Psychology 371387ndash1397

Chartrand T L amp Bargh J (1999) The chameleon effectThe perception-behavior link and social interaction Journalof Personality amp Social Psychology 76(6) 893ndash910

Decarlo D Metaxas D amp Stone M (1998) An anthropo-metric face model using variational techniques Proceedingsof SIGGRAPH rsquo98 67ndash74

Depaulo B M amp Friedman H S (1998) Nonverbal com-munication In D T Gilbert S T Fiske amp G Lindzey

(Eds) The handbook of social psychology (4th ed Vol 2 pp3ndash40) Boston McGraw-Hill

Donato G Bartlett M S Hager J C Ekman P amp Se-jnowski T J (1999) Classifying facial actions IEEE Trans-actions on Pattern Analysis and Machine Intelligence21(10) 974ndash989

Durlach N amp Slater M (2000) Presence in shared virtualenvironments and virtual togetherness Presence Teleopera-tors and Virtual Environments 9 214ndash217

Ekman P (1978) Facial signs Facts fantasies and possibili-ties In T Sebeok (Ed) Sight sound and sense Blooming-ton IN Indiana University Press

Fodor J A (1983) The modularity of mind An essay on fac-ulty psychology Cambridge MA MIT Press

Fry R amp Smith G F (1975) The effects of feedback andeye contact on performance of a digit-encoding task Jour-nal of Social Psychology 96 145ndash146

Gale C amp Monk A F (2002) A look is worth a thousandwords Full gaze awareness in video-mediated conversationDiscourse Processes 33

Garau M Slater M Vinayagamoorhty V Brogni ASteed A amp Sasse M A (2003) The impact of avatar real-ism and eye gaze control on perceived quality of communi-cation in a shared immersive virtual environment Proceed-ings of the SIGCHI Conference on Human Factors inComputing Systems

Gibson W (1984) Neuromancer New York Ace BooksHu C Ferris R amp Turk M (2003) Active wavelet net-

works for face alignment Proceedings of the British MachineVision Conference Norwich UK

Kendon A (1977) Studies in the behavior of social interactionBloomington IN Indiana University

Kleinke C L (1986) Gaze and eye contact A research re-view Psychological Bulletin 100 78ndash100

Kraut R E Fussell S R Brennan S E amp Siegel J(2002) Understanding effects of proximity on collabora-tion Implications for technologies to support remote col-laborative work In P Hinds amp S Kiesler (Eds) Distrib-uted work Cambridge MA MIT Press

Lanier J (2001) Virtually there Scientific American April2001

Leigh J DeFanti T Johnson A Brown M Sandin D(1997) Global telemersion Better than being there Pro-ceedings of ICAT rsquo97

Loomis J M Blascovich J J amp Beall A C (1999) Im-mersive virtual environments as a basic research tool in psy-chology Behavior Research Methods Instruments and Com-puters 31(4) 557ndash564

440 PRESENCE VOLUME 13 NUMBER 4

Mania K amp Chalmers A (1998) Proceedings of theFourth International Conference on Virtual Systems andMultimedia (pp 177ndash182) Amsterdam IOS Press-Ohmsha

Milgram S (1992) The individual in a social world Essaysand experiments (2nd ed) New York McGraw-Hill

Morgan T Kriz R Howard T Dias Neves F amp Kelso J(2001) Extending the use of collaborative virtual environ-ments for instruction to Kndash12 schools Insight 1(1)

Normand V Babski C Benford S Bullock A Carion SChrysanthou Y et al (1999) The COVEN project Ex-ploring applicative technical and usage dimensions of col-laborative virtual environments Presence Teleoperators andVirtual Environments 8(2) 218ndash236

Patterson M L (1982) An arousal model of interpersonalintimacy Psychological Review 89 231ndash249

Pylyshyn Z W (1980) Computation and cognition Issuesin the foundations of cognitive science Behavioral amp BrainSciences 3 111ndash169

Reeves B amp Nass C (1996) The media equation Howpeople treat computers television and new media like realpeople and places New York Cambridge University Press

Rickel J amp Johnson W L (2000) Task-oriented collabora-tion with embodied agents in virtual worlds In J Cassell JSullivan S Prevost amp E Churchill (Eds) Embodied con-versational agents Cambridge MA MIT Press

Rutter D R (1984) Looking and seeing The role of visualcommunication in social interaction Suffolk UK JohnWiley amp Sons

Sannier G amp Thalmann M N (1998) A user friendly tex-ture-fitting methodology for virtual humans ComputerGraphics International rsquo97

Schwartz P Bricker L Campbell B Furness T InkpenK Matheson L et al (1998) Virtual playground Archi-tectures for a shared virtual world Proceedings of the ACMSymposium on Virtual Reality Software and Technology 199843ndash50

Sherwood J V (1987) Facilitative effects of gaze uponlearning Perceptual and Motor Skills 64 1275ndash1278

Simons H (1976) Persuasion Understanding practice andanalysis Reading MA Heath

Slater M Pertaub D amp Steed A (1999) Public speaking

in virtual reality Facing an audience of avatars IEEE Com-puter Graphics and Applications 19(2) 6ndash9

Slater A Sadagic M Usoh R amp Schroeder R (2000)Small group behavior in a virtual and real environment Acomparative study Presence Teleoperators and Virtual Envi-ronments 9(1) 37ndash51

Stiefelhagen R Yang J amp Waibel A (1997) Tracking eyesand monitoring eye gaze In M Turk amp Y Takabayashi(Eds) Proceedings of the Workshop on Perceptual User Inter-faces

Turing A (1950) Computing machinery and intelligenceMind 59 (236)

Turk M amp Kolsch M (in press) Perceptual interfaces InMedioni G amp Kang S B (Eds) Emerging topics in com-puter vision Boston Prentice Hall

Velichkovsky B M (1995) Communicating attention Gazeposition transfer in cooperative problem solving Pragmaticsand Cognition 3(2) 199ndash222

Vertegaal R (1999) The GAZE groupware system Mediat-ing joint attention in multiparty communication and collab-oration Proceedings of the CHI rsquo99 Conference on HumanFactors in Computing Systems The CHI is the Limit 294ndash301

Viola P amp Jones M (2001) Rapid object detection using aboosted cascade of simple features Proceedings of the IEEEConference on Computer Vision and Pattern Recognition

Wallace D F (1996) Infinite jest Boston Little BrownWilliams K Cheung K T amp Choi W (2000) Cyberostra-

cisms Effects of being ignored over the internet Journal ofPersonality and Social Psychology 79 748ndash762

Yee N (2002) Befriending ogres and wood elvesmdashUnder-standing relationship formation in MMORPGs Retrievedfrom httpwwwnickyeecomhubrelationshipshomehtml

Zajonc R B (1971) Brainwash Familiarity breeds comfortPsychology Today 3(9) 60ndash64

Zajonc R B Murphy S T amp Inglehart M (1989) Feel-ing and facial efference Implication of the vascular theoryof emotion Psychological Review 96 395ndash416

Zhang X amp Furnas G (2002) Social interactions in multi-scale CVEs Proceedings of the ACM Conference on Collabo-rative Virtual Environments 2002 (CVE 2002)

Bailenson et al 441

tionship a CVE interactant may use this principle to anadvantage Consider the situation in which a leader anda community member are negotiating via a CVE A par-ticularly devious leader can represent herself by incorpo-rating characteristics of the memberrsquos representation Bymaking herself appear more similar to the member theleader becomes substantially more persuasive (Chaiken1979 Simons 1976) Indeed a leader would be able toadjust the structural or textural similarity of her ownavatar idiosyncratically to the members in her audience

This similarity could be achieved in various mannersemploying any of a number of techniques to parametri-cally vary the similarity of computer-generated modelsvia 2D and 3D morphing techniques (Blanz amp Vetter1999 Busey 1998 Decarlo Metaxas amp Stone 1998)The leader could be represented as some kind of a hy-brid maintaining some percentage of her original facialstructure and texture but also incorporating percent-ages of the memberrsquos structure and texture Alterna-tively the leader could be represented completelyveridically to her facial structure but for a few framesper second could replace her own head with the head ofthe member Priming familiarity with limited exposureto human faces has proven to be effective with 2D im-ages (Zajonc 1971) Finally consider the situation inwhich the leader is interacting via CVE with two mem-bers The leader can be differentially represented toboth members simultaneously such that each membersees a different hybrid leader avatar incorporating as-pects of each member In other words the leader doesnot need a consistent representation across interactantsbecause the CVE operator is free to render differentleader avatars to each member

Incorporating the self-identity of other interactantscan also occur via behavioral characteristics Psychologi-cal research has demonstrated that when an experi-menter subtly mimics experimental participants (egleans in the same direction as they do crosses her legswhen the participants do) participants subsequentlyreport that they liked the experimenter more andsmoother conversation flowed (Chartrand amp Bargh1999) This ldquochameleon effectrdquo could be extremely ef-fective in CVEs The leader (or the system operator) canuse algorithms to detect motions of the other interac-tants at varying levels of detail and coordinate the ani-

mations of her avatar to be a blended combination ofher own and those of the others

Consider a CVE interaction consisting of a leader andtwo members In the course of this interaction patternsof nonverbal behaviors will emerge and statistics basedon a running tabulation can be automatically collectedvia CVE technology In other words if there is a certainrate of head nodding exhibited by person A and anotherrate exhibited by person B the leaderrsquos head can bemade to nod in a way consistent with the statistics (egan average or median) Alternatively the leaderrsquos avatarcan just mimic each interactant individually and renderthose particular movements only to each correspondinginteractant

The leader could also morph her representation withthat of an unrepresented party not present in the CVEbut who is previously known to possess qualities thatinspire certain reactions Depending on the context forexample the leader can morph a percentage of famouspoliticians historical figures or even pop stars into heravatar This feature blending can be explicit and blatant(eg the leader looks just like an expert or a religiousfigure) or more implicit and subterranean (eg theleader incorporates subtle features such as cheekbonesand hairstyle) Alternatively the leader can morph her-self with a person who may not be famous but withwhom the member maintains a deep trust (Gibson1984)

A second form of avatar transformation arises fromthe ability to selectively decouple and reconstruct ren-dered behavior in CVEs In other words not only caninteractants render nonverbal behaviors different fromthe nonverbal behaviors that they actually perform butsimilarly to the discussion above they can render thosebehaviors idiosyncratically for each of the other interac-tants

Consider what we term Non-Zero-Sum-Mutual-Gaze(NZSMG) Ordinary mutual gaze occurs when individ-uals look at one anotherrsquos eyes during discourse Inface-to-face conversation mutual gaze is zero-sum Inother words if interactant A maintains eye contact withinteractant B for 70 percent of the time it is not possi-ble for A to maintain eye contact with interactant C formore than 30 percent of the time However interactionin CVEs is not bound by this constraint With digital

432 PRESENCE VOLUME 13 NUMBER 4

avatars A can be made to appear to maintain mutualgaze with both B and C for a majority of the conversa-tion

Gaze is one of the most thoroughly studied nonverbalgestures in research on social interaction (Rutter 1984Kleinke 1986 Kendon 1977) Direct eye gaze canprovide cues for intimacy agreement and interest(Arygle 1988) Furthermore gaze can enhance learningduring instruction as well as memory for information(Fry amp Smith 1975 Sherwood 1987) The advantageof using CVEs is that normal nonverbal behaviors ofinteractants can be augmented via NZSMG Further-more the interactants in a CVE can either be unawareof this transformation (ie implicit NZSMG) or awareof this transformation (ie explicit NZSMG) as Figure1 demonstrates Preliminary work studying implicitNZSMG has demonstrated that interactants are notaware of the decoupling from actual behavior Further-more the interactants respond to the artificial gaze as ifit were actual gaze (Beall Bailenson Loomis Blasco-vich amp Rex 2003) This method may prove to be mosteffective during distance learning in educational CVEs(Morgan Kriz Howard das Neves amp Kelso 2001) inwhich the instructor uses her augmented gaze as a toolto keep the students more engaged

Decoupling can also be used to achieve the oppositeeffect Consider the situation where the leader wants toscrutinize the nonverbal behaviors of member A butdoes not want the member to feel uncomfortable fromher unwavering gaze The leader can render herselflooking at her shoes or perhaps at member B in the

CVE while in reality she is watching member Arsquos everymove

In order for such a system to be effective there mustbe a convincing algorithm to drive the autonomous eyegaze In other words if the leader wants the freedom toemploy NZSMG or to wander around the CVE scruti-nizing different aspects of the conversation undetectedshe (by her own device or assisted by the systems opera-tor) must maintain the illusion that her avatar is exhibit-ing the typical and appropriate nonverbal gesturesThere are a number of ways to achieve this The first issome type of artificial intelligence algorithm that ap-proximates appropriate gestures of the leaderrsquos avatar bymonitoring the gestures and speech by the other inter-actants While there have been significant advances inthis regard (Cassell 2000) the ability of an algorithmto process natural language as well as generate believ-able responses may still be many years off A more likelymethod for achieving this goal would be to use actualhumans instead of AI algorithms In this scenario theleader employs one or more nonverbal ldquocyranoidsrdquo(Milgram 1992) to augment the nonverbal behaviorspresented to each individual member To do so theleader solicits the help of several assistants each of whosejob is to provide the nonverbal behaviors targeted to-ward a particular member See Figure 2

In this many-to-many ldquoWizard of Ozrdquo implementa-tion each member is presented a unified Leader who isrendered privately to her this private representationwould be a melding of the actual leader and one of herassistants so that when the leaderrsquos attention was di-

Figure 1 Internal belief states from implicit NZMG (left) and explicit NZMG (right)

Bailenson et al 433

verted away from that member for long periods of timethe assistant could step in and help maintain a believableinteraction by seamlessly serving as the leaderrsquos proxyThe leader herself would then act as a conductor over-seeing all the interactions yet being free to focus herattention on individual members when she so desires Inaddition the leader is free to wander about the digitalspace consult her notes take a rest or conduct a side-bar meeting with another person However because heravatar is partially cyranic it can continue to exhibit theappropriate nonverbal behaviors all the while to eachmember Furthermore having a number of assistantswhose sole focus is to respond with appropriate nonver-bal gestures to each of the interactants in the CVEshould maximize the membersrsquo involvement or sense ofpresence in the CVE For important meetings seminarsor presentations conducted via CVEs individual interac-tants may want to utilize a number of assistants as a corepresentation team

32 Transforming Sensory Capabilities

Interactants can be assisted by technology thattakes advantage of CVEs that can keep precise runningtabs of certain types of behaviors and then display sum-

maries of those behaviors exclusively to individual inter-actants For example consider an educational CVE inwhich an instructor wants to ensure that she is directingher nonverbal behaviors in a desired fashion Such asinstructor may want to monitor her mutual gaze to en-sure that she is not looking at any one student morethan others during a presentation The tracking equip-ment used to render the scene can keep an online totalof the amount of time the instructor gazed at each indi-vidual student The CVE can render a display of thisgaze meter as well as use visual or auditory alerts toinform the instructor of disproportionate applications ofgaze

Furthermore interactants can use the tracking datasummaries to learn more about the attitudes of the oth-ers Nonverbal gestures are often correlates of specificmental states (Ekman 1978 Zajonc Murphy amp Ingle-hart 1989) For example in general we nod when weagree smile when we are pleased tilt our heads whenwe are confused and look at something in which we areinterested Interactants will be able to tailor their CVEsystems to keep track of nonverbal behaviors with thegoal of aiding interactants to infer the mental states ofthe other interactants For example a teacher will beable to gauge the percentage of students exhibitingnonverbal behaviors that suggest confusion or not un-derstanding a point in a lesson Similarly a leader coulddetermine who in a room full of members is respondingmost positively to her behavior Intuitively tabulatingand assessing the nonverbal behaviors of others is cer-tainly something that humans do constantly in face-to-face interactions With CVEs interactants will be able totabulate these behaviors with greater precision Interac-tants can use the objective tabulations from the trackingdata to augment their normal intuitions about the ges-tures occurring in the interaction

Another transformation involves filtering or degrad-ing certain signals or nonverbal behaviors There aresome visual nonverbal behaviors that tend to distractinteractants Using filtering algorithms interactants canprevent counterproductive distractions in a number ofways For example consider the situation in which aspeaker in a CVE taps her pen rapidly as she speaks Inface-to-face meetings this type of behavior can distract

Figure 2 A depiction of cyranoids On the top row are three

nonrendered gesturers Each member on the bottom hears the

leaderrsquos actual verbal behaviors (dashed lines) However each

member views the nonverbal behavior of her dedicated gesturer

rendered onto the avatar of the leader (unbroken lines)

434 PRESENCE VOLUME 13 NUMBER 4

interactants Using a CVE this type of behavior can befiltered in two ways First the speaker can filter the be-havior on the transmitting end If people know thatthey have difficulty suppressing certain nonverbal behav-iors that tend to be perceived in a negative mannersuch as a nervous tick they can activate a filter that pre-vents the behavior from being rendered Similarly incertain situations a CVE interactant may not want torender certain nonverbal behaviors Consider the leaderexample The potential member may benefit from ren-dering her ldquopoker facerdquo that is not demonstrating anyenthusiasm or disappointment via facial expressionsConsequently the member may accrue strategic advan-tage during a negotiation Furthermore interactants canfilter behaviors on the receiving end If a speakerrsquos handmotions are distracting then a listener can simplychoose to not render that interactantrsquos hand move-ments

Another example of transforming sensory capabilitiesis producing a visual indicator regarding where eachinteractantrsquos attention currently lies as revealed by their

eye direction (Velichkovsky 1995) We have explored atechnique that involves rendering each personrsquos viewfrustrum to indicate the field of view as Figure 3 illus-trates In this example the wire frame frustrums spot-light the 3D space visible to each person This featurecolor coded for each person may be especially helpfulto teachers in a distance learning CVE who could usesuch information to see where students are focusingtheir visual attention without having to look directly atthe studentsrsquo eyes

There are a number of similar tools (ie specific ob-jects rendered only to particular interactants) that canassist interactants in a CVE For example in ourNZSMG studies an experimenter enters a CVE and at-tempts to persuade other interactants regarding a cer-tain topic (Beall et al 2003) In those interactions werender the interactantsrsquo names over their heads on float-ing billboards for the experimenter to read In this man-ner the experimenter can refer to people by name moreeasily There are many other ways to use these floatingbillboards to assist interactants for example reminders

Figure 3 View frustrums marking the field of view of interactants

Bailenson et al 435

about the interactantrsquos preferences or personality (egldquodoesnrsquot respond well to prolonged mutual gazerdquo)

One of the most useful forms of transforming sensorycapabilities may be to enlist one or more human con-sultants who are rendered to only one member in aCVE (ie virtual ghosts) Unlike a face-to-face interac-tion a CVE will enable an interactant to have informedhuman consultants who are free to wander around thevirtual meeting space to scrutinize the actions of otherinteractants to conduct online research and sidebarmeetings in order to provide key interactants with addi-tional information and to generally provide support forthe interactants For example the leader can have herresearch team actually rendered beside her in the CVEMembers of her team can point out actions by potentialmembers suggest new strategies and even provide real-time criticism and feedback concerning the behavior ofthe leader without any of the other members havingeven a hint of awareness concerning the human consul-tantsrsquo presence Alternatively the leader herself can gointo ldquoghost moderdquo and explore the virtual world withher team while her avatar remains seated and is evencontrolled by yet another member of her team

33 Transforming the Situation

In addition to transforming their representationand sensory capabilities CVE interactants can also usealgorithms to transform their general spatial or temporalsituations In a CVE people generally adopt a spatiallycoherent situational context across all remote interac-tants that brings everyone together in the shared spaceHowever there is no reason that the details and ar-rangements of that virtual space need to be constant forall the interactants in the CVE Consider the situationfor three interactants Interactant A may choose to forman isosceles triangle with the other two while both in-teractants B and C may choose to form equilateral trian-gles Interactant A may even choose to flip the locationsof B and C In this scenario the CVE operating systemcan preserve the intended eye gaze direction by trans-forming the amplitudes or direction of head and eyemovements in a prescribed manner While this is asomewhat simple example with as many as four interac-tants it is straightforward to design spatial transforma-

tions that allow the intended eye and head gaze cues toremain intact across all interactants While eventuallysuch discordance may cause the quality and smoothnessof the interaction to suffer there are a number of waysthat transforming the situation can assist individual in-teractants

One such transformation involves multilateral per-spectives In a normal conversation each interactant hasa unique and privileged perspective That perspective isa combination of her sensory input (eg visual andacoustic fields of view) and internal beliefs about theinteraction In normal face-to-face interactions peoplecontinually use sensory input to update and adjust theirinternal beliefs (Kendon 1977) Interactants in a CVEwill possess a completely new mechanism to adjust andupdate internal beliefs A personrsquos viewpoint can bemultilateral as opposed to unilateral (normal) Inother words in a real-time conversation interactant Acan take the viewpoint of interactant B and perceiveherself as she performs various verbal and nonverbal ges-tures during the interaction In this manner she canacquire invaluable sensory information pertaining to theinteraction and update her internal beliefs concerningthe interaction in ways not possible without the CVE

Consequently interactants in educational and persua-sive interactions may be able to improve performancebecause seeing oneself through the eyes of another mayallow one to develop a more informed set of internalbeliefs about others (Baumeister 1998) Furthermoreit may be the case that being able to experience an inter-action through someone elsersquos eyes should reinforce thefact that one is indeed copresent in the CVE (egDurlach amp Slater 2000) Finally utilizing mulitlateralperspectives may assist students in distance learningCVEs in terms of training transfer effects (Rickel ampJohnson 2000) that might occur after an interactantwho has been trained in multilateral perspective takingperforms similar group tasks in nonmediated situations

A second situational transformation involves partiallyrecording the interaction and adjusting temporal prop-erties or sequences in real time Similar to commercialproducts sold for digitally recording and playing backbroadcast television interactants in a CVE should beable to accelerate and decelerate the perceived flow oftime during the mediated interaction Consider the fol-

436 PRESENCE VOLUME 13 NUMBER 4

lowing situation The student in a distance learningCVE does not understand an example that the instruc-tor provides The student can ldquorewindrdquo the recordedinteraction go back to the beginning of the confusingexample and then play back the example Once the stu-dent has understood the confusing example she canthen turn up the rate of playback (eg watch the se-quence at 2X speed) and eventually she can catch upto the instructor again By slowing down the renderedflow of time or speeding it up the interactant can focusdifferentially on particular topics and can review thesame scene from different points of view without miss-ing the remainder of the interaction Of course doingso will result in costs to that interactantrsquos contributionto the CVE in terms of interactivity (ie what does heravatar do while she rewinds) Consequently the disrup-tion of the temporal sequence will necessarily be cou-pled with some kind of an avatar autopilot

Changing the rate of time in a CVE brings up an-other interesting transformation Traditionally CVEsare roughly defined as ldquogeographically separated interac-tantsrdquo interacting over some kind of a computer-mediated network in a shared environment Howeverby combining some of the concepts discussed in previ-ous sections it may be possible to include in the defini-tion of a CVE ldquotemporally separated interactantsrdquo in ashared environment Consider a videoconference of abusiness meeting Oftentimes interested parties whocannot attend the meetings will later review a videotapeof the meeting In a CVE the temporally absent mem-ber has an option to more deeply involve herself in theinteraction Specifically she can situate her avatar in aspecific place in the CVEs seating arrangement and usean autopilot to give her representation rudimentarynonverbal behaviors Furthermore the absent membercan program her avatar to perform simple interactivetasksmdashprerecorded introductions answers to certainquestions about the CVE topic or perhaps more realis-tically for the near-term direct the avatar to play back arecorded performance Then the CVE interaction canproceed in real time with the temporally absent mem-berrsquos avatar approximating the types of behaviors thatshe would do and say As a result temporally presentmembers would actually direct pieces of the conversa-tion towards the absent member as well as transmit

nonverbal gestures towards her Later on instead of justreviewing the recording the temporally absent membercan take her place in the CVE and actually feel presentin the dialogue receiving appropriate nonverbal behav-iors and maximizing the degree of copresence More-over the members of the CVE who were present at thescheduled time can program their avatars during thereplay of the interaction to respond to any post hocquestions that the absent member might have In thisway the degree of interactivity during the replay can beincreased and perhaps at some point in the not-too-distant future the line between real-time and non-real-time interactions will become interestingly blurred

4 Implications of TSI and ResearchDirections

For better or for worse TSI implemented throughCVEs has great potential to change the nature of medi-ated interaction The strategic decoupling of renderedbehavior from actual behavior allows interactants tobreak many constraints that are inherent in face-to-faceinteraction as well as other forms of mediated interac-tions such as telephone and videophone conferencesThe effects of TSI remain to be seen Assuming thatimplementation of the TSI techniques are technicallyfeasible and that using TSI implementations is concep-tually workable for the interactants (both of which aresubstantial assumptions) one could predict a number ofconsequences First TSI may develop into a worthwhiletool that assists interactants in overcoming the inade-quacies of communicating from remote locations Byaugmenting their representational sensory and situa-tional characteristics interactants of CVEs may be ableto achieve levels of interaction that actually surpass face-to-face interaction

On the other hand people in fact may find the useof these transformations extremely unsettling Thereis the potential for the difference between TSI andcurrent CVE implementations to be as drastic as dif-ferences between email and the written letter As thistechnology is developed it is essential to examinepeoplersquos responses to this new medium (ie Reevesamp Nass 1996) It is essential to examine these impor-

Bailenson et al 437

tant potential implications of TSI before the technol-ogy becomes widespread

Along the same lines the threat of TSI may be thevery downfall of CVE interaction In face-to-face inter-action there tends to be some degree of deception forexample people using facial expressions to mask theiremotions Clearly this deception has the potential to bemuch greater with TSI If interactants have no faith thattheir perceptual experience is genuine they may havelittle reason to ever enter a CVE A complete lack oftrust in the truthfulness of gestures one-to-one corre-spondence of avatars and temporal presence of interac-tants has the potential to rob the CVE of one of itsgreatest strengths namely interactivity since the inter-actants may not know who what or when they are in-teracting with others Similarly given an expectation ofTSI interactants may be constantly suspicious duringinteractions this lack of trust of fellow interactants maylead to unproductive collaborations

A solution to this breakdown may require the devel-opment of TSI detectors for interactants either basedon computer algorithms that analyze nonverbal behav-iors or based on actual humans that scrutinize the inter-action To examine the possibility of using human TSIdetectors we now discuss what we call the non-verbalTuring Test (NVTT)

In the popular reinterpretation of the Turing Test(Turing 1950) a judge attempts to determine whichof two players (one human one machine) is a fellowhuman In our NVTT pilot studies experimental par-ticipants acting as judges enter a CVE with two vir-tual humans one avatar whose head movements areveridical and playing back the movements of anotherhuman in real time (ie without TSI) and one avatarwhose head movements are actually a transformationof the judgersquos own head movements (ie time-delayed and reduced motion range) The judge seesthe head movements from a real person on one avatarand some sort of self-mimicked movements on theother During the interaction only head movementsare permitted (ie no verbal communication al-lowed) and participants must devise ways to engageand test the two virtual humans through nonverbalmeans in order to ascertain which is human andwhich is a mimicker Figure 4 illustrates

In the current initial pilot study we manipulatedthree independent variables test trial length (either 16or 32 seconds) mimic delay (ie the computer-agentmimics either 1 2 4 or 8 seconds after the judgersquosmovements) and range of motion (high pitch yaw androll or low yaw only)

Participants in this study wore head-mounted displayswhile a render computer tracked their head orientationtracking rendering and networking latencies were alllow enough to impart a compelling sense of copresence(see Bailenson et al 2002 for detailed descriptions ofthe hardware and software used) Participants were in-structed to sit in a virtual room with two virtual peoplea human agent (ie a representation whose movementsare controlled by a real person in another room) and acomputer agent (ie a computer program that is de-signed to mimic the userrsquos movements in some way)Participants were instructed to interact with the twoother virtual people using head movements in order todetermine which one is the human agent Participantswere run in groups of two with each one acting as thehuman agent for the other Each participant sat at a vir-tual conference table with two virtual humans (similarto that shown in Figure 3) Each participant received arandom order of 32 test trials (two instances of the 16conditions resulting from the crossing of the three inde-

Figure 4 A schematic of the NVTT The human judge is forced to

determine which of the two avatars exhibiting head movements is the

real human and which is the computer-generated human mimicking

the judgersquos own movements

438 PRESENCE VOLUME 13 NUMBER 4

pendent variables) Forty-one undergraduates partici-pated in this study

For the purposes of brevity we focus on two resultsin particular First despite the fact that we explicitly toldparticipants that the computer agent was directly mim-icking them they performed surprisingly poorly whenattempting to identify the human avatar The overallaverage score was only 66 correct (SD 10chance 50 maximum score 100) Moreover ofthe 41 participants in the study more than one fourthwas not reliably different from chance (ie less than 3SEM from 50 between 44 and 56) Second as Fig-ure 5 demonstrates participantsrsquo scores diminished in-versely with the magnitude of the mimic delay in thatthere was a linear trend in the logarithm of the delayvariable F(132) 885 p 01 When the delay wasgreater than 1 second participants had more difficultyidentifying a mimicker

These data are particularly striking in that we had ini-tially predicted that participants would be able to recog-nize their own head movements much more easily whenaware that one avatar was designed to mimic them Inthis study we explicitly told participants about the useof TSI and they still had great difficulty in detecting itThe effects of implicit TSI (that is not disclosed) canonly have a higher impact While this pilot study is ex-tremely simple and only scratches the surface of a para-digm that examines TSI it is still noteworthy that par-ticipants did not detect the mimicker across the board

We are currently exploring other factors underlyingthe discrimination of human nonverbal behavior from

computer-generated behaviors In future studies wewill use NVTTs to study other nonverbal behaviorssuch as facial gestures eye-head gestures (pointingindications by either system) hand gestures and in-terpersonal distance We have shown that in albeitsimple scenarios it is possible to pass the NVTT for apercentage of our test population using TSI We areconfident that as this percentage grows in the nearfuture important scientific and sociological discover-ies will surface along the way

In conclusion there are many reasons one mightwant to avoid TSI these reasons range from Orwellianconcerns to the fear of rendering CVEs (perhaps eventhe telephone) functionally useless We are not advo-cates of TSI as a means to replace normal communica-tion nor are we staunch believers in avoiding TSI inorder to preserve the natural order of communicationand conversation However we do acknowledge thefact that as CVEs become more prevalent the strategicdecoupling of representation from behavior is inevita-ble For that reason alone the notion of TSI warrantsconsiderable attention

Acknowledgments

The authors would like to thank Robin Gilmour and Christo-pher Rex for helpful suggestions Furthermore we thankChristopher Rex and Ryan Jaeger for assistance in collectingdata This research was sponsored in part by NSF Award SBE-9873432 and in part by NSF ITR Award IIS 0205740

References

Argyle M (1988) Bodily communication (2nd ed) LondonMethuen

Bailenson J N Beall A C amp Blascovich J (2002) Mutualgaze and task performance in shared virtual environmentsJournal of Visualization and Computer Animation 13 1ndash8

Bailenson J N Beall A C Blascovich J Raimmundo Ramp Weisbuch M (2001) Intelligent agents who wear yourface Usersrsquo reactions to the virtual self Lecture Notes inArtificial Intelligence 2190 86ndash99

Bailenson J N Blascovich J Beall A C amp Guadagno

Figure 5 Percent correct by mimic delay in seconds This data

excludes subjects at chance performance

Bailenson et al 439

R E (submitted) Self representations in immersive virtualenvironments

Baumeister R F (1998) The self In D T Gilbert S TFiske amp G Lindzey (Eds) Handbook of social psychology(4th ed pp 680ndash740) New York McGraw-Hill

Beall A C Bailenson J N Loomis J Blascovich J ampRex C (2003) Non-zero-sum mutual gaze in immersivevirtual environments Proceedings of HCI International2003

Benford S Bowers J Fahlen L Greenhalgh C amp Snow-don D (1995) User embodiment in collaborative virtualenvironments Proceedings of CHIrsquo95 (pp 242ndash249) ACMPress

Biocca F (1997) The cyborgrsquos dilemma Progressive em-bodiment in virtual environments Journal of Computer-Mediated Communication [online] 3 Retrieved fromhttpwwwascuscorgjcmcvol3issue2biocca2html

Black M amp Yacoob Y (1997) Recognizing facial expres-sions in image sequences using local parameterized modelsof image motion International Journal of Computer Vision25(1) 23ndash48

Blanz V amp Vetter T (1999) A morphable model for thesynthesis of 3D faces SIGGRAPH rsquo99 Conference Proceed-ings 187ndash194

Blascovich J Loomis J Beall A C Swinth K R HoytC amp Bailenson J N (2002) Immersive virtual environ-ment technology as a methodological tool for social psy-chology Psychological Inquiry 13 103ndash124

Busey T A (1988) Physical and psychological representa-tions of faces Evidence from morphing Psychological Sci-ence 9 476ndash483

Byrne D (1971) The attraction paradigm New York Aca-demic Press

Cassell J (2000) Nudge nudge wink wink Elements of face-to-face conversation for embodied conversational agents InJ Cassell et al (Eds) Embodied conversational agentsCambridge MA MIT Press

Chaiken S (1979) Communicator physical attractiveness andpersuasion Journal of Personality and Social Psychology 371387ndash1397

Chartrand T L amp Bargh J (1999) The chameleon effectThe perception-behavior link and social interaction Journalof Personality amp Social Psychology 76(6) 893ndash910

Decarlo D Metaxas D amp Stone M (1998) An anthropo-metric face model using variational techniques Proceedingsof SIGGRAPH rsquo98 67ndash74

Depaulo B M amp Friedman H S (1998) Nonverbal com-munication In D T Gilbert S T Fiske amp G Lindzey

(Eds) The handbook of social psychology (4th ed Vol 2 pp3ndash40) Boston McGraw-Hill

Donato G Bartlett M S Hager J C Ekman P amp Se-jnowski T J (1999) Classifying facial actions IEEE Trans-actions on Pattern Analysis and Machine Intelligence21(10) 974ndash989

Durlach N amp Slater M (2000) Presence in shared virtualenvironments and virtual togetherness Presence Teleopera-tors and Virtual Environments 9 214ndash217

Ekman P (1978) Facial signs Facts fantasies and possibili-ties In T Sebeok (Ed) Sight sound and sense Blooming-ton IN Indiana University Press

Fodor J A (1983) The modularity of mind An essay on fac-ulty psychology Cambridge MA MIT Press

Fry R amp Smith G F (1975) The effects of feedback andeye contact on performance of a digit-encoding task Jour-nal of Social Psychology 96 145ndash146

Gale C amp Monk A F (2002) A look is worth a thousandwords Full gaze awareness in video-mediated conversationDiscourse Processes 33

Garau M Slater M Vinayagamoorhty V Brogni ASteed A amp Sasse M A (2003) The impact of avatar real-ism and eye gaze control on perceived quality of communi-cation in a shared immersive virtual environment Proceed-ings of the SIGCHI Conference on Human Factors inComputing Systems

Gibson W (1984) Neuromancer New York Ace BooksHu C Ferris R amp Turk M (2003) Active wavelet net-

works for face alignment Proceedings of the British MachineVision Conference Norwich UK

Kendon A (1977) Studies in the behavior of social interactionBloomington IN Indiana University

Kleinke C L (1986) Gaze and eye contact A research re-view Psychological Bulletin 100 78ndash100

Kraut R E Fussell S R Brennan S E amp Siegel J(2002) Understanding effects of proximity on collabora-tion Implications for technologies to support remote col-laborative work In P Hinds amp S Kiesler (Eds) Distrib-uted work Cambridge MA MIT Press

Lanier J (2001) Virtually there Scientific American April2001

Leigh J DeFanti T Johnson A Brown M Sandin D(1997) Global telemersion Better than being there Pro-ceedings of ICAT rsquo97

Loomis J M Blascovich J J amp Beall A C (1999) Im-mersive virtual environments as a basic research tool in psy-chology Behavior Research Methods Instruments and Com-puters 31(4) 557ndash564

440 PRESENCE VOLUME 13 NUMBER 4

Mania K amp Chalmers A (1998) Proceedings of theFourth International Conference on Virtual Systems andMultimedia (pp 177ndash182) Amsterdam IOS Press-Ohmsha

Milgram S (1992) The individual in a social world Essaysand experiments (2nd ed) New York McGraw-Hill

Morgan T Kriz R Howard T Dias Neves F amp Kelso J(2001) Extending the use of collaborative virtual environ-ments for instruction to Kndash12 schools Insight 1(1)

Normand V Babski C Benford S Bullock A Carion SChrysanthou Y et al (1999) The COVEN project Ex-ploring applicative technical and usage dimensions of col-laborative virtual environments Presence Teleoperators andVirtual Environments 8(2) 218ndash236

Patterson M L (1982) An arousal model of interpersonalintimacy Psychological Review 89 231ndash249

Pylyshyn Z W (1980) Computation and cognition Issuesin the foundations of cognitive science Behavioral amp BrainSciences 3 111ndash169

Reeves B amp Nass C (1996) The media equation Howpeople treat computers television and new media like realpeople and places New York Cambridge University Press

Rickel J amp Johnson W L (2000) Task-oriented collabora-tion with embodied agents in virtual worlds In J Cassell JSullivan S Prevost amp E Churchill (Eds) Embodied con-versational agents Cambridge MA MIT Press

Rutter D R (1984) Looking and seeing The role of visualcommunication in social interaction Suffolk UK JohnWiley amp Sons

Sannier G amp Thalmann M N (1998) A user friendly tex-ture-fitting methodology for virtual humans ComputerGraphics International rsquo97

Schwartz P Bricker L Campbell B Furness T InkpenK Matheson L et al (1998) Virtual playground Archi-tectures for a shared virtual world Proceedings of the ACMSymposium on Virtual Reality Software and Technology 199843ndash50

Sherwood J V (1987) Facilitative effects of gaze uponlearning Perceptual and Motor Skills 64 1275ndash1278

Simons H (1976) Persuasion Understanding practice andanalysis Reading MA Heath

Slater M Pertaub D amp Steed A (1999) Public speaking

in virtual reality Facing an audience of avatars IEEE Com-puter Graphics and Applications 19(2) 6ndash9

Slater A Sadagic M Usoh R amp Schroeder R (2000)Small group behavior in a virtual and real environment Acomparative study Presence Teleoperators and Virtual Envi-ronments 9(1) 37ndash51

Stiefelhagen R Yang J amp Waibel A (1997) Tracking eyesand monitoring eye gaze In M Turk amp Y Takabayashi(Eds) Proceedings of the Workshop on Perceptual User Inter-faces

Turing A (1950) Computing machinery and intelligenceMind 59 (236)

Turk M amp Kolsch M (in press) Perceptual interfaces InMedioni G amp Kang S B (Eds) Emerging topics in com-puter vision Boston Prentice Hall

Velichkovsky B M (1995) Communicating attention Gazeposition transfer in cooperative problem solving Pragmaticsand Cognition 3(2) 199ndash222

Vertegaal R (1999) The GAZE groupware system Mediat-ing joint attention in multiparty communication and collab-oration Proceedings of the CHI rsquo99 Conference on HumanFactors in Computing Systems The CHI is the Limit 294ndash301

Viola P amp Jones M (2001) Rapid object detection using aboosted cascade of simple features Proceedings of the IEEEConference on Computer Vision and Pattern Recognition

Wallace D F (1996) Infinite jest Boston Little BrownWilliams K Cheung K T amp Choi W (2000) Cyberostra-

cisms Effects of being ignored over the internet Journal ofPersonality and Social Psychology 79 748ndash762

Yee N (2002) Befriending ogres and wood elvesmdashUnder-standing relationship formation in MMORPGs Retrievedfrom httpwwwnickyeecomhubrelationshipshomehtml

Zajonc R B (1971) Brainwash Familiarity breeds comfortPsychology Today 3(9) 60ndash64

Zajonc R B Murphy S T amp Inglehart M (1989) Feel-ing and facial efference Implication of the vascular theoryof emotion Psychological Review 96 395ndash416

Zhang X amp Furnas G (2002) Social interactions in multi-scale CVEs Proceedings of the ACM Conference on Collabo-rative Virtual Environments 2002 (CVE 2002)

Bailenson et al 441

avatars A can be made to appear to maintain mutualgaze with both B and C for a majority of the conversa-tion

Gaze is one of the most thoroughly studied nonverbalgestures in research on social interaction (Rutter 1984Kleinke 1986 Kendon 1977) Direct eye gaze canprovide cues for intimacy agreement and interest(Arygle 1988) Furthermore gaze can enhance learningduring instruction as well as memory for information(Fry amp Smith 1975 Sherwood 1987) The advantageof using CVEs is that normal nonverbal behaviors ofinteractants can be augmented via NZSMG Further-more the interactants in a CVE can either be unawareof this transformation (ie implicit NZSMG) or awareof this transformation (ie explicit NZSMG) as Figure1 demonstrates Preliminary work studying implicitNZSMG has demonstrated that interactants are notaware of the decoupling from actual behavior Further-more the interactants respond to the artificial gaze as ifit were actual gaze (Beall Bailenson Loomis Blasco-vich amp Rex 2003) This method may prove to be mosteffective during distance learning in educational CVEs(Morgan Kriz Howard das Neves amp Kelso 2001) inwhich the instructor uses her augmented gaze as a toolto keep the students more engaged

Decoupling can also be used to achieve the oppositeeffect Consider the situation where the leader wants toscrutinize the nonverbal behaviors of member A butdoes not want the member to feel uncomfortable fromher unwavering gaze The leader can render herselflooking at her shoes or perhaps at member B in the

CVE while in reality she is watching member Arsquos everymove

In order for such a system to be effective there mustbe a convincing algorithm to drive the autonomous eyegaze In other words if the leader wants the freedom toemploy NZSMG or to wander around the CVE scruti-nizing different aspects of the conversation undetectedshe (by her own device or assisted by the systems opera-tor) must maintain the illusion that her avatar is exhibit-ing the typical and appropriate nonverbal gesturesThere are a number of ways to achieve this The first issome type of artificial intelligence algorithm that ap-proximates appropriate gestures of the leaderrsquos avatar bymonitoring the gestures and speech by the other inter-actants While there have been significant advances inthis regard (Cassell 2000) the ability of an algorithmto process natural language as well as generate believ-able responses may still be many years off A more likelymethod for achieving this goal would be to use actualhumans instead of AI algorithms In this scenario theleader employs one or more nonverbal ldquocyranoidsrdquo(Milgram 1992) to augment the nonverbal behaviorspresented to each individual member To do so theleader solicits the help of several assistants each of whosejob is to provide the nonverbal behaviors targeted to-ward a particular member See Figure 2

In this many-to-many ldquoWizard of Ozrdquo implementa-tion each member is presented a unified Leader who isrendered privately to her this private representationwould be a melding of the actual leader and one of herassistants so that when the leaderrsquos attention was di-

Figure 1 Internal belief states from implicit NZMG (left) and explicit NZMG (right)

Bailenson et al 433

verted away from that member for long periods of timethe assistant could step in and help maintain a believableinteraction by seamlessly serving as the leaderrsquos proxyThe leader herself would then act as a conductor over-seeing all the interactions yet being free to focus herattention on individual members when she so desires Inaddition the leader is free to wander about the digitalspace consult her notes take a rest or conduct a side-bar meeting with another person However because heravatar is partially cyranic it can continue to exhibit theappropriate nonverbal behaviors all the while to eachmember Furthermore having a number of assistantswhose sole focus is to respond with appropriate nonver-bal gestures to each of the interactants in the CVEshould maximize the membersrsquo involvement or sense ofpresence in the CVE For important meetings seminarsor presentations conducted via CVEs individual interac-tants may want to utilize a number of assistants as a corepresentation team

32 Transforming Sensory Capabilities

Interactants can be assisted by technology thattakes advantage of CVEs that can keep precise runningtabs of certain types of behaviors and then display sum-

maries of those behaviors exclusively to individual inter-actants For example consider an educational CVE inwhich an instructor wants to ensure that she is directingher nonverbal behaviors in a desired fashion Such asinstructor may want to monitor her mutual gaze to en-sure that she is not looking at any one student morethan others during a presentation The tracking equip-ment used to render the scene can keep an online totalof the amount of time the instructor gazed at each indi-vidual student The CVE can render a display of thisgaze meter as well as use visual or auditory alerts toinform the instructor of disproportionate applications ofgaze

Furthermore interactants can use the tracking datasummaries to learn more about the attitudes of the oth-ers Nonverbal gestures are often correlates of specificmental states (Ekman 1978 Zajonc Murphy amp Ingle-hart 1989) For example in general we nod when weagree smile when we are pleased tilt our heads whenwe are confused and look at something in which we areinterested Interactants will be able to tailor their CVEsystems to keep track of nonverbal behaviors with thegoal of aiding interactants to infer the mental states ofthe other interactants For example a teacher will beable to gauge the percentage of students exhibitingnonverbal behaviors that suggest confusion or not un-derstanding a point in a lesson Similarly a leader coulddetermine who in a room full of members is respondingmost positively to her behavior Intuitively tabulatingand assessing the nonverbal behaviors of others is cer-tainly something that humans do constantly in face-to-face interactions With CVEs interactants will be able totabulate these behaviors with greater precision Interac-tants can use the objective tabulations from the trackingdata to augment their normal intuitions about the ges-tures occurring in the interaction

Another transformation involves filtering or degrad-ing certain signals or nonverbal behaviors There aresome visual nonverbal behaviors that tend to distractinteractants Using filtering algorithms interactants canprevent counterproductive distractions in a number ofways For example consider the situation in which aspeaker in a CVE taps her pen rapidly as she speaks Inface-to-face meetings this type of behavior can distract

Figure 2 A depiction of cyranoids On the top row are three

nonrendered gesturers Each member on the bottom hears the

leaderrsquos actual verbal behaviors (dashed lines) However each

member views the nonverbal behavior of her dedicated gesturer

rendered onto the avatar of the leader (unbroken lines)

434 PRESENCE VOLUME 13 NUMBER 4

interactants Using a CVE this type of behavior can befiltered in two ways First the speaker can filter the be-havior on the transmitting end If people know thatthey have difficulty suppressing certain nonverbal behav-iors that tend to be perceived in a negative mannersuch as a nervous tick they can activate a filter that pre-vents the behavior from being rendered Similarly incertain situations a CVE interactant may not want torender certain nonverbal behaviors Consider the leaderexample The potential member may benefit from ren-dering her ldquopoker facerdquo that is not demonstrating anyenthusiasm or disappointment via facial expressionsConsequently the member may accrue strategic advan-tage during a negotiation Furthermore interactants canfilter behaviors on the receiving end If a speakerrsquos handmotions are distracting then a listener can simplychoose to not render that interactantrsquos hand move-ments

Another example of transforming sensory capabilitiesis producing a visual indicator regarding where eachinteractantrsquos attention currently lies as revealed by their

eye direction (Velichkovsky 1995) We have explored atechnique that involves rendering each personrsquos viewfrustrum to indicate the field of view as Figure 3 illus-trates In this example the wire frame frustrums spot-light the 3D space visible to each person This featurecolor coded for each person may be especially helpfulto teachers in a distance learning CVE who could usesuch information to see where students are focusingtheir visual attention without having to look directly atthe studentsrsquo eyes

There are a number of similar tools (ie specific ob-jects rendered only to particular interactants) that canassist interactants in a CVE For example in ourNZSMG studies an experimenter enters a CVE and at-tempts to persuade other interactants regarding a cer-tain topic (Beall et al 2003) In those interactions werender the interactantsrsquo names over their heads on float-ing billboards for the experimenter to read In this man-ner the experimenter can refer to people by name moreeasily There are many other ways to use these floatingbillboards to assist interactants for example reminders

Figure 3 View frustrums marking the field of view of interactants

Bailenson et al 435

about the interactantrsquos preferences or personality (egldquodoesnrsquot respond well to prolonged mutual gazerdquo)

One of the most useful forms of transforming sensorycapabilities may be to enlist one or more human con-sultants who are rendered to only one member in aCVE (ie virtual ghosts) Unlike a face-to-face interac-tion a CVE will enable an interactant to have informedhuman consultants who are free to wander around thevirtual meeting space to scrutinize the actions of otherinteractants to conduct online research and sidebarmeetings in order to provide key interactants with addi-tional information and to generally provide support forthe interactants For example the leader can have herresearch team actually rendered beside her in the CVEMembers of her team can point out actions by potentialmembers suggest new strategies and even provide real-time criticism and feedback concerning the behavior ofthe leader without any of the other members havingeven a hint of awareness concerning the human consul-tantsrsquo presence Alternatively the leader herself can gointo ldquoghost moderdquo and explore the virtual world withher team while her avatar remains seated and is evencontrolled by yet another member of her team

33 Transforming the Situation

In addition to transforming their representationand sensory capabilities CVE interactants can also usealgorithms to transform their general spatial or temporalsituations In a CVE people generally adopt a spatiallycoherent situational context across all remote interac-tants that brings everyone together in the shared spaceHowever there is no reason that the details and ar-rangements of that virtual space need to be constant forall the interactants in the CVE Consider the situationfor three interactants Interactant A may choose to forman isosceles triangle with the other two while both in-teractants B and C may choose to form equilateral trian-gles Interactant A may even choose to flip the locationsof B and C In this scenario the CVE operating systemcan preserve the intended eye gaze direction by trans-forming the amplitudes or direction of head and eyemovements in a prescribed manner While this is asomewhat simple example with as many as four interac-tants it is straightforward to design spatial transforma-

tions that allow the intended eye and head gaze cues toremain intact across all interactants While eventuallysuch discordance may cause the quality and smoothnessof the interaction to suffer there are a number of waysthat transforming the situation can assist individual in-teractants

One such transformation involves multilateral per-spectives In a normal conversation each interactant hasa unique and privileged perspective That perspective isa combination of her sensory input (eg visual andacoustic fields of view) and internal beliefs about theinteraction In normal face-to-face interactions peoplecontinually use sensory input to update and adjust theirinternal beliefs (Kendon 1977) Interactants in a CVEwill possess a completely new mechanism to adjust andupdate internal beliefs A personrsquos viewpoint can bemultilateral as opposed to unilateral (normal) Inother words in a real-time conversation interactant Acan take the viewpoint of interactant B and perceiveherself as she performs various verbal and nonverbal ges-tures during the interaction In this manner she canacquire invaluable sensory information pertaining to theinteraction and update her internal beliefs concerningthe interaction in ways not possible without the CVE

Consequently interactants in educational and persua-sive interactions may be able to improve performancebecause seeing oneself through the eyes of another mayallow one to develop a more informed set of internalbeliefs about others (Baumeister 1998) Furthermoreit may be the case that being able to experience an inter-action through someone elsersquos eyes should reinforce thefact that one is indeed copresent in the CVE (egDurlach amp Slater 2000) Finally utilizing mulitlateralperspectives may assist students in distance learningCVEs in terms of training transfer effects (Rickel ampJohnson 2000) that might occur after an interactantwho has been trained in multilateral perspective takingperforms similar group tasks in nonmediated situations

A second situational transformation involves partiallyrecording the interaction and adjusting temporal prop-erties or sequences in real time Similar to commercialproducts sold for digitally recording and playing backbroadcast television interactants in a CVE should beable to accelerate and decelerate the perceived flow oftime during the mediated interaction Consider the fol-

436 PRESENCE VOLUME 13 NUMBER 4

lowing situation The student in a distance learningCVE does not understand an example that the instruc-tor provides The student can ldquorewindrdquo the recordedinteraction go back to the beginning of the confusingexample and then play back the example Once the stu-dent has understood the confusing example she canthen turn up the rate of playback (eg watch the se-quence at 2X speed) and eventually she can catch upto the instructor again By slowing down the renderedflow of time or speeding it up the interactant can focusdifferentially on particular topics and can review thesame scene from different points of view without miss-ing the remainder of the interaction Of course doingso will result in costs to that interactantrsquos contributionto the CVE in terms of interactivity (ie what does heravatar do while she rewinds) Consequently the disrup-tion of the temporal sequence will necessarily be cou-pled with some kind of an avatar autopilot

Changing the rate of time in a CVE brings up an-other interesting transformation Traditionally CVEsare roughly defined as ldquogeographically separated interac-tantsrdquo interacting over some kind of a computer-mediated network in a shared environment Howeverby combining some of the concepts discussed in previ-ous sections it may be possible to include in the defini-tion of a CVE ldquotemporally separated interactantsrdquo in ashared environment Consider a videoconference of abusiness meeting Oftentimes interested parties whocannot attend the meetings will later review a videotapeof the meeting In a CVE the temporally absent mem-ber has an option to more deeply involve herself in theinteraction Specifically she can situate her avatar in aspecific place in the CVEs seating arrangement and usean autopilot to give her representation rudimentarynonverbal behaviors Furthermore the absent membercan program her avatar to perform simple interactivetasksmdashprerecorded introductions answers to certainquestions about the CVE topic or perhaps more realis-tically for the near-term direct the avatar to play back arecorded performance Then the CVE interaction canproceed in real time with the temporally absent mem-berrsquos avatar approximating the types of behaviors thatshe would do and say As a result temporally presentmembers would actually direct pieces of the conversa-tion towards the absent member as well as transmit

nonverbal gestures towards her Later on instead of justreviewing the recording the temporally absent membercan take her place in the CVE and actually feel presentin the dialogue receiving appropriate nonverbal behav-iors and maximizing the degree of copresence More-over the members of the CVE who were present at thescheduled time can program their avatars during thereplay of the interaction to respond to any post hocquestions that the absent member might have In thisway the degree of interactivity during the replay can beincreased and perhaps at some point in the not-too-distant future the line between real-time and non-real-time interactions will become interestingly blurred

4 Implications of TSI and ResearchDirections

For better or for worse TSI implemented throughCVEs has great potential to change the nature of medi-ated interaction The strategic decoupling of renderedbehavior from actual behavior allows interactants tobreak many constraints that are inherent in face-to-faceinteraction as well as other forms of mediated interac-tions such as telephone and videophone conferencesThe effects of TSI remain to be seen Assuming thatimplementation of the TSI techniques are technicallyfeasible and that using TSI implementations is concep-tually workable for the interactants (both of which aresubstantial assumptions) one could predict a number ofconsequences First TSI may develop into a worthwhiletool that assists interactants in overcoming the inade-quacies of communicating from remote locations Byaugmenting their representational sensory and situa-tional characteristics interactants of CVEs may be ableto achieve levels of interaction that actually surpass face-to-face interaction

On the other hand people in fact may find the useof these transformations extremely unsettling Thereis the potential for the difference between TSI andcurrent CVE implementations to be as drastic as dif-ferences between email and the written letter As thistechnology is developed it is essential to examinepeoplersquos responses to this new medium (ie Reevesamp Nass 1996) It is essential to examine these impor-

Bailenson et al 437

tant potential implications of TSI before the technol-ogy becomes widespread

Along the same lines the threat of TSI may be thevery downfall of CVE interaction In face-to-face inter-action there tends to be some degree of deception forexample people using facial expressions to mask theiremotions Clearly this deception has the potential to bemuch greater with TSI If interactants have no faith thattheir perceptual experience is genuine they may havelittle reason to ever enter a CVE A complete lack oftrust in the truthfulness of gestures one-to-one corre-spondence of avatars and temporal presence of interac-tants has the potential to rob the CVE of one of itsgreatest strengths namely interactivity since the inter-actants may not know who what or when they are in-teracting with others Similarly given an expectation ofTSI interactants may be constantly suspicious duringinteractions this lack of trust of fellow interactants maylead to unproductive collaborations

A solution to this breakdown may require the devel-opment of TSI detectors for interactants either basedon computer algorithms that analyze nonverbal behav-iors or based on actual humans that scrutinize the inter-action To examine the possibility of using human TSIdetectors we now discuss what we call the non-verbalTuring Test (NVTT)

In the popular reinterpretation of the Turing Test(Turing 1950) a judge attempts to determine whichof two players (one human one machine) is a fellowhuman In our NVTT pilot studies experimental par-ticipants acting as judges enter a CVE with two vir-tual humans one avatar whose head movements areveridical and playing back the movements of anotherhuman in real time (ie without TSI) and one avatarwhose head movements are actually a transformationof the judgersquos own head movements (ie time-delayed and reduced motion range) The judge seesthe head movements from a real person on one avatarand some sort of self-mimicked movements on theother During the interaction only head movementsare permitted (ie no verbal communication al-lowed) and participants must devise ways to engageand test the two virtual humans through nonverbalmeans in order to ascertain which is human andwhich is a mimicker Figure 4 illustrates

In the current initial pilot study we manipulatedthree independent variables test trial length (either 16or 32 seconds) mimic delay (ie the computer-agentmimics either 1 2 4 or 8 seconds after the judgersquosmovements) and range of motion (high pitch yaw androll or low yaw only)

Participants in this study wore head-mounted displayswhile a render computer tracked their head orientationtracking rendering and networking latencies were alllow enough to impart a compelling sense of copresence(see Bailenson et al 2002 for detailed descriptions ofthe hardware and software used) Participants were in-structed to sit in a virtual room with two virtual peoplea human agent (ie a representation whose movementsare controlled by a real person in another room) and acomputer agent (ie a computer program that is de-signed to mimic the userrsquos movements in some way)Participants were instructed to interact with the twoother virtual people using head movements in order todetermine which one is the human agent Participantswere run in groups of two with each one acting as thehuman agent for the other Each participant sat at a vir-tual conference table with two virtual humans (similarto that shown in Figure 3) Each participant received arandom order of 32 test trials (two instances of the 16conditions resulting from the crossing of the three inde-

Figure 4 A schematic of the NVTT The human judge is forced to

determine which of the two avatars exhibiting head movements is the

real human and which is the computer-generated human mimicking

the judgersquos own movements

438 PRESENCE VOLUME 13 NUMBER 4

pendent variables) Forty-one undergraduates partici-pated in this study

For the purposes of brevity we focus on two resultsin particular First despite the fact that we explicitly toldparticipants that the computer agent was directly mim-icking them they performed surprisingly poorly whenattempting to identify the human avatar The overallaverage score was only 66 correct (SD 10chance 50 maximum score 100) Moreover ofthe 41 participants in the study more than one fourthwas not reliably different from chance (ie less than 3SEM from 50 between 44 and 56) Second as Fig-ure 5 demonstrates participantsrsquo scores diminished in-versely with the magnitude of the mimic delay in thatthere was a linear trend in the logarithm of the delayvariable F(132) 885 p 01 When the delay wasgreater than 1 second participants had more difficultyidentifying a mimicker

These data are particularly striking in that we had ini-tially predicted that participants would be able to recog-nize their own head movements much more easily whenaware that one avatar was designed to mimic them Inthis study we explicitly told participants about the useof TSI and they still had great difficulty in detecting itThe effects of implicit TSI (that is not disclosed) canonly have a higher impact While this pilot study is ex-tremely simple and only scratches the surface of a para-digm that examines TSI it is still noteworthy that par-ticipants did not detect the mimicker across the board

We are currently exploring other factors underlyingthe discrimination of human nonverbal behavior from

computer-generated behaviors In future studies wewill use NVTTs to study other nonverbal behaviorssuch as facial gestures eye-head gestures (pointingindications by either system) hand gestures and in-terpersonal distance We have shown that in albeitsimple scenarios it is possible to pass the NVTT for apercentage of our test population using TSI We areconfident that as this percentage grows in the nearfuture important scientific and sociological discover-ies will surface along the way

In conclusion there are many reasons one mightwant to avoid TSI these reasons range from Orwellianconcerns to the fear of rendering CVEs (perhaps eventhe telephone) functionally useless We are not advo-cates of TSI as a means to replace normal communica-tion nor are we staunch believers in avoiding TSI inorder to preserve the natural order of communicationand conversation However we do acknowledge thefact that as CVEs become more prevalent the strategicdecoupling of representation from behavior is inevita-ble For that reason alone the notion of TSI warrantsconsiderable attention

Acknowledgments

The authors would like to thank Robin Gilmour and Christo-pher Rex for helpful suggestions Furthermore we thankChristopher Rex and Ryan Jaeger for assistance in collectingdata This research was sponsored in part by NSF Award SBE-9873432 and in part by NSF ITR Award IIS 0205740

References

Argyle M (1988) Bodily communication (2nd ed) LondonMethuen

Bailenson J N Beall A C amp Blascovich J (2002) Mutualgaze and task performance in shared virtual environmentsJournal of Visualization and Computer Animation 13 1ndash8

Bailenson J N Beall A C Blascovich J Raimmundo Ramp Weisbuch M (2001) Intelligent agents who wear yourface Usersrsquo reactions to the virtual self Lecture Notes inArtificial Intelligence 2190 86ndash99

Bailenson J N Blascovich J Beall A C amp Guadagno

Figure 5 Percent correct by mimic delay in seconds This data

excludes subjects at chance performance

Bailenson et al 439

R E (submitted) Self representations in immersive virtualenvironments

Baumeister R F (1998) The self In D T Gilbert S TFiske amp G Lindzey (Eds) Handbook of social psychology(4th ed pp 680ndash740) New York McGraw-Hill

Beall A C Bailenson J N Loomis J Blascovich J ampRex C (2003) Non-zero-sum mutual gaze in immersivevirtual environments Proceedings of HCI International2003

Benford S Bowers J Fahlen L Greenhalgh C amp Snow-don D (1995) User embodiment in collaborative virtualenvironments Proceedings of CHIrsquo95 (pp 242ndash249) ACMPress

Biocca F (1997) The cyborgrsquos dilemma Progressive em-bodiment in virtual environments Journal of Computer-Mediated Communication [online] 3 Retrieved fromhttpwwwascuscorgjcmcvol3issue2biocca2html

Black M amp Yacoob Y (1997) Recognizing facial expres-sions in image sequences using local parameterized modelsof image motion International Journal of Computer Vision25(1) 23ndash48

Blanz V amp Vetter T (1999) A morphable model for thesynthesis of 3D faces SIGGRAPH rsquo99 Conference Proceed-ings 187ndash194

Blascovich J Loomis J Beall A C Swinth K R HoytC amp Bailenson J N (2002) Immersive virtual environ-ment technology as a methodological tool for social psy-chology Psychological Inquiry 13 103ndash124

Busey T A (1988) Physical and psychological representa-tions of faces Evidence from morphing Psychological Sci-ence 9 476ndash483

Byrne D (1971) The attraction paradigm New York Aca-demic Press

Cassell J (2000) Nudge nudge wink wink Elements of face-to-face conversation for embodied conversational agents InJ Cassell et al (Eds) Embodied conversational agentsCambridge MA MIT Press

Chaiken S (1979) Communicator physical attractiveness andpersuasion Journal of Personality and Social Psychology 371387ndash1397

Chartrand T L amp Bargh J (1999) The chameleon effectThe perception-behavior link and social interaction Journalof Personality amp Social Psychology 76(6) 893ndash910

Decarlo D Metaxas D amp Stone M (1998) An anthropo-metric face model using variational techniques Proceedingsof SIGGRAPH rsquo98 67ndash74

Depaulo B M amp Friedman H S (1998) Nonverbal com-munication In D T Gilbert S T Fiske amp G Lindzey

(Eds) The handbook of social psychology (4th ed Vol 2 pp3ndash40) Boston McGraw-Hill

Donato G Bartlett M S Hager J C Ekman P amp Se-jnowski T J (1999) Classifying facial actions IEEE Trans-actions on Pattern Analysis and Machine Intelligence21(10) 974ndash989

Durlach N amp Slater M (2000) Presence in shared virtualenvironments and virtual togetherness Presence Teleopera-tors and Virtual Environments 9 214ndash217

Ekman P (1978) Facial signs Facts fantasies and possibili-ties In T Sebeok (Ed) Sight sound and sense Blooming-ton IN Indiana University Press

Fodor J A (1983) The modularity of mind An essay on fac-ulty psychology Cambridge MA MIT Press

Fry R amp Smith G F (1975) The effects of feedback andeye contact on performance of a digit-encoding task Jour-nal of Social Psychology 96 145ndash146

Gale C amp Monk A F (2002) A look is worth a thousandwords Full gaze awareness in video-mediated conversationDiscourse Processes 33

Garau M Slater M Vinayagamoorhty V Brogni ASteed A amp Sasse M A (2003) The impact of avatar real-ism and eye gaze control on perceived quality of communi-cation in a shared immersive virtual environment Proceed-ings of the SIGCHI Conference on Human Factors inComputing Systems

Gibson W (1984) Neuromancer New York Ace BooksHu C Ferris R amp Turk M (2003) Active wavelet net-

works for face alignment Proceedings of the British MachineVision Conference Norwich UK

Kendon A (1977) Studies in the behavior of social interactionBloomington IN Indiana University

Kleinke C L (1986) Gaze and eye contact A research re-view Psychological Bulletin 100 78ndash100

Kraut R E Fussell S R Brennan S E amp Siegel J(2002) Understanding effects of proximity on collabora-tion Implications for technologies to support remote col-laborative work In P Hinds amp S Kiesler (Eds) Distrib-uted work Cambridge MA MIT Press

Lanier J (2001) Virtually there Scientific American April2001

Leigh J DeFanti T Johnson A Brown M Sandin D(1997) Global telemersion Better than being there Pro-ceedings of ICAT rsquo97

Loomis J M Blascovich J J amp Beall A C (1999) Im-mersive virtual environments as a basic research tool in psy-chology Behavior Research Methods Instruments and Com-puters 31(4) 557ndash564

440 PRESENCE VOLUME 13 NUMBER 4

Mania K amp Chalmers A (1998) Proceedings of theFourth International Conference on Virtual Systems andMultimedia (pp 177ndash182) Amsterdam IOS Press-Ohmsha

Milgram S (1992) The individual in a social world Essaysand experiments (2nd ed) New York McGraw-Hill

Morgan T Kriz R Howard T Dias Neves F amp Kelso J(2001) Extending the use of collaborative virtual environ-ments for instruction to Kndash12 schools Insight 1(1)

Normand V Babski C Benford S Bullock A Carion SChrysanthou Y et al (1999) The COVEN project Ex-ploring applicative technical and usage dimensions of col-laborative virtual environments Presence Teleoperators andVirtual Environments 8(2) 218ndash236

Patterson M L (1982) An arousal model of interpersonalintimacy Psychological Review 89 231ndash249

Pylyshyn Z W (1980) Computation and cognition Issuesin the foundations of cognitive science Behavioral amp BrainSciences 3 111ndash169

Reeves B amp Nass C (1996) The media equation Howpeople treat computers television and new media like realpeople and places New York Cambridge University Press

Rickel J amp Johnson W L (2000) Task-oriented collabora-tion with embodied agents in virtual worlds In J Cassell JSullivan S Prevost amp E Churchill (Eds) Embodied con-versational agents Cambridge MA MIT Press

Rutter D R (1984) Looking and seeing The role of visualcommunication in social interaction Suffolk UK JohnWiley amp Sons

Sannier G amp Thalmann M N (1998) A user friendly tex-ture-fitting methodology for virtual humans ComputerGraphics International rsquo97

Schwartz P Bricker L Campbell B Furness T InkpenK Matheson L et al (1998) Virtual playground Archi-tectures for a shared virtual world Proceedings of the ACMSymposium on Virtual Reality Software and Technology 199843ndash50

Sherwood J V (1987) Facilitative effects of gaze uponlearning Perceptual and Motor Skills 64 1275ndash1278

Simons H (1976) Persuasion Understanding practice andanalysis Reading MA Heath

Slater M Pertaub D amp Steed A (1999) Public speaking

in virtual reality Facing an audience of avatars IEEE Com-puter Graphics and Applications 19(2) 6ndash9

Slater A Sadagic M Usoh R amp Schroeder R (2000)Small group behavior in a virtual and real environment Acomparative study Presence Teleoperators and Virtual Envi-ronments 9(1) 37ndash51

Stiefelhagen R Yang J amp Waibel A (1997) Tracking eyesand monitoring eye gaze In M Turk amp Y Takabayashi(Eds) Proceedings of the Workshop on Perceptual User Inter-faces

Turing A (1950) Computing machinery and intelligenceMind 59 (236)

Turk M amp Kolsch M (in press) Perceptual interfaces InMedioni G amp Kang S B (Eds) Emerging topics in com-puter vision Boston Prentice Hall

Velichkovsky B M (1995) Communicating attention Gazeposition transfer in cooperative problem solving Pragmaticsand Cognition 3(2) 199ndash222

Vertegaal R (1999) The GAZE groupware system Mediat-ing joint attention in multiparty communication and collab-oration Proceedings of the CHI rsquo99 Conference on HumanFactors in Computing Systems The CHI is the Limit 294ndash301

Viola P amp Jones M (2001) Rapid object detection using aboosted cascade of simple features Proceedings of the IEEEConference on Computer Vision and Pattern Recognition

Wallace D F (1996) Infinite jest Boston Little BrownWilliams K Cheung K T amp Choi W (2000) Cyberostra-

cisms Effects of being ignored over the internet Journal ofPersonality and Social Psychology 79 748ndash762

Yee N (2002) Befriending ogres and wood elvesmdashUnder-standing relationship formation in MMORPGs Retrievedfrom httpwwwnickyeecomhubrelationshipshomehtml

Zajonc R B (1971) Brainwash Familiarity breeds comfortPsychology Today 3(9) 60ndash64

Zajonc R B Murphy S T amp Inglehart M (1989) Feel-ing and facial efference Implication of the vascular theoryof emotion Psychological Review 96 395ndash416

Zhang X amp Furnas G (2002) Social interactions in multi-scale CVEs Proceedings of the ACM Conference on Collabo-rative Virtual Environments 2002 (CVE 2002)

Bailenson et al 441

verted away from that member for long periods of timethe assistant could step in and help maintain a believableinteraction by seamlessly serving as the leaderrsquos proxyThe leader herself would then act as a conductor over-seeing all the interactions yet being free to focus herattention on individual members when she so desires Inaddition the leader is free to wander about the digitalspace consult her notes take a rest or conduct a side-bar meeting with another person However because heravatar is partially cyranic it can continue to exhibit theappropriate nonverbal behaviors all the while to eachmember Furthermore having a number of assistantswhose sole focus is to respond with appropriate nonver-bal gestures to each of the interactants in the CVEshould maximize the membersrsquo involvement or sense ofpresence in the CVE For important meetings seminarsor presentations conducted via CVEs individual interac-tants may want to utilize a number of assistants as a corepresentation team

32 Transforming Sensory Capabilities

Interactants can be assisted by technology thattakes advantage of CVEs that can keep precise runningtabs of certain types of behaviors and then display sum-

maries of those behaviors exclusively to individual inter-actants For example consider an educational CVE inwhich an instructor wants to ensure that she is directingher nonverbal behaviors in a desired fashion Such asinstructor may want to monitor her mutual gaze to en-sure that she is not looking at any one student morethan others during a presentation The tracking equip-ment used to render the scene can keep an online totalof the amount of time the instructor gazed at each indi-vidual student The CVE can render a display of thisgaze meter as well as use visual or auditory alerts toinform the instructor of disproportionate applications ofgaze

Furthermore interactants can use the tracking datasummaries to learn more about the attitudes of the oth-ers Nonverbal gestures are often correlates of specificmental states (Ekman 1978 Zajonc Murphy amp Ingle-hart 1989) For example in general we nod when weagree smile when we are pleased tilt our heads whenwe are confused and look at something in which we areinterested Interactants will be able to tailor their CVEsystems to keep track of nonverbal behaviors with thegoal of aiding interactants to infer the mental states ofthe other interactants For example a teacher will beable to gauge the percentage of students exhibitingnonverbal behaviors that suggest confusion or not un-derstanding a point in a lesson Similarly a leader coulddetermine who in a room full of members is respondingmost positively to her behavior Intuitively tabulatingand assessing the nonverbal behaviors of others is cer-tainly something that humans do constantly in face-to-face interactions With CVEs interactants will be able totabulate these behaviors with greater precision Interac-tants can use the objective tabulations from the trackingdata to augment their normal intuitions about the ges-tures occurring in the interaction

Another transformation involves filtering or degrad-ing certain signals or nonverbal behaviors There aresome visual nonverbal behaviors that tend to distractinteractants Using filtering algorithms interactants canprevent counterproductive distractions in a number ofways For example consider the situation in which aspeaker in a CVE taps her pen rapidly as she speaks Inface-to-face meetings this type of behavior can distract

Figure 2 A depiction of cyranoids On the top row are three

nonrendered gesturers Each member on the bottom hears the

leaderrsquos actual verbal behaviors (dashed lines) However each

member views the nonverbal behavior of her dedicated gesturer

rendered onto the avatar of the leader (unbroken lines)

434 PRESENCE VOLUME 13 NUMBER 4

interactants Using a CVE this type of behavior can befiltered in two ways First the speaker can filter the be-havior on the transmitting end If people know thatthey have difficulty suppressing certain nonverbal behav-iors that tend to be perceived in a negative mannersuch as a nervous tick they can activate a filter that pre-vents the behavior from being rendered Similarly incertain situations a CVE interactant may not want torender certain nonverbal behaviors Consider the leaderexample The potential member may benefit from ren-dering her ldquopoker facerdquo that is not demonstrating anyenthusiasm or disappointment via facial expressionsConsequently the member may accrue strategic advan-tage during a negotiation Furthermore interactants canfilter behaviors on the receiving end If a speakerrsquos handmotions are distracting then a listener can simplychoose to not render that interactantrsquos hand move-ments

Another example of transforming sensory capabilitiesis producing a visual indicator regarding where eachinteractantrsquos attention currently lies as revealed by their

eye direction (Velichkovsky 1995) We have explored atechnique that involves rendering each personrsquos viewfrustrum to indicate the field of view as Figure 3 illus-trates In this example the wire frame frustrums spot-light the 3D space visible to each person This featurecolor coded for each person may be especially helpfulto teachers in a distance learning CVE who could usesuch information to see where students are focusingtheir visual attention without having to look directly atthe studentsrsquo eyes

There are a number of similar tools (ie specific ob-jects rendered only to particular interactants) that canassist interactants in a CVE For example in ourNZSMG studies an experimenter enters a CVE and at-tempts to persuade other interactants regarding a cer-tain topic (Beall et al 2003) In those interactions werender the interactantsrsquo names over their heads on float-ing billboards for the experimenter to read In this man-ner the experimenter can refer to people by name moreeasily There are many other ways to use these floatingbillboards to assist interactants for example reminders

Figure 3 View frustrums marking the field of view of interactants

Bailenson et al 435

about the interactantrsquos preferences or personality (egldquodoesnrsquot respond well to prolonged mutual gazerdquo)

One of the most useful forms of transforming sensorycapabilities may be to enlist one or more human con-sultants who are rendered to only one member in aCVE (ie virtual ghosts) Unlike a face-to-face interac-tion a CVE will enable an interactant to have informedhuman consultants who are free to wander around thevirtual meeting space to scrutinize the actions of otherinteractants to conduct online research and sidebarmeetings in order to provide key interactants with addi-tional information and to generally provide support forthe interactants For example the leader can have herresearch team actually rendered beside her in the CVEMembers of her team can point out actions by potentialmembers suggest new strategies and even provide real-time criticism and feedback concerning the behavior ofthe leader without any of the other members havingeven a hint of awareness concerning the human consul-tantsrsquo presence Alternatively the leader herself can gointo ldquoghost moderdquo and explore the virtual world withher team while her avatar remains seated and is evencontrolled by yet another member of her team

33 Transforming the Situation

In addition to transforming their representationand sensory capabilities CVE interactants can also usealgorithms to transform their general spatial or temporalsituations In a CVE people generally adopt a spatiallycoherent situational context across all remote interac-tants that brings everyone together in the shared spaceHowever there is no reason that the details and ar-rangements of that virtual space need to be constant forall the interactants in the CVE Consider the situationfor three interactants Interactant A may choose to forman isosceles triangle with the other two while both in-teractants B and C may choose to form equilateral trian-gles Interactant A may even choose to flip the locationsof B and C In this scenario the CVE operating systemcan preserve the intended eye gaze direction by trans-forming the amplitudes or direction of head and eyemovements in a prescribed manner While this is asomewhat simple example with as many as four interac-tants it is straightforward to design spatial transforma-

tions that allow the intended eye and head gaze cues toremain intact across all interactants While eventuallysuch discordance may cause the quality and smoothnessof the interaction to suffer there are a number of waysthat transforming the situation can assist individual in-teractants

One such transformation involves multilateral per-spectives In a normal conversation each interactant hasa unique and privileged perspective That perspective isa combination of her sensory input (eg visual andacoustic fields of view) and internal beliefs about theinteraction In normal face-to-face interactions peoplecontinually use sensory input to update and adjust theirinternal beliefs (Kendon 1977) Interactants in a CVEwill possess a completely new mechanism to adjust andupdate internal beliefs A personrsquos viewpoint can bemultilateral as opposed to unilateral (normal) Inother words in a real-time conversation interactant Acan take the viewpoint of interactant B and perceiveherself as she performs various verbal and nonverbal ges-tures during the interaction In this manner she canacquire invaluable sensory information pertaining to theinteraction and update her internal beliefs concerningthe interaction in ways not possible without the CVE

Consequently interactants in educational and persua-sive interactions may be able to improve performancebecause seeing oneself through the eyes of another mayallow one to develop a more informed set of internalbeliefs about others (Baumeister 1998) Furthermoreit may be the case that being able to experience an inter-action through someone elsersquos eyes should reinforce thefact that one is indeed copresent in the CVE (egDurlach amp Slater 2000) Finally utilizing mulitlateralperspectives may assist students in distance learningCVEs in terms of training transfer effects (Rickel ampJohnson 2000) that might occur after an interactantwho has been trained in multilateral perspective takingperforms similar group tasks in nonmediated situations

A second situational transformation involves partiallyrecording the interaction and adjusting temporal prop-erties or sequences in real time Similar to commercialproducts sold for digitally recording and playing backbroadcast television interactants in a CVE should beable to accelerate and decelerate the perceived flow oftime during the mediated interaction Consider the fol-

436 PRESENCE VOLUME 13 NUMBER 4

lowing situation The student in a distance learningCVE does not understand an example that the instruc-tor provides The student can ldquorewindrdquo the recordedinteraction go back to the beginning of the confusingexample and then play back the example Once the stu-dent has understood the confusing example she canthen turn up the rate of playback (eg watch the se-quence at 2X speed) and eventually she can catch upto the instructor again By slowing down the renderedflow of time or speeding it up the interactant can focusdifferentially on particular topics and can review thesame scene from different points of view without miss-ing the remainder of the interaction Of course doingso will result in costs to that interactantrsquos contributionto the CVE in terms of interactivity (ie what does heravatar do while she rewinds) Consequently the disrup-tion of the temporal sequence will necessarily be cou-pled with some kind of an avatar autopilot

Changing the rate of time in a CVE brings up an-other interesting transformation Traditionally CVEsare roughly defined as ldquogeographically separated interac-tantsrdquo interacting over some kind of a computer-mediated network in a shared environment Howeverby combining some of the concepts discussed in previ-ous sections it may be possible to include in the defini-tion of a CVE ldquotemporally separated interactantsrdquo in ashared environment Consider a videoconference of abusiness meeting Oftentimes interested parties whocannot attend the meetings will later review a videotapeof the meeting In a CVE the temporally absent mem-ber has an option to more deeply involve herself in theinteraction Specifically she can situate her avatar in aspecific place in the CVEs seating arrangement and usean autopilot to give her representation rudimentarynonverbal behaviors Furthermore the absent membercan program her avatar to perform simple interactivetasksmdashprerecorded introductions answers to certainquestions about the CVE topic or perhaps more realis-tically for the near-term direct the avatar to play back arecorded performance Then the CVE interaction canproceed in real time with the temporally absent mem-berrsquos avatar approximating the types of behaviors thatshe would do and say As a result temporally presentmembers would actually direct pieces of the conversa-tion towards the absent member as well as transmit

nonverbal gestures towards her Later on instead of justreviewing the recording the temporally absent membercan take her place in the CVE and actually feel presentin the dialogue receiving appropriate nonverbal behav-iors and maximizing the degree of copresence More-over the members of the CVE who were present at thescheduled time can program their avatars during thereplay of the interaction to respond to any post hocquestions that the absent member might have In thisway the degree of interactivity during the replay can beincreased and perhaps at some point in the not-too-distant future the line between real-time and non-real-time interactions will become interestingly blurred

4 Implications of TSI and ResearchDirections

For better or for worse TSI implemented throughCVEs has great potential to change the nature of medi-ated interaction The strategic decoupling of renderedbehavior from actual behavior allows interactants tobreak many constraints that are inherent in face-to-faceinteraction as well as other forms of mediated interac-tions such as telephone and videophone conferencesThe effects of TSI remain to be seen Assuming thatimplementation of the TSI techniques are technicallyfeasible and that using TSI implementations is concep-tually workable for the interactants (both of which aresubstantial assumptions) one could predict a number ofconsequences First TSI may develop into a worthwhiletool that assists interactants in overcoming the inade-quacies of communicating from remote locations Byaugmenting their representational sensory and situa-tional characteristics interactants of CVEs may be ableto achieve levels of interaction that actually surpass face-to-face interaction

On the other hand people in fact may find the useof these transformations extremely unsettling Thereis the potential for the difference between TSI andcurrent CVE implementations to be as drastic as dif-ferences between email and the written letter As thistechnology is developed it is essential to examinepeoplersquos responses to this new medium (ie Reevesamp Nass 1996) It is essential to examine these impor-

Bailenson et al 437

tant potential implications of TSI before the technol-ogy becomes widespread

Along the same lines the threat of TSI may be thevery downfall of CVE interaction In face-to-face inter-action there tends to be some degree of deception forexample people using facial expressions to mask theiremotions Clearly this deception has the potential to bemuch greater with TSI If interactants have no faith thattheir perceptual experience is genuine they may havelittle reason to ever enter a CVE A complete lack oftrust in the truthfulness of gestures one-to-one corre-spondence of avatars and temporal presence of interac-tants has the potential to rob the CVE of one of itsgreatest strengths namely interactivity since the inter-actants may not know who what or when they are in-teracting with others Similarly given an expectation ofTSI interactants may be constantly suspicious duringinteractions this lack of trust of fellow interactants maylead to unproductive collaborations

A solution to this breakdown may require the devel-opment of TSI detectors for interactants either basedon computer algorithms that analyze nonverbal behav-iors or based on actual humans that scrutinize the inter-action To examine the possibility of using human TSIdetectors we now discuss what we call the non-verbalTuring Test (NVTT)

In the popular reinterpretation of the Turing Test(Turing 1950) a judge attempts to determine whichof two players (one human one machine) is a fellowhuman In our NVTT pilot studies experimental par-ticipants acting as judges enter a CVE with two vir-tual humans one avatar whose head movements areveridical and playing back the movements of anotherhuman in real time (ie without TSI) and one avatarwhose head movements are actually a transformationof the judgersquos own head movements (ie time-delayed and reduced motion range) The judge seesthe head movements from a real person on one avatarand some sort of self-mimicked movements on theother During the interaction only head movementsare permitted (ie no verbal communication al-lowed) and participants must devise ways to engageand test the two virtual humans through nonverbalmeans in order to ascertain which is human andwhich is a mimicker Figure 4 illustrates

In the current initial pilot study we manipulatedthree independent variables test trial length (either 16or 32 seconds) mimic delay (ie the computer-agentmimics either 1 2 4 or 8 seconds after the judgersquosmovements) and range of motion (high pitch yaw androll or low yaw only)

Participants in this study wore head-mounted displayswhile a render computer tracked their head orientationtracking rendering and networking latencies were alllow enough to impart a compelling sense of copresence(see Bailenson et al 2002 for detailed descriptions ofthe hardware and software used) Participants were in-structed to sit in a virtual room with two virtual peoplea human agent (ie a representation whose movementsare controlled by a real person in another room) and acomputer agent (ie a computer program that is de-signed to mimic the userrsquos movements in some way)Participants were instructed to interact with the twoother virtual people using head movements in order todetermine which one is the human agent Participantswere run in groups of two with each one acting as thehuman agent for the other Each participant sat at a vir-tual conference table with two virtual humans (similarto that shown in Figure 3) Each participant received arandom order of 32 test trials (two instances of the 16conditions resulting from the crossing of the three inde-

Figure 4 A schematic of the NVTT The human judge is forced to

determine which of the two avatars exhibiting head movements is the

real human and which is the computer-generated human mimicking

the judgersquos own movements

438 PRESENCE VOLUME 13 NUMBER 4

pendent variables) Forty-one undergraduates partici-pated in this study

For the purposes of brevity we focus on two resultsin particular First despite the fact that we explicitly toldparticipants that the computer agent was directly mim-icking them they performed surprisingly poorly whenattempting to identify the human avatar The overallaverage score was only 66 correct (SD 10chance 50 maximum score 100) Moreover ofthe 41 participants in the study more than one fourthwas not reliably different from chance (ie less than 3SEM from 50 between 44 and 56) Second as Fig-ure 5 demonstrates participantsrsquo scores diminished in-versely with the magnitude of the mimic delay in thatthere was a linear trend in the logarithm of the delayvariable F(132) 885 p 01 When the delay wasgreater than 1 second participants had more difficultyidentifying a mimicker

These data are particularly striking in that we had ini-tially predicted that participants would be able to recog-nize their own head movements much more easily whenaware that one avatar was designed to mimic them Inthis study we explicitly told participants about the useof TSI and they still had great difficulty in detecting itThe effects of implicit TSI (that is not disclosed) canonly have a higher impact While this pilot study is ex-tremely simple and only scratches the surface of a para-digm that examines TSI it is still noteworthy that par-ticipants did not detect the mimicker across the board

We are currently exploring other factors underlyingthe discrimination of human nonverbal behavior from

computer-generated behaviors In future studies wewill use NVTTs to study other nonverbal behaviorssuch as facial gestures eye-head gestures (pointingindications by either system) hand gestures and in-terpersonal distance We have shown that in albeitsimple scenarios it is possible to pass the NVTT for apercentage of our test population using TSI We areconfident that as this percentage grows in the nearfuture important scientific and sociological discover-ies will surface along the way

In conclusion there are many reasons one mightwant to avoid TSI these reasons range from Orwellianconcerns to the fear of rendering CVEs (perhaps eventhe telephone) functionally useless We are not advo-cates of TSI as a means to replace normal communica-tion nor are we staunch believers in avoiding TSI inorder to preserve the natural order of communicationand conversation However we do acknowledge thefact that as CVEs become more prevalent the strategicdecoupling of representation from behavior is inevita-ble For that reason alone the notion of TSI warrantsconsiderable attention

Acknowledgments

The authors would like to thank Robin Gilmour and Christo-pher Rex for helpful suggestions Furthermore we thankChristopher Rex and Ryan Jaeger for assistance in collectingdata This research was sponsored in part by NSF Award SBE-9873432 and in part by NSF ITR Award IIS 0205740

References

Argyle M (1988) Bodily communication (2nd ed) LondonMethuen

Bailenson J N Beall A C amp Blascovich J (2002) Mutualgaze and task performance in shared virtual environmentsJournal of Visualization and Computer Animation 13 1ndash8

Bailenson J N Beall A C Blascovich J Raimmundo Ramp Weisbuch M (2001) Intelligent agents who wear yourface Usersrsquo reactions to the virtual self Lecture Notes inArtificial Intelligence 2190 86ndash99

Bailenson J N Blascovich J Beall A C amp Guadagno

Figure 5 Percent correct by mimic delay in seconds This data

excludes subjects at chance performance

Bailenson et al 439

R E (submitted) Self representations in immersive virtualenvironments

Baumeister R F (1998) The self In D T Gilbert S TFiske amp G Lindzey (Eds) Handbook of social psychology(4th ed pp 680ndash740) New York McGraw-Hill

Beall A C Bailenson J N Loomis J Blascovich J ampRex C (2003) Non-zero-sum mutual gaze in immersivevirtual environments Proceedings of HCI International2003

Benford S Bowers J Fahlen L Greenhalgh C amp Snow-don D (1995) User embodiment in collaborative virtualenvironments Proceedings of CHIrsquo95 (pp 242ndash249) ACMPress

Biocca F (1997) The cyborgrsquos dilemma Progressive em-bodiment in virtual environments Journal of Computer-Mediated Communication [online] 3 Retrieved fromhttpwwwascuscorgjcmcvol3issue2biocca2html

Black M amp Yacoob Y (1997) Recognizing facial expres-sions in image sequences using local parameterized modelsof image motion International Journal of Computer Vision25(1) 23ndash48

Blanz V amp Vetter T (1999) A morphable model for thesynthesis of 3D faces SIGGRAPH rsquo99 Conference Proceed-ings 187ndash194

Blascovich J Loomis J Beall A C Swinth K R HoytC amp Bailenson J N (2002) Immersive virtual environ-ment technology as a methodological tool for social psy-chology Psychological Inquiry 13 103ndash124

Busey T A (1988) Physical and psychological representa-tions of faces Evidence from morphing Psychological Sci-ence 9 476ndash483

Byrne D (1971) The attraction paradigm New York Aca-demic Press

Cassell J (2000) Nudge nudge wink wink Elements of face-to-face conversation for embodied conversational agents InJ Cassell et al (Eds) Embodied conversational agentsCambridge MA MIT Press

Chaiken S (1979) Communicator physical attractiveness andpersuasion Journal of Personality and Social Psychology 371387ndash1397

Chartrand T L amp Bargh J (1999) The chameleon effectThe perception-behavior link and social interaction Journalof Personality amp Social Psychology 76(6) 893ndash910

Decarlo D Metaxas D amp Stone M (1998) An anthropo-metric face model using variational techniques Proceedingsof SIGGRAPH rsquo98 67ndash74

Depaulo B M amp Friedman H S (1998) Nonverbal com-munication In D T Gilbert S T Fiske amp G Lindzey

(Eds) The handbook of social psychology (4th ed Vol 2 pp3ndash40) Boston McGraw-Hill

Donato G Bartlett M S Hager J C Ekman P amp Se-jnowski T J (1999) Classifying facial actions IEEE Trans-actions on Pattern Analysis and Machine Intelligence21(10) 974ndash989

Durlach N amp Slater M (2000) Presence in shared virtualenvironments and virtual togetherness Presence Teleopera-tors and Virtual Environments 9 214ndash217

Ekman P (1978) Facial signs Facts fantasies and possibili-ties In T Sebeok (Ed) Sight sound and sense Blooming-ton IN Indiana University Press

Fodor J A (1983) The modularity of mind An essay on fac-ulty psychology Cambridge MA MIT Press

Fry R amp Smith G F (1975) The effects of feedback andeye contact on performance of a digit-encoding task Jour-nal of Social Psychology 96 145ndash146

Gale C amp Monk A F (2002) A look is worth a thousandwords Full gaze awareness in video-mediated conversationDiscourse Processes 33

Garau M Slater M Vinayagamoorhty V Brogni ASteed A amp Sasse M A (2003) The impact of avatar real-ism and eye gaze control on perceived quality of communi-cation in a shared immersive virtual environment Proceed-ings of the SIGCHI Conference on Human Factors inComputing Systems

Gibson W (1984) Neuromancer New York Ace BooksHu C Ferris R amp Turk M (2003) Active wavelet net-

works for face alignment Proceedings of the British MachineVision Conference Norwich UK

Kendon A (1977) Studies in the behavior of social interactionBloomington IN Indiana University

Kleinke C L (1986) Gaze and eye contact A research re-view Psychological Bulletin 100 78ndash100

Kraut R E Fussell S R Brennan S E amp Siegel J(2002) Understanding effects of proximity on collabora-tion Implications for technologies to support remote col-laborative work In P Hinds amp S Kiesler (Eds) Distrib-uted work Cambridge MA MIT Press

Lanier J (2001) Virtually there Scientific American April2001

Leigh J DeFanti T Johnson A Brown M Sandin D(1997) Global telemersion Better than being there Pro-ceedings of ICAT rsquo97

Loomis J M Blascovich J J amp Beall A C (1999) Im-mersive virtual environments as a basic research tool in psy-chology Behavior Research Methods Instruments and Com-puters 31(4) 557ndash564

440 PRESENCE VOLUME 13 NUMBER 4

Mania K amp Chalmers A (1998) Proceedings of theFourth International Conference on Virtual Systems andMultimedia (pp 177ndash182) Amsterdam IOS Press-Ohmsha

Milgram S (1992) The individual in a social world Essaysand experiments (2nd ed) New York McGraw-Hill

Morgan T Kriz R Howard T Dias Neves F amp Kelso J(2001) Extending the use of collaborative virtual environ-ments for instruction to Kndash12 schools Insight 1(1)

Normand V Babski C Benford S Bullock A Carion SChrysanthou Y et al (1999) The COVEN project Ex-ploring applicative technical and usage dimensions of col-laborative virtual environments Presence Teleoperators andVirtual Environments 8(2) 218ndash236

Patterson M L (1982) An arousal model of interpersonalintimacy Psychological Review 89 231ndash249

Pylyshyn Z W (1980) Computation and cognition Issuesin the foundations of cognitive science Behavioral amp BrainSciences 3 111ndash169

Reeves B amp Nass C (1996) The media equation Howpeople treat computers television and new media like realpeople and places New York Cambridge University Press

Rickel J amp Johnson W L (2000) Task-oriented collabora-tion with embodied agents in virtual worlds In J Cassell JSullivan S Prevost amp E Churchill (Eds) Embodied con-versational agents Cambridge MA MIT Press

Rutter D R (1984) Looking and seeing The role of visualcommunication in social interaction Suffolk UK JohnWiley amp Sons

Sannier G amp Thalmann M N (1998) A user friendly tex-ture-fitting methodology for virtual humans ComputerGraphics International rsquo97

Schwartz P Bricker L Campbell B Furness T InkpenK Matheson L et al (1998) Virtual playground Archi-tectures for a shared virtual world Proceedings of the ACMSymposium on Virtual Reality Software and Technology 199843ndash50

Sherwood J V (1987) Facilitative effects of gaze uponlearning Perceptual and Motor Skills 64 1275ndash1278

Simons H (1976) Persuasion Understanding practice andanalysis Reading MA Heath

Slater M Pertaub D amp Steed A (1999) Public speaking

in virtual reality Facing an audience of avatars IEEE Com-puter Graphics and Applications 19(2) 6ndash9

Slater A Sadagic M Usoh R amp Schroeder R (2000)Small group behavior in a virtual and real environment Acomparative study Presence Teleoperators and Virtual Envi-ronments 9(1) 37ndash51

Stiefelhagen R Yang J amp Waibel A (1997) Tracking eyesand monitoring eye gaze In M Turk amp Y Takabayashi(Eds) Proceedings of the Workshop on Perceptual User Inter-faces

Turing A (1950) Computing machinery and intelligenceMind 59 (236)

Turk M amp Kolsch M (in press) Perceptual interfaces InMedioni G amp Kang S B (Eds) Emerging topics in com-puter vision Boston Prentice Hall

Velichkovsky B M (1995) Communicating attention Gazeposition transfer in cooperative problem solving Pragmaticsand Cognition 3(2) 199ndash222

Vertegaal R (1999) The GAZE groupware system Mediat-ing joint attention in multiparty communication and collab-oration Proceedings of the CHI rsquo99 Conference on HumanFactors in Computing Systems The CHI is the Limit 294ndash301

Viola P amp Jones M (2001) Rapid object detection using aboosted cascade of simple features Proceedings of the IEEEConference on Computer Vision and Pattern Recognition

Wallace D F (1996) Infinite jest Boston Little BrownWilliams K Cheung K T amp Choi W (2000) Cyberostra-

cisms Effects of being ignored over the internet Journal ofPersonality and Social Psychology 79 748ndash762

Yee N (2002) Befriending ogres and wood elvesmdashUnder-standing relationship formation in MMORPGs Retrievedfrom httpwwwnickyeecomhubrelationshipshomehtml

Zajonc R B (1971) Brainwash Familiarity breeds comfortPsychology Today 3(9) 60ndash64

Zajonc R B Murphy S T amp Inglehart M (1989) Feel-ing and facial efference Implication of the vascular theoryof emotion Psychological Review 96 395ndash416

Zhang X amp Furnas G (2002) Social interactions in multi-scale CVEs Proceedings of the ACM Conference on Collabo-rative Virtual Environments 2002 (CVE 2002)

Bailenson et al 441

interactants Using a CVE this type of behavior can befiltered in two ways First the speaker can filter the be-havior on the transmitting end If people know thatthey have difficulty suppressing certain nonverbal behav-iors that tend to be perceived in a negative mannersuch as a nervous tick they can activate a filter that pre-vents the behavior from being rendered Similarly incertain situations a CVE interactant may not want torender certain nonverbal behaviors Consider the leaderexample The potential member may benefit from ren-dering her ldquopoker facerdquo that is not demonstrating anyenthusiasm or disappointment via facial expressionsConsequently the member may accrue strategic advan-tage during a negotiation Furthermore interactants canfilter behaviors on the receiving end If a speakerrsquos handmotions are distracting then a listener can simplychoose to not render that interactantrsquos hand move-ments

Another example of transforming sensory capabilitiesis producing a visual indicator regarding where eachinteractantrsquos attention currently lies as revealed by their

eye direction (Velichkovsky 1995) We have explored atechnique that involves rendering each personrsquos viewfrustrum to indicate the field of view as Figure 3 illus-trates In this example the wire frame frustrums spot-light the 3D space visible to each person This featurecolor coded for each person may be especially helpfulto teachers in a distance learning CVE who could usesuch information to see where students are focusingtheir visual attention without having to look directly atthe studentsrsquo eyes

There are a number of similar tools (ie specific ob-jects rendered only to particular interactants) that canassist interactants in a CVE For example in ourNZSMG studies an experimenter enters a CVE and at-tempts to persuade other interactants regarding a cer-tain topic (Beall et al 2003) In those interactions werender the interactantsrsquo names over their heads on float-ing billboards for the experimenter to read In this man-ner the experimenter can refer to people by name moreeasily There are many other ways to use these floatingbillboards to assist interactants for example reminders

Figure 3 View frustrums marking the field of view of interactants

Bailenson et al 435

about the interactantrsquos preferences or personality (egldquodoesnrsquot respond well to prolonged mutual gazerdquo)

One of the most useful forms of transforming sensorycapabilities may be to enlist one or more human con-sultants who are rendered to only one member in aCVE (ie virtual ghosts) Unlike a face-to-face interac-tion a CVE will enable an interactant to have informedhuman consultants who are free to wander around thevirtual meeting space to scrutinize the actions of otherinteractants to conduct online research and sidebarmeetings in order to provide key interactants with addi-tional information and to generally provide support forthe interactants For example the leader can have herresearch team actually rendered beside her in the CVEMembers of her team can point out actions by potentialmembers suggest new strategies and even provide real-time criticism and feedback concerning the behavior ofthe leader without any of the other members havingeven a hint of awareness concerning the human consul-tantsrsquo presence Alternatively the leader herself can gointo ldquoghost moderdquo and explore the virtual world withher team while her avatar remains seated and is evencontrolled by yet another member of her team

33 Transforming the Situation

In addition to transforming their representationand sensory capabilities CVE interactants can also usealgorithms to transform their general spatial or temporalsituations In a CVE people generally adopt a spatiallycoherent situational context across all remote interac-tants that brings everyone together in the shared spaceHowever there is no reason that the details and ar-rangements of that virtual space need to be constant forall the interactants in the CVE Consider the situationfor three interactants Interactant A may choose to forman isosceles triangle with the other two while both in-teractants B and C may choose to form equilateral trian-gles Interactant A may even choose to flip the locationsof B and C In this scenario the CVE operating systemcan preserve the intended eye gaze direction by trans-forming the amplitudes or direction of head and eyemovements in a prescribed manner While this is asomewhat simple example with as many as four interac-tants it is straightforward to design spatial transforma-

tions that allow the intended eye and head gaze cues toremain intact across all interactants While eventuallysuch discordance may cause the quality and smoothnessof the interaction to suffer there are a number of waysthat transforming the situation can assist individual in-teractants

One such transformation involves multilateral per-spectives In a normal conversation each interactant hasa unique and privileged perspective That perspective isa combination of her sensory input (eg visual andacoustic fields of view) and internal beliefs about theinteraction In normal face-to-face interactions peoplecontinually use sensory input to update and adjust theirinternal beliefs (Kendon 1977) Interactants in a CVEwill possess a completely new mechanism to adjust andupdate internal beliefs A personrsquos viewpoint can bemultilateral as opposed to unilateral (normal) Inother words in a real-time conversation interactant Acan take the viewpoint of interactant B and perceiveherself as she performs various verbal and nonverbal ges-tures during the interaction In this manner she canacquire invaluable sensory information pertaining to theinteraction and update her internal beliefs concerningthe interaction in ways not possible without the CVE

Consequently interactants in educational and persua-sive interactions may be able to improve performancebecause seeing oneself through the eyes of another mayallow one to develop a more informed set of internalbeliefs about others (Baumeister 1998) Furthermoreit may be the case that being able to experience an inter-action through someone elsersquos eyes should reinforce thefact that one is indeed copresent in the CVE (egDurlach amp Slater 2000) Finally utilizing mulitlateralperspectives may assist students in distance learningCVEs in terms of training transfer effects (Rickel ampJohnson 2000) that might occur after an interactantwho has been trained in multilateral perspective takingperforms similar group tasks in nonmediated situations

A second situational transformation involves partiallyrecording the interaction and adjusting temporal prop-erties or sequences in real time Similar to commercialproducts sold for digitally recording and playing backbroadcast television interactants in a CVE should beable to accelerate and decelerate the perceived flow oftime during the mediated interaction Consider the fol-

436 PRESENCE VOLUME 13 NUMBER 4

lowing situation The student in a distance learningCVE does not understand an example that the instruc-tor provides The student can ldquorewindrdquo the recordedinteraction go back to the beginning of the confusingexample and then play back the example Once the stu-dent has understood the confusing example she canthen turn up the rate of playback (eg watch the se-quence at 2X speed) and eventually she can catch upto the instructor again By slowing down the renderedflow of time or speeding it up the interactant can focusdifferentially on particular topics and can review thesame scene from different points of view without miss-ing the remainder of the interaction Of course doingso will result in costs to that interactantrsquos contributionto the CVE in terms of interactivity (ie what does heravatar do while she rewinds) Consequently the disrup-tion of the temporal sequence will necessarily be cou-pled with some kind of an avatar autopilot

Changing the rate of time in a CVE brings up an-other interesting transformation Traditionally CVEsare roughly defined as ldquogeographically separated interac-tantsrdquo interacting over some kind of a computer-mediated network in a shared environment Howeverby combining some of the concepts discussed in previ-ous sections it may be possible to include in the defini-tion of a CVE ldquotemporally separated interactantsrdquo in ashared environment Consider a videoconference of abusiness meeting Oftentimes interested parties whocannot attend the meetings will later review a videotapeof the meeting In a CVE the temporally absent mem-ber has an option to more deeply involve herself in theinteraction Specifically she can situate her avatar in aspecific place in the CVEs seating arrangement and usean autopilot to give her representation rudimentarynonverbal behaviors Furthermore the absent membercan program her avatar to perform simple interactivetasksmdashprerecorded introductions answers to certainquestions about the CVE topic or perhaps more realis-tically for the near-term direct the avatar to play back arecorded performance Then the CVE interaction canproceed in real time with the temporally absent mem-berrsquos avatar approximating the types of behaviors thatshe would do and say As a result temporally presentmembers would actually direct pieces of the conversa-tion towards the absent member as well as transmit

nonverbal gestures towards her Later on instead of justreviewing the recording the temporally absent membercan take her place in the CVE and actually feel presentin the dialogue receiving appropriate nonverbal behav-iors and maximizing the degree of copresence More-over the members of the CVE who were present at thescheduled time can program their avatars during thereplay of the interaction to respond to any post hocquestions that the absent member might have In thisway the degree of interactivity during the replay can beincreased and perhaps at some point in the not-too-distant future the line between real-time and non-real-time interactions will become interestingly blurred

4 Implications of TSI and ResearchDirections

For better or for worse TSI implemented throughCVEs has great potential to change the nature of medi-ated interaction The strategic decoupling of renderedbehavior from actual behavior allows interactants tobreak many constraints that are inherent in face-to-faceinteraction as well as other forms of mediated interac-tions such as telephone and videophone conferencesThe effects of TSI remain to be seen Assuming thatimplementation of the TSI techniques are technicallyfeasible and that using TSI implementations is concep-tually workable for the interactants (both of which aresubstantial assumptions) one could predict a number ofconsequences First TSI may develop into a worthwhiletool that assists interactants in overcoming the inade-quacies of communicating from remote locations Byaugmenting their representational sensory and situa-tional characteristics interactants of CVEs may be ableto achieve levels of interaction that actually surpass face-to-face interaction

On the other hand people in fact may find the useof these transformations extremely unsettling Thereis the potential for the difference between TSI andcurrent CVE implementations to be as drastic as dif-ferences between email and the written letter As thistechnology is developed it is essential to examinepeoplersquos responses to this new medium (ie Reevesamp Nass 1996) It is essential to examine these impor-

Bailenson et al 437

tant potential implications of TSI before the technol-ogy becomes widespread

Along the same lines the threat of TSI may be thevery downfall of CVE interaction In face-to-face inter-action there tends to be some degree of deception forexample people using facial expressions to mask theiremotions Clearly this deception has the potential to bemuch greater with TSI If interactants have no faith thattheir perceptual experience is genuine they may havelittle reason to ever enter a CVE A complete lack oftrust in the truthfulness of gestures one-to-one corre-spondence of avatars and temporal presence of interac-tants has the potential to rob the CVE of one of itsgreatest strengths namely interactivity since the inter-actants may not know who what or when they are in-teracting with others Similarly given an expectation ofTSI interactants may be constantly suspicious duringinteractions this lack of trust of fellow interactants maylead to unproductive collaborations

A solution to this breakdown may require the devel-opment of TSI detectors for interactants either basedon computer algorithms that analyze nonverbal behav-iors or based on actual humans that scrutinize the inter-action To examine the possibility of using human TSIdetectors we now discuss what we call the non-verbalTuring Test (NVTT)

In the popular reinterpretation of the Turing Test(Turing 1950) a judge attempts to determine whichof two players (one human one machine) is a fellowhuman In our NVTT pilot studies experimental par-ticipants acting as judges enter a CVE with two vir-tual humans one avatar whose head movements areveridical and playing back the movements of anotherhuman in real time (ie without TSI) and one avatarwhose head movements are actually a transformationof the judgersquos own head movements (ie time-delayed and reduced motion range) The judge seesthe head movements from a real person on one avatarand some sort of self-mimicked movements on theother During the interaction only head movementsare permitted (ie no verbal communication al-lowed) and participants must devise ways to engageand test the two virtual humans through nonverbalmeans in order to ascertain which is human andwhich is a mimicker Figure 4 illustrates

In the current initial pilot study we manipulatedthree independent variables test trial length (either 16or 32 seconds) mimic delay (ie the computer-agentmimics either 1 2 4 or 8 seconds after the judgersquosmovements) and range of motion (high pitch yaw androll or low yaw only)

Participants in this study wore head-mounted displayswhile a render computer tracked their head orientationtracking rendering and networking latencies were alllow enough to impart a compelling sense of copresence(see Bailenson et al 2002 for detailed descriptions ofthe hardware and software used) Participants were in-structed to sit in a virtual room with two virtual peoplea human agent (ie a representation whose movementsare controlled by a real person in another room) and acomputer agent (ie a computer program that is de-signed to mimic the userrsquos movements in some way)Participants were instructed to interact with the twoother virtual people using head movements in order todetermine which one is the human agent Participantswere run in groups of two with each one acting as thehuman agent for the other Each participant sat at a vir-tual conference table with two virtual humans (similarto that shown in Figure 3) Each participant received arandom order of 32 test trials (two instances of the 16conditions resulting from the crossing of the three inde-

Figure 4 A schematic of the NVTT The human judge is forced to

determine which of the two avatars exhibiting head movements is the

real human and which is the computer-generated human mimicking

the judgersquos own movements

438 PRESENCE VOLUME 13 NUMBER 4

pendent variables) Forty-one undergraduates partici-pated in this study

For the purposes of brevity we focus on two resultsin particular First despite the fact that we explicitly toldparticipants that the computer agent was directly mim-icking them they performed surprisingly poorly whenattempting to identify the human avatar The overallaverage score was only 66 correct (SD 10chance 50 maximum score 100) Moreover ofthe 41 participants in the study more than one fourthwas not reliably different from chance (ie less than 3SEM from 50 between 44 and 56) Second as Fig-ure 5 demonstrates participantsrsquo scores diminished in-versely with the magnitude of the mimic delay in thatthere was a linear trend in the logarithm of the delayvariable F(132) 885 p 01 When the delay wasgreater than 1 second participants had more difficultyidentifying a mimicker

These data are particularly striking in that we had ini-tially predicted that participants would be able to recog-nize their own head movements much more easily whenaware that one avatar was designed to mimic them Inthis study we explicitly told participants about the useof TSI and they still had great difficulty in detecting itThe effects of implicit TSI (that is not disclosed) canonly have a higher impact While this pilot study is ex-tremely simple and only scratches the surface of a para-digm that examines TSI it is still noteworthy that par-ticipants did not detect the mimicker across the board

We are currently exploring other factors underlyingthe discrimination of human nonverbal behavior from

computer-generated behaviors In future studies wewill use NVTTs to study other nonverbal behaviorssuch as facial gestures eye-head gestures (pointingindications by either system) hand gestures and in-terpersonal distance We have shown that in albeitsimple scenarios it is possible to pass the NVTT for apercentage of our test population using TSI We areconfident that as this percentage grows in the nearfuture important scientific and sociological discover-ies will surface along the way

In conclusion there are many reasons one mightwant to avoid TSI these reasons range from Orwellianconcerns to the fear of rendering CVEs (perhaps eventhe telephone) functionally useless We are not advo-cates of TSI as a means to replace normal communica-tion nor are we staunch believers in avoiding TSI inorder to preserve the natural order of communicationand conversation However we do acknowledge thefact that as CVEs become more prevalent the strategicdecoupling of representation from behavior is inevita-ble For that reason alone the notion of TSI warrantsconsiderable attention

Acknowledgments

The authors would like to thank Robin Gilmour and Christo-pher Rex for helpful suggestions Furthermore we thankChristopher Rex and Ryan Jaeger for assistance in collectingdata This research was sponsored in part by NSF Award SBE-9873432 and in part by NSF ITR Award IIS 0205740

References

Argyle M (1988) Bodily communication (2nd ed) LondonMethuen

Bailenson J N Beall A C amp Blascovich J (2002) Mutualgaze and task performance in shared virtual environmentsJournal of Visualization and Computer Animation 13 1ndash8

Bailenson J N Beall A C Blascovich J Raimmundo Ramp Weisbuch M (2001) Intelligent agents who wear yourface Usersrsquo reactions to the virtual self Lecture Notes inArtificial Intelligence 2190 86ndash99

Bailenson J N Blascovich J Beall A C amp Guadagno

Figure 5 Percent correct by mimic delay in seconds This data

excludes subjects at chance performance

Bailenson et al 439

R E (submitted) Self representations in immersive virtualenvironments

Baumeister R F (1998) The self In D T Gilbert S TFiske amp G Lindzey (Eds) Handbook of social psychology(4th ed pp 680ndash740) New York McGraw-Hill

Beall A C Bailenson J N Loomis J Blascovich J ampRex C (2003) Non-zero-sum mutual gaze in immersivevirtual environments Proceedings of HCI International2003

Benford S Bowers J Fahlen L Greenhalgh C amp Snow-don D (1995) User embodiment in collaborative virtualenvironments Proceedings of CHIrsquo95 (pp 242ndash249) ACMPress

Biocca F (1997) The cyborgrsquos dilemma Progressive em-bodiment in virtual environments Journal of Computer-Mediated Communication [online] 3 Retrieved fromhttpwwwascuscorgjcmcvol3issue2biocca2html

Black M amp Yacoob Y (1997) Recognizing facial expres-sions in image sequences using local parameterized modelsof image motion International Journal of Computer Vision25(1) 23ndash48

Blanz V amp Vetter T (1999) A morphable model for thesynthesis of 3D faces SIGGRAPH rsquo99 Conference Proceed-ings 187ndash194

Blascovich J Loomis J Beall A C Swinth K R HoytC amp Bailenson J N (2002) Immersive virtual environ-ment technology as a methodological tool for social psy-chology Psychological Inquiry 13 103ndash124

Busey T A (1988) Physical and psychological representa-tions of faces Evidence from morphing Psychological Sci-ence 9 476ndash483

Byrne D (1971) The attraction paradigm New York Aca-demic Press

Cassell J (2000) Nudge nudge wink wink Elements of face-to-face conversation for embodied conversational agents InJ Cassell et al (Eds) Embodied conversational agentsCambridge MA MIT Press

Chaiken S (1979) Communicator physical attractiveness andpersuasion Journal of Personality and Social Psychology 371387ndash1397

Chartrand T L amp Bargh J (1999) The chameleon effectThe perception-behavior link and social interaction Journalof Personality amp Social Psychology 76(6) 893ndash910

Decarlo D Metaxas D amp Stone M (1998) An anthropo-metric face model using variational techniques Proceedingsof SIGGRAPH rsquo98 67ndash74

Depaulo B M amp Friedman H S (1998) Nonverbal com-munication In D T Gilbert S T Fiske amp G Lindzey

(Eds) The handbook of social psychology (4th ed Vol 2 pp3ndash40) Boston McGraw-Hill

Donato G Bartlett M S Hager J C Ekman P amp Se-jnowski T J (1999) Classifying facial actions IEEE Trans-actions on Pattern Analysis and Machine Intelligence21(10) 974ndash989

Durlach N amp Slater M (2000) Presence in shared virtualenvironments and virtual togetherness Presence Teleopera-tors and Virtual Environments 9 214ndash217

Ekman P (1978) Facial signs Facts fantasies and possibili-ties In T Sebeok (Ed) Sight sound and sense Blooming-ton IN Indiana University Press

Fodor J A (1983) The modularity of mind An essay on fac-ulty psychology Cambridge MA MIT Press

Fry R amp Smith G F (1975) The effects of feedback andeye contact on performance of a digit-encoding task Jour-nal of Social Psychology 96 145ndash146

Gale C amp Monk A F (2002) A look is worth a thousandwords Full gaze awareness in video-mediated conversationDiscourse Processes 33

Garau M Slater M Vinayagamoorhty V Brogni ASteed A amp Sasse M A (2003) The impact of avatar real-ism and eye gaze control on perceived quality of communi-cation in a shared immersive virtual environment Proceed-ings of the SIGCHI Conference on Human Factors inComputing Systems

Gibson W (1984) Neuromancer New York Ace BooksHu C Ferris R amp Turk M (2003) Active wavelet net-

works for face alignment Proceedings of the British MachineVision Conference Norwich UK

Kendon A (1977) Studies in the behavior of social interactionBloomington IN Indiana University

Kleinke C L (1986) Gaze and eye contact A research re-view Psychological Bulletin 100 78ndash100

Kraut R E Fussell S R Brennan S E amp Siegel J(2002) Understanding effects of proximity on collabora-tion Implications for technologies to support remote col-laborative work In P Hinds amp S Kiesler (Eds) Distrib-uted work Cambridge MA MIT Press

Lanier J (2001) Virtually there Scientific American April2001

Leigh J DeFanti T Johnson A Brown M Sandin D(1997) Global telemersion Better than being there Pro-ceedings of ICAT rsquo97

Loomis J M Blascovich J J amp Beall A C (1999) Im-mersive virtual environments as a basic research tool in psy-chology Behavior Research Methods Instruments and Com-puters 31(4) 557ndash564

440 PRESENCE VOLUME 13 NUMBER 4

Mania K amp Chalmers A (1998) Proceedings of theFourth International Conference on Virtual Systems andMultimedia (pp 177ndash182) Amsterdam IOS Press-Ohmsha

Milgram S (1992) The individual in a social world Essaysand experiments (2nd ed) New York McGraw-Hill

Morgan T Kriz R Howard T Dias Neves F amp Kelso J(2001) Extending the use of collaborative virtual environ-ments for instruction to Kndash12 schools Insight 1(1)

Normand V Babski C Benford S Bullock A Carion SChrysanthou Y et al (1999) The COVEN project Ex-ploring applicative technical and usage dimensions of col-laborative virtual environments Presence Teleoperators andVirtual Environments 8(2) 218ndash236

Patterson M L (1982) An arousal model of interpersonalintimacy Psychological Review 89 231ndash249

Pylyshyn Z W (1980) Computation and cognition Issuesin the foundations of cognitive science Behavioral amp BrainSciences 3 111ndash169

Reeves B amp Nass C (1996) The media equation Howpeople treat computers television and new media like realpeople and places New York Cambridge University Press

Rickel J amp Johnson W L (2000) Task-oriented collabora-tion with embodied agents in virtual worlds In J Cassell JSullivan S Prevost amp E Churchill (Eds) Embodied con-versational agents Cambridge MA MIT Press

Rutter D R (1984) Looking and seeing The role of visualcommunication in social interaction Suffolk UK JohnWiley amp Sons

Sannier G amp Thalmann M N (1998) A user friendly tex-ture-fitting methodology for virtual humans ComputerGraphics International rsquo97

Schwartz P Bricker L Campbell B Furness T InkpenK Matheson L et al (1998) Virtual playground Archi-tectures for a shared virtual world Proceedings of the ACMSymposium on Virtual Reality Software and Technology 199843ndash50

Sherwood J V (1987) Facilitative effects of gaze uponlearning Perceptual and Motor Skills 64 1275ndash1278

Simons H (1976) Persuasion Understanding practice andanalysis Reading MA Heath

Slater M Pertaub D amp Steed A (1999) Public speaking

in virtual reality Facing an audience of avatars IEEE Com-puter Graphics and Applications 19(2) 6ndash9

Slater A Sadagic M Usoh R amp Schroeder R (2000)Small group behavior in a virtual and real environment Acomparative study Presence Teleoperators and Virtual Envi-ronments 9(1) 37ndash51

Stiefelhagen R Yang J amp Waibel A (1997) Tracking eyesand monitoring eye gaze In M Turk amp Y Takabayashi(Eds) Proceedings of the Workshop on Perceptual User Inter-faces

Turing A (1950) Computing machinery and intelligenceMind 59 (236)

Turk M amp Kolsch M (in press) Perceptual interfaces InMedioni G amp Kang S B (Eds) Emerging topics in com-puter vision Boston Prentice Hall

Velichkovsky B M (1995) Communicating attention Gazeposition transfer in cooperative problem solving Pragmaticsand Cognition 3(2) 199ndash222

Vertegaal R (1999) The GAZE groupware system Mediat-ing joint attention in multiparty communication and collab-oration Proceedings of the CHI rsquo99 Conference on HumanFactors in Computing Systems The CHI is the Limit 294ndash301

Viola P amp Jones M (2001) Rapid object detection using aboosted cascade of simple features Proceedings of the IEEEConference on Computer Vision and Pattern Recognition

Wallace D F (1996) Infinite jest Boston Little BrownWilliams K Cheung K T amp Choi W (2000) Cyberostra-

cisms Effects of being ignored over the internet Journal ofPersonality and Social Psychology 79 748ndash762

Yee N (2002) Befriending ogres and wood elvesmdashUnder-standing relationship formation in MMORPGs Retrievedfrom httpwwwnickyeecomhubrelationshipshomehtml

Zajonc R B (1971) Brainwash Familiarity breeds comfortPsychology Today 3(9) 60ndash64

Zajonc R B Murphy S T amp Inglehart M (1989) Feel-ing and facial efference Implication of the vascular theoryof emotion Psychological Review 96 395ndash416

Zhang X amp Furnas G (2002) Social interactions in multi-scale CVEs Proceedings of the ACM Conference on Collabo-rative Virtual Environments 2002 (CVE 2002)

Bailenson et al 441

about the interactantrsquos preferences or personality (egldquodoesnrsquot respond well to prolonged mutual gazerdquo)

One of the most useful forms of transforming sensorycapabilities may be to enlist one or more human con-sultants who are rendered to only one member in aCVE (ie virtual ghosts) Unlike a face-to-face interac-tion a CVE will enable an interactant to have informedhuman consultants who are free to wander around thevirtual meeting space to scrutinize the actions of otherinteractants to conduct online research and sidebarmeetings in order to provide key interactants with addi-tional information and to generally provide support forthe interactants For example the leader can have herresearch team actually rendered beside her in the CVEMembers of her team can point out actions by potentialmembers suggest new strategies and even provide real-time criticism and feedback concerning the behavior ofthe leader without any of the other members havingeven a hint of awareness concerning the human consul-tantsrsquo presence Alternatively the leader herself can gointo ldquoghost moderdquo and explore the virtual world withher team while her avatar remains seated and is evencontrolled by yet another member of her team

33 Transforming the Situation

In addition to transforming their representationand sensory capabilities CVE interactants can also usealgorithms to transform their general spatial or temporalsituations In a CVE people generally adopt a spatiallycoherent situational context across all remote interac-tants that brings everyone together in the shared spaceHowever there is no reason that the details and ar-rangements of that virtual space need to be constant forall the interactants in the CVE Consider the situationfor three interactants Interactant A may choose to forman isosceles triangle with the other two while both in-teractants B and C may choose to form equilateral trian-gles Interactant A may even choose to flip the locationsof B and C In this scenario the CVE operating systemcan preserve the intended eye gaze direction by trans-forming the amplitudes or direction of head and eyemovements in a prescribed manner While this is asomewhat simple example with as many as four interac-tants it is straightforward to design spatial transforma-

tions that allow the intended eye and head gaze cues toremain intact across all interactants While eventuallysuch discordance may cause the quality and smoothnessof the interaction to suffer there are a number of waysthat transforming the situation can assist individual in-teractants

One such transformation involves multilateral per-spectives In a normal conversation each interactant hasa unique and privileged perspective That perspective isa combination of her sensory input (eg visual andacoustic fields of view) and internal beliefs about theinteraction In normal face-to-face interactions peoplecontinually use sensory input to update and adjust theirinternal beliefs (Kendon 1977) Interactants in a CVEwill possess a completely new mechanism to adjust andupdate internal beliefs A personrsquos viewpoint can bemultilateral as opposed to unilateral (normal) Inother words in a real-time conversation interactant Acan take the viewpoint of interactant B and perceiveherself as she performs various verbal and nonverbal ges-tures during the interaction In this manner she canacquire invaluable sensory information pertaining to theinteraction and update her internal beliefs concerningthe interaction in ways not possible without the CVE

Consequently interactants in educational and persua-sive interactions may be able to improve performancebecause seeing oneself through the eyes of another mayallow one to develop a more informed set of internalbeliefs about others (Baumeister 1998) Furthermoreit may be the case that being able to experience an inter-action through someone elsersquos eyes should reinforce thefact that one is indeed copresent in the CVE (egDurlach amp Slater 2000) Finally utilizing mulitlateralperspectives may assist students in distance learningCVEs in terms of training transfer effects (Rickel ampJohnson 2000) that might occur after an interactantwho has been trained in multilateral perspective takingperforms similar group tasks in nonmediated situations

A second situational transformation involves partiallyrecording the interaction and adjusting temporal prop-erties or sequences in real time Similar to commercialproducts sold for digitally recording and playing backbroadcast television interactants in a CVE should beable to accelerate and decelerate the perceived flow oftime during the mediated interaction Consider the fol-

436 PRESENCE VOLUME 13 NUMBER 4

lowing situation The student in a distance learningCVE does not understand an example that the instruc-tor provides The student can ldquorewindrdquo the recordedinteraction go back to the beginning of the confusingexample and then play back the example Once the stu-dent has understood the confusing example she canthen turn up the rate of playback (eg watch the se-quence at 2X speed) and eventually she can catch upto the instructor again By slowing down the renderedflow of time or speeding it up the interactant can focusdifferentially on particular topics and can review thesame scene from different points of view without miss-ing the remainder of the interaction Of course doingso will result in costs to that interactantrsquos contributionto the CVE in terms of interactivity (ie what does heravatar do while she rewinds) Consequently the disrup-tion of the temporal sequence will necessarily be cou-pled with some kind of an avatar autopilot

Changing the rate of time in a CVE brings up an-other interesting transformation Traditionally CVEsare roughly defined as ldquogeographically separated interac-tantsrdquo interacting over some kind of a computer-mediated network in a shared environment Howeverby combining some of the concepts discussed in previ-ous sections it may be possible to include in the defini-tion of a CVE ldquotemporally separated interactantsrdquo in ashared environment Consider a videoconference of abusiness meeting Oftentimes interested parties whocannot attend the meetings will later review a videotapeof the meeting In a CVE the temporally absent mem-ber has an option to more deeply involve herself in theinteraction Specifically she can situate her avatar in aspecific place in the CVEs seating arrangement and usean autopilot to give her representation rudimentarynonverbal behaviors Furthermore the absent membercan program her avatar to perform simple interactivetasksmdashprerecorded introductions answers to certainquestions about the CVE topic or perhaps more realis-tically for the near-term direct the avatar to play back arecorded performance Then the CVE interaction canproceed in real time with the temporally absent mem-berrsquos avatar approximating the types of behaviors thatshe would do and say As a result temporally presentmembers would actually direct pieces of the conversa-tion towards the absent member as well as transmit

nonverbal gestures towards her Later on instead of justreviewing the recording the temporally absent membercan take her place in the CVE and actually feel presentin the dialogue receiving appropriate nonverbal behav-iors and maximizing the degree of copresence More-over the members of the CVE who were present at thescheduled time can program their avatars during thereplay of the interaction to respond to any post hocquestions that the absent member might have In thisway the degree of interactivity during the replay can beincreased and perhaps at some point in the not-too-distant future the line between real-time and non-real-time interactions will become interestingly blurred

4 Implications of TSI and ResearchDirections

For better or for worse TSI implemented throughCVEs has great potential to change the nature of medi-ated interaction The strategic decoupling of renderedbehavior from actual behavior allows interactants tobreak many constraints that are inherent in face-to-faceinteraction as well as other forms of mediated interac-tions such as telephone and videophone conferencesThe effects of TSI remain to be seen Assuming thatimplementation of the TSI techniques are technicallyfeasible and that using TSI implementations is concep-tually workable for the interactants (both of which aresubstantial assumptions) one could predict a number ofconsequences First TSI may develop into a worthwhiletool that assists interactants in overcoming the inade-quacies of communicating from remote locations Byaugmenting their representational sensory and situa-tional characteristics interactants of CVEs may be ableto achieve levels of interaction that actually surpass face-to-face interaction

On the other hand people in fact may find the useof these transformations extremely unsettling Thereis the potential for the difference between TSI andcurrent CVE implementations to be as drastic as dif-ferences between email and the written letter As thistechnology is developed it is essential to examinepeoplersquos responses to this new medium (ie Reevesamp Nass 1996) It is essential to examine these impor-

Bailenson et al 437

tant potential implications of TSI before the technol-ogy becomes widespread

Along the same lines the threat of TSI may be thevery downfall of CVE interaction In face-to-face inter-action there tends to be some degree of deception forexample people using facial expressions to mask theiremotions Clearly this deception has the potential to bemuch greater with TSI If interactants have no faith thattheir perceptual experience is genuine they may havelittle reason to ever enter a CVE A complete lack oftrust in the truthfulness of gestures one-to-one corre-spondence of avatars and temporal presence of interac-tants has the potential to rob the CVE of one of itsgreatest strengths namely interactivity since the inter-actants may not know who what or when they are in-teracting with others Similarly given an expectation ofTSI interactants may be constantly suspicious duringinteractions this lack of trust of fellow interactants maylead to unproductive collaborations

A solution to this breakdown may require the devel-opment of TSI detectors for interactants either basedon computer algorithms that analyze nonverbal behav-iors or based on actual humans that scrutinize the inter-action To examine the possibility of using human TSIdetectors we now discuss what we call the non-verbalTuring Test (NVTT)

In the popular reinterpretation of the Turing Test(Turing 1950) a judge attempts to determine whichof two players (one human one machine) is a fellowhuman In our NVTT pilot studies experimental par-ticipants acting as judges enter a CVE with two vir-tual humans one avatar whose head movements areveridical and playing back the movements of anotherhuman in real time (ie without TSI) and one avatarwhose head movements are actually a transformationof the judgersquos own head movements (ie time-delayed and reduced motion range) The judge seesthe head movements from a real person on one avatarand some sort of self-mimicked movements on theother During the interaction only head movementsare permitted (ie no verbal communication al-lowed) and participants must devise ways to engageand test the two virtual humans through nonverbalmeans in order to ascertain which is human andwhich is a mimicker Figure 4 illustrates

In the current initial pilot study we manipulatedthree independent variables test trial length (either 16or 32 seconds) mimic delay (ie the computer-agentmimics either 1 2 4 or 8 seconds after the judgersquosmovements) and range of motion (high pitch yaw androll or low yaw only)

Participants in this study wore head-mounted displayswhile a render computer tracked their head orientationtracking rendering and networking latencies were alllow enough to impart a compelling sense of copresence(see Bailenson et al 2002 for detailed descriptions ofthe hardware and software used) Participants were in-structed to sit in a virtual room with two virtual peoplea human agent (ie a representation whose movementsare controlled by a real person in another room) and acomputer agent (ie a computer program that is de-signed to mimic the userrsquos movements in some way)Participants were instructed to interact with the twoother virtual people using head movements in order todetermine which one is the human agent Participantswere run in groups of two with each one acting as thehuman agent for the other Each participant sat at a vir-tual conference table with two virtual humans (similarto that shown in Figure 3) Each participant received arandom order of 32 test trials (two instances of the 16conditions resulting from the crossing of the three inde-

Figure 4 A schematic of the NVTT The human judge is forced to

determine which of the two avatars exhibiting head movements is the

real human and which is the computer-generated human mimicking

the judgersquos own movements

438 PRESENCE VOLUME 13 NUMBER 4

pendent variables) Forty-one undergraduates partici-pated in this study

For the purposes of brevity we focus on two resultsin particular First despite the fact that we explicitly toldparticipants that the computer agent was directly mim-icking them they performed surprisingly poorly whenattempting to identify the human avatar The overallaverage score was only 66 correct (SD 10chance 50 maximum score 100) Moreover ofthe 41 participants in the study more than one fourthwas not reliably different from chance (ie less than 3SEM from 50 between 44 and 56) Second as Fig-ure 5 demonstrates participantsrsquo scores diminished in-versely with the magnitude of the mimic delay in thatthere was a linear trend in the logarithm of the delayvariable F(132) 885 p 01 When the delay wasgreater than 1 second participants had more difficultyidentifying a mimicker

These data are particularly striking in that we had ini-tially predicted that participants would be able to recog-nize their own head movements much more easily whenaware that one avatar was designed to mimic them Inthis study we explicitly told participants about the useof TSI and they still had great difficulty in detecting itThe effects of implicit TSI (that is not disclosed) canonly have a higher impact While this pilot study is ex-tremely simple and only scratches the surface of a para-digm that examines TSI it is still noteworthy that par-ticipants did not detect the mimicker across the board

We are currently exploring other factors underlyingthe discrimination of human nonverbal behavior from

computer-generated behaviors In future studies wewill use NVTTs to study other nonverbal behaviorssuch as facial gestures eye-head gestures (pointingindications by either system) hand gestures and in-terpersonal distance We have shown that in albeitsimple scenarios it is possible to pass the NVTT for apercentage of our test population using TSI We areconfident that as this percentage grows in the nearfuture important scientific and sociological discover-ies will surface along the way

In conclusion there are many reasons one mightwant to avoid TSI these reasons range from Orwellianconcerns to the fear of rendering CVEs (perhaps eventhe telephone) functionally useless We are not advo-cates of TSI as a means to replace normal communica-tion nor are we staunch believers in avoiding TSI inorder to preserve the natural order of communicationand conversation However we do acknowledge thefact that as CVEs become more prevalent the strategicdecoupling of representation from behavior is inevita-ble For that reason alone the notion of TSI warrantsconsiderable attention

Acknowledgments

The authors would like to thank Robin Gilmour and Christo-pher Rex for helpful suggestions Furthermore we thankChristopher Rex and Ryan Jaeger for assistance in collectingdata This research was sponsored in part by NSF Award SBE-9873432 and in part by NSF ITR Award IIS 0205740

References

Argyle M (1988) Bodily communication (2nd ed) LondonMethuen

Bailenson J N Beall A C amp Blascovich J (2002) Mutualgaze and task performance in shared virtual environmentsJournal of Visualization and Computer Animation 13 1ndash8

Bailenson J N Beall A C Blascovich J Raimmundo Ramp Weisbuch M (2001) Intelligent agents who wear yourface Usersrsquo reactions to the virtual self Lecture Notes inArtificial Intelligence 2190 86ndash99

Bailenson J N Blascovich J Beall A C amp Guadagno

Figure 5 Percent correct by mimic delay in seconds This data

excludes subjects at chance performance

Bailenson et al 439

R E (submitted) Self representations in immersive virtualenvironments

Baumeister R F (1998) The self In D T Gilbert S TFiske amp G Lindzey (Eds) Handbook of social psychology(4th ed pp 680ndash740) New York McGraw-Hill

Beall A C Bailenson J N Loomis J Blascovich J ampRex C (2003) Non-zero-sum mutual gaze in immersivevirtual environments Proceedings of HCI International2003

Benford S Bowers J Fahlen L Greenhalgh C amp Snow-don D (1995) User embodiment in collaborative virtualenvironments Proceedings of CHIrsquo95 (pp 242ndash249) ACMPress

Biocca F (1997) The cyborgrsquos dilemma Progressive em-bodiment in virtual environments Journal of Computer-Mediated Communication [online] 3 Retrieved fromhttpwwwascuscorgjcmcvol3issue2biocca2html

Black M amp Yacoob Y (1997) Recognizing facial expres-sions in image sequences using local parameterized modelsof image motion International Journal of Computer Vision25(1) 23ndash48

Blanz V amp Vetter T (1999) A morphable model for thesynthesis of 3D faces SIGGRAPH rsquo99 Conference Proceed-ings 187ndash194

Blascovich J Loomis J Beall A C Swinth K R HoytC amp Bailenson J N (2002) Immersive virtual environ-ment technology as a methodological tool for social psy-chology Psychological Inquiry 13 103ndash124

Busey T A (1988) Physical and psychological representa-tions of faces Evidence from morphing Psychological Sci-ence 9 476ndash483

Byrne D (1971) The attraction paradigm New York Aca-demic Press

Cassell J (2000) Nudge nudge wink wink Elements of face-to-face conversation for embodied conversational agents InJ Cassell et al (Eds) Embodied conversational agentsCambridge MA MIT Press

Chaiken S (1979) Communicator physical attractiveness andpersuasion Journal of Personality and Social Psychology 371387ndash1397

Chartrand T L amp Bargh J (1999) The chameleon effectThe perception-behavior link and social interaction Journalof Personality amp Social Psychology 76(6) 893ndash910

Decarlo D Metaxas D amp Stone M (1998) An anthropo-metric face model using variational techniques Proceedingsof SIGGRAPH rsquo98 67ndash74

Depaulo B M amp Friedman H S (1998) Nonverbal com-munication In D T Gilbert S T Fiske amp G Lindzey

(Eds) The handbook of social psychology (4th ed Vol 2 pp3ndash40) Boston McGraw-Hill

Donato G Bartlett M S Hager J C Ekman P amp Se-jnowski T J (1999) Classifying facial actions IEEE Trans-actions on Pattern Analysis and Machine Intelligence21(10) 974ndash989

Durlach N amp Slater M (2000) Presence in shared virtualenvironments and virtual togetherness Presence Teleopera-tors and Virtual Environments 9 214ndash217

Ekman P (1978) Facial signs Facts fantasies and possibili-ties In T Sebeok (Ed) Sight sound and sense Blooming-ton IN Indiana University Press

Fodor J A (1983) The modularity of mind An essay on fac-ulty psychology Cambridge MA MIT Press

Fry R amp Smith G F (1975) The effects of feedback andeye contact on performance of a digit-encoding task Jour-nal of Social Psychology 96 145ndash146

Gale C amp Monk A F (2002) A look is worth a thousandwords Full gaze awareness in video-mediated conversationDiscourse Processes 33

Garau M Slater M Vinayagamoorhty V Brogni ASteed A amp Sasse M A (2003) The impact of avatar real-ism and eye gaze control on perceived quality of communi-cation in a shared immersive virtual environment Proceed-ings of the SIGCHI Conference on Human Factors inComputing Systems

Gibson W (1984) Neuromancer New York Ace BooksHu C Ferris R amp Turk M (2003) Active wavelet net-

works for face alignment Proceedings of the British MachineVision Conference Norwich UK

Kendon A (1977) Studies in the behavior of social interactionBloomington IN Indiana University

Kleinke C L (1986) Gaze and eye contact A research re-view Psychological Bulletin 100 78ndash100

Kraut R E Fussell S R Brennan S E amp Siegel J(2002) Understanding effects of proximity on collabora-tion Implications for technologies to support remote col-laborative work In P Hinds amp S Kiesler (Eds) Distrib-uted work Cambridge MA MIT Press

Lanier J (2001) Virtually there Scientific American April2001

Leigh J DeFanti T Johnson A Brown M Sandin D(1997) Global telemersion Better than being there Pro-ceedings of ICAT rsquo97

Loomis J M Blascovich J J amp Beall A C (1999) Im-mersive virtual environments as a basic research tool in psy-chology Behavior Research Methods Instruments and Com-puters 31(4) 557ndash564

440 PRESENCE VOLUME 13 NUMBER 4

Mania K amp Chalmers A (1998) Proceedings of theFourth International Conference on Virtual Systems andMultimedia (pp 177ndash182) Amsterdam IOS Press-Ohmsha

Milgram S (1992) The individual in a social world Essaysand experiments (2nd ed) New York McGraw-Hill

Morgan T Kriz R Howard T Dias Neves F amp Kelso J(2001) Extending the use of collaborative virtual environ-ments for instruction to Kndash12 schools Insight 1(1)

Normand V Babski C Benford S Bullock A Carion SChrysanthou Y et al (1999) The COVEN project Ex-ploring applicative technical and usage dimensions of col-laborative virtual environments Presence Teleoperators andVirtual Environments 8(2) 218ndash236

Patterson M L (1982) An arousal model of interpersonalintimacy Psychological Review 89 231ndash249

Pylyshyn Z W (1980) Computation and cognition Issuesin the foundations of cognitive science Behavioral amp BrainSciences 3 111ndash169

Reeves B amp Nass C (1996) The media equation Howpeople treat computers television and new media like realpeople and places New York Cambridge University Press

Rickel J amp Johnson W L (2000) Task-oriented collabora-tion with embodied agents in virtual worlds In J Cassell JSullivan S Prevost amp E Churchill (Eds) Embodied con-versational agents Cambridge MA MIT Press

Rutter D R (1984) Looking and seeing The role of visualcommunication in social interaction Suffolk UK JohnWiley amp Sons

Sannier G amp Thalmann M N (1998) A user friendly tex-ture-fitting methodology for virtual humans ComputerGraphics International rsquo97

Schwartz P Bricker L Campbell B Furness T InkpenK Matheson L et al (1998) Virtual playground Archi-tectures for a shared virtual world Proceedings of the ACMSymposium on Virtual Reality Software and Technology 199843ndash50

Sherwood J V (1987) Facilitative effects of gaze uponlearning Perceptual and Motor Skills 64 1275ndash1278

Simons H (1976) Persuasion Understanding practice andanalysis Reading MA Heath

Slater M Pertaub D amp Steed A (1999) Public speaking

in virtual reality Facing an audience of avatars IEEE Com-puter Graphics and Applications 19(2) 6ndash9

Slater A Sadagic M Usoh R amp Schroeder R (2000)Small group behavior in a virtual and real environment Acomparative study Presence Teleoperators and Virtual Envi-ronments 9(1) 37ndash51

Stiefelhagen R Yang J amp Waibel A (1997) Tracking eyesand monitoring eye gaze In M Turk amp Y Takabayashi(Eds) Proceedings of the Workshop on Perceptual User Inter-faces

Turing A (1950) Computing machinery and intelligenceMind 59 (236)

Turk M amp Kolsch M (in press) Perceptual interfaces InMedioni G amp Kang S B (Eds) Emerging topics in com-puter vision Boston Prentice Hall

Velichkovsky B M (1995) Communicating attention Gazeposition transfer in cooperative problem solving Pragmaticsand Cognition 3(2) 199ndash222

Vertegaal R (1999) The GAZE groupware system Mediat-ing joint attention in multiparty communication and collab-oration Proceedings of the CHI rsquo99 Conference on HumanFactors in Computing Systems The CHI is the Limit 294ndash301

Viola P amp Jones M (2001) Rapid object detection using aboosted cascade of simple features Proceedings of the IEEEConference on Computer Vision and Pattern Recognition

Wallace D F (1996) Infinite jest Boston Little BrownWilliams K Cheung K T amp Choi W (2000) Cyberostra-

cisms Effects of being ignored over the internet Journal ofPersonality and Social Psychology 79 748ndash762

Yee N (2002) Befriending ogres and wood elvesmdashUnder-standing relationship formation in MMORPGs Retrievedfrom httpwwwnickyeecomhubrelationshipshomehtml

Zajonc R B (1971) Brainwash Familiarity breeds comfortPsychology Today 3(9) 60ndash64

Zajonc R B Murphy S T amp Inglehart M (1989) Feel-ing and facial efference Implication of the vascular theoryof emotion Psychological Review 96 395ndash416

Zhang X amp Furnas G (2002) Social interactions in multi-scale CVEs Proceedings of the ACM Conference on Collabo-rative Virtual Environments 2002 (CVE 2002)

Bailenson et al 441

lowing situation The student in a distance learningCVE does not understand an example that the instruc-tor provides The student can ldquorewindrdquo the recordedinteraction go back to the beginning of the confusingexample and then play back the example Once the stu-dent has understood the confusing example she canthen turn up the rate of playback (eg watch the se-quence at 2X speed) and eventually she can catch upto the instructor again By slowing down the renderedflow of time or speeding it up the interactant can focusdifferentially on particular topics and can review thesame scene from different points of view without miss-ing the remainder of the interaction Of course doingso will result in costs to that interactantrsquos contributionto the CVE in terms of interactivity (ie what does heravatar do while she rewinds) Consequently the disrup-tion of the temporal sequence will necessarily be cou-pled with some kind of an avatar autopilot

Changing the rate of time in a CVE brings up an-other interesting transformation Traditionally CVEsare roughly defined as ldquogeographically separated interac-tantsrdquo interacting over some kind of a computer-mediated network in a shared environment Howeverby combining some of the concepts discussed in previ-ous sections it may be possible to include in the defini-tion of a CVE ldquotemporally separated interactantsrdquo in ashared environment Consider a videoconference of abusiness meeting Oftentimes interested parties whocannot attend the meetings will later review a videotapeof the meeting In a CVE the temporally absent mem-ber has an option to more deeply involve herself in theinteraction Specifically she can situate her avatar in aspecific place in the CVEs seating arrangement and usean autopilot to give her representation rudimentarynonverbal behaviors Furthermore the absent membercan program her avatar to perform simple interactivetasksmdashprerecorded introductions answers to certainquestions about the CVE topic or perhaps more realis-tically for the near-term direct the avatar to play back arecorded performance Then the CVE interaction canproceed in real time with the temporally absent mem-berrsquos avatar approximating the types of behaviors thatshe would do and say As a result temporally presentmembers would actually direct pieces of the conversa-tion towards the absent member as well as transmit

nonverbal gestures towards her Later on instead of justreviewing the recording the temporally absent membercan take her place in the CVE and actually feel presentin the dialogue receiving appropriate nonverbal behav-iors and maximizing the degree of copresence More-over the members of the CVE who were present at thescheduled time can program their avatars during thereplay of the interaction to respond to any post hocquestions that the absent member might have In thisway the degree of interactivity during the replay can beincreased and perhaps at some point in the not-too-distant future the line between real-time and non-real-time interactions will become interestingly blurred

4 Implications of TSI and ResearchDirections

For better or for worse TSI implemented throughCVEs has great potential to change the nature of medi-ated interaction The strategic decoupling of renderedbehavior from actual behavior allows interactants tobreak many constraints that are inherent in face-to-faceinteraction as well as other forms of mediated interac-tions such as telephone and videophone conferencesThe effects of TSI remain to be seen Assuming thatimplementation of the TSI techniques are technicallyfeasible and that using TSI implementations is concep-tually workable for the interactants (both of which aresubstantial assumptions) one could predict a number ofconsequences First TSI may develop into a worthwhiletool that assists interactants in overcoming the inade-quacies of communicating from remote locations Byaugmenting their representational sensory and situa-tional characteristics interactants of CVEs may be ableto achieve levels of interaction that actually surpass face-to-face interaction

On the other hand people in fact may find the useof these transformations extremely unsettling Thereis the potential for the difference between TSI andcurrent CVE implementations to be as drastic as dif-ferences between email and the written letter As thistechnology is developed it is essential to examinepeoplersquos responses to this new medium (ie Reevesamp Nass 1996) It is essential to examine these impor-

Bailenson et al 437

tant potential implications of TSI before the technol-ogy becomes widespread

Along the same lines the threat of TSI may be thevery downfall of CVE interaction In face-to-face inter-action there tends to be some degree of deception forexample people using facial expressions to mask theiremotions Clearly this deception has the potential to bemuch greater with TSI If interactants have no faith thattheir perceptual experience is genuine they may havelittle reason to ever enter a CVE A complete lack oftrust in the truthfulness of gestures one-to-one corre-spondence of avatars and temporal presence of interac-tants has the potential to rob the CVE of one of itsgreatest strengths namely interactivity since the inter-actants may not know who what or when they are in-teracting with others Similarly given an expectation ofTSI interactants may be constantly suspicious duringinteractions this lack of trust of fellow interactants maylead to unproductive collaborations

A solution to this breakdown may require the devel-opment of TSI detectors for interactants either basedon computer algorithms that analyze nonverbal behav-iors or based on actual humans that scrutinize the inter-action To examine the possibility of using human TSIdetectors we now discuss what we call the non-verbalTuring Test (NVTT)

In the popular reinterpretation of the Turing Test(Turing 1950) a judge attempts to determine whichof two players (one human one machine) is a fellowhuman In our NVTT pilot studies experimental par-ticipants acting as judges enter a CVE with two vir-tual humans one avatar whose head movements areveridical and playing back the movements of anotherhuman in real time (ie without TSI) and one avatarwhose head movements are actually a transformationof the judgersquos own head movements (ie time-delayed and reduced motion range) The judge seesthe head movements from a real person on one avatarand some sort of self-mimicked movements on theother During the interaction only head movementsare permitted (ie no verbal communication al-lowed) and participants must devise ways to engageand test the two virtual humans through nonverbalmeans in order to ascertain which is human andwhich is a mimicker Figure 4 illustrates

In the current initial pilot study we manipulatedthree independent variables test trial length (either 16or 32 seconds) mimic delay (ie the computer-agentmimics either 1 2 4 or 8 seconds after the judgersquosmovements) and range of motion (high pitch yaw androll or low yaw only)

Participants in this study wore head-mounted displayswhile a render computer tracked their head orientationtracking rendering and networking latencies were alllow enough to impart a compelling sense of copresence(see Bailenson et al 2002 for detailed descriptions ofthe hardware and software used) Participants were in-structed to sit in a virtual room with two virtual peoplea human agent (ie a representation whose movementsare controlled by a real person in another room) and acomputer agent (ie a computer program that is de-signed to mimic the userrsquos movements in some way)Participants were instructed to interact with the twoother virtual people using head movements in order todetermine which one is the human agent Participantswere run in groups of two with each one acting as thehuman agent for the other Each participant sat at a vir-tual conference table with two virtual humans (similarto that shown in Figure 3) Each participant received arandom order of 32 test trials (two instances of the 16conditions resulting from the crossing of the three inde-

Figure 4 A schematic of the NVTT The human judge is forced to

determine which of the two avatars exhibiting head movements is the

real human and which is the computer-generated human mimicking

the judgersquos own movements

438 PRESENCE VOLUME 13 NUMBER 4

pendent variables) Forty-one undergraduates partici-pated in this study

For the purposes of brevity we focus on two resultsin particular First despite the fact that we explicitly toldparticipants that the computer agent was directly mim-icking them they performed surprisingly poorly whenattempting to identify the human avatar The overallaverage score was only 66 correct (SD 10chance 50 maximum score 100) Moreover ofthe 41 participants in the study more than one fourthwas not reliably different from chance (ie less than 3SEM from 50 between 44 and 56) Second as Fig-ure 5 demonstrates participantsrsquo scores diminished in-versely with the magnitude of the mimic delay in thatthere was a linear trend in the logarithm of the delayvariable F(132) 885 p 01 When the delay wasgreater than 1 second participants had more difficultyidentifying a mimicker

These data are particularly striking in that we had ini-tially predicted that participants would be able to recog-nize their own head movements much more easily whenaware that one avatar was designed to mimic them Inthis study we explicitly told participants about the useof TSI and they still had great difficulty in detecting itThe effects of implicit TSI (that is not disclosed) canonly have a higher impact While this pilot study is ex-tremely simple and only scratches the surface of a para-digm that examines TSI it is still noteworthy that par-ticipants did not detect the mimicker across the board

We are currently exploring other factors underlyingthe discrimination of human nonverbal behavior from

computer-generated behaviors In future studies wewill use NVTTs to study other nonverbal behaviorssuch as facial gestures eye-head gestures (pointingindications by either system) hand gestures and in-terpersonal distance We have shown that in albeitsimple scenarios it is possible to pass the NVTT for apercentage of our test population using TSI We areconfident that as this percentage grows in the nearfuture important scientific and sociological discover-ies will surface along the way

In conclusion there are many reasons one mightwant to avoid TSI these reasons range from Orwellianconcerns to the fear of rendering CVEs (perhaps eventhe telephone) functionally useless We are not advo-cates of TSI as a means to replace normal communica-tion nor are we staunch believers in avoiding TSI inorder to preserve the natural order of communicationand conversation However we do acknowledge thefact that as CVEs become more prevalent the strategicdecoupling of representation from behavior is inevita-ble For that reason alone the notion of TSI warrantsconsiderable attention

Acknowledgments

The authors would like to thank Robin Gilmour and Christo-pher Rex for helpful suggestions Furthermore we thankChristopher Rex and Ryan Jaeger for assistance in collectingdata This research was sponsored in part by NSF Award SBE-9873432 and in part by NSF ITR Award IIS 0205740

References

Argyle M (1988) Bodily communication (2nd ed) LondonMethuen

Bailenson J N Beall A C amp Blascovich J (2002) Mutualgaze and task performance in shared virtual environmentsJournal of Visualization and Computer Animation 13 1ndash8

Bailenson J N Beall A C Blascovich J Raimmundo Ramp Weisbuch M (2001) Intelligent agents who wear yourface Usersrsquo reactions to the virtual self Lecture Notes inArtificial Intelligence 2190 86ndash99

Bailenson J N Blascovich J Beall A C amp Guadagno

Figure 5 Percent correct by mimic delay in seconds This data

excludes subjects at chance performance

Bailenson et al 439

R E (submitted) Self representations in immersive virtualenvironments

Baumeister R F (1998) The self In D T Gilbert S TFiske amp G Lindzey (Eds) Handbook of social psychology(4th ed pp 680ndash740) New York McGraw-Hill

Beall A C Bailenson J N Loomis J Blascovich J ampRex C (2003) Non-zero-sum mutual gaze in immersivevirtual environments Proceedings of HCI International2003

Benford S Bowers J Fahlen L Greenhalgh C amp Snow-don D (1995) User embodiment in collaborative virtualenvironments Proceedings of CHIrsquo95 (pp 242ndash249) ACMPress

Biocca F (1997) The cyborgrsquos dilemma Progressive em-bodiment in virtual environments Journal of Computer-Mediated Communication [online] 3 Retrieved fromhttpwwwascuscorgjcmcvol3issue2biocca2html

Black M amp Yacoob Y (1997) Recognizing facial expres-sions in image sequences using local parameterized modelsof image motion International Journal of Computer Vision25(1) 23ndash48

Blanz V amp Vetter T (1999) A morphable model for thesynthesis of 3D faces SIGGRAPH rsquo99 Conference Proceed-ings 187ndash194

Blascovich J Loomis J Beall A C Swinth K R HoytC amp Bailenson J N (2002) Immersive virtual environ-ment technology as a methodological tool for social psy-chology Psychological Inquiry 13 103ndash124

Busey T A (1988) Physical and psychological representa-tions of faces Evidence from morphing Psychological Sci-ence 9 476ndash483

Byrne D (1971) The attraction paradigm New York Aca-demic Press

Cassell J (2000) Nudge nudge wink wink Elements of face-to-face conversation for embodied conversational agents InJ Cassell et al (Eds) Embodied conversational agentsCambridge MA MIT Press

Chaiken S (1979) Communicator physical attractiveness andpersuasion Journal of Personality and Social Psychology 371387ndash1397

Chartrand T L amp Bargh J (1999) The chameleon effectThe perception-behavior link and social interaction Journalof Personality amp Social Psychology 76(6) 893ndash910

Decarlo D Metaxas D amp Stone M (1998) An anthropo-metric face model using variational techniques Proceedingsof SIGGRAPH rsquo98 67ndash74

Depaulo B M amp Friedman H S (1998) Nonverbal com-munication In D T Gilbert S T Fiske amp G Lindzey

(Eds) The handbook of social psychology (4th ed Vol 2 pp3ndash40) Boston McGraw-Hill

Donato G Bartlett M S Hager J C Ekman P amp Se-jnowski T J (1999) Classifying facial actions IEEE Trans-actions on Pattern Analysis and Machine Intelligence21(10) 974ndash989

Durlach N amp Slater M (2000) Presence in shared virtualenvironments and virtual togetherness Presence Teleopera-tors and Virtual Environments 9 214ndash217

Ekman P (1978) Facial signs Facts fantasies and possibili-ties In T Sebeok (Ed) Sight sound and sense Blooming-ton IN Indiana University Press

Fodor J A (1983) The modularity of mind An essay on fac-ulty psychology Cambridge MA MIT Press

Fry R amp Smith G F (1975) The effects of feedback andeye contact on performance of a digit-encoding task Jour-nal of Social Psychology 96 145ndash146

Gale C amp Monk A F (2002) A look is worth a thousandwords Full gaze awareness in video-mediated conversationDiscourse Processes 33

Garau M Slater M Vinayagamoorhty V Brogni ASteed A amp Sasse M A (2003) The impact of avatar real-ism and eye gaze control on perceived quality of communi-cation in a shared immersive virtual environment Proceed-ings of the SIGCHI Conference on Human Factors inComputing Systems

Gibson W (1984) Neuromancer New York Ace BooksHu C Ferris R amp Turk M (2003) Active wavelet net-

works for face alignment Proceedings of the British MachineVision Conference Norwich UK

Kendon A (1977) Studies in the behavior of social interactionBloomington IN Indiana University

Kleinke C L (1986) Gaze and eye contact A research re-view Psychological Bulletin 100 78ndash100

Kraut R E Fussell S R Brennan S E amp Siegel J(2002) Understanding effects of proximity on collabora-tion Implications for technologies to support remote col-laborative work In P Hinds amp S Kiesler (Eds) Distrib-uted work Cambridge MA MIT Press

Lanier J (2001) Virtually there Scientific American April2001

Leigh J DeFanti T Johnson A Brown M Sandin D(1997) Global telemersion Better than being there Pro-ceedings of ICAT rsquo97

Loomis J M Blascovich J J amp Beall A C (1999) Im-mersive virtual environments as a basic research tool in psy-chology Behavior Research Methods Instruments and Com-puters 31(4) 557ndash564

440 PRESENCE VOLUME 13 NUMBER 4

Mania K amp Chalmers A (1998) Proceedings of theFourth International Conference on Virtual Systems andMultimedia (pp 177ndash182) Amsterdam IOS Press-Ohmsha

Milgram S (1992) The individual in a social world Essaysand experiments (2nd ed) New York McGraw-Hill

Morgan T Kriz R Howard T Dias Neves F amp Kelso J(2001) Extending the use of collaborative virtual environ-ments for instruction to Kndash12 schools Insight 1(1)

Normand V Babski C Benford S Bullock A Carion SChrysanthou Y et al (1999) The COVEN project Ex-ploring applicative technical and usage dimensions of col-laborative virtual environments Presence Teleoperators andVirtual Environments 8(2) 218ndash236

Patterson M L (1982) An arousal model of interpersonalintimacy Psychological Review 89 231ndash249

Pylyshyn Z W (1980) Computation and cognition Issuesin the foundations of cognitive science Behavioral amp BrainSciences 3 111ndash169

Reeves B amp Nass C (1996) The media equation Howpeople treat computers television and new media like realpeople and places New York Cambridge University Press

Rickel J amp Johnson W L (2000) Task-oriented collabora-tion with embodied agents in virtual worlds In J Cassell JSullivan S Prevost amp E Churchill (Eds) Embodied con-versational agents Cambridge MA MIT Press

Rutter D R (1984) Looking and seeing The role of visualcommunication in social interaction Suffolk UK JohnWiley amp Sons

Sannier G amp Thalmann M N (1998) A user friendly tex-ture-fitting methodology for virtual humans ComputerGraphics International rsquo97

Schwartz P Bricker L Campbell B Furness T InkpenK Matheson L et al (1998) Virtual playground Archi-tectures for a shared virtual world Proceedings of the ACMSymposium on Virtual Reality Software and Technology 199843ndash50

Sherwood J V (1987) Facilitative effects of gaze uponlearning Perceptual and Motor Skills 64 1275ndash1278

Simons H (1976) Persuasion Understanding practice andanalysis Reading MA Heath

Slater M Pertaub D amp Steed A (1999) Public speaking

in virtual reality Facing an audience of avatars IEEE Com-puter Graphics and Applications 19(2) 6ndash9

Slater A Sadagic M Usoh R amp Schroeder R (2000)Small group behavior in a virtual and real environment Acomparative study Presence Teleoperators and Virtual Envi-ronments 9(1) 37ndash51

Stiefelhagen R Yang J amp Waibel A (1997) Tracking eyesand monitoring eye gaze In M Turk amp Y Takabayashi(Eds) Proceedings of the Workshop on Perceptual User Inter-faces

Turing A (1950) Computing machinery and intelligenceMind 59 (236)

Turk M amp Kolsch M (in press) Perceptual interfaces InMedioni G amp Kang S B (Eds) Emerging topics in com-puter vision Boston Prentice Hall

Velichkovsky B M (1995) Communicating attention Gazeposition transfer in cooperative problem solving Pragmaticsand Cognition 3(2) 199ndash222

Vertegaal R (1999) The GAZE groupware system Mediat-ing joint attention in multiparty communication and collab-oration Proceedings of the CHI rsquo99 Conference on HumanFactors in Computing Systems The CHI is the Limit 294ndash301

Viola P amp Jones M (2001) Rapid object detection using aboosted cascade of simple features Proceedings of the IEEEConference on Computer Vision and Pattern Recognition

Wallace D F (1996) Infinite jest Boston Little BrownWilliams K Cheung K T amp Choi W (2000) Cyberostra-

cisms Effects of being ignored over the internet Journal ofPersonality and Social Psychology 79 748ndash762

Yee N (2002) Befriending ogres and wood elvesmdashUnder-standing relationship formation in MMORPGs Retrievedfrom httpwwwnickyeecomhubrelationshipshomehtml

Zajonc R B (1971) Brainwash Familiarity breeds comfortPsychology Today 3(9) 60ndash64

Zajonc R B Murphy S T amp Inglehart M (1989) Feel-ing and facial efference Implication of the vascular theoryof emotion Psychological Review 96 395ndash416

Zhang X amp Furnas G (2002) Social interactions in multi-scale CVEs Proceedings of the ACM Conference on Collabo-rative Virtual Environments 2002 (CVE 2002)

Bailenson et al 441

tant potential implications of TSI before the technol-ogy becomes widespread

Along the same lines the threat of TSI may be thevery downfall of CVE interaction In face-to-face inter-action there tends to be some degree of deception forexample people using facial expressions to mask theiremotions Clearly this deception has the potential to bemuch greater with TSI If interactants have no faith thattheir perceptual experience is genuine they may havelittle reason to ever enter a CVE A complete lack oftrust in the truthfulness of gestures one-to-one corre-spondence of avatars and temporal presence of interac-tants has the potential to rob the CVE of one of itsgreatest strengths namely interactivity since the inter-actants may not know who what or when they are in-teracting with others Similarly given an expectation ofTSI interactants may be constantly suspicious duringinteractions this lack of trust of fellow interactants maylead to unproductive collaborations

A solution to this breakdown may require the devel-opment of TSI detectors for interactants either basedon computer algorithms that analyze nonverbal behav-iors or based on actual humans that scrutinize the inter-action To examine the possibility of using human TSIdetectors we now discuss what we call the non-verbalTuring Test (NVTT)

In the popular reinterpretation of the Turing Test(Turing 1950) a judge attempts to determine whichof two players (one human one machine) is a fellowhuman In our NVTT pilot studies experimental par-ticipants acting as judges enter a CVE with two vir-tual humans one avatar whose head movements areveridical and playing back the movements of anotherhuman in real time (ie without TSI) and one avatarwhose head movements are actually a transformationof the judgersquos own head movements (ie time-delayed and reduced motion range) The judge seesthe head movements from a real person on one avatarand some sort of self-mimicked movements on theother During the interaction only head movementsare permitted (ie no verbal communication al-lowed) and participants must devise ways to engageand test the two virtual humans through nonverbalmeans in order to ascertain which is human andwhich is a mimicker Figure 4 illustrates

In the current initial pilot study we manipulatedthree independent variables test trial length (either 16or 32 seconds) mimic delay (ie the computer-agentmimics either 1 2 4 or 8 seconds after the judgersquosmovements) and range of motion (high pitch yaw androll or low yaw only)

Participants in this study wore head-mounted displayswhile a render computer tracked their head orientationtracking rendering and networking latencies were alllow enough to impart a compelling sense of copresence(see Bailenson et al 2002 for detailed descriptions ofthe hardware and software used) Participants were in-structed to sit in a virtual room with two virtual peoplea human agent (ie a representation whose movementsare controlled by a real person in another room) and acomputer agent (ie a computer program that is de-signed to mimic the userrsquos movements in some way)Participants were instructed to interact with the twoother virtual people using head movements in order todetermine which one is the human agent Participantswere run in groups of two with each one acting as thehuman agent for the other Each participant sat at a vir-tual conference table with two virtual humans (similarto that shown in Figure 3) Each participant received arandom order of 32 test trials (two instances of the 16conditions resulting from the crossing of the three inde-

Figure 4 A schematic of the NVTT The human judge is forced to

determine which of the two avatars exhibiting head movements is the

real human and which is the computer-generated human mimicking

the judgersquos own movements

438 PRESENCE VOLUME 13 NUMBER 4

pendent variables) Forty-one undergraduates partici-pated in this study

For the purposes of brevity we focus on two resultsin particular First despite the fact that we explicitly toldparticipants that the computer agent was directly mim-icking them they performed surprisingly poorly whenattempting to identify the human avatar The overallaverage score was only 66 correct (SD 10chance 50 maximum score 100) Moreover ofthe 41 participants in the study more than one fourthwas not reliably different from chance (ie less than 3SEM from 50 between 44 and 56) Second as Fig-ure 5 demonstrates participantsrsquo scores diminished in-versely with the magnitude of the mimic delay in thatthere was a linear trend in the logarithm of the delayvariable F(132) 885 p 01 When the delay wasgreater than 1 second participants had more difficultyidentifying a mimicker

These data are particularly striking in that we had ini-tially predicted that participants would be able to recog-nize their own head movements much more easily whenaware that one avatar was designed to mimic them Inthis study we explicitly told participants about the useof TSI and they still had great difficulty in detecting itThe effects of implicit TSI (that is not disclosed) canonly have a higher impact While this pilot study is ex-tremely simple and only scratches the surface of a para-digm that examines TSI it is still noteworthy that par-ticipants did not detect the mimicker across the board

We are currently exploring other factors underlyingthe discrimination of human nonverbal behavior from

computer-generated behaviors In future studies wewill use NVTTs to study other nonverbal behaviorssuch as facial gestures eye-head gestures (pointingindications by either system) hand gestures and in-terpersonal distance We have shown that in albeitsimple scenarios it is possible to pass the NVTT for apercentage of our test population using TSI We areconfident that as this percentage grows in the nearfuture important scientific and sociological discover-ies will surface along the way

In conclusion there are many reasons one mightwant to avoid TSI these reasons range from Orwellianconcerns to the fear of rendering CVEs (perhaps eventhe telephone) functionally useless We are not advo-cates of TSI as a means to replace normal communica-tion nor are we staunch believers in avoiding TSI inorder to preserve the natural order of communicationand conversation However we do acknowledge thefact that as CVEs become more prevalent the strategicdecoupling of representation from behavior is inevita-ble For that reason alone the notion of TSI warrantsconsiderable attention

Acknowledgments

The authors would like to thank Robin Gilmour and Christo-pher Rex for helpful suggestions Furthermore we thankChristopher Rex and Ryan Jaeger for assistance in collectingdata This research was sponsored in part by NSF Award SBE-9873432 and in part by NSF ITR Award IIS 0205740

References

Argyle M (1988) Bodily communication (2nd ed) LondonMethuen

Bailenson J N Beall A C amp Blascovich J (2002) Mutualgaze and task performance in shared virtual environmentsJournal of Visualization and Computer Animation 13 1ndash8

Bailenson J N Beall A C Blascovich J Raimmundo Ramp Weisbuch M (2001) Intelligent agents who wear yourface Usersrsquo reactions to the virtual self Lecture Notes inArtificial Intelligence 2190 86ndash99

Bailenson J N Blascovich J Beall A C amp Guadagno

Figure 5 Percent correct by mimic delay in seconds This data

excludes subjects at chance performance

Bailenson et al 439

R E (submitted) Self representations in immersive virtualenvironments

Baumeister R F (1998) The self In D T Gilbert S TFiske amp G Lindzey (Eds) Handbook of social psychology(4th ed pp 680ndash740) New York McGraw-Hill

Beall A C Bailenson J N Loomis J Blascovich J ampRex C (2003) Non-zero-sum mutual gaze in immersivevirtual environments Proceedings of HCI International2003

Benford S Bowers J Fahlen L Greenhalgh C amp Snow-don D (1995) User embodiment in collaborative virtualenvironments Proceedings of CHIrsquo95 (pp 242ndash249) ACMPress

Biocca F (1997) The cyborgrsquos dilemma Progressive em-bodiment in virtual environments Journal of Computer-Mediated Communication [online] 3 Retrieved fromhttpwwwascuscorgjcmcvol3issue2biocca2html

Black M amp Yacoob Y (1997) Recognizing facial expres-sions in image sequences using local parameterized modelsof image motion International Journal of Computer Vision25(1) 23ndash48

Blanz V amp Vetter T (1999) A morphable model for thesynthesis of 3D faces SIGGRAPH rsquo99 Conference Proceed-ings 187ndash194

Blascovich J Loomis J Beall A C Swinth K R HoytC amp Bailenson J N (2002) Immersive virtual environ-ment technology as a methodological tool for social psy-chology Psychological Inquiry 13 103ndash124

Busey T A (1988) Physical and psychological representa-tions of faces Evidence from morphing Psychological Sci-ence 9 476ndash483

Byrne D (1971) The attraction paradigm New York Aca-demic Press

Cassell J (2000) Nudge nudge wink wink Elements of face-to-face conversation for embodied conversational agents InJ Cassell et al (Eds) Embodied conversational agentsCambridge MA MIT Press

Chaiken S (1979) Communicator physical attractiveness andpersuasion Journal of Personality and Social Psychology 371387ndash1397

Chartrand T L amp Bargh J (1999) The chameleon effectThe perception-behavior link and social interaction Journalof Personality amp Social Psychology 76(6) 893ndash910

Decarlo D Metaxas D amp Stone M (1998) An anthropo-metric face model using variational techniques Proceedingsof SIGGRAPH rsquo98 67ndash74

Depaulo B M amp Friedman H S (1998) Nonverbal com-munication In D T Gilbert S T Fiske amp G Lindzey

(Eds) The handbook of social psychology (4th ed Vol 2 pp3ndash40) Boston McGraw-Hill

Donato G Bartlett M S Hager J C Ekman P amp Se-jnowski T J (1999) Classifying facial actions IEEE Trans-actions on Pattern Analysis and Machine Intelligence21(10) 974ndash989

Durlach N amp Slater M (2000) Presence in shared virtualenvironments and virtual togetherness Presence Teleopera-tors and Virtual Environments 9 214ndash217

Ekman P (1978) Facial signs Facts fantasies and possibili-ties In T Sebeok (Ed) Sight sound and sense Blooming-ton IN Indiana University Press

Fodor J A (1983) The modularity of mind An essay on fac-ulty psychology Cambridge MA MIT Press

Fry R amp Smith G F (1975) The effects of feedback andeye contact on performance of a digit-encoding task Jour-nal of Social Psychology 96 145ndash146

Gale C amp Monk A F (2002) A look is worth a thousandwords Full gaze awareness in video-mediated conversationDiscourse Processes 33

Garau M Slater M Vinayagamoorhty V Brogni ASteed A amp Sasse M A (2003) The impact of avatar real-ism and eye gaze control on perceived quality of communi-cation in a shared immersive virtual environment Proceed-ings of the SIGCHI Conference on Human Factors inComputing Systems

Gibson W (1984) Neuromancer New York Ace BooksHu C Ferris R amp Turk M (2003) Active wavelet net-

works for face alignment Proceedings of the British MachineVision Conference Norwich UK

Kendon A (1977) Studies in the behavior of social interactionBloomington IN Indiana University

Kleinke C L (1986) Gaze and eye contact A research re-view Psychological Bulletin 100 78ndash100

Kraut R E Fussell S R Brennan S E amp Siegel J(2002) Understanding effects of proximity on collabora-tion Implications for technologies to support remote col-laborative work In P Hinds amp S Kiesler (Eds) Distrib-uted work Cambridge MA MIT Press

Lanier J (2001) Virtually there Scientific American April2001

Leigh J DeFanti T Johnson A Brown M Sandin D(1997) Global telemersion Better than being there Pro-ceedings of ICAT rsquo97

Loomis J M Blascovich J J amp Beall A C (1999) Im-mersive virtual environments as a basic research tool in psy-chology Behavior Research Methods Instruments and Com-puters 31(4) 557ndash564

440 PRESENCE VOLUME 13 NUMBER 4

Mania K amp Chalmers A (1998) Proceedings of theFourth International Conference on Virtual Systems andMultimedia (pp 177ndash182) Amsterdam IOS Press-Ohmsha

Milgram S (1992) The individual in a social world Essaysand experiments (2nd ed) New York McGraw-Hill

Morgan T Kriz R Howard T Dias Neves F amp Kelso J(2001) Extending the use of collaborative virtual environ-ments for instruction to Kndash12 schools Insight 1(1)

Normand V Babski C Benford S Bullock A Carion SChrysanthou Y et al (1999) The COVEN project Ex-ploring applicative technical and usage dimensions of col-laborative virtual environments Presence Teleoperators andVirtual Environments 8(2) 218ndash236

Patterson M L (1982) An arousal model of interpersonalintimacy Psychological Review 89 231ndash249

Pylyshyn Z W (1980) Computation and cognition Issuesin the foundations of cognitive science Behavioral amp BrainSciences 3 111ndash169

Reeves B amp Nass C (1996) The media equation Howpeople treat computers television and new media like realpeople and places New York Cambridge University Press

Rickel J amp Johnson W L (2000) Task-oriented collabora-tion with embodied agents in virtual worlds In J Cassell JSullivan S Prevost amp E Churchill (Eds) Embodied con-versational agents Cambridge MA MIT Press

Rutter D R (1984) Looking and seeing The role of visualcommunication in social interaction Suffolk UK JohnWiley amp Sons

Sannier G amp Thalmann M N (1998) A user friendly tex-ture-fitting methodology for virtual humans ComputerGraphics International rsquo97

Schwartz P Bricker L Campbell B Furness T InkpenK Matheson L et al (1998) Virtual playground Archi-tectures for a shared virtual world Proceedings of the ACMSymposium on Virtual Reality Software and Technology 199843ndash50

Sherwood J V (1987) Facilitative effects of gaze uponlearning Perceptual and Motor Skills 64 1275ndash1278

Simons H (1976) Persuasion Understanding practice andanalysis Reading MA Heath

Slater M Pertaub D amp Steed A (1999) Public speaking

in virtual reality Facing an audience of avatars IEEE Com-puter Graphics and Applications 19(2) 6ndash9

Slater A Sadagic M Usoh R amp Schroeder R (2000)Small group behavior in a virtual and real environment Acomparative study Presence Teleoperators and Virtual Envi-ronments 9(1) 37ndash51

Stiefelhagen R Yang J amp Waibel A (1997) Tracking eyesand monitoring eye gaze In M Turk amp Y Takabayashi(Eds) Proceedings of the Workshop on Perceptual User Inter-faces

Turing A (1950) Computing machinery and intelligenceMind 59 (236)

Turk M amp Kolsch M (in press) Perceptual interfaces InMedioni G amp Kang S B (Eds) Emerging topics in com-puter vision Boston Prentice Hall

Velichkovsky B M (1995) Communicating attention Gazeposition transfer in cooperative problem solving Pragmaticsand Cognition 3(2) 199ndash222

Vertegaal R (1999) The GAZE groupware system Mediat-ing joint attention in multiparty communication and collab-oration Proceedings of the CHI rsquo99 Conference on HumanFactors in Computing Systems The CHI is the Limit 294ndash301

Viola P amp Jones M (2001) Rapid object detection using aboosted cascade of simple features Proceedings of the IEEEConference on Computer Vision and Pattern Recognition

Wallace D F (1996) Infinite jest Boston Little BrownWilliams K Cheung K T amp Choi W (2000) Cyberostra-

cisms Effects of being ignored over the internet Journal ofPersonality and Social Psychology 79 748ndash762

Yee N (2002) Befriending ogres and wood elvesmdashUnder-standing relationship formation in MMORPGs Retrievedfrom httpwwwnickyeecomhubrelationshipshomehtml

Zajonc R B (1971) Brainwash Familiarity breeds comfortPsychology Today 3(9) 60ndash64

Zajonc R B Murphy S T amp Inglehart M (1989) Feel-ing and facial efference Implication of the vascular theoryof emotion Psychological Review 96 395ndash416

Zhang X amp Furnas G (2002) Social interactions in multi-scale CVEs Proceedings of the ACM Conference on Collabo-rative Virtual Environments 2002 (CVE 2002)

Bailenson et al 441

pendent variables) Forty-one undergraduates partici-pated in this study

For the purposes of brevity we focus on two resultsin particular First despite the fact that we explicitly toldparticipants that the computer agent was directly mim-icking them they performed surprisingly poorly whenattempting to identify the human avatar The overallaverage score was only 66 correct (SD 10chance 50 maximum score 100) Moreover ofthe 41 participants in the study more than one fourthwas not reliably different from chance (ie less than 3SEM from 50 between 44 and 56) Second as Fig-ure 5 demonstrates participantsrsquo scores diminished in-versely with the magnitude of the mimic delay in thatthere was a linear trend in the logarithm of the delayvariable F(132) 885 p 01 When the delay wasgreater than 1 second participants had more difficultyidentifying a mimicker

These data are particularly striking in that we had ini-tially predicted that participants would be able to recog-nize their own head movements much more easily whenaware that one avatar was designed to mimic them Inthis study we explicitly told participants about the useof TSI and they still had great difficulty in detecting itThe effects of implicit TSI (that is not disclosed) canonly have a higher impact While this pilot study is ex-tremely simple and only scratches the surface of a para-digm that examines TSI it is still noteworthy that par-ticipants did not detect the mimicker across the board

We are currently exploring other factors underlyingthe discrimination of human nonverbal behavior from

computer-generated behaviors In future studies wewill use NVTTs to study other nonverbal behaviorssuch as facial gestures eye-head gestures (pointingindications by either system) hand gestures and in-terpersonal distance We have shown that in albeitsimple scenarios it is possible to pass the NVTT for apercentage of our test population using TSI We areconfident that as this percentage grows in the nearfuture important scientific and sociological discover-ies will surface along the way

In conclusion there are many reasons one mightwant to avoid TSI these reasons range from Orwellianconcerns to the fear of rendering CVEs (perhaps eventhe telephone) functionally useless We are not advo-cates of TSI as a means to replace normal communica-tion nor are we staunch believers in avoiding TSI inorder to preserve the natural order of communicationand conversation However we do acknowledge thefact that as CVEs become more prevalent the strategicdecoupling of representation from behavior is inevita-ble For that reason alone the notion of TSI warrantsconsiderable attention

Acknowledgments

The authors would like to thank Robin Gilmour and Christo-pher Rex for helpful suggestions Furthermore we thankChristopher Rex and Ryan Jaeger for assistance in collectingdata This research was sponsored in part by NSF Award SBE-9873432 and in part by NSF ITR Award IIS 0205740

References

Argyle M (1988) Bodily communication (2nd ed) LondonMethuen

Bailenson J N Beall A C amp Blascovich J (2002) Mutualgaze and task performance in shared virtual environmentsJournal of Visualization and Computer Animation 13 1ndash8

Bailenson J N Beall A C Blascovich J Raimmundo Ramp Weisbuch M (2001) Intelligent agents who wear yourface Usersrsquo reactions to the virtual self Lecture Notes inArtificial Intelligence 2190 86ndash99

Bailenson J N Blascovich J Beall A C amp Guadagno

Figure 5 Percent correct by mimic delay in seconds This data

excludes subjects at chance performance

Bailenson et al 439

R E (submitted) Self representations in immersive virtualenvironments

Baumeister R F (1998) The self In D T Gilbert S TFiske amp G Lindzey (Eds) Handbook of social psychology(4th ed pp 680ndash740) New York McGraw-Hill

Beall A C Bailenson J N Loomis J Blascovich J ampRex C (2003) Non-zero-sum mutual gaze in immersivevirtual environments Proceedings of HCI International2003

Benford S Bowers J Fahlen L Greenhalgh C amp Snow-don D (1995) User embodiment in collaborative virtualenvironments Proceedings of CHIrsquo95 (pp 242ndash249) ACMPress

Biocca F (1997) The cyborgrsquos dilemma Progressive em-bodiment in virtual environments Journal of Computer-Mediated Communication [online] 3 Retrieved fromhttpwwwascuscorgjcmcvol3issue2biocca2html

Black M amp Yacoob Y (1997) Recognizing facial expres-sions in image sequences using local parameterized modelsof image motion International Journal of Computer Vision25(1) 23ndash48

Blanz V amp Vetter T (1999) A morphable model for thesynthesis of 3D faces SIGGRAPH rsquo99 Conference Proceed-ings 187ndash194

Blascovich J Loomis J Beall A C Swinth K R HoytC amp Bailenson J N (2002) Immersive virtual environ-ment technology as a methodological tool for social psy-chology Psychological Inquiry 13 103ndash124

Busey T A (1988) Physical and psychological representa-tions of faces Evidence from morphing Psychological Sci-ence 9 476ndash483

Byrne D (1971) The attraction paradigm New York Aca-demic Press

Cassell J (2000) Nudge nudge wink wink Elements of face-to-face conversation for embodied conversational agents InJ Cassell et al (Eds) Embodied conversational agentsCambridge MA MIT Press

Chaiken S (1979) Communicator physical attractiveness andpersuasion Journal of Personality and Social Psychology 371387ndash1397

Chartrand T L amp Bargh J (1999) The chameleon effectThe perception-behavior link and social interaction Journalof Personality amp Social Psychology 76(6) 893ndash910

Decarlo D Metaxas D amp Stone M (1998) An anthropo-metric face model using variational techniques Proceedingsof SIGGRAPH rsquo98 67ndash74

Depaulo B M amp Friedman H S (1998) Nonverbal com-munication In D T Gilbert S T Fiske amp G Lindzey

(Eds) The handbook of social psychology (4th ed Vol 2 pp3ndash40) Boston McGraw-Hill

Donato G Bartlett M S Hager J C Ekman P amp Se-jnowski T J (1999) Classifying facial actions IEEE Trans-actions on Pattern Analysis and Machine Intelligence21(10) 974ndash989

Durlach N amp Slater M (2000) Presence in shared virtualenvironments and virtual togetherness Presence Teleopera-tors and Virtual Environments 9 214ndash217

Ekman P (1978) Facial signs Facts fantasies and possibili-ties In T Sebeok (Ed) Sight sound and sense Blooming-ton IN Indiana University Press

Fodor J A (1983) The modularity of mind An essay on fac-ulty psychology Cambridge MA MIT Press

Fry R amp Smith G F (1975) The effects of feedback andeye contact on performance of a digit-encoding task Jour-nal of Social Psychology 96 145ndash146

Gale C amp Monk A F (2002) A look is worth a thousandwords Full gaze awareness in video-mediated conversationDiscourse Processes 33

Garau M Slater M Vinayagamoorhty V Brogni ASteed A amp Sasse M A (2003) The impact of avatar real-ism and eye gaze control on perceived quality of communi-cation in a shared immersive virtual environment Proceed-ings of the SIGCHI Conference on Human Factors inComputing Systems

Gibson W (1984) Neuromancer New York Ace BooksHu C Ferris R amp Turk M (2003) Active wavelet net-

works for face alignment Proceedings of the British MachineVision Conference Norwich UK

Kendon A (1977) Studies in the behavior of social interactionBloomington IN Indiana University

Kleinke C L (1986) Gaze and eye contact A research re-view Psychological Bulletin 100 78ndash100

Kraut R E Fussell S R Brennan S E amp Siegel J(2002) Understanding effects of proximity on collabora-tion Implications for technologies to support remote col-laborative work In P Hinds amp S Kiesler (Eds) Distrib-uted work Cambridge MA MIT Press

Lanier J (2001) Virtually there Scientific American April2001

Leigh J DeFanti T Johnson A Brown M Sandin D(1997) Global telemersion Better than being there Pro-ceedings of ICAT rsquo97

Loomis J M Blascovich J J amp Beall A C (1999) Im-mersive virtual environments as a basic research tool in psy-chology Behavior Research Methods Instruments and Com-puters 31(4) 557ndash564

440 PRESENCE VOLUME 13 NUMBER 4

Mania K amp Chalmers A (1998) Proceedings of theFourth International Conference on Virtual Systems andMultimedia (pp 177ndash182) Amsterdam IOS Press-Ohmsha

Milgram S (1992) The individual in a social world Essaysand experiments (2nd ed) New York McGraw-Hill

Morgan T Kriz R Howard T Dias Neves F amp Kelso J(2001) Extending the use of collaborative virtual environ-ments for instruction to Kndash12 schools Insight 1(1)

Normand V Babski C Benford S Bullock A Carion SChrysanthou Y et al (1999) The COVEN project Ex-ploring applicative technical and usage dimensions of col-laborative virtual environments Presence Teleoperators andVirtual Environments 8(2) 218ndash236

Patterson M L (1982) An arousal model of interpersonalintimacy Psychological Review 89 231ndash249

Pylyshyn Z W (1980) Computation and cognition Issuesin the foundations of cognitive science Behavioral amp BrainSciences 3 111ndash169

Reeves B amp Nass C (1996) The media equation Howpeople treat computers television and new media like realpeople and places New York Cambridge University Press

Rickel J amp Johnson W L (2000) Task-oriented collabora-tion with embodied agents in virtual worlds In J Cassell JSullivan S Prevost amp E Churchill (Eds) Embodied con-versational agents Cambridge MA MIT Press

Rutter D R (1984) Looking and seeing The role of visualcommunication in social interaction Suffolk UK JohnWiley amp Sons

Sannier G amp Thalmann M N (1998) A user friendly tex-ture-fitting methodology for virtual humans ComputerGraphics International rsquo97

Schwartz P Bricker L Campbell B Furness T InkpenK Matheson L et al (1998) Virtual playground Archi-tectures for a shared virtual world Proceedings of the ACMSymposium on Virtual Reality Software and Technology 199843ndash50

Sherwood J V (1987) Facilitative effects of gaze uponlearning Perceptual and Motor Skills 64 1275ndash1278

Simons H (1976) Persuasion Understanding practice andanalysis Reading MA Heath

Slater M Pertaub D amp Steed A (1999) Public speaking

in virtual reality Facing an audience of avatars IEEE Com-puter Graphics and Applications 19(2) 6ndash9

Slater A Sadagic M Usoh R amp Schroeder R (2000)Small group behavior in a virtual and real environment Acomparative study Presence Teleoperators and Virtual Envi-ronments 9(1) 37ndash51

Stiefelhagen R Yang J amp Waibel A (1997) Tracking eyesand monitoring eye gaze In M Turk amp Y Takabayashi(Eds) Proceedings of the Workshop on Perceptual User Inter-faces

Turing A (1950) Computing machinery and intelligenceMind 59 (236)

Turk M amp Kolsch M (in press) Perceptual interfaces InMedioni G amp Kang S B (Eds) Emerging topics in com-puter vision Boston Prentice Hall

Velichkovsky B M (1995) Communicating attention Gazeposition transfer in cooperative problem solving Pragmaticsand Cognition 3(2) 199ndash222

Vertegaal R (1999) The GAZE groupware system Mediat-ing joint attention in multiparty communication and collab-oration Proceedings of the CHI rsquo99 Conference on HumanFactors in Computing Systems The CHI is the Limit 294ndash301

Viola P amp Jones M (2001) Rapid object detection using aboosted cascade of simple features Proceedings of the IEEEConference on Computer Vision and Pattern Recognition

Wallace D F (1996) Infinite jest Boston Little BrownWilliams K Cheung K T amp Choi W (2000) Cyberostra-

cisms Effects of being ignored over the internet Journal ofPersonality and Social Psychology 79 748ndash762

Yee N (2002) Befriending ogres and wood elvesmdashUnder-standing relationship formation in MMORPGs Retrievedfrom httpwwwnickyeecomhubrelationshipshomehtml

Zajonc R B (1971) Brainwash Familiarity breeds comfortPsychology Today 3(9) 60ndash64

Zajonc R B Murphy S T amp Inglehart M (1989) Feel-ing and facial efference Implication of the vascular theoryof emotion Psychological Review 96 395ndash416

Zhang X amp Furnas G (2002) Social interactions in multi-scale CVEs Proceedings of the ACM Conference on Collabo-rative Virtual Environments 2002 (CVE 2002)

Bailenson et al 441

R E (submitted) Self representations in immersive virtualenvironments

Baumeister R F (1998) The self In D T Gilbert S TFiske amp G Lindzey (Eds) Handbook of social psychology(4th ed pp 680ndash740) New York McGraw-Hill

Beall A C Bailenson J N Loomis J Blascovich J ampRex C (2003) Non-zero-sum mutual gaze in immersivevirtual environments Proceedings of HCI International2003

Benford S Bowers J Fahlen L Greenhalgh C amp Snow-don D (1995) User embodiment in collaborative virtualenvironments Proceedings of CHIrsquo95 (pp 242ndash249) ACMPress

Biocca F (1997) The cyborgrsquos dilemma Progressive em-bodiment in virtual environments Journal of Computer-Mediated Communication [online] 3 Retrieved fromhttpwwwascuscorgjcmcvol3issue2biocca2html

Black M amp Yacoob Y (1997) Recognizing facial expres-sions in image sequences using local parameterized modelsof image motion International Journal of Computer Vision25(1) 23ndash48

Blanz V amp Vetter T (1999) A morphable model for thesynthesis of 3D faces SIGGRAPH rsquo99 Conference Proceed-ings 187ndash194

Blascovich J Loomis J Beall A C Swinth K R HoytC amp Bailenson J N (2002) Immersive virtual environ-ment technology as a methodological tool for social psy-chology Psychological Inquiry 13 103ndash124

Busey T A (1988) Physical and psychological representa-tions of faces Evidence from morphing Psychological Sci-ence 9 476ndash483

Byrne D (1971) The attraction paradigm New York Aca-demic Press

Cassell J (2000) Nudge nudge wink wink Elements of face-to-face conversation for embodied conversational agents InJ Cassell et al (Eds) Embodied conversational agentsCambridge MA MIT Press

Chaiken S (1979) Communicator physical attractiveness andpersuasion Journal of Personality and Social Psychology 371387ndash1397

Chartrand T L amp Bargh J (1999) The chameleon effectThe perception-behavior link and social interaction Journalof Personality amp Social Psychology 76(6) 893ndash910

Decarlo D Metaxas D amp Stone M (1998) An anthropo-metric face model using variational techniques Proceedingsof SIGGRAPH rsquo98 67ndash74

Depaulo B M amp Friedman H S (1998) Nonverbal com-munication In D T Gilbert S T Fiske amp G Lindzey

(Eds) The handbook of social psychology (4th ed Vol 2 pp3ndash40) Boston McGraw-Hill

Donato G Bartlett M S Hager J C Ekman P amp Se-jnowski T J (1999) Classifying facial actions IEEE Trans-actions on Pattern Analysis and Machine Intelligence21(10) 974ndash989

Durlach N amp Slater M (2000) Presence in shared virtualenvironments and virtual togetherness Presence Teleopera-tors and Virtual Environments 9 214ndash217

Ekman P (1978) Facial signs Facts fantasies and possibili-ties In T Sebeok (Ed) Sight sound and sense Blooming-ton IN Indiana University Press

Fodor J A (1983) The modularity of mind An essay on fac-ulty psychology Cambridge MA MIT Press

Fry R amp Smith G F (1975) The effects of feedback andeye contact on performance of a digit-encoding task Jour-nal of Social Psychology 96 145ndash146

Gale C amp Monk A F (2002) A look is worth a thousandwords Full gaze awareness in video-mediated conversationDiscourse Processes 33

Garau M Slater M Vinayagamoorhty V Brogni ASteed A amp Sasse M A (2003) The impact of avatar real-ism and eye gaze control on perceived quality of communi-cation in a shared immersive virtual environment Proceed-ings of the SIGCHI Conference on Human Factors inComputing Systems

Gibson W (1984) Neuromancer New York Ace BooksHu C Ferris R amp Turk M (2003) Active wavelet net-

works for face alignment Proceedings of the British MachineVision Conference Norwich UK

Kendon A (1977) Studies in the behavior of social interactionBloomington IN Indiana University

Kleinke C L (1986) Gaze and eye contact A research re-view Psychological Bulletin 100 78ndash100

Kraut R E Fussell S R Brennan S E amp Siegel J(2002) Understanding effects of proximity on collabora-tion Implications for technologies to support remote col-laborative work In P Hinds amp S Kiesler (Eds) Distrib-uted work Cambridge MA MIT Press

Lanier J (2001) Virtually there Scientific American April2001

Leigh J DeFanti T Johnson A Brown M Sandin D(1997) Global telemersion Better than being there Pro-ceedings of ICAT rsquo97

Loomis J M Blascovich J J amp Beall A C (1999) Im-mersive virtual environments as a basic research tool in psy-chology Behavior Research Methods Instruments and Com-puters 31(4) 557ndash564

440 PRESENCE VOLUME 13 NUMBER 4

Mania K amp Chalmers A (1998) Proceedings of theFourth International Conference on Virtual Systems andMultimedia (pp 177ndash182) Amsterdam IOS Press-Ohmsha

Milgram S (1992) The individual in a social world Essaysand experiments (2nd ed) New York McGraw-Hill

Morgan T Kriz R Howard T Dias Neves F amp Kelso J(2001) Extending the use of collaborative virtual environ-ments for instruction to Kndash12 schools Insight 1(1)

Normand V Babski C Benford S Bullock A Carion SChrysanthou Y et al (1999) The COVEN project Ex-ploring applicative technical and usage dimensions of col-laborative virtual environments Presence Teleoperators andVirtual Environments 8(2) 218ndash236

Patterson M L (1982) An arousal model of interpersonalintimacy Psychological Review 89 231ndash249

Pylyshyn Z W (1980) Computation and cognition Issuesin the foundations of cognitive science Behavioral amp BrainSciences 3 111ndash169

Reeves B amp Nass C (1996) The media equation Howpeople treat computers television and new media like realpeople and places New York Cambridge University Press

Rickel J amp Johnson W L (2000) Task-oriented collabora-tion with embodied agents in virtual worlds In J Cassell JSullivan S Prevost amp E Churchill (Eds) Embodied con-versational agents Cambridge MA MIT Press

Rutter D R (1984) Looking and seeing The role of visualcommunication in social interaction Suffolk UK JohnWiley amp Sons

Sannier G amp Thalmann M N (1998) A user friendly tex-ture-fitting methodology for virtual humans ComputerGraphics International rsquo97

Schwartz P Bricker L Campbell B Furness T InkpenK Matheson L et al (1998) Virtual playground Archi-tectures for a shared virtual world Proceedings of the ACMSymposium on Virtual Reality Software and Technology 199843ndash50

Sherwood J V (1987) Facilitative effects of gaze uponlearning Perceptual and Motor Skills 64 1275ndash1278

Simons H (1976) Persuasion Understanding practice andanalysis Reading MA Heath

Slater M Pertaub D amp Steed A (1999) Public speaking

in virtual reality Facing an audience of avatars IEEE Com-puter Graphics and Applications 19(2) 6ndash9

Slater A Sadagic M Usoh R amp Schroeder R (2000)Small group behavior in a virtual and real environment Acomparative study Presence Teleoperators and Virtual Envi-ronments 9(1) 37ndash51

Stiefelhagen R Yang J amp Waibel A (1997) Tracking eyesand monitoring eye gaze In M Turk amp Y Takabayashi(Eds) Proceedings of the Workshop on Perceptual User Inter-faces

Turing A (1950) Computing machinery and intelligenceMind 59 (236)

Turk M amp Kolsch M (in press) Perceptual interfaces InMedioni G amp Kang S B (Eds) Emerging topics in com-puter vision Boston Prentice Hall

Velichkovsky B M (1995) Communicating attention Gazeposition transfer in cooperative problem solving Pragmaticsand Cognition 3(2) 199ndash222

Vertegaal R (1999) The GAZE groupware system Mediat-ing joint attention in multiparty communication and collab-oration Proceedings of the CHI rsquo99 Conference on HumanFactors in Computing Systems The CHI is the Limit 294ndash301

Viola P amp Jones M (2001) Rapid object detection using aboosted cascade of simple features Proceedings of the IEEEConference on Computer Vision and Pattern Recognition

Wallace D F (1996) Infinite jest Boston Little BrownWilliams K Cheung K T amp Choi W (2000) Cyberostra-

cisms Effects of being ignored over the internet Journal ofPersonality and Social Psychology 79 748ndash762

Yee N (2002) Befriending ogres and wood elvesmdashUnder-standing relationship formation in MMORPGs Retrievedfrom httpwwwnickyeecomhubrelationshipshomehtml

Zajonc R B (1971) Brainwash Familiarity breeds comfortPsychology Today 3(9) 60ndash64

Zajonc R B Murphy S T amp Inglehart M (1989) Feel-ing and facial efference Implication of the vascular theoryof emotion Psychological Review 96 395ndash416

Zhang X amp Furnas G (2002) Social interactions in multi-scale CVEs Proceedings of the ACM Conference on Collabo-rative Virtual Environments 2002 (CVE 2002)

Bailenson et al 441

Mania K amp Chalmers A (1998) Proceedings of theFourth International Conference on Virtual Systems andMultimedia (pp 177ndash182) Amsterdam IOS Press-Ohmsha

Milgram S (1992) The individual in a social world Essaysand experiments (2nd ed) New York McGraw-Hill

Morgan T Kriz R Howard T Dias Neves F amp Kelso J(2001) Extending the use of collaborative virtual environ-ments for instruction to Kndash12 schools Insight 1(1)

Normand V Babski C Benford S Bullock A Carion SChrysanthou Y et al (1999) The COVEN project Ex-ploring applicative technical and usage dimensions of col-laborative virtual environments Presence Teleoperators andVirtual Environments 8(2) 218ndash236

Patterson M L (1982) An arousal model of interpersonalintimacy Psychological Review 89 231ndash249

Pylyshyn Z W (1980) Computation and cognition Issuesin the foundations of cognitive science Behavioral amp BrainSciences 3 111ndash169

Reeves B amp Nass C (1996) The media equation Howpeople treat computers television and new media like realpeople and places New York Cambridge University Press

Rickel J amp Johnson W L (2000) Task-oriented collabora-tion with embodied agents in virtual worlds In J Cassell JSullivan S Prevost amp E Churchill (Eds) Embodied con-versational agents Cambridge MA MIT Press

Rutter D R (1984) Looking and seeing The role of visualcommunication in social interaction Suffolk UK JohnWiley amp Sons

Sannier G amp Thalmann M N (1998) A user friendly tex-ture-fitting methodology for virtual humans ComputerGraphics International rsquo97

Schwartz P Bricker L Campbell B Furness T InkpenK Matheson L et al (1998) Virtual playground Archi-tectures for a shared virtual world Proceedings of the ACMSymposium on Virtual Reality Software and Technology 199843ndash50

Sherwood J V (1987) Facilitative effects of gaze uponlearning Perceptual and Motor Skills 64 1275ndash1278

Simons H (1976) Persuasion Understanding practice andanalysis Reading MA Heath

Slater M Pertaub D amp Steed A (1999) Public speaking

in virtual reality Facing an audience of avatars IEEE Com-puter Graphics and Applications 19(2) 6ndash9

Slater A Sadagic M Usoh R amp Schroeder R (2000)Small group behavior in a virtual and real environment Acomparative study Presence Teleoperators and Virtual Envi-ronments 9(1) 37ndash51

Stiefelhagen R Yang J amp Waibel A (1997) Tracking eyesand monitoring eye gaze In M Turk amp Y Takabayashi(Eds) Proceedings of the Workshop on Perceptual User Inter-faces

Turing A (1950) Computing machinery and intelligenceMind 59 (236)

Turk M amp Kolsch M (in press) Perceptual interfaces InMedioni G amp Kang S B (Eds) Emerging topics in com-puter vision Boston Prentice Hall

Velichkovsky B M (1995) Communicating attention Gazeposition transfer in cooperative problem solving Pragmaticsand Cognition 3(2) 199ndash222

Vertegaal R (1999) The GAZE groupware system Mediat-ing joint attention in multiparty communication and collab-oration Proceedings of the CHI rsquo99 Conference on HumanFactors in Computing Systems The CHI is the Limit 294ndash301

Viola P amp Jones M (2001) Rapid object detection using aboosted cascade of simple features Proceedings of the IEEEConference on Computer Vision and Pattern Recognition

Wallace D F (1996) Infinite jest Boston Little BrownWilliams K Cheung K T amp Choi W (2000) Cyberostra-

cisms Effects of being ignored over the internet Journal ofPersonality and Social Psychology 79 748ndash762

Yee N (2002) Befriending ogres and wood elvesmdashUnder-standing relationship formation in MMORPGs Retrievedfrom httpwwwnickyeecomhubrelationshipshomehtml

Zajonc R B (1971) Brainwash Familiarity breeds comfortPsychology Today 3(9) 60ndash64

Zajonc R B Murphy S T amp Inglehart M (1989) Feel-ing and facial efference Implication of the vascular theoryof emotion Psychological Review 96 395ndash416

Zhang X amp Furnas G (2002) Social interactions in multi-scale CVEs Proceedings of the ACM Conference on Collabo-rative Virtual Environments 2002 (CVE 2002)

Bailenson et al 441